High Availability Configuration

Overview

Morpheus provides a wide array of options when it comes to deployment architectures. It can start as a simple one machine instance where all services run on the same machine, or it can be split off into individual services per machine and configured in a high availability configuration, either in the same region or cross-region. Naturally, high availability can grow more complicated, depending on the configuration you want to do and this article will cover the basic concepts of the Morpheus HA architecture that can be used in a wide array of configurations.

There are four primary tiers of services represented within the Morpheus appliance. They are the App Tier, Transactional Database Tier, Non-Transactional Database Tier, and Message Tier. Each of these tiers have their own recommendations for High availability deployments that we need to cover.

../../_images/morpheusHA.png

Important

This is a sample configuration only. Customer configurations and requirements will vary.

Transactional Database Tier

The Transactional database tier usually consists of a MySQL compatible database. It is recommended that a lockable clustered configuration be used (Currently Percona XtraDB Cluster is the most recommended in Permissive Mode). There are several documents online related to configuring and setting up an XtraDB Cluster but it most simply can be laid out in a many master configuration. There can be some nodes setup with replication delay as well as some with no replication delay. It is common practice to have no replication delay within the same region and allow some replication delay cross region. This does increase the risk of job run overlap between the 2 regions however, the concurrent operations typically self-correct and this is a non-issue.

Non-Transactional Database Tier

The Non-Transactional tier consists of an ElasticSearch (version 5.6.10) cluster. Elastic Search is used for log aggregation data and temporal aggregation data (essentially stats, metrics, and logs). This enables for a high write throughput at scale. ElasticSearch is a Clustered database meaning all nodes no matter the region need to be connected to each other over what they call a “Transport” protocol. It is fairly simple to get setup as all nodes are identical. It is also a java based system and does require a sizable chunk of memory for larger data sets. (8gb) is recommended and more nodes can be added to scale either horizontally or vertically.

Messaging Tier

The Messaging tier is an AMQP based tier along with STOMP Protocol (used for agent communication). The primary model recommended is to use RabbitMQ for queue services. RabbitMQ is also a clustered based queuing system and needs at least 3 instances for HA configurations. This is due to elections in the failover scenarios rabbitmq can manage. If doing a cross-region HA rabbitmq cluster it is recommended to have at least 3 rabbit queue clusters per region. Typically to handle HA a RabbitMQ cluster should be placed between a load balancer and the front-end application server to handle cross host connections. The ports necessary to forward in a Rabbit MQ cluster are (5672, and 61613). A rabbitmq cluster can run on smaller memory machines depending on how frequent large requests bursts occur. 4–8gb of Memory is recommended to start.

Application Tier

The application tier is easily installed with the same debian or yum repository package that Morpheus is normally distributed with. Advanced configuration allows for the additional tiers to be skipped and leave only the “stateless” services that need run. These stateless services include Nginx, Tomcat, and Redis (to be phased out at a later date). These machines should also have at least 8gb of Memory. They can be configured across all regions and placed behind a central load-balancer or Geo based load-balancer. They typically connect to all other tiers as none of the other tiers talk to each other besides through the central application tier. One final piece when it comes to setting up the Application tier is a shared storage means is necessary when it comes to maintaining things like deployment archives, virtual image catalogs, backups, etc. These can be externalized to an object storage service such as amazon S3 or Openstack Swiftstack as well. If not using those options a simple NFS cluster can also be used to handle the shared storage structure.

../../_images/morpheus-ha-multi-configuration.png

Database Tier

Installation and configuration of Percona XtraDB Cluster on CentOS/RHEL 7

Important

This is a sample configuration only. Customer configurations and requirements will vary.

Requirements

Percona requires the following ports for the cluster nodes. Please create the appropriate firewall rules on your Percona nodes.

  • 3306
  • 4444
  • 4567
  • 4568

Percona also recommends setting the selinux policy to permissive. You can temporarily set the permission to permissive by running

sudo setenforce 0

You will need to edit the selinux configuration file if you want the permission to take affect permanently which can be found in /etc/selinux/config

Add Percona Repo

  1. Add the percona repo to your Linux Distro.
sudo yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm

Note

For the most up to date repo please visit this link https://www.percona.com/doc/percona-repo-config/yum-repo.html

  1. Check the repo by running the below command.

    sudo yum list | grep percona
    
  2. The below commands will clean the repos and update the server.

    sudo yum clean all
    sudo yum update -y
    

Installing Percona XtraDB Cluster

  1. The below command will install the Percona XtraDB Cluster software and it’s dependences.

    sudo yum install Percona-XtraDB-Cluster-57
    

    Note

    During the installation you will receive the below message. Accept the Percona PGP key to install the software.

    retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Percona
    Importing GPG key 0xCD2EFD2A:
    Userid     : "Percona MySQL Development Team <mysql-dev@percona.com>"
    Fingerprint: 430b df5c 56e7 c94e 848e e60c 1c4c bdcd cd2e fd2a
    Package    : percona-release-0.1-4.noarch (installed)
    From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-Percona
    Is this ok [y/N]: y
    
  2. Next we need enable the mysql service so that the service started at boot.

    sudo systemctl enable mysql
    
  3. Next we need to start mysql

    sudo systemctl start mysql
    
  4. Next we will log into the mysql server and set a new password. To get the temporary root mysql password you will need to run the below command.The command will print the password to the screen. Copy the password.

    sudo grep 'temporary password' /var/log/mysqld.log
    
  5. Login to mysql

    mysql -u root -p
    password: `enter password copied above`
    
  6. Change the root user password to the mysql db

    ALTER USER 'root'@'localhost' IDENTIFIED BY '$root_db_user_pw';
    
  7. Create the sstuser user and grant the permissions.

    mysql> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY '$sstuser_db_user_pw';
    

    Note

    The sstuser and password will be used in the /etc/my.cnf configuration.

    mysql> GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';
    
    mysql> FLUSH PRIVILEGES;
    
  8. Exit mysql then stop the mysql services:

    mysql> exit
    Bye
    $ sudo systemctl stop mysql.service
    
  9. Now install the Percona software on to the other nodes using the same steps.

Once the service is stopped on all nodes move onto the next step.

Add [mysqld] to my.cnf in /etc/

  1. Copy the below contents to /etc/my.cnf. The node_name and node_address needs to be unique on each of the nodes. The first node does not require the gcomm value to be set.

    $ sudo vi /etc/my.cnf
    
    [mysqld]
    wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
    
    wsrep_cluster_name=$dbclustername
    wsrep_cluster_address=gcomm://  #Leave blank for Master Node. The other nodes require this field. Enter the IP address of the primary node first then remaining nodes. Separating the ip addresses with commas like this 10.30.20.196,10.30.20.197,10.30.20.198##
    
    wsrep_node_name=$nodename
    wsrep_node_address=$nodeip
    
    wsrep_sst_method=xtrabackup-v2
    wsrep_sst_auth=sstuser:$sstuser_db_user_pw
    pxc_strict_mode=PERMISSIVE
    
    binlog_format=ROW
    default_storage_engine=InnoDB
    innodb_autoinc_lock_mode=2
    
  2. Save /etc/my.cnf

Bootstrapping the first Node in the cluster

Important

Ensure mysql.service is stopped prior to bootstrap.

  1. To bootstrap the first node in the cluster run the below command.

    sudo systemctl start mysql@bootstrap.service
    

    Note

    The mysql service will start during the boot strap.

  2. To verify the bootstrap, on the master node login to mysql and run show status like 'wsrep%';

    # mysql -u root -p
    
       mysql>  show status like 'wsrep%';
       +----------------------------------+--------------------------------------+
       | Variable_name                    | Value                                |
       +----------------------------------+--------------------------------------+
       | wsrep_local_state_uuid           | 591179cb-a98e-11e7-b9aa-07df8a228fe9 |
       | wsrep_protocol_version           | 7                                    |
       | wsrep_last_committed             | 1                                    |
       | wsrep_replicated                 | 0                                    |
       | wsrep_replicated_bytes           | 0                                    |
       | wsrep_repl_keys                  | 0                                    |
       | wsrep_repl_keys_bytes            | 0                                    |
       | wsrep_repl_data_bytes            | 0                                    |
       | wsrep_repl_other_bytes           | 0                                    |
       | wsrep_received                   | 2                                    |
       | wsrep_received_bytes             | 141                                  |
       | wsrep_local_commits              | 0                                    |
       | wsrep_local_cert_failures        | 0                                    |
       | wsrep_local_replays              | 0                                    |
       | wsrep_local_send_queue           | 0                                    |
       | wsrep_local_send_queue_max       | 1                                    |
       | wsrep_local_send_queue_min       | 0                                    |
       | wsrep_local_send_queue_avg       | 0.000000                             |
       | wsrep_local_recv_queue           | 0                                    |
       | wsrep_local_recv_queue_max       | 2                                    |
       | wsrep_local_recv_queue_min       | 0                                    |
       | wsrep_local_recv_queue_avg       | 0.500000                             |
       | wsrep_local_cached_downto        | 0                                    |
       | wsrep_flow_control_paused_ns     | 0                                    |
       | wsrep_flow_control_paused        | 0.000000                             |
       | wsrep_flow_control_sent          | 0                                    |
       | wsrep_flow_control_recv          | 0                                    |
       | wsrep_flow_control_interval      | [ 100, 100 ]                         |
       | wsrep_flow_control_interval_low  | 100                                  |
       | wsrep_flow_control_interval_high | 100                                  |
       | wsrep_flow_control_status        | OFF                                  |
       | wsrep_cert_deps_distance         | 0.000000                             |
       | wsrep_apply_oooe                 | 0.000000                             |
       | wsrep_apply_oool                 | 0.000000                             |
       | wsrep_apply_window               | 0.000000                             |
       | wsrep_commit_oooe                | 0.000000                             |
       | wsrep_commit_oool                | 0.000000                             |
       | wsrep_commit_window              | 0.000000                             |
       | wsrep_local_state                | 4                                    |
       | wsrep_local_state_comment        | Synced                               |
       | wsrep_cert_index_size            | 0                                    |
       | wsrep_cert_bucket_count          | 22                                   |
       | wsrep_gcache_pool_size           | 1320                                 |
       | wsrep_causal_reads               | 0                                    |
       | wsrep_cert_interval              | 0.000000                             |
       | wsrep_ist_receive_status         |                                      |
       | wsrep_ist_receive_seqno_start    | 0                                    |
       | wsrep_ist_receive_seqno_current  | 0                                    |
       | wsrep_ist_receive_seqno_end      | 0                                    |
       | wsrep_incoming_addresses         | 10.30.20.196:3306                    |
       | wsrep_desync_count               | 0                                    |
       | wsrep_evs_delayed                |                                      |
       | wsrep_evs_evict_list             |                                      |
       | wsrep_evs_repl_latency           | 0/0/0/0/0                            |
       | wsrep_evs_state                  | OPERATIONAL                          |
       | wsrep_gcomm_uuid                 | 07c8c8fe-a998-11e7-883e-06949cfe5af3 |
       | wsrep_cluster_conf_id            | 1                                    |
       | wsrep_cluster_size               | 1                                    |
       | wsrep_cluster_state_uuid         | 591179cb-a98e-11e7-b9aa-07df8a228fe9 |
       | wsrep_cluster_status             | Primary                              |
       | wsrep_connected                  | ON                                   |
       | wsrep_local_bf_aborts            | 0                                    |
       | wsrep_local_index                | 0                                    |
       | wsrep_provider_name              | Galera                               |
       | wsrep_provider_vendor            | Codership Oy <info@codership.com>    |
       | wsrep_provider_version           | 3.22(r8678538)                       |
       | wsrep_ready                      | ON                                   |
       +----------------------------------+--------------------------------------+
        67 rows in set (0.01 sec)
    

    A table will appear with the status and rows.

  3. Next Create the Database you will be using with morpheus.

    mysql> CREATE DATABASE morpheusdb;
    
    mysql> show databases;
    
  4. Next create your morpheus database user. The user needs to be either at the IP address of the morpheus application server or use @'%' within the user name to allow the user to login from anywhere.

    mysql> CREATE USER '$morpheus_db_user_name'@'$source_ip' IDENTIFIED BY '$morpheus_db_user_pw';
    
  5. Next Grant your new morpheus user permissions to the database.

    mysql> GRANT ALL PRIVILEGES ON *.* TO '$morpheus_db_user_name'@'$source_ip' IDENTIFIED BY '$morpheus_db_user_pw' with grant option;
    
    mysql> FLUSH PRIVILEGES;
    
  6. Checking Permissions for your user.

    SHOW GRANTS FOR '$morpheus_db_user_name'@'$source_ip';
    

Bootstrap the Remaining Nodes

  1. To bootstrap the remaining nodes into the cluster run the following command on each node:

    sudo systemctl start mysql.service
    

    The services will automatically connect to the cluster using the sstuser we created earlier.

    Note

    Bootstrap failures are commonly caused by misconfigured /etc/my.cnf files.

Verification

  1. To verify the cluster, on the master login to mysql and run show status like 'wsrep%';

    $ mysql -u root -p
    
     mysql>  show status like 'wsrep%';
    
    +----------------------------------+-------------------------------------------------------+
     | Variable_name                    | Value                                                 |
     +----------------------------------+-------------------------------------------------------+
     | wsrep_local_state_uuid           | 591179cb-a98e-11e7-b9aa-07df8a228fe9                  |
     | wsrep_protocol_version           | 7                                                     |
     | wsrep_last_committed             | 4                                                     |
     | wsrep_replicated                 | 3                                                     |
     | wsrep_replicated_bytes           | 711                                                   |
     | wsrep_repl_keys                  | 3                                                     |
     | wsrep_repl_keys_bytes            | 93                                                    |
     | wsrep_repl_data_bytes            | 426                                                   |
     | wsrep_repl_other_bytes           | 0                                                     |
     | wsrep_received                   | 10                                                    |
     | wsrep_received_bytes             | 774                                                   |
     | wsrep_local_commits              | 0                                                     |
     | wsrep_local_cert_failures        | 0                                                     |
     | wsrep_local_replays              | 0                                                     |
     | wsrep_local_send_queue           | 0                                                     |
     | wsrep_local_send_queue_max       | 1                                                     |
     | wsrep_local_send_queue_min       | 0                                                     |
     | wsrep_local_send_queue_avg       | 0.000000                                              |
     | wsrep_local_recv_queue           | 0                                                     |
     | wsrep_local_recv_queue_max       | 2                                                     |
     | wsrep_local_recv_queue_min       | 0                                                     |
     | wsrep_local_recv_queue_avg       | 0.100000                                              |
     | wsrep_local_cached_downto        | 2                                                     |
     | wsrep_flow_control_paused_ns     | 0                                                     |
     | wsrep_flow_control_paused        | 0.000000                                              |
     | wsrep_flow_control_sent          | 0                                                     |
     | wsrep_flow_control_recv          | 0                                                     |
     | wsrep_flow_control_interval      | [ 173, 173 ]                                          |
     | wsrep_flow_control_interval_low  | 173                                                   |
     | wsrep_flow_control_interval_high | 173                                                   |
     | wsrep_flow_control_status        | OFF                                                   |
     | wsrep_cert_deps_distance         | 1.000000                                              |
     | wsrep_apply_oooe                 | 0.000000                                              |
     | wsrep_apply_oool                 | 0.000000                                              |
     | wsrep_apply_window               | 1.000000                                              |
     | wsrep_commit_oooe                | 0.000000                                              |
     | wsrep_commit_oool                | 0.000000                                              |
     | wsrep_commit_window              | 1.000000                                              |
     | wsrep_local_state                | 4                                                     |
     | wsrep_local_state_comment        | Synced                                                |
     | wsrep_cert_index_size            | 1                                                     |
     | wsrep_cert_bucket_count          | 22                                                    |
     | wsrep_gcache_pool_size           | 2413                                                  |
     | wsrep_causal_reads               | 0                                                     |
     | wsrep_cert_interval              | 0.000000                                              |
     | wsrep_ist_receive_status         |                                                       |
     | wsrep_ist_receive_seqno_start    | 0                                                     |
     | wsrep_ist_receive_seqno_current  | 0                                                     |
     | wsrep_ist_receive_seqno_end      | 0                                                     |
     | wsrep_incoming_addresses         | 10.30.20.196:3306,10.30.20.197:3306,10.30.20.198:3306 |
     | wsrep_desync_count               | 0                                                     |
     | wsrep_evs_delayed                |                                                       |
     | wsrep_evs_evict_list             |                                                       |
     | wsrep_evs_repl_latency           | 0/0/0/0/0                                             |
     | wsrep_evs_state                  | OPERATIONAL                                           |
     | wsrep_gcomm_uuid                 | 07c8c8fe-a998-11e7-883e-06949cfe5af3                  |
     | wsrep_cluster_conf_id            | 3                                                     |
     | wsrep_cluster_size               | 3                                                     |
     | wsrep_cluster_state_uuid         | 591179cb-a98e-11e7-b9aa-07df8a228fe9                  |
     | wsrep_cluster_status             | Primary                                               |
     | wsrep_connected                  | ON                                                    |
     | wsrep_local_bf_aborts            | 0                                                     |
     | wsrep_local_index                | 1                                                     |
     | wsrep_provider_name              | Galera                                                |
     | wsrep_provider_vendor            | Codership Oy <info@codership.com>                     |
     | wsrep_provider_version           | 3.22(r8678538)                                        |
     | wsrep_ready                      | ON                                                    |
     +----------------------------------+-------------------------------------------------------+
    
  2. Verify that you can login to the MSQL server by running the below command on the Morpheus Application server(s).

    mysql -u $morpheus_db_user_name -p  -h 192.168.10.100
    

    Note

    This command requires mysql client installed. If you are on a windows machine you can connect to the server using mysql work bench which can be found here https://www.mysql.com/products/workbench/

RabbitMQ Cluster

RabbitMQ Installation and Configuration

Important

This is a sample configuration only. Customer configurations and requirements will vary.

Prerequisites

yum install epel-release
yum install erlang

Install RabbitMQ on the 3 nodes

wget https://dl.bintray.com/rabbitmq/rabbitmq-server-rpm/rabbitmq-server-3.6.12-1.el7.noarch.rpm

 rpm --import https://www.rabbitmq.com/rabbitmq-release-signing-key.asc

 yum -y install rabbitmq-server-3.6.12-1.el7.noarch.rpm

 chkconfig rabbitmq-server on

 rabbitmq-server -detached

On Node 1:

cat /var/lib/rabbitmq/.erlang.cookie

Copy this value

On Nodes 2 & 3:

  1. Overwrite /var/lib/rabbitmq/.erlang.cookie with value from previous step and change its permissions using the follow commands.

    chown rabbitmq:rabbitmq /var/lib/rabbitmq/*
    chmod 400 /var/lib/rabbitmq/.erlang.cookie
    
  2. edit /etc/hosts file to refer to shortname of node 1

    example:

    10.30.20.100 rabbit-1
    
  3. Run the commands to join each node to the cluster

    rabbitmqctl stop
    rabbitmq-server -detached
    rabbitmqctl stop_app
    rabbitmqctl join_cluster rabbit@<<node 1 shortname>>
    rabbitmqctl start_app
    

Note

If you receive an error ERROR: unable to connect to node 'rabbit@ha': nodedown run the following commands

sudo ps aux | grep rabbit | grep -v grep | awk '{print $2}' | xargs kill -9
ps aux | grep rabbit  "to make sure rabbit is down"
rabbitmq-server -detached
"if detach was passed then run" ps aux | grep rabbit "to make sure rabbit is up and running"

Now rabbitmqctl stop should work

On Node 1:

rabbitmqctl add_user <<admin username>> <<password>>
rabbitmqctl set_permissions -p / <<admin username>> ".*" ".*" ".*"
rabbitmqctl set_user_tags <<admin username>> administrator

On All Nodes:

rabbitmq-plugins enable rabbitmq_stomp

Elasticsearch

Install 3 node Elasticsearch Cluster on Centos 7

Important

This is a sample configuration only. Customer configurations and requirements will vary.

Requirements

  1. Three Existing CentOS 7+ nodes accessible to the Morpheus Appliance

  2. Install Java on each node

    You can install the latest OpenJDK with the command:

    sudo yum install java-1.8.0-openjdk.x86_64
    

    To verify your JRE is installed and can be used, run the command:

    java -version
    

    The result should look like this:

    Output of java -version
    openjdk version "1.8.0_65"
    OpenJDK Runtime Environment (build 1.8.0_65-b17)
    OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)
    

Installation

  1. Download and Install Elasticsearch

    Elasticsearch can be downloaded directly from elastic.co in zip, tar.gz, deb, or rpm packages. For CentOS, it’s best to use the native rpm package which will install everything you need to run Elasticsearch. Download it in a directory of your choosing with the command:

    wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.10.rpm
    

    Then install it in the usual CentOS way with the rpm command like this:

    sudo rpm -ivh elasticsearch-5.6.10.noarch.rpm
    

    This results in Elasticsearch being installed in /usr/share/elasticsearch/ with its configuration files placed in /etc/elasticsearch and its init script added in /etc/init.d/elasticsearch.

    To make sure Elasticsearch starts and stops automatically, add its init script to the default runlevels with the command:

    sudo systemctl enable elasticsearch.service
    

Note

If you manage an ElasticSearch cluster externally from Morpheus, follow the steps located on the ElasticSearch website to upgrade to the latest version compatible with Morpheus

  1. Configuring Elastic

    Now that Elasticsearch and its Java dependencies have been installed, it is time to configure Elasticsearch.

    The Elasticsearch configuration files are in the /etc/elasticsearch directory. There are two files:

    sudo vi /etc/elasticsearch/elasticsearch.yml
    
    elasticsearch.yml

    Configures the Elasticsearch server settings. This is where all options, except those for logging, are stored, which is why we are mostly interested in this file.

    logging.yml

    Provides configuration for logging. In the beginning, you don’t have to edit this file. You can leave all default logging options. You can find the resulting logs in /var/log/elasticsearch by default.

    The first variables to customize on any Elasticsearch server are node.name and cluster.name in elasticsearch.yml. As their names suggest, node.name specifies the name of the server (node) and the cluster to which the latter is associated.

    Important

    Make sure to uncomment each of the following listed below in /etc/elasticsearch/elasticsearch.yml

    Node 1

    cluster.name: morpheusha1
    node.name: "morpheuses1"
    network.host: enter the IP of the node ex: 10.30.22.130
    http.port: 9200
    discovery.zen.ping.unicast.hosts: ["10.30.20.91","10.30.20.149","10.30.20.165"]
    

    Node 2

    cluster.name: morpheusha1
    node.name: "morpheuses2"
    network.host: enter the IP of the node ex: 10.30.22.130
    http.port: 9200
    discovery.zen.ping.unicast.hosts: ["10.30.20.91","10.30.20.149","10.30.20.165"]
    

    Node 3

    cluster.name: morpheusha1
    node.name: "morpheuses3"
    network.host: enter the IP of the node ex: 10.30.22.130
    http.port: 9200
    discovery.zen.ping.unicast.hosts: ["10.30.20.91","10.30.20.149","10.30.20.165"]
    

    For the above changes to take effect, you will have to restart Elasticsearch with the command:

    sudo service elasticsearch restart
    

    Next restart the network with the command:

    sudo service network restart
    
  2. Testing

    By now, Elasticsearch should be running on port 9200. You can test it with curl, the command line client-side URL transfers tool and a simple GET request like this:

    [~]$ sudo curl -X GET 'http://10.30.20.149:9200'
          {
            "status" : 200,
            "name" : "morpheuses1",
            "cluster_name" : "morpheusha1",
            "version" : {
              "number" : "1.7.3",
              "build_hash" : "05d4530971ef0ea46d0f4fa6ee64dbc8df659682",
              "build_timestamp" : "2015-10-15T09:14:17Z",
              "build_snapshot" : false,
              "lucene_version" : "4.10.4"
            },
    

Application Tier

Morpheus configuration is controlled by a configuration file located at /etc/morpheus/morpheus.rb. This file is read when you run morpheus-ctl reconfigure after installing the appliance package. Each section is tied to a deployment tier: database is mysql, message queue is rabbitmq, search index is elasticsearch. There are no entries for the web and application tiers since those are part of the core application server where the configuration file resides.

  1. Download and install the Morpheus Appliance Package
  2. Next we must install the package onto the machine and configure the morpheus services:
sudo sudo rpm -i morpheus-appliance-x.x.x-1.x86_64.rpm
  1. After installing and prior to reconfiguring, edit the morpheus.rb file
sudo vi /etc/morpheus/morpheus.rb

Change the values to match your configured services:

Note

The values below are examples. Update hosts, ports, usernames and password with your specifications. Only include entries for services you wish to externalize.

mysql['enable'] = false
mysql['host'] = {'10.30.20.139' => 3306,  '10.30.20.153' => 3306,  '10.30.20.196' => 3306}
mysql['morpheus_db'] = 'morpheusdb'
mysql['morpheus_db_user'] = 'morpheusadmin'
mysql['morpheus_password'] = 'morpheus4admin!'
rabbitmq['enable'] = false
rabbitmq['vhost'] = 'morph'
rabbitmq['queue_user'] = 'lbuser'
rabbitmq['queue_user_password'] = 'morpheus4admin'
rabbitmq['host'] = 'morpheus-ha-mq-lb-1.den.morpheusdata.com'
rabbitmq['port'] = '5672'
rabbitmq['stomp_port'] = '61613'
rabbitmq['heartbeat'] = 50
elasticsearch['enable'] = false
elasticsearch['cluster'] = 'morpheusha1'
elasticsearch['es_hosts'] = {'10.30.20.91' => 9200, '10.30.20.149' => 9200, '10.30.20.165' => 9200}
  1. Reconfigure Morpheus
sudo morpheus-ctl reconfigure

3 Node with Externalized DB Configuration

Assumptions

This guide assumes the following:

  • There is an externalized database running for Morpheus to access.
  • The database service is a MySQL dialect (MySQL, MariaDB, Galera, etc…)
  • A database has been created for Morpheus as well as a user and proper grants have been run for the user. Morpheus will create the schema.
  • The Baremetal nodes cannot access the public internet
  • The base OS is RHEL 7.x
  • Shortname versions of hostnames will be resolvable
  • All nodes have access to a shared volume for /var/opt/morpheus/morpheus-ui. This can be done as a post startup step.
  • This configuration will support the complete loss of a single node, but no more. Specifically the Elasticsearch tier requires at least two nodes to always be clustered..

Steps

  1. First begin by downloading the requisite Morpheus packages either to the nodes or to your workstation for transfer. These packages need to be made available on the nodes you wish to install Morpheus on.

    [root@app-server-1 ~]# wget https://example/path/morpheus-appliance-ver-1.el7.x86_64.rpm
    [root@app-server-1 ~]# wget https://example/path/morpheus-appliance-offline-ver-1.noarch.rpm
    
  2. Once the packages are available on the nodes they can be installed. Make sure that no steps beyond the rpm install are run.

    [root@app-server-1 ~] rpm -i morpheus-appliance-ver-1.el7.x86_64.rpm
    [root@app-server-1 ~] rpm -i morpheus-appliance-offline-ver-1.noarch.rpm
    
  3. Next you will need to edit the Morpheus configuration file /etc/morpheus/morpheus.rb on each node.

    Node 1

    appliance_url 'https://morpheus1.localdomain'
    elasticsearch['es_hosts'] = {'10.100.10.121' => 9200, '10.100.10.122' => 9200, '10.100.10.123' => 9200}
    elasticsearch['node_name'] = 'morpheus1'
    elasticsearch['host'] = '0.0.0.0'
    rabbitmq['host'] = '0.0.0.0'
    rabbitmq['nodename'] = 'rabbit@node01'
    mysql['enable'] = false
    mysql['host'] = '10.100.10.111'
    mysql['morpheus_db'] = 'morpheusdb'
    mysql['morpheus_db_user'] = 'morpheus'
    mysql['morpheus_password'] = 'password'
    

    Node 2

    appliance_url 'https://morpheus2.localdomain'
    elasticsearch['es_hosts'] = {'10.100.10.121' => 9200, '10.100.10.122' => 9200, '10.100.10.123' => 9200}
    elasticsearch['node_name'] = 'morpheus2'
    elasticsearch['host'] = '0.0.0.0'
    rabbitmq['host'] = '0.0.0.0'
    rabbitmq['nodename'] = 'rabbit@node02'
    mysql['enable'] = false
    mysql['host'] = '10.100.10.112'
    mysql['morpheus_db'] = 'morpheusdb'
    mysql['morpheus_db_user'] = 'morpheus'
    mysql['morpheus_password'] = 'password'
    

    Node 3

    appliance_url 'https://morpheus3.localdomain'
    elasticsearch['es_hosts'] = {'10.100.10.121' => 9200, '10.100.10.122' => 9200, '10.100.10.123' => 9200}
    elasticsearch['node_name'] = 'morpheus3'
    elasticsearch['host'] = '0.0.0.0'
    rabbitmq['host'] = '0.0.0.0'
    rabbitmq['nodename'] = 'rabbit@node03'
    mysql['enable'] = false
    mysql['host'] = '10.100.10.113'
    mysql['morpheus_db'] = 'morpheusdb'
    mysql['morpheus_db_user'] = 'morpheus'
    mysql['morpheus_password'] = 'password'
    

    Note

    If you are running MySQL in a Master/Master configuration we will need to slightly alter the mysql[‘host’] line in the morpheus.rb to account for both masters in a failover configuration. As an example: mysql['host'] = '10.100.10.111:3306,10.100.10.112'. Morpheus will append the ‘3306’ port to the end of the final IP in the string, which is why we leave it off but explicitly type it for the first IP in the string. The order of IPs matters in that it should be the same across all three Morpheus Application Servers. As mentioned, this will be a failover configuration for MySQL in that the application will only read/write from the second master if the first master becomes unavailable. This way we can avoid commit lock issues that might arise from a load balanced Master/Master.

  4. Run the reconfigure on all nodes

    [root@app-server-1 ~] morpheus-ctl reconfigure
    

    Morpheus will come up on all nodes and Elasticsearch will auto-cluster. The only item left is the manual clustering of RabbitMQ.

  5. Select one of the nodes to be your Source Of Truth (SOT) for RabbitMQ clustering. We need to copy the secrets for RabbitMQ, copy the erlang cookie and join the other nodes to the SOT node.

    Begin by copying secrets from the SOT node to the other nodes.

    [root@app-server-1 ~] cat /etc/morpheus/morpheus-secrets.json
    
      "rabbitmq": {
        "morpheus_password": "***REDACTED***",
        "queue_user_password": "***REDACTED***",
        "cookie": "***REDACTED***"
      },
    

    Then copy the erlang.cookie from the SOT node to the other nodes

    [root@app-server-1 ~]# cat /opt/morpheus/embedded/rabbitmq/.erlang.cookie
    
    # 754363AD864649RD63D28
    
  6. Once this is done run a reconfigure on the two nodes that are NOT the SOT nodes.

    [root@app-server-2 ~] morpheus-ctl reconfigure
    

    Note

    This step will fail. This is ok, and expected. If the reconfigure hangs then use Ctrl+C to quit the reconfigure run and force a failure.

  7. Subsequently we need to stop and start Rabbit on the NOT SOT nodes.

    Important

    The commands below must be run at root

    [root@app-server-2 ~]# morpheus-ctl stop rabbitmq
    [root@app-server-2 ~]# morpheus-ctl start rabbitmq
    [root@app-server-2 ~]# PATH=/opt/morpheus/sbin:/opt/morpheus/sbin:/opt/morpheus/embedded/sbin:/opt/morpheus/embedded/bin:$PATH
    [root@app-server-2 ~]# rabbitmqctl stop_app
    
    Stopping node 'rabbit@app-server-2' ...
    
    [root@app-server-2 ~]# rabbitmqctl join_cluster rabbit@app-server-1
    
    Clustering node 'rabbit@app-server-2' with 'rabbit@app-server-1' ...
    
    [root@app-server-2 ~]# rabbitmqctl start_app
    
    Starting node 'rabbit@app-server-2' ...
    
  8. Now make sure to reconfigure

    [root@app-server-2 ~] morpheus-ctl reconfigure
    
  9. Once the Rabbit services are up and clustered on all nodes they need to be set to HA/Mirrored Queues:

    [root@app-server-2 ~]# rabbitmqctl set_policy -p morpheus --priority 1 --apply-to all ha ".*" '{"ha-mode": "all"}'
    
  10. The last thing to do is restart the Morpheus UI on the two nodes that are NOT the SOT node.

    [root@app-server-2 ~]# morpheus-ctl restart morpheus-ui
    

    If this command times out then run:

    [root@app-server-2 ~]# morpheus-ctl kill morpheus-ui
    [root@app-server-2 ~]# morpheus-ctl start morpheus-ui
    
  11. You will be able to verify that the UI services have restarted properly by inspecting the logfiles. A standard practice after running a restart is to tail the UI log file.

    root@app-server-2 ~]# morpheus-ctl tail morpheus-ui
    
  12. Lastly, we need to ensure that Elasticsearch is configured in such a way as to support a quorum of 2. We need to do this step on EVERY NODE.

    [root@app-server-2 ~]# echo "discovery.zen.minimum_master_nodes: 2" >> /opt/morpheus/embedded/elasticsearch/config/elasticsearch.yml
    [root@app-server-2 ~]# morpheus-ctl restart elasticsearch
    

    Note

    For moving /var/opt/morpheus/morpheus-ui files into a shared volume make sure ALL Morpheus services on ALL three nodes are down before you begin.

    [root@app-server-1 ~]# morpheus-ctl stop
    
  13. Permissions are as important as is content, so make sure to preserve directory contents to the shared volume.

  14. Subsequently you can start all Morpheus services on all three nodes and tail the Morpheus UI log file to inspect errors.

Database Migration

If your new installation is part of a migration then you need to move the data from your original Morpheus database to your new one. This is easily accomplished by using a stateful dump.

  1. To begin this, stop the Morpheus UI on your original Morpheus server:

    [root@app-server-old ~]# morpheus-ctl stop morpheus-ui
    
  2. Once this is done you can safely export. To access the MySQL shell we will need the password for the Morpheus DB user. We can find this in the morpheus-secrets file:

    [root@app-server-old ~]# cat /etc/morpheus/morpheus-secrets.json
    
    {
      "mysql": {
          "root_password": "***REDACTED***",
          "morpheus_password": "***REDACTED***",
          "ops_password": "***REDACTED***"
            },
      "rabbitmq": {
                "morpheus_password": "***REDACTED***",
                "queue_user_password": "***REDACTED***",
                "cookie": "***REDACTED***"
      },
      "vm-images": {
        "s3": {
            "aws_access_id": "***REDACTED***",
            "aws_secret_key": "***REDACTED***"
          }
        }
    }
    
  3. Take note of this password as it will be used to invoke a dump. Morpheus provides embedded binaries for this task. Invoke it via the embedded path and specify the host. In this example we are using the Morpheus database on the MySQL listening on localhost. Enter the password copied from the previous step when prompted:

    [root@app-server-old ~]# /opt/morpheus/embedded/mysql/bin/mysqldump -u morpheus -h 127.0.0.1 morpheus -p > /tmp/morpheus_backup.sql
    
    Enter password:
    

    This file needs to be pushed to the new Morpheus Installation’s backend. Depending on the GRANTS in the new MySQL backend, this will likely require moving this file to one of the new Morpheus frontend servers.

  4. Once the file is in place it can be imported into the backend. Begin by ensuring the Morpheus UI service is stopped on all of the application servers:

    [root@app-server-1 ~]# morpheus-ctl stop morpheus-ui
    [root@app-server-2 ~]# morpheus-ctl stop morpheus-ui
    [root@app-server-3 ~]# morpheus-ctl stop morpheus-ui
    
  5. Then you can import the MySQL dump into the target database using the embedded MySQL binaries, specifying the database host, and entering the password for the Morpheus user when prompted:

    [root@app-server-1 ~]# /opt/morpheus/embedded/mysql/bin/mysql -u morpheus -h 10.130.2.38 morpheus -p < /tmp/morpheus_backup.sql
    Enter password:
    

Recovery

If a node happens to crash most of the time Morpheus will start upon boot of the server and the services will self-recover. However, there can be cases where RabbitMQ and Elasticsearch are unable to recover in a clean fashion and it require minor manual intervention. Regardless, it is considered best practice when recovering a restart to perform some manual health checks.

[root@app-server-1 ~]# morpheus-ctl status
run: check-server: (pid 17808) 7714s;
run: log: (pid 549) 8401s
run: elasticsearch: (pid 19207) 5326s;
run: log: (pid 565) 8401s
run: guacd: (pid 601) 8401s;
run: log: (pid 573) 8401s
run: morpheus-ui: (pid 17976) 7633s;
run: log: (pid 555) 8401s
run: nginx: (pid 581) 8401s;
run: log: (pid 544) 8401s
run: rabbitmq: (pid 17850) 7708s;
run: log: (pid 542) 8401s
run: redis: (pid 572) 8401s;
run: log: (pid 548) 8401s

But, a status can report false positives if, say, RabbitMQ is in a boot loop or Elasticsearch is up, but not able to join the cluster. It is always advisable to tail the logs of the services to investigate their health.

[root@app-server-1 ~]# morpheus-ctl tail rabbitmq
[root@app-server-1 ~]# morpheus-ctl tail elasticsearch

To minimize disruption to the user interface, it is advisable to remedy Elasticsearch clustering first. Due to write locking in Elasticsearch it can be required to restart other nodes in the cluster to allow the recovering node to join. Begin by determining which Elasticsearch node became the master during the outage. On one of the two other nodes (not the recovered node):

[root@app-server-2 ~]# curl localhost:9200/_cat/nodes
app-server-1 10.100.10.121 7 47 0.21 d * morpheus1
localhost 127.0.0.1 4 30 0.32 d m morpheus2

The master is determined by identifying the row with the ‘*’ in it. SSH to this node (if different) and restart Elasticsearch.

[root@app-server-1 ~]# morpheus-ctl restart elasticsearch

Go to the other of the two ‘up’ nodes and run the curl command again. If the output contains three nodes then Elasticsearch has been recovered and you can move on to re-clustering RabbitMQ. Otherwise you will see output that contains only the node itself:

[root@app-server-2 ~]# curl localhost:9200/_cat/nodes
localhost 127.0.0.1 4 30 0.32 d * morpheus2

If this is the case then restart Elasticsearch on this node as well:

[root@app-server-2 ~]# morpheus-ctl restart elasticsearch

After this you should be able to run the curl command and see all three nodes have rejoined the cluster:

[root@app-server-2 ~]# curl localhost:9200/_cat/nodes
app-server-1 10.100.10.121 9 53 0.31 d * morpheus1
localhost 127.0.0.1 7 32 0.22 d m morpheus2
app-server-3 10.100.10.123 3 28 0.02 d m morpheus3

The most frequent case of restart errors for RabbitMQ is with epmd failing to restart. Morpheus’s recommendation is to ensure the epmd process is running and daemonized by starting it:

[root@app-server-1 ~]# /opt/morpheus/embedded/lib/erlang/erts-5.10.4/bin/epmd -daemon

And then restarting RabbitMQ:

[root@app-server-1 ~]# morpheus-ctl restart rabbitmq

And then restarting the Morpheus UI service:

[root@app-server-1 ~]# morpheus-ctl restart morpheus-ui

Again, it is always advisable to monitor the startup to ensure the Morpheus Application is starting without error:

[root@app-server-1 ~]# morpheus-ctl tail morpheus-ui

Recovery Thoughts/Further Discussion: If Morpheus UI cannot connect to RabbitMQ, Elasticsearch or the database tier it will fail to start. The Morpheus UI logs can indicate if this is the case.

Aside from RabbitMQ, there can be issues with false positives concerning Elasticsearch’s running status. The biggest challenge with Elasticsearch, for instance, is that a restarted node has trouble joining the ES cluster. This is fine in the case of ES, though, because the minimum_master_nodes setting will not allow the un-joined singleton to be consumed until it joins. Morpheus will still start if it can reach the other two ES hosts, which are still clustered.

The challenge with RabbitMQ is that it is load balanced behind Morpheus for requests, but each Morpheus application server needs to boostrap the RabbitMQ tied into it. Thus, if it cannot reach its own RabbitMQ startup for it will fail.

Similarly, if a Morpheus UI service cannot reach the database, startup will fail. However, if the database is externalized and failover is configured for Master/Master, then there should be ample opportunity for Morpheus to connect to the database tier.

Because Morpheus can start even though the Elasticsearch node on the same host fails to join the cluster, it is advisable to investigate the health of ES on the restarted node after the services are up. This can be done by accessing the endpoint with curl and inspecting the output. The status should be “green” and number of nodes should be “3”:

[root@app-server-1 ~]# curl localhost:9200/_cluster/health?pretty=true
{
"cluster_name" : "morpheus",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 110,
"active_shards" : 220,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}

If this is not the case it is worth investigating the Elasticsearch logs to understand why the singleton node is having trouble joining the cluster. These can be found at:

/var/log/morpheus/elasticsearch/current

Outside of these stateful tiers, the “morpheus-ctl status” command will not output a “run” status unless the service is successfully running. If a stateless service reports a failure to run, the logs should be investigated and/or sent to Morpheus for additional support. Logs for all Morpheus embedded services are found in /var/log/morpheus.