High Availability Configuration

Overview

Morpheus provides a wide array of options when it comes to deployment architectures. It can start as a simple one machine instance where all services run on the same machine, or it can be split off into individual services per machine and configured in a high availability configuration, either in the same region or cross-region. Naturally, high availability can grow more complicated, depending on the configuration you want to do and this article will cover the basic concepts of the Morpheus HA architecture that can be used in a wide array of configurations.

There are four primary tiers of services represented within the Morpheus appliance. They are the App Tier, Transactional Database Tier, Non-Transactional Database Tier, and Message Tier. Each of these tiers have their own recommendations for High availability deployments that we need to cover.

../../_images/morpheusHA.png

Important

This is a sample configuration only. Customer configurations and requirements will vary.

Transactional Database Tier

The Transactional database tier usually consists of a MySQL compatible database. It is recommended that a lockable clustered configuration be used (Currently Percona XtraDB Cluster is the most recommended in Permissive Mode). There are several documents online related to configuring and setting up an XtraDB Cluster but it most simply can be laid out in a many master configuration. There can be some nodes setup with replication delay as well as some with no replication delay. It is common practice to have no replication delay within the same region and allow some replication delay cross region. This does increase the risk of job run overlap between the 2 regions however, the concurrent operations typically self-correct and this is a non-issue.

Non-Transactional Database Tier

The Non-Transactional tier consists of an ElasticSearch (version 5.6.10) cluster. Elastic Search is used for log aggregation data and temporal aggregation data (essentially stats, metrics, and logs). This enables for a high write throughput at scale. ElasticSearch is a Clustered database meaning all nodes no matter the region need to be connected to each other over what they call a “Transport” protocol. It is fairly simple to get setup as all nodes are identical. It is also a java based system and does require a sizable chunk of memory for larger data sets. (8gb) is recommended and more nodes can be added to scale either horizontally or vertically.

Messaging Tier

The Messaging tier is an AMQP based tier along with STOMP Protocol (used for agent communication). The primary model recommended is to use RabbitMQ for queue services. RabbitMQ is also a clustered based queuing system and needs at least 3 instances for HA configurations. This is due to elections in the failover scenarios rabbitmq can manage. If doing a cross-region HA rabbitmq cluster it is recommended to have at least 3 rabbit queue clusters per region. Typically to handle HA a RabbitMQ cluster should be placed between a load balancer and the front-end application server to handle cross host connections. The ports necessary to forward in a Rabbit MQ cluster are (5672, and 61613). A rabbitmq cluster can run on smaller memory machines depending on how frequent large requests bursts occur. 4–8gb of Memory is recommended to start.

Application Tier

The application tier is easily installed with the same debian or yum repository package that Morpheus is normally distributed with. Advanced configuration allows for the additional tiers to be skipped and leave only the “stateless” services that need run. These stateless services include Nginx, Tomcat, and Redis (to be phased out at a later date). These machines should also have at least 8gb of Memory. They can be configured across all regions and placed behind a central load-balancer or Geo based load-balancer. They typically connect to all other tiers as none of the other tiers talk to each other besides through the central application tier. One final piece when it comes to setting up the Application tier is a shared storage means is necessary when it comes to maintaining things like deployment archives, virtual image catalogs, backups, etc. These can be externalized to an object storage service such as amazon S3 or Openstack Swiftstack as well. If not using those options a simple NFS cluster can also be used to handle the shared storage structure.

../../_images/morpheus-ha-multi-configuration.png

Database Tier

Out of the box Morpheus uses MySQL but Morpheus supports any mySQL compliant database. There are many ways to set up a highly available, MySQL dialect based database. One which has found favor with many of our customers is Percona’s XtraDB Cluster. Percona’s product is based off of Galera’s WSREP Clustering, which is also supported.

If you’re not as familiar with WSREP and prefer replication, some of our customers prefer to configure a failover connection to a MariaDB or MySQL based Master/Master Replication cluster. Less often used, though still a viable option, is MySQL based NDB Clustering. Wonderful guides for each of these HA and DR based database management strategies can be found here: https://www.percona.com/doc/percona-xtradb-cluster/LATEST/index.html

Requirements

Note

Morpheus idiomatically connects to database nodes over 3306

Once you have your database installed and configured:

  1. Create the Database you will be using with morpheus.

    mysql> CREATE DATABASE morpheusdb;
    
    mysql> show databases;
    
  2. Next create your morpheus database user. The user needs to be either at the IP address of the morpheus application server or use @'%' within the user name to allow the user to login from anywhere.

    mysql> CREATE USER '$morpheus_db_user_name'@'$source_ip' IDENTIFIED BY '$morpheus_db_user_pw';
    
  3. Next Grant your new morpheus user permissions to the database.

    mysql> GRANT ALL PRIVILEGES ON morpheus_db_name.* TO 'morpheus_db_user'@'$source_ip' IDENTIFIED BY 'morpheus_db_user_pw' with grant option;
    
    
    mysql>  GRANT SELECT, PROCESS, SHOW DATABASES, SUPER ON *.* TO 'morpheus_db_user'@'$source_ip' IDENTIFIED BY 'morpheus_db_user_pw';
    
    mysql> FLUSH PRIVILEGES;
    
  4. Checking Permissions for your user.

    SHOW GRANTS FOR '$morpheus_db_user_name'@'$source_ip';
    

RabbitMQ Cluster

An HA deployment will also include a Highly Available RabbitMQ. This can be achieved through RabbitMQ’s HA-Mirrored Queues on at least 3, independent nodes. To accomplish this we recommend following Pivotal’s documentation on RabbitMQ here: https://www.rabbitmq.com/ha.html and https://www.rabbitmq.com/clustering.html

Install RabbitMQ on the 3 nodes and create a cluster.

Note

For the most up to date RPM package we recommend using this link: https://www.rabbitmq.com/install-rpm.html#downloads

Important

Morpheus connects to AMQP over 5672 or 5671(SSL) and 61613 or 61614(SSL)

rabbitmq-plugins enable rabbitmq_stomp

The following policies must be set on the morpheus vhost. Failure to apply these policies will cause performance and/or stability issues.

rabbitmqctl set_policy -p morpheus --apply-to queues --priority 2 statCommands "statCommands.*" '{"expires":1800000, "ha-mode":"all"}'
rabbitmqctl set_policy -p morpheus --apply-to queues --priority 2 morpheusAgentActions "morpheusAgentActions.*" '{"expires":1800000, "ha-mode":"all"}'
rabbitmqctl set_policy -p morpheus --apply-to all --priority 1 ha ".*" '{"ha-mode":"all"}'

Elasticsearch

Install 3 node Elasticsearch Cluster on Centos 7

Important

This is a sample configuration only. Customer configurations and requirements will vary.

Important

Enabling watcher and monitoring in elasticsearch with default configurations (xpack) can cause storage constraints in /var/log/elasticsearch and /var/lib/elasticsearch. While this is not related to Morpheus and not controlled by Morpheus, please be aware of and monitor available storage for your elasticsearch cluster configuration.

Requirements

  1. Three Existing CentOS 7+ nodes accessible to the Morpheus Appliance

  2. Install Java on each node

    You can install the latest OpenJDK with the command:

    sudo yum install java-1.8.0-openjdk.x86_64
    

    To verify your JRE is installed and can be used, run the command:

    java -version
    

    The result should look like this:

    Output of java -version
    openjdk version "1.8.0_65"
    OpenJDK Runtime Environment (build 1.8.0_65-b17)
    OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)
    

Installation

To install Elasticsearch please use the following instructions

https://www.elastic.co/guide/en/elasticsearch/reference/current/rpm.html#install-rpm

Once installed, to make sure Elasticsearch starts and stops automatically, add its init script to the default runlevels with the command:

sudo systemctl enable elasticsearch.service

Configuring Elastic

Now that Elasticsearch and its Java dependencies have been installed, it is time to configure Elasticsearch.

The Elasticsearch configuration files are in the /etc/elasticsearch directory. There are two files:

sudo vi /etc/elasticsearch/elasticsearch.yml
elasticsearch.yml
Configures the Elasticsearch server settings. This is where all options, except those for logging, are stored, which is why we are mostly interested in this file.
logging.yml
Provides configuration for logging. In the beginning, you don’t have to edit this file. You can leave all default logging options. You can find the resulting logs in /var/log/elasticsearch by default.

The first variables to customize on any Elasticsearch server are node.name and cluster.name in elasticsearch.yml. As their names suggest, node.name specifies the name of the server (node) and the cluster to which the latter is associated.

Important

Make sure to uncomment each of the following listed below in /etc/elasticsearch/elasticsearch.yml

Node 1

cluster.name: morpheusha1
node.name: "morpheuses1"
network.host: enter the IP of the node ex: 10.30.22.130
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.30.20.91","10.30.20.149","10.30.20.165"]

Node 2

cluster.name: morpheusha1
node.name: "morpheuses2"
network.host: enter the IP of the node ex: 10.30.22.130
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.30.20.91","10.30.20.149","10.30.20.165"]

Node 3

cluster.name: morpheusha1
node.name: "morpheuses3"
network.host: enter the IP of the node ex: 10.30.22.130
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.30.20.91","10.30.20.149","10.30.20.165"]

For the above changes to take effect, you will have to restart Elasticsearch with the command:

sudo service elasticsearch restart

Next restart the network with the command:

sudo service network restart

Testing

To make sure Elasticsearch is running use the following commands

  1. Testing

Elasticsearch should be running on port 9200. You can test it with curl, the command line client-side URL transfers tool and a simple GET request like this:

[~]$ sudo curl -X GET 'http://10.30.20.149:9200'
      {
        "status" : 200,
        "name" : "morpheuses1",
        "cluster_name" : "morpheusha1",
        ...
        },

https://www.elastic.co/guide/en/elasticsearch/reference/current/rpm.html#rpm-check-running

Application Tier

Morpheus configuration is controlled by a configuration file located at /etc/morpheus/morpheus.rb. This file is read when you run morpheus-ctl reconfigure after installing the appliance package. Each section is tied to a deployment tier: database is mysql, message queue is rabbitmq, search index is elasticsearch. There are no entries for the web and application tiers since those are part of the core application server where the configuration file resides.

  1. Download and install the Morpheus Appliance Package

  2. Next we must install the package onto the machine and configure the morpheus services:

    sudo sudo rpm -i morpheus-appliance-x.x.x-1.x86_64.rpm
    
  3. After installing and prior to reconfiguring, edit the morpheus.rb file

    sudo vi /etc/morpheus/morpheus.rb
    

Change the values to match your configured services:

Note

The values below are examples. Update hosts, ports, usernames and password with your specifications. Only include entries for services you wish to externalize.

mysql['enable'] = false
mysql['host'] = '10.30.20.139:3306,10.30.20.153:3306,10.30.20.196'
mysql['morpheus_db'] = 'morpheusdb'
mysql['morpheus_db_user'] = 'dbuser'
mysql['morpheus_password'] = 'dbuserpassword'
rabbitmq['enable'] = false
rabbitmq['vhost'] = 'morpheus'
rabbitmq['queue_user'] = 'lbuser'
rabbitmq['queue_user_password'] = 'lbuserpassword'
rabbitmq['host'] = 'rabbitvip'
rabbitmq['port'] = '5672'
rabbitmq['stomp_port'] = '61613'
rabbitmq['heartbeat'] = 50
elasticsearch['enable'] = false
elasticsearch['cluster'] = 'esclustername'
elasticsearch['es_hosts'] = {'10.30.20.91' => 9200, '10.30.20.149' => 9200, '10.30.20.165' => 9200}
  1. Reconfigure Morpheus
sudo morpheus-ctl reconfigure

Storage

When Morpheus is in a High Availability configuration the required Local Storage File Shares will need to be copied to a shared file system so that all nodes within the Morpheus cluster is able to connect to assets.

Assets

  • White label images
  • Uploaded virtual images
  • Deploy uploads
  • Ansible Plays
  • Terraform
  • Morpheus backups.

Tip

Backups, deployments and virtual images can be overridden within the Morpheus-UI. You can find more information on storage here: Storage

To copy the `morpheus-ui` directory to the shared storage follow the below steps:

  1. SSH into the Appliance
  2. sudo su (or login as root)
  3. cd into `/var/opt/morpheus/`
  4. Backup morpheus-ui directory by running the command below. This will create a new directory in `/var/opt/morpheus/` called morpheus-ui-bkp and copy the contents of morpheus-ui into the new directory
cp -r morpheus-ui morpheus-ui-bkp
  1. Move morpheus-ui to your shared storage. Example below:
mv morpheus-ui /nfs/appliance-files/
  1. Mount your shared storage volume to `/var/opt/morpheus/morpheus-ui`. How you mount it is dependent on what kind of storage it is. If you mount the volume after the package install, but before the reconfigure then you don’t need to copy anything to a backup.
  2. SSH into the second Appliance and then Backup morpheus-ui directory by running
cp -r morpheus-ui morpheus-ui-bkp

Tip

when adding additional nodes you will only need to run step 6 and 7

3 Node with Externalized DB Configuration

Assumptions

This guide assumes the following:

  • There is an externalized database running for Morpheus to access.
  • The database service is a MySQL dialect (MySQL, MariaDB, Galera, etc…)
  • A database has been created for Morpheus as well as a user and proper grants have been run for the user. Morpheus will create the schema.
  • The Baremetal nodes cannot access the public internet
  • The base OS is RHEL 7.x
  • Shortname versions of hostnames will be resolvable
  • All nodes have access to a shared volume for /var/opt/morpheus/morpheus-ui. This can be done as a post startup step.
  • This configuration will support the complete loss of a single node, but no more. Specifically the Elasticsearch tier requires at least two nodes to always be clustered..
Morpheus 3-Node HA Architecture

Steps

  1. First begin by downloading the requisite Morpheus packages either to the nodes or to your workstation for transfer. These packages need to be made available on the nodes you wish to install Morpheus on.

    [root@app-server-1 ~]# wget https://example/path/morpheus-appliance-ver-1.el7.x86_64.rpm
    [root@app-server-1 ~]# wget https://example/path/morpheus-appliance-offline-ver-1.noarch.rpm
    
  2. Once the packages are available on the nodes they can be installed. Make sure that no steps beyond the rpm install are run.

    [root@app-server-1 ~] rpm -i morpheus-appliance-ver-1.el7.x86_64.rpm
    [root@app-server-1 ~] rpm -i morpheus-appliance-offline-ver-1.noarch.rpm
    
  3. Next you will need to edit the Morpheus configuration file /etc/morpheus/morpheus.rb on each node.

    Node 1

    appliance_url 'https://morpheus1.localdomain'
    elasticsearch['es_hosts'] = {'10.100.10.121' => 9200, '10.100.10.122' => 9200, '10.100.10.123' => 9200}
    elasticsearch['node_name'] = 'morpheus1'
    elasticsearch['host'] = '0.0.0.0'
    rabbitmq['host'] = '0.0.0.0'
    rabbitmq['nodename'] = 'rabbit@node01'
    mysql['enable'] = false
    mysql['host'] = '10.100.10.111'
    mysql['morpheus_db'] = 'morpheusdb'
    mysql['morpheus_db_user'] = 'morpheus'
    mysql['morpheus_password'] = 'password'
    

    Node 2

    appliance_url 'https://morpheus2.localdomain'
    elasticsearch['es_hosts'] = {'10.100.10.121' => 9200, '10.100.10.122' => 9200, '10.100.10.123' => 9200}
    elasticsearch['node_name'] = 'morpheus2'
    elasticsearch['host'] = '0.0.0.0'
    rabbitmq['host'] = '0.0.0.0'
    rabbitmq['nodename'] = 'rabbit@node02'
    mysql['enable'] = false
    mysql['host'] = '10.100.10.112'
    mysql['morpheus_db'] = 'morpheusdb'
    mysql['morpheus_db_user'] = 'morpheus'
    mysql['morpheus_password'] = 'password'
    

    Node 3

    appliance_url 'https://morpheus3.localdomain'
    elasticsearch['es_hosts'] = {'10.100.10.121' => 9200, '10.100.10.122' => 9200, '10.100.10.123' => 9200}
    elasticsearch['node_name'] = 'morpheus3'
    elasticsearch['host'] = '0.0.0.0'
    rabbitmq['host'] = '0.0.0.0'
    rabbitmq['nodename'] = 'rabbit@node03'
    mysql['enable'] = false
    mysql['host'] = '10.100.10.113'
    mysql['morpheus_db'] = 'morpheusdb'
    mysql['morpheus_db_user'] = 'morpheus'
    mysql['morpheus_password'] = 'password'
    

    Note

    If you are running MySQL in a Master/Master configuration we will need to slightly alter the mysql[‘host’] line in the morpheus.rb to account for both masters in a failover configuration. As an example: mysql['host'] = '10.100.10.111:3306,10.100.10.112'. Morpheus will append the ‘3306’ port to the end of the final IP in the string, which is why we leave it off but explicitly type it for the first IP in the string. The order of IPs matters in that it should be the same across all three Morpheus Application Servers. As mentioned, this will be a failover configuration for MySQL in that the application will only read/write from the second master if the first master becomes unavailable. This way we can avoid commit lock issues that might arise from a load balanced Master/Master.

  4. Run the reconfigure on all nodes

    [root@app-server-1 ~] morpheus-ctl reconfigure
    

    Morpheus will come up on all nodes and Elasticsearch will auto-cluster. The only item left is the manual clustering of RabbitMQ.

  5. Select one of the nodes to be your Source Of Truth (SOT) for RabbitMQ clustering. We need to copy the secrets for RabbitMQ, copy the erlang cookie and join the other nodes to the SOT node.

    Begin by copying secrets from the SOT node to the other nodes.

    [root@app-server-1 ~] cat /etc/morpheus/morpheus-secrets.json
    
      "rabbitmq": {
        "morpheus_password": "***REDACTED***",
        "queue_user_password": "***REDACTED***",
        "cookie": "***REDACTED***"
      },
    

    Then copy the erlang.cookie from the SOT node to the other nodes

    [root@app-server-1 ~]# cat /opt/morpheus/embedded/rabbitmq/.erlang.cookie
    
    # 754363AD864649RD63D28
    
  6. Once this is done run a reconfigure on the two nodes that are NOT the SOT nodes.

    [root@app-server-2 ~] morpheus-ctl reconfigure
    

    Note

    This step will fail. This is ok, and expected. If the reconfigure hangs then use Ctrl+C to quit the reconfigure run and force a failure.

  7. Subsequently we need to stop and start Rabbit on the NOT SOT nodes.

    Important

    The commands below must be run at root

    Note

    If you receive an error unable to connect to epmd (port 4369) on app-server-1: nxdomain (non-existing domain) make sure to add all IPs and hostnames to the etc/hosts file like so:

    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    127.0.0.1 app-server-1.localdomain app-server-2 localhost
    127.0.0.1 container16
    10.100.10.113 app-server-1
    10.100.10.114 app-server-2
    10.100.10.115 app-server-3
    
    [root@app-server-2 ~]# morpheus-ctl stop rabbitmq
    [root@app-server-2 ~]# morpheus-ctl start rabbitmq
    [root@app-server-2 ~]# PATH=/opt/morpheus/sbin:/opt/morpheus/sbin:/opt/morpheus/embedded/sbin:/opt/morpheus/embedded/bin:$PATH
    [root@app-server-2 ~]# rabbitmqctl stop_app
    
    Stopping node 'rabbit@app-server-2' ...
    
    [root@app-server-2 ~]# rabbitmqctl join_cluster rabbit@app-server-1
    
    Clustering node 'rabbit@app-server-2' with 'rabbit@app-server-1' ...
    
    [root@app-server-2 ~]# rabbitmqctl start_app
    
    Starting node 'rabbit@app-server-2' ...
    
  8. Now make sure to reconfigure

    [root@app-server-2 ~] morpheus-ctl reconfigure
    
  9. Once the Rabbit services are up and clustered on all nodes they need to be set to HA/Mirrored Queues:

    [root@app-server-2 ~]# rabbitmqctl set_policy -p morpheus --priority 1 --apply-to all ha ".*" '{"ha-mode": "all"}'
    
  10. The last thing to do is restart the Morpheus UI on the two nodes that are NOT the SOT node.

    [root@app-server-2 ~]# morpheus-ctl restart morpheus-ui
    

    If this command times out then run:

    [root@app-server-2 ~]# morpheus-ctl kill morpheus-ui
    [root@app-server-2 ~]# morpheus-ctl start morpheus-ui
    
  11. You will be able to verify that the UI services have restarted properly by inspecting the logfiles. A standard practice after running a restart is to tail the UI log file.

    root@app-server-2 ~]# morpheus-ctl tail morpheus-ui
    
  12. Lastly, we need to ensure that Elasticsearch is configured in such a way as to support a quorum of 2. We need to do this step on EVERY NODE.

    [root@app-server-2 ~]# echo "discovery.zen.minimum_master_nodes: 2" >> /opt/morpheus/embedded/elasticsearch/config/elasticsearch.yml
    [root@app-server-2 ~]# morpheus-ctl restart elasticsearch
    

    Note

    For moving /var/opt/morpheus/morpheus-ui files into a shared volume make sure ALL Morpheus services on ALL three nodes are down before you begin.

    [root@app-server-1 ~]# morpheus-ctl stop
    
  13. Permissions are as important as is content, so make sure to preserve directory contents to the shared volume.

  14. Subsequently you can start all Morpheus services on all three nodes and tail the Morpheus UI log file to inspect errors.

Database Migration

If your new installation is part of a migration then you need to move the data from your original Morpheus database to your new one. This is easily accomplished by using a stateful dump.

  1. To begin this, stop the Morpheus UI on your original Morpheus server:

    [root@app-server-old ~]# morpheus-ctl stop morpheus-ui
    
  2. Once this is done you can safely export. To access the MySQL shell we will need the password for the Morpheus DB user. We can find this in the morpheus-secrets file:

    [root@app-server-old ~]# cat /etc/morpheus/morpheus-secrets.json
    
    {
      "mysql": {
          "root_password": "***REDACTED***",
          "morpheus_password": "***REDACTED***",
          "ops_password": "***REDACTED***"
            },
      "rabbitmq": {
                "morpheus_password": "***REDACTED***",
                "queue_user_password": "***REDACTED***",
                "cookie": "***REDACTED***"
      },
      "vm-images": {
        "s3": {
            "aws_access_id": "***REDACTED***",
            "aws_secret_key": "***REDACTED***"
          }
        }
    }
    
  3. Take note of this password as it will be used to invoke a dump. Morpheus provides embedded binaries for this task. Invoke it via the embedded path and specify the host. In this example we are using the Morpheus database on the MySQL listening on localhost. Enter the password copied from the previous step when prompted:

    [root@app-server-old ~]# /opt/morpheus/embedded/mysql/bin/mysqldump -u morpheus -h 127.0.0.1 morpheus -p > /tmp/morpheus_backup.sql
    
    Enter password:
    

    This file needs to be pushed to the new Morpheus Installation’s backend. Depending on the GRANTS in the new MySQL backend, this will likely require moving this file to one of the new Morpheus frontend servers.

  4. Once the file is in place it can be imported into the backend. Begin by ensuring the Morpheus UI service is stopped on all of the application servers:

    [root@app-server-1 ~]# morpheus-ctl stop morpheus-ui
    [root@app-server-2 ~]# morpheus-ctl stop morpheus-ui
    [root@app-server-3 ~]# morpheus-ctl stop morpheus-ui
    
  5. Then you can import the MySQL dump into the target database using the embedded MySQL binaries, specifying the database host, and entering the password for the Morpheus user when prompted:

    [root@app-server-1 ~]# /opt/morpheus/embedded/mysql/bin/mysql -u morpheus -h 10.130.2.38 morpheus -p < /tmp/morpheus_backup.sql
    Enter password:
    

Recovery

If a node happens to crash most of the time Morpheus will start upon boot of the server and the services will self-recover. However, there can be cases where RabbitMQ and Elasticsearch are unable to recover in a clean fashion and it require minor manual intervention. Regardless, it is considered best practice when recovering a restart to perform some manual health checks.

[root@app-server-1 ~]# morpheus-ctl status
run: check-server: (pid 17808) 7714s;
run: log: (pid 549) 8401s
run: elasticsearch: (pid 19207) 5326s;
run: log: (pid 565) 8401s
run: guacd: (pid 601) 8401s;
run: log: (pid 573) 8401s
run: morpheus-ui: (pid 17976) 7633s;
run: log: (pid 555) 8401s
run: nginx: (pid 581) 8401s;
run: log: (pid 544) 8401s
run: rabbitmq: (pid 17850) 7708s;
run: log: (pid 542) 8401s
run: redis: (pid 572) 8401s;
run: log: (pid 548) 8401s

But, a status can report false positives if, say, RabbitMQ is in a boot loop or Elasticsearch is up, but not able to join the cluster. It is always advisable to tail the logs of the services to investigate their health.

[root@app-server-1 ~]# morpheus-ctl tail rabbitmq
[root@app-server-1 ~]# morpheus-ctl tail elasticsearch

To minimize disruption to the user interface, it is advisable to remedy Elasticsearch clustering first. Due to write locking in Elasticsearch it can be required to restart other nodes in the cluster to allow the recovering node to join. Begin by determining which Elasticsearch node became the master during the outage. On one of the two other nodes (not the recovered node):

[root@app-server-2 ~]# curl localhost:9200/_cat/nodes
app-server-1 10.100.10.121 7 47 0.21 d * morpheus1
localhost 127.0.0.1 4 30 0.32 d m morpheus2

The master is determined by identifying the row with the ‘*’ in it. SSH to this node (if different) and restart Elasticsearch.

[root@app-server-1 ~]# morpheus-ctl restart elasticsearch

Go to the other of the two ‘up’ nodes and run the curl command again. If the output contains three nodes then Elasticsearch has been recovered and you can move on to re-clustering RabbitMQ. Otherwise you will see output that contains only the node itself:

[root@app-server-2 ~]# curl localhost:9200/_cat/nodes
localhost 127.0.0.1 4 30 0.32 d * morpheus2

If this is the case then restart Elasticsearch on this node as well:

[root@app-server-2 ~]# morpheus-ctl restart elasticsearch

After this you should be able to run the curl command and see all three nodes have rejoined the cluster:

[root@app-server-2 ~]# curl localhost:9200/_cat/nodes
app-server-1 10.100.10.121 9 53 0.31 d * morpheus1
localhost 127.0.0.1 7 32 0.22 d m morpheus2
app-server-3 10.100.10.123 3 28 0.02 d m morpheus3

The most frequent case of restart errors for RabbitMQ is with epmd failing to restart. Morpheus’s recommendation is to ensure the epmd process is running and daemonized by starting it:

[root@app-server-1 ~]# /opt/morpheus/embedded/lib/erlang/erts-5.10.4/bin/epmd -daemon

And then restarting RabbitMQ:

[root@app-server-1 ~]# morpheus-ctl restart rabbitmq

And then restarting the Morpheus UI service:

[root@app-server-1 ~]# morpheus-ctl restart morpheus-ui

Again, it is always advisable to monitor the startup to ensure the Morpheus Application is starting without error:

[root@app-server-1 ~]# morpheus-ctl tail morpheus-ui

Recovery Thoughts/Further Discussion: If Morpheus UI cannot connect to RabbitMQ, Elasticsearch or the database tier it will fail to start. The Morpheus UI logs can indicate if this is the case.

Aside from RabbitMQ, there can be issues with false positives concerning Elasticsearch’s running status. The biggest challenge with Elasticsearch, for instance, is that a restarted node has trouble joining the ES cluster. This is fine in the case of ES, though, because the minimum_master_nodes setting will not allow the un-joined singleton to be consumed until it joins. Morpheus will still start if it can reach the other two ES hosts, which are still clustered.

The challenge with RabbitMQ is that it is load balanced behind Morpheus for requests, but each Morpheus application server needs to boostrap the RabbitMQ tied into it. Thus, if it cannot reach its own RabbitMQ startup for it will fail.

Similarly, if a Morpheus UI service cannot reach the database, startup will fail. However, if the database is externalized and failover is configured for Master/Master, then there should be ample opportunity for Morpheus to connect to the database tier.

Because Morpheus can start even though the Elasticsearch node on the same host fails to join the cluster, it is advisable to investigate the health of ES on the restarted node after the services are up. This can be done by accessing the endpoint with curl and inspecting the output. The status should be “green” and number of nodes should be “3”:

[root@app-server-1 ~]# curl localhost:9200/_cluster/health?pretty=true
{
"cluster_name" : "morpheus",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 110,
"active_shards" : 220,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}

If this is not the case it is worth investigating the Elasticsearch logs to understand why the singleton node is having trouble joining the cluster. These can be found at:

/var/log/morpheus/elasticsearch/current

Outside of these stateful tiers, the “morpheus-ctl status” command will not output a “run” status unless the service is successfully running. If a stateless service reports a failure to run, the logs should be investigated and/or sent to Morpheus for additional support. Logs for all Morpheus embedded services are found in /var/log/morpheus.