The OviOS HA Cluster Guide

OviOS Linux in a HA Cluster setup.

The recommended set up is a 3-node cluster with OviOS Linux for automatic failover.

A 2-node cluster setup is possible, but this won't allow for automatic failover. In a 2 node HA Cluster setup the storage Admin has to manualkly failover the resources (Pools and virtual IP) if the main node experiences downtime.

The following is the cluster setup that has been tested with OviOS to achieve the best results.

This use case consists of 3 OviOS Linux nodes with shared storage.

Requirements:

1. Three OviOS Linux nodes with shares storage.

This means all nodes in the cluster must use the same RAID controller, JBODs , drives etc.

A 2 node setup has also been tested with the same configuration. The difference is a 2-node cluster doesn't do automatic failover.


2. One Resource Group containing the ZFS resource for storage pools, the virtual IP plugin and the SCSI-3 fence agent


3. The following options set in on ALL nodes:

Run the following command on all nodes in the cluster. This will automatically enable other required options for the cluster.

ovios-shell> options cluster.enable 1
Changing option: cluster.enable ==> on
Changing option: autosync.cluster ==> on if not on already
Changing option: autosync.cluster ==> on
Changing option: skip.import ==> on if not on already
Option skip.import is already on
ovios-shell>


4. The /etc/hosts files must be configured to contain each node's IP , hostname and FQDN

EX:

192.168.86.101  cluster1
192.168.86.102  cluster2
192.168.86.103  cluster3

172.21.11.101 cluster1.localdomain
172.21.11.102 cluster2.localdomain
172.21.11.103 cluster3.localdomain


5. Passwordless authentication between ALL nodes in the cluster must be configured (for auto-sync)

Run the following command on all nodes in the cluster to allow SSH connections with root.

ovios-shell> options ssh.allow.root 1
Changing option: ssh.allow.root ==> on
ovios-shell>

Run : ssh-keygen on all nodes to generate public and private keys.

Run: ssh-copy-id -i ~/.ssh/id_rsa.pub <hostname of the remote server>

to complete passwordless authentication.

Do this for all nodes in the cluster.


6. On each node configure a bonded interface with the SAME name for the VIP.

EX: do this on ALL nodes in the cluster!

ovios-indt:~ # bondadm -n ovios-ha -i eth1 -i eth2 -m 0
  *  Adding eth1 as slave...    [  OK  ]
  *  Adding eth2 as slave...    [  OK  ]
     Finished setting up ovios-ha
     Run : netsetup : to set up the IPs.
ovios-indt:~ #

This creates an interface named ovios-ha.


SETUP the cluster.

Run the following commands on ALL nodes in the cluster:

  • This commands are run on the linux shell:
# pcs cluster setup --local --name ovios-cluster cluster1,cluster1.localdomain cluster2,cluster2.localdomain cluster3,cluster3.localdomain

The following errors can be ignored:

Shutting down pacemaker/corosync services...
sh: service: command not found
sh: service: command not found
sh: service: command not found
Killing any remaining services...
Removing all cluster configuration files...

Do not use "pcs cluster start" as pcs looks for a "service" command to start corosync and pacemaker.

OviOS Linux uses it's own implementation of the services commands.

Run "cluster start" on ALL nodes to start the cluster.

Verify the cluster status: "crm_mon -1"

When the cluster is up and running , on ONLY ONE node run the following commands to setup the Resource Group:

# pcs resource create STORAGE lsb:zfs-hac is-managed=true op monitor interval=10s meta resource-stickiness=100

This creates the resource called "STORAGE" which is managed by the zfs-hac script.

# pcs resource create VIP ocf:heartbeat:IPaddr2 ip=172.21.11.104 cidr_netmask=24 nic=ovios-ha op monitor interval=15s meta resource-stickiness=100

This creates a VIP assigned to eth0. The Storage services will be available via this IP and will be migrated between nodes during failover.

# pcs stonith create SCSI-RES fence_scsi devices="/dev/disk/by-path/disk1,/dev/disk/by-path/disk2" pcmk_host_list="cluster1 cluster2 cluster3" \
pcmk_host_map="cluster1=cluster1.localdomain;cluster2=cluster2.localdomain;cluster3=cluster3.localdomain" meta provides=unfencing resource-stickiness=100 power_wait=3 op monitor interval=20s

This creates a SCSI-3 reservation fence-agent to provide protection against data corruption.


At this point the HA-Cluster is up and running.

Should a node import the storage pools while they already are active on another node, it will receive a reservation conflict and panic.

The devices=" " must contain the drives used by the storage Pools. It must contain at least one drive from each pool (if multiple pools have been created)

pcs resource group add RG SCSI-RES STORAGE VIP

This creates a resource group containing all resources.

The Resource Group will now start on one node in the cluster.

Make sure to set:

pcs property set stonith-enabled=true


The following option must be set to "stop" on a 3-node cluster or "ignore" on a 2-node cluster

3-node cluster:

pcs property set no-quorum-policy=stop

2-node cluster:

pcs property set no-quorum-policy=ignore