HAC Setup

OviOS Linux in a HA Cluster setup.

The recommended set up is a 3-node cluster with OviOS Linux for automatic failover.
A 2-node cluster setup is possible, but this won't allow for automatic failover.

The following is the cluster setup that has been tested with OviOS to achieve the best results. This use case consists of 3 OviOS Linux nodes with shared storage.


1. Three OviOS Linux nodes
A 2 node setup has also been tested with the same configuration. The difference is a 2-node cluster doesn't do automatic failover.

2. One Resource Group containing the ZFS resource for storage pools, the virtual IP plugin and the SCSI-3 fence agent

3. The following options set in on ALL nodes:
Run : options cluster.enable 1

4. The /etc/hosts files must be configured to contain each node's IP , hostname and FQDN

EX:  cluster1  cluster2  cluster3 cluster1.localdomain cluster2.localdomain cluster3.localdomain

5. Passwordless authentication between ALL nodes in the cluster must be configured (for auto-sync)
Run : options ssh.allow.root on 
on all nodes. 
Run : ssh-keygen on all nodes to generate public and private keys.
Run: sh-copy-id -i ~/.ssh/id_rsa.pub <hostname of the remote server> 
to complete passwordless authentication.
 Do this for all nodes in the cluster.

6. On each node configure a bond with the SAME name for the VIP.
EX: bondadm -n eth0 -i eno23s0 -i eno23s1 -i eno32s2 -m 0
This creates an interface named eth0.
Run netsetup to setup a unique IP for eth0 on each node.

SETUP the cluster.

Run the following commands on each node in the cluster:

# pcs cluster setup --local --name ovios-cluster cluster1,cluster1.localdomain cluster2,cluster2.localdomain cluster3,cluster3.localdomain

The following errors can be ignored:
Shutting down pacemaker/corosync services...
sh: service: command not found
sh: service: command not found
sh: service: command not found
Killing any remaining services...
Removing all cluster configuration files...

Do not use "pcs cluster start" as pcs looks for a "service" command to start corosync and pacemaker. OviOS Linux uses it's own
implementation of services.
Run "cluster start" on each node to start the cluster.

Verify the cluster status: "crm_mon -1"

When the cluster is up and running , on ONLY ONE node run the following commands to setup the Resource Group:

# pcs resource create STORAGE lsb:zfs-hac is-managed=true op monitor interval=10s meta resource-stickiness=100
This creates the resource called "STORAGE" which is managed by the zfs-hac script.

pcs resource create VIP ocf:heartbeat:IPaddr2 ip= cidr_netmask=24 nic=eth0 op monitor interval=15s meta resource-stickiness=100
This creates a VIP assigned to eth0. The Storage services will be available via this IP and will be migrated between nodes during failover

# pcs stonith create SCSI-RES fence_scsi devices="/dev/disk/by-path/disk1,/dev/disk/by-path/disk2" pcmk_host_list="cluster1 cluster2 cluster3" \
pcmk_host_map="cluster1=cluster1.localdomain;cluster2=cluster2.localdomain;cluster3=cluster3.localdomain" meta provides=unfencing resource-stickiness=100 power_wait=3 op monitor interval=20s

This creates a SCSI-3 reservation fence-agent to provide protection against data corruption.
Should a node import the storage pools while they already are active on another node, it will receive a reservation conflict and panic.
the devices=" " must contain the drives used by the storage Pools. It must contain at least one drive from each pool (if multiple pools have been created)

pcs resource group add RG SCSI-RES STORAGE VIP
This creates a resource group containing all resources.

The Resource Group will now start on one node in the cluster.

Make sure to set:
pcs property set stonith-enabled=true

The following option must be set to "stop" on a 3-node cluster or "ignore" on a 2-node cluster

3-node cluster:
pcs property set no-quorum-policy=stop

2-node cluster:

pcs property set no-quorum-policy=ignore