HAC Setup

OviOS Linux in a HA Cluster setup.

The recommended set up is a 3-node cluster with OviOS Linux for automatic failover.
A 2-node cluster setup is possible, but this won't allow for automatic failover.

The following is the cluster setup that has been tested with OviOS to achieve the best results.

Requirements:

1. Three OviOS Linux nodes
A 2 node setup has also been tested with the same configuration. The difference is a 2-node cluster doesn't do automatic failover.

2. One Resource Group containing the ZFS resource for storage pools, the virtual IP plugin and the SCSI-3 fence agent

3. The following options set in /etc/sysconfig/ovios.conf (on ALL nodes)
In Version 2.10 run : options cluster.enable 1
and skip to the next step.

In previous versions enable cluster manually :

This option ensures pools are not imported by zfs-admin when nodes start.
Pools will be imported by the cluster.
If this option is not set to on data corruption will occur.
SKIP_IMPORT_POOLS=on

This option ensures the cluster is started when the system starts
HAC_ENABLED=on

This option ensures the ovios shell autosyncs the configuration on all (available) nodes in the cluster, every time the admin
changes something in the SMB or iSCSI configuration.
HAC_AUTOSYNC=on

If more files then the default need to be synced in the cluster, add them here:
SYNC_FILES=" "

4. Passwordless authentication between ALL nodes in the cluster must be configured (for auto-sync)

5. The /etc/hosts files must be configured to contain each node's IP , hostname and FQDN

EX:
192.168.86.101  cluster1
192.168.86.102  cluster2
192.168.86.103  cluster3

172.21.11.101 cluster1.localdomain
172.21.11.102 cluster2.localdomain
172.21.11.103 cluster3.localdomain

6. On each node configure a bond with the SAME name for the VIP.
EX: bondadm -n eth0 -i eno23s0 -i eno23s1 -i eno32s2 -m 0
This creates an interface named eth0.
Run netsetup to setup a unique IP for eth0 on each node.

SETUP the cluster.

Run the following commands on each node in the cluster:

# pcs cluster setup --local --name ovios-cluster cluster1,cluster1.localdomain cluster2,cluster2.localdomain cluster3,cluster3.localdomain

The following errors can be ignored:
Shutting down pacemaker/corosync services...
sh: service: command not found
sh: service: command not found
sh: service: command not found
Killing any remaining services...
Removing all cluster configuration files...

Do not use "pcs cluster start" as pcs looks for a "service" command to start corosync and pacemaker. OviOS Linux uses it's own
implementation of services.
Run "cluster start" on each node to start the cluster.

Verify the cluster status: "crm_mon -1"

When the cluster is up and running , on ONLY ONE node run the following commands to setup the Resource Group:


# pcs resource create STORAGE lsb:zfs-hac is-managed=true op monitor interval=10s meta resource-stickiness=100
This creates the resource called "STORAGE" which is managed by the zfs-hac script.

pcs resource create VIP ocf:heartbeat:IPaddr2 ip=172.21.11.104 cidr_netmask=24 nic=eth0 op monitor interval=15s meta resource-stickiness=100
This creates a VIP assigned to eth0. The Storage services will be available via this IP and will be migrated between nodes during failover


# pcs stonith create SCSI-RES fence_scsi devices="/dev/disk/by-path/disk1,/dev/disk/by-path/disk2" pcmk_host_list="cluster1 cluster2 cluster3" \
pcmk_host_map="cluster1=cluster1.localdomain;cluster2=cluster2.localdomain;cluster3=cluster3.localdomain" meta provides=unfencing resource-stickiness=100 power_wait=3 op monitor interval=20s

This creates a SCSI-3 reservation fence-agent to provide protection against data corruption.
Should a node import the storage pools while they already are active on another node, it will receive a reservation conflict and panic.
the devices=" " must contain the drives used by the storage Pools. It must contain at least one drive from each pool (if multiple pools have been created)

pcs resource group add RG SCSI-RES STORAGE VIP
This creates a resource group containing all resources.

The Resource Group will now start on one node in the cluster.

Make sure to set:
pcs property set stonith-enabled=true

The following option must be set to "stop" on a 3-node cluster or "ignore" on a 2-node cluster

3-node cluster:
pcs property set no-quorum-policy=stop

2-node cluster:

pcs property set no-quorum-policy=ignore