Pages

Sunday, February 23, 2014

I/O Fencing



Oracle RAC : I/O Fencing

There will be some situations where the left over write operations from database instances reach the storage system. The cluster function on this node failed, but the nodes are still running at the OS level. Since these operations are no longer in the serial order, they can damage the consistency of the stored data. Therefore, when a cluster node fails, the failed node needs to be fenced off from all the shared disk devices or disk groups. This methodology is called I/O fencing, disk fencing or failure fencing.



Functions of I/O fencing 

Prevents the updates by failed instances and to detect failure and prevent split-brain in the cluster.

Cluster volume manager and cluster file system play a significant role in preventing the failed nodes from accessing shared devices. Oracle uses algorithm common to STONITH (shoot the other node in the head) implementations to determine what nodes needs to fenced. This simply means the healthy nodes kill the sick node.  Oracle's Clusterware does not do this; instead, it simply gives the message "Please Reboot" to the sick node.  The node bounces itself and rejoins the cluster.

There are other methods of fencing that are utilized by different hardware/software vendors.  When using Veritas Storage Foundation for RAC (VxSF RAC), you can implement I/O fencing instead of node fencing.  This means that instead of asking a server to reboot, you simply close it off from shared storage.

In versions before 11.2.0.2 Oracle Clusterware tried to prevent a split-brain with a fast reboot (better: reset) of the server(s) without waiting for ongoing I/O operations or synchronization of the file systems. This mechanism has been changed in version 11.2.0.2 (first 11g Release 2 patch set). After deciding which node to evict, the Clusterware:

  • attempts to shut down all Oracle resources/processes on the server (especially processes generating I/Os)

  • will stop itself on the node

  • Afterwards Oracle High Availability Service Daemon (OHASD)5 will try to start the Cluster Ready Services (CRS) stack again. Once the cluster interconnect is back online, all relevant cluster resources on that node will automatically start

  • Kill the node if stop of resources or processes generating I/O is not possible (hanging in kernel mode, I/O path, etc.)

Generally Oracle Clusterware uses two rules to choose which nodes should leave the cluster to assure the cluster integrity:


  • In configurations with two nodes, node with the lowest ID will survive (first node that joined the cluster), the other one will be asked to leave the cluster

  • With more cluster nodes, the Clusterware will try to keep the largest sub-cluster Running

No comments:

Post a Comment