Oracle RAC : I/O Fencing
There will be some
situations where the left over write operations from database instances reach
the storage system. The cluster function on this node failed, but the nodes are
still running at the OS level. Since these operations are no longer in the
serial order, they can damage the consistency of the stored data. Therefore,
when a cluster node fails, the failed node needs to be fenced off from all the
shared disk devices or disk groups. This methodology is called I/O fencing,
disk fencing or failure fencing.
Functions
of I/O fencing
Prevents
the updates by failed instances and to detect failure and prevent split-brain
in the cluster.
Cluster volume manager and
cluster file system play a significant role in preventing the failed nodes from
accessing shared devices. Oracle uses algorithm common to STONITH (shoot
the other node in the head) implementations to determine what nodes needs to
fenced. This simply means the healthy nodes kill the sick node. Oracle's
Clusterware does not do this; instead, it simply gives the message "Please
Reboot" to the sick node. The node bounces itself and rejoins the
cluster.
There are other methods of
fencing that are utilized by different hardware/software vendors. When
using Veritas Storage Foundation for RAC (VxSF RAC), you can implement I/O
fencing instead of node fencing. This means that instead of asking a
server to reboot, you simply close it off from shared storage.
In versions before 11.2.0.2 Oracle
Clusterware tried to prevent a split-brain with a fast reboot (better: reset)
of the server(s) without waiting for ongoing I/O operations or synchronization
of the file systems. This mechanism has been changed in version 11.2.0.2 (first
11g Release 2 patch set). After deciding which node to evict, the Clusterware:
- attempts to shut down all Oracle resources/processes on the server (especially processes generating I/Os)
- will stop itself on the node
- Afterwards Oracle High Availability Service Daemon (OHASD)5 will try to start the Cluster Ready Services (CRS) stack again. Once the cluster interconnect is back online, all relevant cluster resources on that node will automatically start
- Kill the node if stop of resources or processes generating I/O is not possible (hanging in kernel mode, I/O path, etc.)
Generally Oracle Clusterware uses two rules to choose which nodes
should leave the cluster to assure the cluster integrity:
- In configurations with two nodes, node with the lowest ID will survive (first node that joined the cluster), the other one will be asked to leave the cluster
- With more cluster nodes, the Clusterware will try to keep the largest sub-cluster Running
No comments:
Post a Comment