Pages

Wednesday, February 26, 2014

Configuring Transparent Application Failover (TAF)



Configuring Transparent Application Failover

The TAF policy can be configured either in the client-side TNS connection entry, defining the FAILOVER_MODE parameter within the CONNECT_DATA section, or at the server-side service attributes. When both methods are defined, the server-side service attributes take precedence.

The following table describes the parameters that are associated with the FAILOVER_MODE:

Parameter
Values
Description
TYPE
Session
When a user connection is lost due to an instance crash, a new session is automatically established on a surviving instance. This type of failover does not support replay of the queries that were in progress.
Select
Re-establishes the lost user connection on a surviving instance, and replays the queries that were in progress.
None
Is the default mode without failover functionality.
METHOD
Basic
Re-establishes the lost user connections at failover time. Doesn't require much work on the backup server until failover time.
Preconnect
Pre-establishes the connection on another instance to provide rapid failover facility.
DELAY

Specifies the amount of time (in seconds) to wait between connect attempts.
RETRIES

Specifies the number of re-attempts to connect after a failover.

The following types of transactions do not automatically fail over and must be restarted by TAF:

  • Transactional statements. Transactions involving INSERT, UPDATE, or DELETE statements are not supported by TAF.
  •  ALTER SESSION statements. ALTER SESSION and SQL*Plus SET statements do not fail over.
  • Transactions using temporary segments in the TEMP tablespace and global temporary tables do not fail over.
  • PL/SQL package states. PL/SQL package states are lost during failover.

EXECUTE DBMS_SERVICE.MODIFY_SERVICE (service_name => 'OLTP_SERVICE'
, aq_ha_notifications => TRUE
, failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC
, failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT
, failover_retries => 180
, failover_delay => 5
, clb_goal => DBMS_SERVICE.CLB_GOAL_LONG);

Tuesday, February 25, 2014

How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems (11gR2)





How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems 


This document assumes that the CRS diskgroup was completely lost, the name of the OCR diskgroup remains unchanged, however there may be a need to use a different diskgroup name, in which case the name of the OCR diskgroup would have to be modified in /etc/oracle/ocr.loc across all nodes prior to executing the following steps

Locate the latest automatic OCR backup

when using a non-shared CRS home, automatic OCR backups can be located on any node of the cluster, consequently all nodes need to be checked for the most recent backup:

$ ls -lrt $CRS_HOME/cdata/rac_cluster1/

Make sure the Grid Infrastructure is shutdown on all nodes

Given that the OCR diskgroup is missing, the GI stack will not be functional on any node, however there may still be various daemon processes running.  On each node shutdown the GI stack using the force (-f) option:

# $CRS_HOME/bin/crsctl stop crs -f

Start the CRS stack in exclusive mode

On the node that has the most recent OCR backup, log on as root and start CRS in exclusive mode, this mode will allow ASM to start & stay up without the presence of a Voting disk and without the CRS daemon process (crsd.bin) running.

11.2.0.1:
# $CRS_HOME/bin/crsctl start crs excl

11.2.0.2 and above:
# $CRS_HOME/bin/crsctl start crs -excl -nocrs

Note
A new option '-nocrs' has been introduced with  11.2.0.2, which prevents the start of the ora.crsd resource. It is vital that this option is specified, otherwise the failure to start the ora.crsd resource will tear down ora.cluster_interconnect.haip, which in turn will cause ASM to crash.

Label the CRS disk for ASMLIB use

If using ASMLIB the disk to be used for the CRS disk group needs to stamped first, as user root do:

# /usr/sbin/oracleasm createdisk ASMD40 /dev/sdh1

Create the CRS diskgroup via sqlplus

The disk group can now be (re-)created via sqlplus from the grid user. The compatible.asm attribute must be set to 11.2 in order for the disk group to be used by CRS:

SQL> create diskgroup CRS external redundancy disk 'ORCL:ASMD40' attribute 'COMPATIBLE.ASM' = '11.2';

Restore the latest OCR backup

Now that the CRS disk group is created & mounted the OCR can be restored - must be done as the root user:

# cd $CRS_HOME/cdata/rac_cluster1/
# $CRS_HOME/bin/ocrconfig -restore backup00.ocr

Start the CRS daemon on the current node (11.2.0.1 only !)

Now that the OCR has been restored the CRS daemon can be started, this is needed to recreate the Voting file. Skip this step for 11.2.0.2.0.

# $CRS_HOME/bin/crsctl start res ora.crsd -init

Recreate the Voting file

The Voting file needs to be initialized in the CRS disk group:

# $CRS_HOME/bin/crsctl replace votedisk +CRS

Shutdown CRS

Since CRS is running in exclusive mode, it needs to be shutdown  to allow CRS to run on all nodes again. Use of the force (-f) option may be required:

# $CRS_HOME/bin/crsctl stop crs -f

Start CRS

As the root user submit the CRS startup on all cluster nodes:

# $CRS_HOME/bin/crsctl start crs

Verify CRS

To verify that CRS is fully functional again:

# $CRS_HOME/bin/crsctl check cluster -all

How to Recreate OCR/Voting Disk Accidentally Deleted


How to Recreate OCR/Voting Disk Accidentally Deleted (The note applies to 10gR2 and 11gR1)     

·         Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user.

·         Backup the entire Oracle Clusterware home.

·         Execute <CRS_HOME>/install/rootdelete.sh on all nodes

·         Execute <CRS_HOME>/install/rootdeinstall.sh on the node which is supposed to be the first node

The following commands should return nothing

ps -e | grep -i 'ocs[s]d' 
ps -e | grep -i 'cr[s]d.bin' 
ps -e | grep -i 'ev[m]d.bin'     
                                  
·         Execute <CRS_HOME>/root.sh on first node

·         After successful root.sh execution on first node Execute root.sh on the rest of the nodes of the cluster

·         For 10gR2, use racgons; for 11g use onsconfig command. Using onsconfig stops and starts ONS so the changes take effect, while racgons doesn't do that so the changes won't take effect until ONS is restarted on all nodes. Examples for each are provided below.

For 10g : Execute as owner (generally oracle) of CRS_HOME command

<CRS_HOME>/bin/racgons add_config hostname1:port hostname2:port

For 11g : Execute as owner (generally oracle) of CRS_HOME command

<CRS_HOME>/install/onsconfig add_config hostname1:port hostname2:port


·         Execute as owner of CRS_HOME (generally oracle)  <CRS_HOME>/bin/oifcfg setif -global.

$ oifcfg setif -global  eth0/192.168.0.0:cluster_interconnect eth1/10.35.140.0:public

·         Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added earlier.

·         Add ASM & database resource to the OCR using the appropriate srvctl add database command as the user who owns the ASM & database resource. Please ensure that this is not run as root user

·         Add  Instance, services using appropriate srvctl add commands. Please refer to the documentation for the exact commands.