
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	《关于oracle rac节点的驱逐》的评论	</title>
	<atom:link href="http://www.killdb.com/2011/09/25/%E5%85%B3%E4%BA%8Eoracle-rac%E8%8A%82%E7%82%B9%E7%9A%84%E9%A9%B1%E9%80%90/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.killdb.com/2011/09/25/%e5%85%b3%e4%ba%8eoracle-rac%e8%8a%82%e7%82%b9%e7%9a%84%e9%a9%b1%e9%80%90/</link>
	<description>Phone:18180207355 提供专业Oracle/MySQL/PostgreSQL数据恢复、性能优化、迁移升级、紧急救援等服务</description>
	<lastBuildDate>Sat, 08 Oct 2011 13:42:48 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.3.18</generator>
			<item>
				<title>
				作者：roger				</title>
				<link>http://www.killdb.com/2011/09/25/%e5%85%b3%e4%ba%8eoracle-rac%e8%8a%82%e7%82%b9%e7%9a%84%e9%a9%b1%e9%80%90/#comment-156</link>
		<dc:creator><![CDATA[roger]]></dc:creator>
		<pubDate>Sun, 25 Sep 2011 09:10:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.killdb.com/?p=511#comment-156</guid>
					<description><![CDATA[10g RAC: Steps To Increase CSS Misscount, Reboottime and Disktimeout [ID 284752.1] 

--------------------------------------------------------------------------------
 
  修改时间 26-OCT-2010     类型 BULLETIN     状态 PUBLISHED   


PURPOSE
-------
The purpose of this note is to document the steps needed to modify the CSS 
misscount, reboottime and  disktimeout settings.  Please review Note 294430.1 
to understand the implications before editing these settings.
 
SCOPE &#038; APPLICATION
-------------------
Customers should not modify  CSS settings unless guided by either Oracle support or Oracle development to do so.


Steps To Modify The CSS Miscount
--------------------------------

  1) Shut down CRS on all but one node. For exact steps use Note 309542.1 
  2) Execute crsctl as root to modify the misscount:
     $ORA_CRS_HOME/bin/crsctl set css misscount 
     where  is the maximum i/o latency to the voting disk +1 second
  3) Reboot the node where adjustment was made
  4) Start all other nodes shutdown in step 1

With the Patch:4896338 for 10.2.0.1 there are two additional settings that can 
be tuned.  This change is incorporated into the 10.2.0.2 and 10.1.0.6 patchsets.   


These following are only relevant on 10.2.0.1 with Patch:4896338
In addition to MissCount, CSS now has two more parameters:
  1) reboottime (default 3 seconds) - the amount of time allowed for a node 
     to complete a reboot after the CSS daemon has been evicted. (I.E. how 
     long does it take for the machine to completely shutdown when you do a 
     reboot)
  2) disktimeout (default 200 seconds) - the maximum amount of time allowed 
     for a voting file I/O to complete; if this time is exceeded the voting 
     disk will be marked as offline.  Note that this is also the amount of 
     time that will be required for initial cluster formation, i.e. when no 
     nodes have previously been up and in a cluster.

       $CRS_HOME/bin/crsctl set css reboottime  [-force]  ( is seconds)
       $CRS_HOME/bin/crsctl set css disktimeout  [-force] ( is seco

Confirm the new css  misscount setting via ocrdump

RELATED DOCUMENTS
-----------------
Note:294430.1 CSS Timeout Computation in 10g RAC (10gR1 and 10gR2)
Note:309542.1 How to start/stop 10g CRS ClusterWare]]></description>
		<content:encoded><![CDATA[<p>10g RAC: Steps To Increase CSS Misscount, Reboottime and Disktimeout [ID 284752.1] </p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>  修改时间 26-OCT-2010     类型 BULLETIN     状态 PUBLISHED   </p>
<p>PURPOSE<br />
&#8212;&#8212;-<br />
The purpose of this note is to document the steps needed to modify the CSS<br />
misscount, reboottime and  disktimeout settings.  Please review Note 294430.1<br />
to understand the implications before editing these settings.</p>
<p>SCOPE &amp; APPLICATION<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
Customers should not modify  CSS settings unless guided by either Oracle support or Oracle development to do so.</p>
<p>Steps To Modify The CSS Miscount<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>  1) Shut down CRS on all but one node. For exact steps use Note 309542.1<br />
  2) Execute crsctl as root to modify the misscount:<br />
     $ORA_CRS_HOME/bin/crsctl set css misscount<br />
     where  is the maximum i/o latency to the voting disk +1 second<br />
  3) Reboot the node where adjustment was made<br />
  4) Start all other nodes shutdown in step 1</p>
<p>With the Patch:4896338 for 10.2.0.1 there are two additional settings that can<br />
be tuned.  This change is incorporated into the 10.2.0.2 and 10.1.0.6 patchsets.   </p>
<p>These following are only relevant on 10.2.0.1 with Patch:4896338<br />
In addition to MissCount, CSS now has two more parameters:<br />
  1) reboottime (default 3 seconds) &#8211; the amount of time allowed for a node<br />
     to complete a reboot after the CSS daemon has been evicted. (I.E. how<br />
     long does it take for the machine to completely shutdown when you do a<br />
     reboot)<br />
  2) disktimeout (default 200 seconds) &#8211; the maximum amount of time allowed<br />
     for a voting file I/O to complete; if this time is exceeded the voting<br />
     disk will be marked as offline.  Note that this is also the amount of<br />
     time that will be required for initial cluster formation, i.e. when no<br />
     nodes have previously been up and in a cluster.</p>
<p>       $CRS_HOME/bin/crsctl set css reboottime  [-force]  ( is seconds)<br />
       $CRS_HOME/bin/crsctl set css disktimeout  [-force] ( is seco</p>
<p>Confirm the new css  misscount setting via ocrdump</p>
<p>RELATED DOCUMENTS<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
Note:294430.1 CSS Timeout Computation in 10g RAC (10gR1 and 10gR2)<br />
Note:309542.1 How to start/stop 10g CRS ClusterWare</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				作者：roger				</title>
				<link>http://www.killdb.com/2011/09/25/%e5%85%b3%e4%ba%8eoracle-rac%e8%8a%82%e7%82%b9%e7%9a%84%e9%a9%b1%e9%80%90/#comment-155</link>
		<dc:creator><![CDATA[roger]]></dc:creator>
		<pubDate>Sun, 25 Sep 2011 09:08:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.killdb.com/?p=511#comment-155</guid>
					<description><![CDATA[Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot [ID 395878.1] 

--------------------------------------------------------------------------------
 
  修改时间 18-AUG-2011     类型 BULLETIN     状态 PUBLISHED   

In this Document
  Purpose
  Scope and Application
  Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot
     Disk / Storage Hardware  I/O Timeout Values
     Linux OS Software I/O Timeout Values
     Oracle Cluster File System 2 (OCFS2) Heartbeat Timeouts
     Real Application Clusters / Cluster Synchronization Services Timeouts
  References



--------------------------------------------------------------------------------



Applies to: 
Linux OS - Version: 2.6 to 2.6
Linux OS - Version: 1.2.0-1 to 1.2.1-1   [Release: OCFS2 to OCFS2]
Linux OS - Version: 2.6 to 2.6]
Oracle Server - Enterprise Edition - Version: 10.2.0.2 and later    [Release: 10.2 and later]
Linux x86
Linux x86-64
***Checked for relevance on 18-Aug-2011*** 
Linux Kernel - Version: 2.6 to 2.6
Linux Kernel - Version: 2.6 to 2.6
Linux Kernel - Version: 1.2 to 1.2.1
OCFS2 1.2.1 and up
Oracle Real Application Clusters 10.2.0.2 and up 
Purpose
This document aims to guide customers to configure the heartbeat, voting or quorum related timeouts for their cluster configuration with:

Linux with 2.6 Kernel 
OCFS2 

Real Application Clusters 

Scope and Application
There are other documents that are providing technical information about how to configure the RAC/CSS voting/heartbeat timeouts, OCFS2 heartbeat timeouts based on storage layer I/O timeout values. This document does not aim to replicate the information but it covers pointers to configure the complete stack of components.

This document does not apply to RAC configurations with Oracle&#039;s Automatic Storage Manager (ASM).

Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot
To be able to understand the nature of the timeout related settings, we need to approach each component in the cluster stack from bottom to top, i.e. from hardware to application level. The following sections are laid out from that perspective.

Disk / Storage Hardware  I/O Timeout Values
The modern storage systems and Storage Area Networks (SAN) have inherent failover capability when an hardware failure happens. There are various solutions of RAID (in different levels) and Multi Pathing provided by different storage system vendors to provide transparent failover for:

Internal disk failures 
Host Bus Adapter (HBA) failures 
Network connection failures (Network Attached Storage) 
Fiber switch failures 
SCSI connection failures 
etc. 
Each configuration, model, brand of hardware from different vendors has some maximum time to perform a failover for different specific types of failures. The maximum timeout for the whole storage hardware stack provided to the computing server hardware it the greatest of all maximums. We define that number (in seconds) as HW_STORAGE_TIMEOUT.

Most multipath solutions have a timeout ranging from 60 secs to 120 secs. This value will be affecting all the components below that are depending on the storage hardware.

Linux OS Software I/O Timeout Values
The RAID and Multi Pathing solutions for storage failures can be implemented also in the software layer. For a clustered configuration and for RAC/CRS, this software layer must be cluster-aware. Note that &quot;md&quot; devices on Linux are not cluster-aware. Cluster-aware tools based on device-mapper like device-mapper-multipath, LVM2, EVMS.

Note that software multipathing or RAID is not a good practice for a production cluster as they cannot be as reliable as a specific hardware and they present a performance impact on the operating system especially consuming a part of the CPU resources.

In case such a configuration is used, we need to know the greatest maximum failover. For device-mapper-multipath(dm-multipath), which allows system to re-route I/O requests from failed paths to available paths. An I/O path generally refers to a connection from an initiator port to an target port. The failover time of device-mapper-multipath mainly depends on 2 things when outage happens:
1. How long the low level driver underneath it detects the path broken.
2. The interval itself performs polling operation on paths.

For 1, Ordinarily there is a timeout setting for the commands sent from the initiator to the target, the elapse of timeout will trigger a notification to the dm-multipath.

For 2, see the /etc/multipath.conf:

    polling_interval        10Please refer to 


    /usr/share/doc/device-mapper-multipath-x.x.x/multipath.conf.annotated for this interval. For other multipath software, please refer to application specific documentation for any reference to maximum time for failover.

We define that time (in seconds) need for low level driver timeout plus multipath polling interval as SW_STORAGE_TIMEOUT.

The setting should satisfy:

SW_STORAGE_TIMEOUT &#062; HW_STORAGE_TIMEOUTOracle Cluster File System 2 (OCFS2) Heartbeat Timeouts
Every node writes every two secs to its block in the heartbeat system file. The block offset is equal to its global node number. So node 0 writes to the first block, node 1 to the second, etc. All the nodes also read the heartbeat sysfile every two secs. As long as the timestamp is changing, that node is deemed alive. An active node is deemed dead if it does not update its timestamp for O2CB_HEARTBEAT_THRESHOLD (default=7 or 31 based on the OCFS2 version) loops. Once a node is deemed dead, the surviving node which manages to cluster lock the dead node&#039;s journal, recovers it by replaying the journal.

The setting should satify:

O2CB_HEARTBEAT_THRESHOLD &#062;= ((max(HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT) / 2) + 1)Note that any change in O2CB_HEARTBEAT_THRESHOLD requires that o2cb services on all cluster nodes are stopped at once and then o2cb services are started. About setting this parameter and for more informationplease refer to Note 377616.1 and Note 391771.1.

Real Application Clusters / Cluster Synchronization Services Timeouts
When the Oracle Cluster Registry (OCR) and / or Voting is on an OCFS2 volume, the disk timeout for CSS should be set accordingly. In case where OCFS2 is not used, the other values of HW and SW storage timeouts will be effective only. 

The Oracle Clusterware has two heartbeat mechanisms:

Disk heartbeat (voting device) - IOT

Network heartbeat (across the interconnect) - misscount

The disk heartbeat is important here as the voting disk is on an OCFS2 volume. On different versions of RAC/CRS, the internal I/O timeout (IOT)of CSS depends differently on the CSS misscount (network heartbeat failures) value. Please see Note 294430.1 for details. 

It is not recommended to change the misscount and with 10gR2 (10.2.0.2) , a new setting named disktimeout is presented. The value of IOT is:

same as misscount , if the cluster is being formed initially - the cluster nodes are starting up from scratch 
same as disktimeout, for all other times

The setting is backported to the previous versions too see Note 294430.1 for details. The setting is available after:

10.2.0.2 
10.2.0.1 + Patch 4896338 
10.1.0.5 + CRS Bundle Patch #1 
10.1.0.4 + CRS Bundle Patch #2

To summarize, the following should be satisfied:

IOT &#062; max((O2CB_HEARTBEAT_THRESHOLD - 1) * 2, HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT)e.g. For CSS 10.2.0.2: 

disktimeout &#062; max((O2CB_HEARTBEAT_THRESHOLD - 1) * 2, HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT)For the default values of the settings, please see Note 294430.1. For about how to set and check values, please see Note 284752.1.]]></description>
		<content:encoded><![CDATA[<p>Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot [ID 395878.1] </p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>  修改时间 18-AUG-2011     类型 BULLETIN     状态 PUBLISHED   </p>
<p>In this Document<br />
  Purpose<br />
  Scope and Application<br />
  Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot<br />
     Disk / Storage Hardware  I/O Timeout Values<br />
     Linux OS Software I/O Timeout Values<br />
     Oracle Cluster File System 2 (OCFS2) Heartbeat Timeouts<br />
     Real Application Clusters / Cluster Synchronization Services Timeouts<br />
  References</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>Applies to:<br />
Linux OS &#8211; Version: 2.6 to 2.6<br />
Linux OS &#8211; Version: 1.2.0-1 to 1.2.1-1   [Release: OCFS2 to OCFS2]<br />
Linux OS &#8211; Version: 2.6 to 2.6]<br />
Oracle Server &#8211; Enterprise Edition &#8211; Version: 10.2.0.2 and later    [Release: 10.2 and later]<br />
Linux x86<br />
Linux x86-64<br />
***Checked for relevance on 18-Aug-2011***<br />
Linux Kernel &#8211; Version: 2.6 to 2.6<br />
Linux Kernel &#8211; Version: 2.6 to 2.6<br />
Linux Kernel &#8211; Version: 1.2 to 1.2.1<br />
OCFS2 1.2.1 and up<br />
Oracle Real Application Clusters 10.2.0.2 and up<br />
Purpose<br />
This document aims to guide customers to configure the heartbeat, voting or quorum related timeouts for their cluster configuration with:</p>
<p>Linux with 2.6 Kernel<br />
OCFS2 </p>
<p>Real Application Clusters </p>
<p>Scope and Application<br />
There are other documents that are providing technical information about how to configure the RAC/CSS voting/heartbeat timeouts, OCFS2 heartbeat timeouts based on storage layer I/O timeout values. This document does not aim to replicate the information but it covers pointers to configure the complete stack of components.</p>
<p>This document does not apply to RAC configurations with Oracle&#8217;s Automatic Storage Manager (ASM).</p>
<p>Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot<br />
To be able to understand the nature of the timeout related settings, we need to approach each component in the cluster stack from bottom to top, i.e. from hardware to application level. The following sections are laid out from that perspective.</p>
<p>Disk / Storage Hardware  I/O Timeout Values<br />
The modern storage systems and Storage Area Networks (SAN) have inherent failover capability when an hardware failure happens. There are various solutions of RAID (in different levels) and Multi Pathing provided by different storage system vendors to provide transparent failover for:</p>
<p>Internal disk failures<br />
Host Bus Adapter (HBA) failures<br />
Network connection failures (Network Attached Storage)<br />
Fiber switch failures<br />
SCSI connection failures<br />
etc.<br />
Each configuration, model, brand of hardware from different vendors has some maximum time to perform a failover for different specific types of failures. The maximum timeout for the whole storage hardware stack provided to the computing server hardware it the greatest of all maximums. We define that number (in seconds) as HW_STORAGE_TIMEOUT.</p>
<p>Most multipath solutions have a timeout ranging from 60 secs to 120 secs. This value will be affecting all the components below that are depending on the storage hardware.</p>
<p>Linux OS Software I/O Timeout Values<br />
The RAID and Multi Pathing solutions for storage failures can be implemented also in the software layer. For a clustered configuration and for RAC/CRS, this software layer must be cluster-aware. Note that &#8220;md&#8221; devices on Linux are not cluster-aware. Cluster-aware tools based on device-mapper like device-mapper-multipath, LVM2, EVMS.</p>
<p>Note that software multipathing or RAID is not a good practice for a production cluster as they cannot be as reliable as a specific hardware and they present a performance impact on the operating system especially consuming a part of the CPU resources.</p>
<p>In case such a configuration is used, we need to know the greatest maximum failover. For device-mapper-multipath(dm-multipath), which allows system to re-route I/O requests from failed paths to available paths. An I/O path generally refers to a connection from an initiator port to an target port. The failover time of device-mapper-multipath mainly depends on 2 things when outage happens:<br />
1. How long the low level driver underneath it detects the path broken.<br />
2. The interval itself performs polling operation on paths.</p>
<p>For 1, Ordinarily there is a timeout setting for the commands sent from the initiator to the target, the elapse of timeout will trigger a notification to the dm-multipath.</p>
<p>For 2, see the /etc/multipath.conf:</p>
<p>    polling_interval        10Please refer to </p>
<p>    /usr/share/doc/device-mapper-multipath-x.x.x/multipath.conf.annotated for this interval. For other multipath software, please refer to application specific documentation for any reference to maximum time for failover.</p>
<p>We define that time (in seconds) need for low level driver timeout plus multipath polling interval as SW_STORAGE_TIMEOUT.</p>
<p>The setting should satisfy:</p>
<p>SW_STORAGE_TIMEOUT &gt; HW_STORAGE_TIMEOUTOracle Cluster File System 2 (OCFS2) Heartbeat Timeouts<br />
Every node writes every two secs to its block in the heartbeat system file. The block offset is equal to its global node number. So node 0 writes to the first block, node 1 to the second, etc. All the nodes also read the heartbeat sysfile every two secs. As long as the timestamp is changing, that node is deemed alive. An active node is deemed dead if it does not update its timestamp for O2CB_HEARTBEAT_THRESHOLD (default=7 or 31 based on the OCFS2 version) loops. Once a node is deemed dead, the surviving node which manages to cluster lock the dead node&#8217;s journal, recovers it by replaying the journal.</p>
<p>The setting should satify:</p>
<p>O2CB_HEARTBEAT_THRESHOLD &gt;= ((max(HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT) / 2) + 1)Note that any change in O2CB_HEARTBEAT_THRESHOLD requires that o2cb services on all cluster nodes are stopped at once and then o2cb services are started. About setting this parameter and for more informationplease refer to Note 377616.1 and Note 391771.1.</p>
<p>Real Application Clusters / Cluster Synchronization Services Timeouts<br />
When the Oracle Cluster Registry (OCR) and / or Voting is on an OCFS2 volume, the disk timeout for CSS should be set accordingly. In case where OCFS2 is not used, the other values of HW and SW storage timeouts will be effective only. </p>
<p>The Oracle Clusterware has two heartbeat mechanisms:</p>
<p>Disk heartbeat (voting device) &#8211; IOT</p>
<p>Network heartbeat (across the interconnect) &#8211; misscount</p>
<p>The disk heartbeat is important here as the voting disk is on an OCFS2 volume. On different versions of RAC/CRS, the internal I/O timeout (IOT)of CSS depends differently on the CSS misscount (network heartbeat failures) value. Please see Note 294430.1 for details. </p>
<p>It is not recommended to change the misscount and with 10gR2 (10.2.0.2) , a new setting named disktimeout is presented. The value of IOT is:</p>
<p>same as misscount , if the cluster is being formed initially &#8211; the cluster nodes are starting up from scratch<br />
same as disktimeout, for all other times</p>
<p>The setting is backported to the previous versions too see Note 294430.1 for details. The setting is available after:</p>
<p>10.2.0.2<br />
10.2.0.1 + Patch 4896338<br />
10.1.0.5 + CRS Bundle Patch #1<br />
10.1.0.4 + CRS Bundle Patch #2</p>
<p>To summarize, the following should be satisfied:</p>
<p>IOT &gt; max((O2CB_HEARTBEAT_THRESHOLD &#8211; 1) * 2, HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT)e.g. For CSS 10.2.0.2: </p>
<p>disktimeout &gt; max((O2CB_HEARTBEAT_THRESHOLD &#8211; 1) * 2, HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT)For the default values of the settings, please see Note 294430.1. For about how to set and check values, please see Note 284752.1.</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				作者：roger				</title>
				<link>http://www.killdb.com/2011/09/25/%e5%85%b3%e4%ba%8eoracle-rac%e8%8a%82%e7%82%b9%e7%9a%84%e9%a9%b1%e9%80%90/#comment-154</link>
		<dc:creator><![CDATA[roger]]></dc:creator>
		<pubDate>Sun, 25 Sep 2011 09:08:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.killdb.com/?p=511#comment-154</guid>
					<description><![CDATA[CSS Timeout Computation in Oracle Clusterware [ID 294430.1] 

--------------------------------------------------------------------------------
 
  修改时间 26-OCT-2010     类型 BULLETIN     状态 PUBLISHED   

In this Document
  Purpose
  Scope and Application
  CSS Timeout Computation in Oracle Clusterware
  References



--------------------------------------------------------------------------------



Applies to: 
Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.1.0.6 - Release: 10.1 to 11.1
Oracle Server - Standard Edition - Version: 10.1.0.2 to 11.1.0.6   [Release: 10.1 to 11.1]
Information in this document applies to any platform.
Oracle Clusterware 
Purpose
The purpose of this Note is to document default CSS misscount timeout calculations in 10g  Release 1,  10g Release 2 , 11g and higher versions. 
Scope and Application
Define misscount parameter 
Define the default calculations for the misscount parameter 
Describe Cluster Synchronization Service (CSS) heartbeats and their interrelationship 
Describe the cases where the default calculation may be too sensitive 
CSS Timeout Computation in Oracle Clusterware
MISSCOUNT DEFINITION AND DEFAULT VALUES
The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node. The following are the default values for the misscount parameter and their respective versions when using Oracle Clusterware* in seconds:

OS  10g (R1 &#038;R2) 11g  
Linux
 60 
 30 
 
Unix
 30 
 30 
 
VMS
 30 
 30 
 
Windows
 30 
 30 
 


*CSS misscount default value when using vendor (non-Oracle) clusterware is 600 seconds. This is to allow the vendor clusterware ample time to resolve any possible split brain scenarios.

On AIX platforms with HACMP starting with 10.2.0.3 BP#1, the misscount is 30. This is documented in Note 551658.1
CSS HEARTBEAT MECHANISMS AND THEIR INTERRELATIONSHIP
The synchronization services component (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms 1.) the disk heartbeat to the voting device and 2.) the network heartbeat  across the interconnect which establish and confirm valid node membership in the cluster. Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat  can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting. There has been some variation in this relationship 
between versions as described below:
9.x.x.x
 NOTE, MISSCOUNT WAS A  DIFFERENT ENTITY IN THIS RELEASE 
10.1.0.2
 No one should be on this version 
10.1.0.3
 DTO = MC - 15 seconds 
10.1.0.4
 DTO = MC - 15 seconds 
10.1.0.4+Unpublished Bug 3306964 
 DTO = MC - 3 seconds 
10.1.0.4 with CRS II Merge patch 
 DTO =Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration 
10.1.0.5
 IOT = MC - 3 seconds 
10.2.0.1 +Fix for unpublished Bug 4896338
 IOT=Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration
 
10.2.0.2
 Same as above (10.2.0.1 with Patch Bug:4896338
 
10.1 - 11.1
 During node join and leave (reconfiguration) in a cluster we need to reconfigure, in that particular case we use Short Disk TimeOut (SDTO) which is in all versions SDTO = MC – reboottime (usually 3 seconds) 
 


Misscount drives cluster membership reconfigurations and directly effects the availability of the cluster. In most cases, the default settings for MC should be acceptable.  Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.

LONG LATENCIES TO THE VOTING DISKS
If I/O latencies to the voting disk are greater than the default DTO calculations noted above, the cluster may experience CSS node evictions depending on (a)the Oracle Clusterware (CRS) version, (b)whether merge patch has been applied and (c)the state of the Cluster. More details on this are covered in the section &quot;Change in Behavior with CRS Merge PATCH (4896338 on 10.2.0.1)&quot;. 

These latencies can be attributed to any number of problems in the i/o subsystem or problems with any component in the i/o path. The following is a non exhaustive list of reported problems which resulted in CSS node eviction due to latencies to the voting disk longer than the default Oracle Clusterware i/o timeout value(DTO):

QLogic HBA cards with a Link Down Timeout greater than the default misscount. 
Bad cables to the SAN/storage array that effect i/o latencies 
SAN switch (like Brocade) failover latency greater than the default misscount 
EMC Clariion Array when trespassing the SP to the backup SP greater than default misscount 
EMC PowerPath path error detection and I/O repost and redirect greater than default misscount   
NetApp Cluster (CFO) failover latency greater than default misscount 
Sustained high CPU load which effects the CSSD disk ping monitoring thread 
Poor SAN network configuration that creates latencies in the I/O path. 
The most common problems relate to multi-path IO software drivers, and the reconfiguration times resulting from a failure in the IO path. Hardware and (re)configuration issues that introduce these latencies should be corrected. Incompatible failover times with underlying OS, network or storage hardware or software may be addressed given a complete understanding of the considerations listed below. 

Misscount should NOT be modified to workaround the above-mentioned issues. Oracle support recommends that you apply the latest patchset which changes the CSS behaviour. More details covered in next section.


Change in Behavior with Bug:4896338 applied on top of 10.2.0.1
Starting with 10.2.0.1+Bug:4896338, CSS will not evict the node from the cluster due to (DTO) I/O to voting disk taking more than misscount seconds unless it is during the initial cluster formation or slightly before reconfiguration. 
So if we have a N number of nodes in a cluster and one of the nodes takes more than misscount seconds to access the voting disk, the node will not be evicted as long as the access to the voting disk is completed within disktimeout seconds. Consequently with this patch, there is no need to increase the misscount at all.

Additionally this merge patch introduces Disktimeout  which is the amount of time that a lack of disk ping to voting disk(s) will be tolerated.

Note:  applying the patch will not change your value for Misscount.  


The table below explains in the conditions under which the eviction will occur


Network Ping Disk Ping Reboot 
Completes within misscount seconds Completes within Misscount seconds N
 
Completes within Misscount seconds Takes more than misscount seconds but less than Disktimeout seconds N
 
Completes within Misscount seconds Takes more than Disktimeout seconds Y
 
Takes more than Misscount Seconds Completes within Misscount seconds Y
 






 

 

 

 

 

  

* By default Misscount is less than Disktimeout seconds

CONSIDERATIONS WHEN CHANGING MISSCOUNT FROM THE DEFAULT VALUE


Customers drive SLA and cluster availability. The customer ultimately defines Service Levels and availability for the cluster. Before recommending any change to misscount, the full impact of that change should be described and the impact to cluster availability measured. 
Customers may have timeout and retry logic in their applications. The impact of delaying reconfiguration may cause &#039;artificial&#039; timeouts of the application, reconnect failures and subsequent logon storms. 
Misscount timeout values are version dependent and are subject to change. As we have seen, misscount calculations are variable between releases and between versions within a release. Creating a false dependency on misscount calculation in one version may not be appropriate for later versions. 
Internal I/O timeout interval (DTO) algorithms may change in later releases as stated above, there exists a direct relationship between the internal I/O timeout interval and misscount. This relationship is subject to change in later releases. 
An increase in misscount to compensate for i/o latencies directly effects reconfiguration times for network failures. The network heartbeat is the primary indicator of connectivity within the cluster. Misscount is the tolerance level of missed &#039;check ins&#039; that trigger cluster reconfiguration. Increasing misscount will prolong the time to take corrective action in the event of network failure or other anomalies effecting the availability of a node in the cluster. This directly effects cluster availability. 
Changing misscount to workaround voting disk latencies will need to be corrected when the underlying disk latency is corrected, misscount needs to be set back to the default The customer needs to document the change and set the parameter back to the default when the underlying storage I/O latency is resolved. 
Do not change default misscount values if you are  running Vendor Clusterware along with Oracle Clusterware. The default values for misscount should not be changed when using vendor clusterware. Modifying misscount in this environment may cause clusterwide outages and potential corruptions. 
Changing misscount parameter incurs a clusterwide outage. As note below, the customer will need to schedule 
a clusterwide outage to make this change. 
Changing misscount should not be used to compensate for poor configurations or faulty hardware 
Cluster and RDBMS availability are directly effected by high misscount settings. 
In case of stretched clusters and stretched storage systems and a site failure where we loose one storage and N number of nodes we go into a reconfiguration state and then we revert to ShortDiskTimeOut value as internal I/O timeout for the votings. Several cases are known with stretched clusters where when a site failure happen the storage failover cannot complete within SDTO. If the I/O to the votings is blocked more than SDTO the result is node evictions on the surviving side. 


To Change MISSCOUNT back to default Please refer to Note:284752.1
THIS IS THE ONLY SUPPORTED METHOD. NOT FOLLOWING THIS METHOD RISKS EVICTIONS AND/OR CORRUPTING THE OCR

10g Release 2 MIRRORED VOTING DISKS AND VENDOR MULTIPATHING SOLUTIONS 
Oracle RAC 10g Release 2 allows for multiple voting disks so that  the customer does not have to rely on a multipathing solution from a storage vendor. You can have n voting disks (up to 31) where n = m*2+1 where m is the number of disk failures you  want to survive. Oracle recommends each voting disk to be on a separate physical disk.]]></description>
		<content:encoded><![CDATA[<p>CSS Timeout Computation in Oracle Clusterware [ID 294430.1] </p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>  修改时间 26-OCT-2010     类型 BULLETIN     状态 PUBLISHED   </p>
<p>In this Document<br />
  Purpose<br />
  Scope and Application<br />
  CSS Timeout Computation in Oracle Clusterware<br />
  References</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>Applies to:<br />
Oracle Server &#8211; Enterprise Edition &#8211; Version: 10.1.0.2 to 11.1.0.6 &#8211; Release: 10.1 to 11.1<br />
Oracle Server &#8211; Standard Edition &#8211; Version: 10.1.0.2 to 11.1.0.6   [Release: 10.1 to 11.1]<br />
Information in this document applies to any platform.<br />
Oracle Clusterware<br />
Purpose<br />
The purpose of this Note is to document default CSS misscount timeout calculations in 10g  Release 1,  10g Release 2 , 11g and higher versions.<br />
Scope and Application<br />
Define misscount parameter<br />
Define the default calculations for the misscount parameter<br />
Describe Cluster Synchronization Service (CSS) heartbeats and their interrelationship<br />
Describe the cases where the default calculation may be too sensitive<br />
CSS Timeout Computation in Oracle Clusterware<br />
MISSCOUNT DEFINITION AND DEFAULT VALUES<br />
The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node. The following are the default values for the misscount parameter and their respective versions when using Oracle Clusterware* in seconds:</p>
<p>OS  10g (R1 &amp;R2) 11g<br />
Linux<br />
 60<br />
 30 </p>
<p>Unix<br />
 30<br />
 30 </p>
<p>VMS<br />
 30<br />
 30 </p>
<p>Windows<br />
 30<br />
 30 </p>
<p>*CSS misscount default value when using vendor (non-Oracle) clusterware is 600 seconds. This is to allow the vendor clusterware ample time to resolve any possible split brain scenarios.</p>
<p>On AIX platforms with HACMP starting with 10.2.0.3 BP#1, the misscount is 30. This is documented in Note 551658.1<br />
CSS HEARTBEAT MECHANISMS AND THEIR INTERRELATIONSHIP<br />
The synchronization services component (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms 1.) the disk heartbeat to the voting device and 2.) the network heartbeat  across the interconnect which establish and confirm valid node membership in the cluster. Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat  can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting. There has been some variation in this relationship<br />
between versions as described below:<br />
9.x.x.x<br />
 NOTE, MISSCOUNT WAS A  DIFFERENT ENTITY IN THIS RELEASE<br />
10.1.0.2<br />
 No one should be on this version<br />
10.1.0.3<br />
 DTO = MC &#8211; 15 seconds<br />
10.1.0.4<br />
 DTO = MC &#8211; 15 seconds<br />
10.1.0.4+Unpublished Bug 3306964<br />
 DTO = MC &#8211; 3 seconds<br />
10.1.0.4 with CRS II Merge patch<br />
 DTO =Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration<br />
10.1.0.5<br />
 IOT = MC &#8211; 3 seconds<br />
10.2.0.1 +Fix for unpublished Bug 4896338<br />
 IOT=Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration</p>
<p>10.2.0.2<br />
 Same as above (10.2.0.1 with Patch Bug:4896338</p>
<p>10.1 &#8211; 11.1<br />
 During node join and leave (reconfiguration) in a cluster we need to reconfigure, in that particular case we use Short Disk TimeOut (SDTO) which is in all versions SDTO = MC – reboottime (usually 3 seconds) </p>
<p>Misscount drives cluster membership reconfigurations and directly effects the availability of the cluster. In most cases, the default settings for MC should be acceptable.  Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.</p>
<p>LONG LATENCIES TO THE VOTING DISKS<br />
If I/O latencies to the voting disk are greater than the default DTO calculations noted above, the cluster may experience CSS node evictions depending on (a)the Oracle Clusterware (CRS) version, (b)whether merge patch has been applied and (c)the state of the Cluster. More details on this are covered in the section &#8220;Change in Behavior with CRS Merge PATCH (4896338 on 10.2.0.1)&#8221;. </p>
<p>These latencies can be attributed to any number of problems in the i/o subsystem or problems with any component in the i/o path. The following is a non exhaustive list of reported problems which resulted in CSS node eviction due to latencies to the voting disk longer than the default Oracle Clusterware i/o timeout value(DTO):</p>
<p>QLogic HBA cards with a Link Down Timeout greater than the default misscount.<br />
Bad cables to the SAN/storage array that effect i/o latencies<br />
SAN switch (like Brocade) failover latency greater than the default misscount<br />
EMC Clariion Array when trespassing the SP to the backup SP greater than default misscount<br />
EMC PowerPath path error detection and I/O repost and redirect greater than default misscount<br />
NetApp Cluster (CFO) failover latency greater than default misscount<br />
Sustained high CPU load which effects the CSSD disk ping monitoring thread<br />
Poor SAN network configuration that creates latencies in the I/O path.<br />
The most common problems relate to multi-path IO software drivers, and the reconfiguration times resulting from a failure in the IO path. Hardware and (re)configuration issues that introduce these latencies should be corrected. Incompatible failover times with underlying OS, network or storage hardware or software may be addressed given a complete understanding of the considerations listed below. </p>
<p>Misscount should NOT be modified to workaround the above-mentioned issues. Oracle support recommends that you apply the latest patchset which changes the CSS behaviour. More details covered in next section.</p>
<p>Change in Behavior with Bug:4896338 applied on top of 10.2.0.1<br />
Starting with 10.2.0.1+Bug:4896338, CSS will not evict the node from the cluster due to (DTO) I/O to voting disk taking more than misscount seconds unless it is during the initial cluster formation or slightly before reconfiguration.<br />
So if we have a N number of nodes in a cluster and one of the nodes takes more than misscount seconds to access the voting disk, the node will not be evicted as long as the access to the voting disk is completed within disktimeout seconds. Consequently with this patch, there is no need to increase the misscount at all.</p>
<p>Additionally this merge patch introduces Disktimeout  which is the amount of time that a lack of disk ping to voting disk(s) will be tolerated.</p>
<p>Note:  applying the patch will not change your value for Misscount.  </p>
<p>The table below explains in the conditions under which the eviction will occur</p>
<p>Network Ping Disk Ping Reboot<br />
Completes within misscount seconds Completes within Misscount seconds N</p>
<p>Completes within Misscount seconds Takes more than misscount seconds but less than Disktimeout seconds N</p>
<p>Completes within Misscount seconds Takes more than Disktimeout seconds Y</p>
<p>Takes more than Misscount Seconds Completes within Misscount seconds Y</p>
<p>* By default Misscount is less than Disktimeout seconds</p>
<p>CONSIDERATIONS WHEN CHANGING MISSCOUNT FROM THE DEFAULT VALUE</p>
<p>Customers drive SLA and cluster availability. The customer ultimately defines Service Levels and availability for the cluster. Before recommending any change to misscount, the full impact of that change should be described and the impact to cluster availability measured.<br />
Customers may have timeout and retry logic in their applications. The impact of delaying reconfiguration may cause &#8216;artificial&#8217; timeouts of the application, reconnect failures and subsequent logon storms.<br />
Misscount timeout values are version dependent and are subject to change. As we have seen, misscount calculations are variable between releases and between versions within a release. Creating a false dependency on misscount calculation in one version may not be appropriate for later versions.<br />
Internal I/O timeout interval (DTO) algorithms may change in later releases as stated above, there exists a direct relationship between the internal I/O timeout interval and misscount. This relationship is subject to change in later releases.<br />
An increase in misscount to compensate for i/o latencies directly effects reconfiguration times for network failures. The network heartbeat is the primary indicator of connectivity within the cluster. Misscount is the tolerance level of missed &#8216;check ins&#8217; that trigger cluster reconfiguration. Increasing misscount will prolong the time to take corrective action in the event of network failure or other anomalies effecting the availability of a node in the cluster. This directly effects cluster availability.<br />
Changing misscount to workaround voting disk latencies will need to be corrected when the underlying disk latency is corrected, misscount needs to be set back to the default The customer needs to document the change and set the parameter back to the default when the underlying storage I/O latency is resolved.<br />
Do not change default misscount values if you are  running Vendor Clusterware along with Oracle Clusterware. The default values for misscount should not be changed when using vendor clusterware. Modifying misscount in this environment may cause clusterwide outages and potential corruptions.<br />
Changing misscount parameter incurs a clusterwide outage. As note below, the customer will need to schedule<br />
a clusterwide outage to make this change.<br />
Changing misscount should not be used to compensate for poor configurations or faulty hardware<br />
Cluster and RDBMS availability are directly effected by high misscount settings.<br />
In case of stretched clusters and stretched storage systems and a site failure where we loose one storage and N number of nodes we go into a reconfiguration state and then we revert to ShortDiskTimeOut value as internal I/O timeout for the votings. Several cases are known with stretched clusters where when a site failure happen the storage failover cannot complete within SDTO. If the I/O to the votings is blocked more than SDTO the result is node evictions on the surviving side. </p>
<p>To Change MISSCOUNT back to default Please refer to Note:284752.1<br />
THIS IS THE ONLY SUPPORTED METHOD. NOT FOLLOWING THIS METHOD RISKS EVICTIONS AND/OR CORRUPTING THE OCR</p>
<p>10g Release 2 MIRRORED VOTING DISKS AND VENDOR MULTIPATHING SOLUTIONS<br />
Oracle RAC 10g Release 2 allows for multiple voting disks so that  the customer does not have to rely on a multipathing solution from a storage vendor. You can have n voting disks (up to 31) where n = m*2+1 where m is the number of disk failures you  want to survive. Oracle recommends each voting disk to be on a separate physical disk.</p>
]]></content:encoded>
						</item>
			</channel>
</rss>
