Wednesday, October 13, 2021

How to resolve MRP stuck issues on a physical standby database? (Doc ID 1221163.1)

 In this Document

Goal
Solution
 Cause 1 : Log transport issues.
 Solution 1: Please use cksum command to verify the size of password file on the primary and the standby sites, and make sure SEC_CASE_SENSITIVE_LOGON  is set to false for 11g or above databases.
 Cause 2 : Firewall caused partial archive log transferred.
 
Solution 2: Please make sure the following firewall features are disabled.
 Cause 3 : ARC process on the primary that is responsible for the heartbeat sticks on the network due to bug 6113783
 Solution 3: The bug 6113783 was fixed in 11.2.0.2. You could workaround the issue by killing the arch processes on the primary database.
 Cause 4. The standby recovery asked for old sequence # that had been applied to the standby.
 Solution 4 : Apply the fix of bug 6029179 or kill the heartbeat arch process of each primary instance if the primary is a RAC database.
 Cause 5 Recover from the wrong location. 
 Solution 5 Specify standby_archive_dest the same location as log_archive_dest_1 on the physical standby. Use from 'location' attribute in the manual recovery clause. 
 Cause 6: All standby redo log files are active on the standby database.
 
Solution 6: Please make sure you have enough space in the archive location.
log_archive_dest_1 is defined with proper valid_for values and db_unique_name.
standby_archive_dest is specified properly.
 
Cause 7 : Partial archive log file is applied on the standby database.
 Solution 7 Use RMAN incremental backup method to roll forward your physical standby database.
 
Cause 8 : The archive log files were transferred to the standby manually, not through data guard log transport service.
 
Solution 8 Register those archive log files or use manual recovery.
 Cause 9 : The archive log files are deleted from the primary before they are shipped and applied to the standby.
 Solution 9 Restore archive log files from backup, register them and use managed recovery to recover them; or use manual recovery without registering them.
 Cause 10 : New datafiles are added to the primary, but they are not added to the standby automatically.
 Solution 10 Add the new datafiles to the standby database manually.
 Cause 11 : MRP can stall after control file contention at the standby.
 Solution 11 This is fixed in 10.2.0.4. The workaround is to restart the standby.
 Cause 12 : Cancelling managed recovery hangs.
 Solution 12 Let's shutdown abort the standby, startup mount, then have a clean shutdown, check if there are server processes left over for the standby instance.
References

APPLIES TO:

Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Database - Enterprise Edition - Version 10.1.0.2 to 12.2.0.1 [Release 10.1 to 12.2]
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Information in this document applies to any platform.

GOAL

When you find the applied sequence# doesn't increase from grid control for a physical standby database, or MRP (Managed Recovery Process ) is stuck and doesn't apply more logs, what do you do to find the cause of the issue and resolve the issue to let your physical standby database in-sync with your primary database?

In general, you could perform the following steps to start with:

  1. Please use one of the following method to find the archive log file that MRP sticks at:

a. Please query v$managed_standby from the standby database if MRP is started.

 

% ps -ef |grep -i mrp 
SQL>select process, thread#, sequence#, status from v$managed_standby where process='MRP0';
PROCESS  THREAD#   SEQUENCE#  STATUS
-------- --------- ---------- ------------

MRP0     1         548        WAIT_FOR_GAP

Note: Starting from 12.2 use V$DATAGUARD_PROCESS view instead of v$managed_standby

 

b. Stop the managed recovery and start the manual recovery.

SQL>recover managed standby database cancel;
SQL>recover automatic standby database;

If you use data guard broker, you need to do these from either DGMGRL or grid control.

For 11g data guard broker,

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-OFF';
DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON';

 


If your standby is a RAC database, then
DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON' WITH APPLY INSTANCE = <instance-name>;

 
<instance-name> is the name of the instance you want to become the apply instance.

For 10g Data Guard broker,

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='LOG-APPLY-OFF'; DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='ONLINE';


If your standby is a RAC database, then

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='ONLINE' WITH APPLY INSTANCE = <instance-name>;

To change the state of the standby database from grid control for 10g and 11g standby databases, follow these steps:

1. From the Data Guard Overview page, select the standby database you want to change.
2. Click Edit to go to the Edit Properties page.
3. Select Log Apply Off (or Online).
4. Click Apply.
5. When the process completes, a message indicating success is returned.


For 9i Data Guard Broker,

DGMGRL> ALTER RESOURCE '<resource name>' ON SITE '<site name>' SET STATE='PHYSICAL-APPLY-READY';
DGMGRL> ALTER RESOURCE '<resource name>' ON SITE '<site name>' SET STATE='PHYSICAL-APPLY-ON';

To change the state of the standby database from grid control for 9i standby database, follow these steps:

1. In the navigator tree, select the standby database resource.
2. In the right-hand property sheet, click Set State.
3. Click Apply Off (or Online).
4. Click OK.


c. Use the lowest checkpoint_change# in the data file header to find the archive log that the recovery needs to recover from:

SQL>select min(fhscn),fhrba_Seq SEQUENCE from x$kcvfh group by fhrba_Seq;


  2. Check whether the archive log file you find from step 1 exists on the standby site in the location defined by the standby init parameter standby_archive_dest or log_archive_dest_1 if standby_archive_dest is not defined on the standby.

If so, then please compare whether it has the same size on the standby as it is on the primary by cksum command. For example

% cksum <archive log file full path/file name>
  

Note: if cksum is not available on your unix platform please see the document below which provides more commands, such as, SHA-1 and MD5 checksum utilities.

Note 549617.1 How To Verify The Integrity Of A Patch/Software Download?

  

You could also verify whether the archive log file is corrupted on either standby or primary site by

SQL>alter system dump logfile '<full path/archive log file name>' validate;


If you get the SQL prompt back without error, then the archive log file is not corrupted.

  3. If either the archive log file from step 1 doesn't exist on the standby or it is corrupted, then you would need to get a new copy from the primary site. Please refer to the document below on how to resolve a gap manually.

 Note 1537316.1 Data Guard Gap Detection and Resolution Possibilities


  4. If the archive log file exists on the standby site and is not corrupted, then please check whether it is registered to the standby controlfile by querying v$archived_log. For example,

SQL>select name from v$archived_log where (thread#=1 and sequence#=192917) or (thread#=2 and sequence#=26903);


If the last applied sequence# for thread 1 is 192916 and the last applied sequence# for thread 2 is 26902. Or the thread# and the sequence# are what you find from step 1 which the MRP sticks at.
If the archive log file names come back from the query above, then they are registered to the standby controlfile. Otherwise, the standby controlfile doesn't aware of them.

 

2. Please use the queries below to identify the current sequence# on the primary for each thread, the last received sequence# on the standby and the last applied sequence# on the standby for

each thread so that you will know whether the standby is in sync with the primary, whether there is a transport lag or an apply lag.

Primary: SQL > select thread#, max(sequence#) "Last Primary Seq Generated"
from v$archived_log val, v$database vdb
where val.resetlogs_change# = vdb.resetlogs_change#
group by thread# order by 1;

  


<standby>: SQL > select thread#, max(sequence#) "Last Standby Seq Received"
from v$archived_log val, v$database vdb
where val.resetlogs_change# = vdb.resetlogs_change#
group by thread# order by 1;

  

<standby>: SQL > select thread#, max(sequence#) "Last Standby Seq Applied"
from v$archived_log val, v$database vdb
where val.resetlogs_change# = vdb.resetlogs_change#
and applied='YES'
group by thread# order by 1;

 
=======================================================================

Please see the following Causes and Solutions for the MRP stuck issues on a physical standby database. If the MRP stucking issues still couldn't be resolved, then you could use RMAN incremental backup method to roll forward your physical standby database.

Note: You couldn't use the RMAN incremental backup method to roll forward a logical standby database.


Note 290817.1 Rolling a Standby Forward using an RMAN Incremental Backup in 9i
Note 836986.1 Steps to perform for Rolling forward a standby database using RMAN incremental backup when primary and standby are in ASM filesystem

 

SOLUTION

Cause 1 : Log transport issues.

MRP sticks on the standby because the primary has trouble to ship redo to the standby. You could check the status of remote archive destination on the primary database. For example,

SQL> SELECT DESTINATION, STATUS, ERROR FROM V$ARCHIVE_DEST WHERE DEST_ID=2;

It could give you various errors and you would need to resolve the log transport errors.

I would only illustrate very common errors related with password file issues here, such as ora-1031, ora-1017, ora-16191, ora-12154 etc.

Solution 1: Please use cksum command to verify the size of password file on the primary and the standby sites, and make sure SEC_CASE_SENSITIVE_LOGON  is set to false for 11g or above databases.

- use cksum command to verify the size of password file on the primary and the standby sites  

   % cd $ORACLE_HOME/dbs

   % cksum <password file name>               /* cksum command is only available on the UNIX platform  */

  

Note: if cksum is not available on your unix platform please see the document below which provides more commands, such as, SHA-1 and MD5 checksum utilities.

Note 549617.1 How To Verify The Integrity Of A Patch/Software Download?

 

If the sizes are different, you could create a new password file on one primary node and then copy it to other primary nodes and standby nodes.

UNIX
orapwd file=$ORACLE_HOME/dbs/orapw<local ORACLE_SID> password=<sys password> entries=5

Windows
orapwd file=$ORACLE_HOME/database/PWD<local ORACLE_SID>.ora password=<sys password> entries=5

 

Note: if the instance name on the standby database is different from the primary instance name, then their password file names are different.

  

You could check your local database instance name by

SQL>show parameter instance_name;

  

- Please make sure SEC_CASE_SENSITIVE_LOGON init parameter is set to false on both the primary and the standby databases for 11g or above databases.

If you have to set SEC_CASE_SENSITIVE_LOGON to true for business security reason, then you must create your password file with ignorecase=y attribute. For example,

orapwd file=$ORACLE_HOME/dbs/orapw<ORACLE_SID> password=<sys password> ignorecase=y entries=5

  

Please see Note 799353.1 - How to Resolve Error in Remote Archiving for more about log shipping issues.

Cause 2 : Firewall caused partial archive log transferred.

The contents of the mrp trace file show:

Media Recovery Log /<path>/<datafile_name>.dbf
ORA-00332: archived log is too small - may be incompletely archived
ORA-00334: archived log: '/<path>/<datafile_name>.dbf'
Background Media Recovery terminated with error 332
ORA-00332: archived log is too small - may be incompletely archived
ORA-00334: archived log: '/<path>/<datafile_name>.dbf'
----- Redo read statistics for thread 1 -----

 

The other common error is ORA-3135 and recommend you to check this cause and solution.

ORA-03135: connection lost contact when shipping redo log to standby database

ORA-03135: connection lost contact

ARC2: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)


Solution 2: Please make sure the following firewall features are disabled.

- SQLNet fixup protocol
- Deep Packet Inspection (DPI)
- SQLNet packet inspection
- SQL Fixup
- SQL ALG (Juniper firewall)


The wording and features can vary by vendor but all the above have some impact on some packets (not all packets are affected). Some firewalls can have an adverse effect on certain SQL packets transported across them (again, some not all).

You could then resolve the gap by following  Note 1537316.1 Data Guard Gap Detection and Resolution Possibilities.

Please see the following articles related to ora-3135.

Note 404724.1 ORA-03135 when connecting to Database
Note 739522.1 ORA - 03135 connection lost contact while shipping from Primary Server to Standby server
Note 730066.1 Diagnosis of ORA-3135/ORA-3136 Connection Timeouts when the Fault is in the Database
Note 787354.1 Troubleshooting ORA - 3135 Connection Lost Contact

Cause 3 : ARC process on the primary that is responsible for the heartbeat sticks on the network due to bug 6113783

You could see different symptoms. For example, you get FAL archive error as the primary arch process sticks and it doesn't ship the missing archive log files to the standby. The worst case you see is all arch processes stick and no one does local archiving and all online redo log files are full and that causes the primary database hangs. Please see the following errors related to this cause.

FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 2262-2266
DBID 2591313681 branch 686760021
FAL[client]: All defined FAL servers have been attempted.

SQL> alter session set nls_date_format='YYYY-MON-DD HH24:MI:SS';
SQL> select message, timestamp 
from v$dataguard_status 
where severity in ('Error','Fatal') 
order by timestamp;

MESSAGE
-------------------------------------------------------------------------------
TIMESTAMP
--------------------------
Error 12154 received logging on to the standby
OCT-12-2010 15:55:45

PING[ARC1]: Heartbeat failed to connect to standby '<standby_name>'. Error is 12154.
OCT-12-2010 15:55:45

Solution 3: The bug 6113783 was fixed in 11.2.0.2. You could workaround the issue by killing the arch processes on the primary database.

This won't harm your primary database at all as arch processes will be respawned automatically immediately by Oracle.

    % ps -ef |grep -i arc
    % kill -9 <ospid of arc process> <ospid of arc process> <ospid of arc process>...

NOTE: If you are on Windows Platform you can use either 'orakill'- command or any OS-Tool to kill the Windows Processes/Threads

Cause 4. The standby recovery asked for old sequence # that had been applied to the standby.

    a. This is caused by bug 6029179 that was fixed in 10.2.0.4.1. 
        Bug 6029179 Abstract: MRP WAITING FOR LOGS THAT ARE ALREADY APPLIED (with RAC primary)

    b. This could be caused by ARC process that is responsible for standby heartbeat sticking on
network as well due to Bug 6113783

Solution 4 : Apply the fix of bug 6029179 or kill the heartbeat arch process of each primary instance if the primary is a RAC database.

   a. Apply the fix of Bug 6029179 that was fixed in 10.2.0.4.1.

    b. You could identify the heartbeat arch process from the primary alert log file. Look for
the last start up information in the alert log file. For example,

Starting ORACLE instance (normal)
...
processes = 1800
.....
ARC4 started with pid=27, OS id=8805
ARC2: Archival started
ARC3: Archival started
ARC3: Becoming the 'no FAL' ARCH
ARC3: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
...........
Completed: ALTER DATABASE OPEN


So ARC2 is the heartbeat arch process for this startup. If you checked the previous startup information,
it might be a different arch process. Then you could find the ospid of arc2 process and kill it.

% ps -ef |grep -i arc2
% kill -9 <ospid of arc2> 

NOTE: If you are on Windows Platform you can use either 'orakill'- command or any OS-Tool to kill the Windows Processes/Threads
Please do this for each primary instance if the primary database is a RAC database.

    c. The worst case would be to use RMAN incremental backup method to roll forward your standby database. 
       If your primary database is small, you could simply take a full database backup from the primary and restore it to the standby server to refresh or recreate your physical standby database.

Cause 5 Recover from the wrong location. 

    a. The init parameter standby_archive_dest is not defined on the standby database for 10g or lower version. The default location is $ORACLE_HOME/dbs.

    b. You use RMAN or manual method to restore or transfer archive log files to a different location
than the local archive location log_archive_dest_1 or standby_archive_dest defined on the standby.

Solution 5 Specify standby_archive_dest the same location as log_archive_dest_1 on the physical standby. Use from 'location' attribute in the manual recovery clause. 

    a. Always specify standby_archive_dest the same location as the local archive destination log_archive_dest_1 for 10g or lower version physical standby database.

It is obsoleted in 11g.

    b. When archive log files are located in a different location than the recovery expects, then you could use from 'location' attribute in the manual recovery clause. For example,

SQL>alter database recover managed standby database cancel;
SQL>alter database recover automatic from '/<different_location>/' standby database;

<different_location> is where your archive log files are located.

Cause 6: All standby redo log files are active on the standby database.

So that the standby redo log files couldn't receive new redo from the primary database.

SQL> select group#,thread#,bytes/1024/1024 as mbytes,used,archived,status
from v$standby_log

GROUP# THREAD# MBYTES USED ARC STATUS
---------- ---------- ---------- ---------- --- ----------
15 2 100 33292288 NO ACTIVE
16 2 100 3323904 NO ACTIVE
17 1 100 98733056 NO ACTIVE
18 2 100 3813376 NO ACTIVE
19 2 100 4763648 NO ACTIVE
...........

38 1 100 98072576 NO ACTIVE
39 2 100 7388672 NO ACTIVE
40 2 100 4899328 NO ACTIVE


Standby alert log file showed:

ORA-16014: log 15 sequence# 13234 not archived, no available destinations
ORA-00312: online log 15 thread 2: '+<diskgroup1>/<db_name>/onlinelog/group_15.457.
682446507'
ORA-00312: online log 15 thread 2: '+D<diskgroup2>/<db_name>/onlinelog/group_15.456.
682446509'
ORA-00312: online log 15 thread 2: '+<diskgroup3>/<db_name>/onlinelog/group_15.490.
682446509'


Solution 6: Please make sure you have enough space in the archive location.
log_archive_dest_1 is defined with proper valid_for values and db_unique_name.
standby_archive_dest is specified properly.

    a. Please make sure you have enough space in the local archive destination
specified by log_archive_dest_1 so that the standby redo log files could be
archived successfully.
Please also make sure you have enough space in the location specified by
standby_archive_dest so that the primary arch process could ship archive log files
to it successfully.
When log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST...', you would
need to check the location specified by init parameter db_recovery_file_dest.
For example:

log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST valid_for=(ALL_LOGFILE,ALL_ROLES)
db_unique_name=<standby_name>'

db_recovery_file_dest=+<diskgroup1> 
db_recovery_file_dest_size=214748364800


    b. Please make sure the local archive destination log_archive_dest_1 is defined
with VALID_FOR=(ALL_LOGFILES,ALL_ROLES) and DB_UNIQUE_NAME=<local db DB_UNIQUE_NAME>.
For example:

Oracle® Data Guard Concepts and Administration
3 Creating a Physical Standby Database

LOG_ARCHIVE_DEST_1='LOCATION=/<path>/<standby_name>/ VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=<local database DB_UNIQUE_NAME>'


    c. Please specify standby_archive_dest properly on the standby database. For example,

STANDBY_ARCHIVE_DEST=/<path>/<db_name>/

 

When LOG_ARCHIVE_DEST_1='LOCATION=USE_DB_RECOVERY_FILE_DEST VALID_FOR=(ALL_LOGFILES,ALL_ROLES)
DB_UNIQUE_NAME=<local database DB_UNIQUE_NAME>', you specify STANDBY_ARCHIVE_DEST='LOCATION=USE_DB_RECOVERY_FILE_DEST'.

The init parameter standby_archive_dest is deprecated in 11g.


You could use alter system command to modify or define LOG_ARCHIVE_DEST_1 or STANDBY_ARCHIVE_DEST.
If your database is a RAC database, please use attribute sid='*' so that the change will be
made to all RAC instances for the database. For example,

SQL>alter system set log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST
valid_for=(ALL_LOGFILES,ALL_ROLES) db_unique_name=<standby_name>' scope=both sid='*';

 

If you use data guard broker, then you need to shutdown the broker first before you issue the command above from sqlplus. For example,

SQL>alter system set dg_Broker_start=false sid='*' scope=both;

Then after the change, you could start the broker by

SQL>alter system set dg_Broker_start=true sid='*' scope=both;
SQL>alter system set standby_archive_dest = 'LOCATION=USE_DB_RECOVERY_FILE_DEST' scope=both sid='*';


When 'scope' is not specified, the default value is 'both'. 

If you use broker, you could modify the StandbyArchiveLocation property as below:

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET PROPERTY 'StandbyArchiveLocation'='LOCATION=USE_DB_RECOVERY_FILE_DEST';


You would need to issue the command on all standby instances if the standby database is a RAC database. 

    d. Please make sure the standby redo log file size on the standby database
are the same size as the primary online redo log file size and you have one more
standby redo log group on the standby than the primary online redo log group for each thread.
For example, if you have 2 threads on the primary database and three online redo log groups
for each thread on the primary, then you would need 4 standby redo log groups for thread 1
and 4 standby redo log groups for thread 2 on the standby database no matter how many instance
you have on the standby database.

In case of switchover and the database roles change, the same principle applies. For example,
if your current standby database has only one instance, and three online redo log groups for it,
then you would need only 4 standby redo log groups for thread 1 on the current primary database
no matter how many instances/threads it has.

Reference:

MAA - Creating a RAC Physical Standby for a RAC Primary (Doc ID 380449.1)

    e. Please clear all standby redo log groups by

SQL> alter database clear standby logfile group <#>;


Cause 7 : Partial archive log file is applied on the standby database.

The standby alert log file showed:

Fri Jan 9 20:11:18 2009
MRP0: Background Managed Standby Recovery process started (<standby_name>)
Managed Standby Recovery not using Real Time Apply
parallel recovery started with 7 processes
Media Recovery Log /<path>/<db_name>-656901558185964.arc
Fri Jan 9 20:11:24 2009
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM
SESSION
Fri Jan 9 20:11:24 2009
Incomplete Recovery applied until change 14025537844
Fri Jan 9 20:11:24 2009
MRP0: Media Recovery Complete (<standby_name>)
Fri Jan 9 20:11:26 2009
MRP0: Background Media Recovery process shutdown (<standby_name>)


You could find which archive log file contains the change# 14025537844 by querying v$archived_log.

SQL>select thread#, sequence#, name, first_change#, next_Change#, deleted, status from v$archived_log where 14025537844 between first_change# and next_Change#;

 

Solution 7 Use RMAN incremental backup method to roll forward your physical standby database.

If your primary database is small, you could simply take a full database backup from the primary
and restore it to the standby server to refresh or recreate your physical standby database.

To prevent this happening again, please perform the following before you cancel the managed recovery
on the physical standby database or shutdown the physical standby database.

a. Defer the remote log transport on the primary database. For example, if log_archive_dest_2 is the remote archive destination,

SQL>alter system set log_archive_dest_state_2=defer; (for single instance primary database)
SQL>alter system set log_archive_dest_state_2=defer sid='*'; (for RAC primary database)


b. Please make sure the standby recovers the last received archive log file on the standby.
You can go to the location defined by log_archive_dest_1 or standby_archive_dest to see the most recent received archive log file.
Then check the latest applied sequence on the standby by the query below:

SQL>select thread#, max(sequence#) from v$archived_log where applied='YES' group by thread#;


Cause 8 : The archive log files were transferred to the standby manually, not through data guard log transport service.

So that they are not registered to the standby control file and MRP (Managed Recovery Process) couldn't recognize them.
For example, they are transferred from the primary by ftp, or scp, etc. or rman backup them and restores them to the standby.



The standby alert log file showed:

Media Recovery Waiting for thread 1 sequence 13615 (in transit)


Solution 8 Register those archive log files or use manual recovery.

    a. Register those manually transferred archive log files and then use managed recovery to recover them.

SQL> alter database recover managed standby database cancel;
SQL> alter database register logfile '<full path/archive log file name>';
or you could use one rman command to catalog all the archive logs that were manually transferred and needed to be added to the controlfile at the standby site.

rman> catalog start with 'PATH_TO_ARCHIVELOGS/'; 
SQL> alter database recover managed standby database disconnect;


    b. If you have lots of archive log files to register, then you could simply use manual recovery without registering them.
Make sure the archive log files are located in the location specified by log_archive_dest_1 or standby_archive_dest.
If not, then you would need to use from 'location' attribute in the manual recovery clause.

SQL> alter database recover managed standby database cancel;
SQL> alter database recover automatic standby database;
or
SQL> alter database recover automatic from '/<path>/' standby database;

 
        /<path>/ is the location where your archive log files are located, which is different from the local archive destination or the standby archive destination.

Cause 9 : The archive log files are deleted from the primary before they are shipped and applied to the standby.

You use RMAN to do backup and archive log files are deleted after backup before they are shipped and applied to the standby.

Solution 9 Restore archive log files from backup, register them and use managed recovery to recover them; or use manual recovery without registering them.

Please refer to the Solution 8 for details.

If you couldn't find the archive log files anywhere, then you have to use RMAN incremental backup to roll forward your physical standby database.

If your primary database is small, you could simply take a full database backup from the primary
and restore it to the standby server to refresh or recreate your physical standby database.

To prevent this happens again, please reconsider your RMAN deletion policy.

If your archive log files are configured to be in a flash recovery area, then you could configure RMAN deletion policy to APPLIED ON STANDBY so that the archive log files are deleted after they are applied to the standby database.

Please refer to the document below for details:

Configure RMAN to purge archivelogs after applied on standby (Doc ID 728053.1)

http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/rman.htm#SBYDB00750

Oracle® Data Guard Concepts and Administration
10g Release 2 (10.2)
10.3.4 Deletion Policy for Archived Redo Log Files In Flash Recovery Areas

http://download.oracle.com/docs/cd/B28359_01/server.111/b28294/rman.htm#BAJHHAEB

Oracle® Data Guard Concepts and Administration
11g Release 1 (11.1)
11.3.2 RMAN Configurations at the Primary Database

11.3.3 RMAN Configurations at a Standby Database Where Backups are Performed

11.3.4 RMAN Configurations at a Standby Where Backups Are Not Performed

http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/rman.htm#SBYDB4854

Oracle® Data Guard Concepts and Administration
11g Release 2 (11.2)
11.3.2 RMAN Configurations at the Primary Database

11.3.3 RMAN Configurations at a Standby Database Where Backups are Performed

11.3.4 RMAN Configurations at a Standby Where Backups Are Not Performed

Cause 10 : New datafiles are added to the primary, but they are not added to the standby automatically.

This could be caused by standby_file_managment=manual on the standby database.
Sometimes the new datafiles failed to be added to the standby database automatically even though
standby_file_managment was set to AUTO.

You could see various errors. Please refer to the documents below:

Note 743101.1 ORA-01110 ORA-1122 ORA-01251 When Restoring With a Backup from a Standby DB
Note 304488.1 Using standby_file_management with Raw Devices
Note 759181.1 Dataguard Physical Standby occurs ORA-600 [2130] after added new node to primary RAC database.

Note 549078.1 ORA-600 [25016] Errors Trying To Recover Managed Standby Database

Bug 3596262 Add DATAFILE on primary can cause standby to raise ORA-1274

ORA-1274 Encountered on Physical Standby After Adding Datafile to Primary (Doc ID 388659.1)

ORA-01110: data file 7: '+<diskgroup>/<path>/<datafile_name>.300.740842175'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 142995
ORA-00600: internal error code, arguments: [6101], [0], [40], [128], [], [], [], [], [], [], [], []
Slave exiting with ORA-10562 exception
ORA-10562: Error occurred while applying redo to data block (file# 7, block# 237743)
ORA-10564: tablespace <tablespace_name>
ORA-01110: data file 7: '+<diskgroup>/<path>/<datafile_name>.300.740842175

Please work with Oracle Corruption team on the root cause of datafile corruptions. Please refer to the

following documents:

Prevention, Detection and Repair of Database Corruption (Doc ID 76375.1)

Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration (Doc ID 1302539.1)

Solution 10 Add the new datafiles to the standby database manually.

1) Please take a hot backup of new datafiles from the primary database.

2) Create a new standby controlfile from the primary database by

SQL>alter database create standby controlfile as '/tmp/controlf.ctl';


If datafiles are on ASM, please follow the note below and you could ignore the rest of steps:

Note 734862.1 Step By Step Guide On How To Recreate Standby Control File
When Datafiles Are On ASM And Using Oracle Managed Files

Or you could modify the wrong datafile name in the standby controlfile by alter database rename command. For example,

SQL> ALTER DATABASE RENAME FILE '<path/UNNAMED00003>' to '<path/real datafile name>';


3) If the new datafile location on the primary is different from the standby, please make sure
db_file_name_convert init parameter is set on the standby database.

Note 47325.1 Init.ora Parameter "DB_FILE_NAME_CONVERT" Reference Note

If db_file_name_convert init parameter has already been set, then you could ignore this step.


4) Cancel the managed recovery

 

SQL>alter database recover managed standby database cancel; 

5) set standby_file_management=manual on the standby database and shutdown the standby database.

SQL>alter system set standby_file_management=manual sid='*';
SQL>shutdown immediate; 


 
6) Copy the hot backup of the new datafiles and the new standby controlfile to the standby.
Please make sure the controlfiles are located in the right location with right names
according to the init parameter control_files. Please make sure the copied datafiles are
located in the right location as well according to name from v$datafile.

7) startup the standby database in mount mode and set standby_file_management=auto.

SQL>startup mount;
SQL>alter system set standby_file_management=auto sid='*';

 
8) Start the managed recovery.

SQL>alter database recover managed standby database disconnect;

 

Cause 11 : MRP can stall after control file contention at the standby.

MRP is waiting for a log that has already been shipped but does not appear in v$archived_logs and there are ORA-16146 errors reported in the standby alert log.

Solution 11 This is fixed in 10.2.0.4. The workaround is to restart the standby.

Reference: Note 6004936.8 Bug 6004936  Standby apply stops after ORA-16146


You can directly participate in the Discussion about this Note below. The Frame is the interactive live Discussion - not a Screenshot ;-)

 

Cause 12 : Cancelling managed recovery hangs.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

DGMGRL>EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-OFF';

The above commands hang.

 

Solution 12 Let's shutdown abort the standby, startup mount, then have a clean shutdown, check if there are server processes left over for the standby instance.

 

- Let's shutdown abort the standby database.

 SQL>shutdown abort;
 SQL>startup mount;

  
- Then have a clean shutdown of the standby database.

 SQL>shutdown immediate;

  
- Then check if there is any oracle process left over for the standby instance.

 ps -ef |grep -i <standby instance_name>

- If so, then kill the left over process(es).

kill -9 <ospid>

  
- If not, then startup the standby and recovery.

SQL>startup mount;
SQL>recover automatic standby database;

  
After recovering the available archive log files on the standby, you could enter CANCEL at the prompt, and then start managed recovery.

If you use broker, then you issue the commands from dgmgrl. For example,

% dgmgrl            /* from the standby server  */

DGMGRL> startup mount;

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON';

No comments:

Post a Comment