Saturday, April 18, 2020

How to Patch Exadata / Upgrade Exadata to 18c and 19c -- Part 1 -- Introduction and Prerequisites

How to Patch Exadata / Upgrade Exadata to 18c and 19c -- Part 1 -- Introduction and Prerequisites


Once you have installed your new Exadata will come a time where you'll be asked :

"Shouldn't we patch the Exadata ?"
And the answer is "Yes, definitely".

Indeed, Oracle releases 10 ~ 15 GB "Quarterly Full Stack" patches (aka Bundles) every quarter (for example: Patch 28689205: QUARTERLY FULL STACK DOWNLOAD PATCH FOR EXADATA (OCT2018 - 12.2.0.1)); these Bundles contain all the patches for all the components that make an Exadata. You will need (almost) nothing else to be able to patch your whole Exadata.

Based on dozens of successful Exadata patching sessions, I will clearly describe in this blog every step of this procedure to patch a 12.1 Exadata, a 12.2 Exadata as well as upgrading an Exadata to 18c.
This procedure has been successfully applied dozens of time (hard to say exactly how many times but more than 50 for sure) on pretty much every possible Exadata combinations and models.

Let's start with a preview of this patching with the order and the tools we will be using:




0/ An advice

First of all, please strongly keep this advice in mind:
Do NOT continue to the next step before a failed step is properly resolved.
Indeed, everything that needs to be redundant is redundant and it is supported to run different versions between servers. In the MOS note "Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)", we can read that :
It is supported to run different Exadata versions between servers. For example, some storage servers may run 11.2.2.4.2 while others run 11.2.3.1.1, or all storage servers may run 11.2.3.1.1 while database servers run 11.2.2.4.2. However, it is highly recommended that this be only a temporary configuration that exists for the purpose and duration of rolling upgrade.
Then if, when patching your cells for example, one cell does not reboot, stop here, do not continue, do not force patch the next one. Indeed, everything will still be working fine and in a supported manner with one cell down (I did it on production, no user could notice anything), it will most likely not be the case with 2 cells down. If this kind of issue happens, have a look at the troubleshooting section of this blog and open a MOS Sev 1.

1/ General Information

Please find some information you need to know before starting to patch your Exadata :
  • There is no difference in the procedure whether you patch a 12.1 Exadata, a 12.2 Exadata or you upgrade an Exadata to 18c (18c is a patchset of 12.2); the examples of this blog are from a recent maintenance to upgrade an Exadata to 18c
  • It is better to have a basic understanding of what is an Exadata before jumping in this patch procedure
  • This procedure does not apply to an ODA (Oracle Database Appliance)
  • I will use the /Oct2018_Bundle FS to save the Bundle in the examples of this blog
  • I use the "DB node" term here, it means "database node", aka "Compute node"; the nodes where the Grid Infrastructure and the database are running, I will also use the db01 term for the database node number 1, usually named "cluster_name"db01
  • I use the "cell" word aka "storage servers", the servers that manage your storage. I will also use cel01 for the storage server number 1, usually named "cluster_name"cel01
  • It is good to have the screen utility installed; if not, use nohup
  • Almost all the procedure will be executed as root
  • I will be patching the IB Switches from the DB node 1 server
  • I will be patching the cells from the DB node 1 server
  • I will be patching the DB nodes from the cel01 server
  • I will not cover the databases Homes as there is nothing specific to Exadata here
  • I will be using the rac-status.sh script to easily check the status of the resources of the Exadata as well as easily follow the patch progress
  • I will be using the exa-versions.sh script to easily check the versions of the Exadata components
  • I will be using the cell-status.sh script to easily check the status of the cell and grid disks of the storage servers

1/ Some pre requisites it is worth doing before the maintenance

I highly recommend executing these pre requisites as early as possible. The sooner you discover an issue in these pre requisites, the better.

1.1/ Download and unzip the Bundle

Review the Exadata general note (Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)) to find the latest Bundle, download it and unzip it; be sure that every directory is owned by oracle:dba to avoid any issue in the future :
[oracle@myclusterdb01]$ cd /Oct2018_Bundle
[oracle@myclusterdb01]$ ls -ltr
total 9609228
-rw-r--r-- 1 oracle oinstall 560430690 Nov 16 18:24 p28689205_121020_Linux-x86-64_10of10.zip
-rw-r--r-- 1 oracle oinstall 1030496554 Nov 16 18:26 p28689205_121020_Linux-x86-64_1of10.zip
-rw-r--r-- 1 oracle oinstall 1032681260 Nov 16 18:27 p28689205_121020_Linux-x86-64_2of10.zip
-rw-r--r-- 1 oracle oinstall 1037111138 Nov 16 18:29 p28689205_121020_Linux-x86-64_3of10.zip
-rw-r--r-- 1 oracle oinstall 1037009057 Nov 16 18:31 p28689205_121020_Linux-x86-64_4of10.zip
-rw-r--r-- 1 oracle oinstall 1037185003 Nov 16 18:33 p28689205_121020_Linux-x86-64_5of10.zip
-rw-r--r-- 1 oracle oinstall 1026218494 Nov 16 18:35 p28689205_121020_Linux-x86-64_6of10.zip
-rw-r--r-- 1 oracle oinstall 1026514887 Nov 16 18:36 p28689205_121020_Linux-x86-64_7of10.zip
-rw-r--r-- 1 oracle oinstall 1026523343 Nov 16 18:39 p28689205_121020_Linux-x86-64_8of10.zip
-rw-r--r-- 1 oracle oinstall 1025677014 Nov 16 18:41 p28689205_121020_Linux-x86-64_9of10.zip
[oracle@myclusterdb01]$ for I in `ls p28689205_121020_Linux-x86-64*f10.zip`
do
unzip $I
done
Archive: p28689205_121020_Linux-x86-64_10of10.zip
 inflating: 28689205.tar.splitaj
...
Archive: p28689205_121020_Linux-x86-64_9of10.zip
 inflating: 28689205.tar.splitai
[oracle@myclusterdb01]$ cat *.tar.* | tar -xvf -
28689205/
28689205/automation/
28689205/automation/bp1-out-of-place-switchback.xml
28689205/automation/bp1-auto-inplace-rolling-automation.xml

...
Note : you can use unzip -q to make unzip silent

1.2/ Download the latest patchmgr

patchmgr is the orchestation tool who will perform most of the job here. Its version can change quite often so I recommend double checking that the version shipped with the Bundle is the latest one:
-- patchmgr is in the below directory :
/FileSystem/Bundle_patch_number/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/Version
Example for the October 2018 PSU :
/Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_181800_Linux-x86-64.zip

-- Patchmgr is delivered on Metalink through this patch:
Patch 21634633: DBSERVER.PATCH.ZIP ORCHESTRATOR PLUS DBNU - ARU PLACEHOLDER

-- Download the latest patchmgr version and replace it in the Bundle directory
[oracle@myclusterdb01]$ cp /tmp/p21634633_191000_Linux-x86-64.zip /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/.
[oracle@myclusterdb01]$ ls -ltr /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/
total 563944
-rw-r--r-- 1 oracle oinstall       581 Oct 16 09:51 README.txt
-rw-r--r-- 1 oracle oinstall 173620439 Oct 16 09:51 p21634633_181800_Linux-x86-64.zip
-rw-r--r-- 1 oracle oinstall 403531297 Nov 21 19:45 p21634633_191000_Linux-x86-64.zip
[oracle@myclusterdb01]$ mv /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_181800_Linux-x86-64.zip /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/old_p21634633_181800_Linux-x86-64.zip
[oracle@myclusterdb01]$

1.3/ SSH keys

For this step, if you are not confident with the dbs_groupcell_group, etc... files, here is how to create them as I have described it in this post (look for "dbs_group" in the post).
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > /root/dbs_group
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep cel | awk '{print $6}' | sort > /root/cell_group
[root@myclusterdb01 ~]# cat /root/dbs_group ~/cell_group > /root/all_group
[root@myclusterdb01 ~]# ibswitches | awk '{print $10}' | sort > /root/ib_group
[root@myclusterdb01 ~]#

We would need few SSH keys deployed in order to ease the patches application :
  • root ssh keys deployed from the db01 server to the IB Switches (you will have to enter the root password once for each IB Switch)
[root@myclusterdb01 ~]# cat ~/ib_group
myclustersw-ib2
myclustersw-ib3
[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustersw-ib3's password:
root@myclustersw-ib2's password:
myclustersw-ib2: ssh key added
myclustersw-ib3: ssh key added
[root@myclusterdb01 ~]#
  • root ssh keys deployed from the cel01 server to all the database nodes (you will have to enter the root password once for each database server)
[root@myclustercel01 ~]# cat ~/dbs_group
myclusterdb01
myclusterdb02
myclusterdb03
myclusterdb04
[root@myclustercel01 ~]# dcli -g ~/dbs_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclusterdb01's password:
root@myclusterdb03's password:
root@myclusterdb04's password:
root@myclusterdb02's password:
myclusterdb01: ssh key added
myclusterdb02: ssh key added
myclusterdb03: ssh key added
myclusterdb04: ssh key added
[root@myclustercel01 ~]#
  • root ssh keys deployed from the db01 server to all the cells (you will have to enter the root password once for each cell)
[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root hostname
myclustercel01: myclustercel01.mydomain.com
myclustercel02: myclustercel02.mydomain.com
myclustercel03: myclustercel03.mydomain.com
myclustercel04: myclustercel04.mydomain.com
[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustercel04's password:
...
root@myclustercel03's password:
myclustercel01: ssh key added
...
myclustercel06: ssh key added
[root@myclusterdb01 ~]#

1.4/ Upgrade opatch

It is highly recommended to upgrade opatch before any patching activity and this Bundle is not an exception. Please find a detailled procedure to quickly upgrade opatch with dcli in this post.
Please note that upgrading opatch will also allow you to be ocm.rsp-free !
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid /u01/app/12.1.0.2/grid/OPatch/opatch version | grep Version
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid -f /Oct2018_Bundle/28183368/Database/OPatch/12.2/12.2.0.1.*/p6880880_12*_Linux-x86-64.zip -d /tmp
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid "unzip -o /tmp/p6880880_12*_Linux-x86-64.zip -d /u01/app/12.1.0.2/grid; /u01/app/12.1.0.2/grid/OPatch/opatch version; rm /tmp/p6880880_12*_Linux-x86-64.zip" | grep Version

1.5/ Run the prechecks

It is very important to run those prechecks and take a good care of the outputs. They have to be 100% successful to ensure a smooth application of the patches.

1.4.1/ Cell patching prechecks

First of all, you'll have to unzip the patch:
[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch
[root@myclusterdb01 ~]# unzip -q p28633752_*_Linux-x86-64.zip
-- This should create a patch_18.1.9.0.0.181006 directory with the cell patch
And start the pre requisites from database node 1:
[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch/patch_18.1.9.0.0.181006
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -patch_check_prereq -rolling

Check disk_repair_time

You have to be aware and understand this parameter. Indeed, disk_repair_time specifies the amount of time before ASM drops a disk after it is taken offline -- the default for this parameter is 3.6h.
Oracle recommends to set this parameter to 8h when patching a cell. But as we will see in the cell patching logs, patchmgr's timeout for this operation is 600 minutes (then 10 hours) and as I had issues in the past with a very long cell patching, I now use to set this parameter to 24h as Oracle has recommended when I faced very long cell patching. I would then recommend anyone to set it to 24h when patching -- this is what I will describe in the cell patchingh procedure. We will just have a look at the value of the parameter for awareness here.
Please note that this prerequisite is only needed for a rolling patch application.
SQL> select dg.name as diskgroup, a.name as attribute, a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and (a.name like '%repair_time' or a.name = 'compatible.asm');

DISKGROUP          ATTRIBUTE              VALUE
---------------- ----------------------- ----------------------------------------
DATA             disk_repair_time         3.6h
DATA             compatible.asm           11.2.0.2.0
DBFS_DG          disk_repair_time         3.6h
DBFS_DG          compatible.asm           11.2.0.2.0
RECO             disk_repair_time         3.6h
RECO             compatible.asm           11.2.0.2.0

6 rows selected.

1.4.2/ DB Nodes prechecks

As we cannot patch a node we are connected to, we will start the patch from a cell server (myclustercel01). To be able to do that, we first need to copy patchmgr and the ISO file on this cell server. Do NOT unzip the ISO file, patchmgr will take care of it.
I create a /tmp/SAVE directory to patch the database servers. Having a SAVE directory in /tmp is a good idea to avoid the automatic maintenance jobs that purge /tmp every day (directories > 5 MB and older than 1 day). If not, these maintenance jobs will delete the dbnodeupdate.zip file that is mandatory to apply the patch -- this won't survive a reboot though.
[root@myclusterdb01 ~]# ssh root@myclusterdb01 rm -fr /tmp/SAVE
[root@myclusterdb01 ~]# ssh root@myclusterdb01 mkdir /tmp/SAVE
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_*_Linux-x86-64.zip root@myclusterdb01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataDatabaseServer_OL6/p28666206_*_Linux-x86-64.zip root@myclusterdb01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp ~/dbs_group root@myclusterdb01:~/.
[root@myclusterdb01 ~]# ssh root@myclusterdb01
[root@myclustercel01 ~]# cd /tmp/SAVE
[root@myclustercel01 ~]# unzip -q p21634633_*_Linux-x86-64.zip
This should create a dbserver_patch_5.180720 directory (the name may be slightly different if you use a different patchmgr than the one shipped with the Bundle)
And start the pre requisites:
[root@myclusterdb01 ~]# cd /tmp/SAVE/dbserver_patch_*
[root@myclusterdb01 ~]# ./patchmgr -dbnodes ~/dbs_group -precheck  -iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 18.1.9.0.0.181006 -allow_active_network_mounts
-- You can safely ignore the below warning (this is a patchmgr bug for a while) if the GI version is > 11.2.0.2 -- which is most likely the case
(*) - Yum rolling update requires fix for 11768055 when Grid Infrastructure is below 11.2.0.2 BP12
Note : if your source version is > 12.1.2.1.1, you can use the -allow_active_network_mounts parameter to be able to patch all the DB nodes without taking care of the NFS. In the oposite, if you have some NFS mounted, you will have some error messages, you can ignore them at this stage, we will umount the NFS manually before patching the DB nodes

Dependencies issues

You may have dependencies issues reported by the database servers pre-requisites. I have documented the 2 cases you can be in and the 2 ways you can fix this:
  • When there is no OS upgrade, follow this blog.
  • When there is an OS upgrade (from 12c or 18c to 19c or above), please have a look at this blog.

1.4.3/ IB Switches prechecks

- To avoid issues with NFS/ZFS when rebooting the IB Switches (I got a lot in the past, not sure if it came from the client configuration but it is always unpleasant), I recommend copying the patch outside of any NFS/ZFS
- This patch is ~ 2.5 GB so be careful not to fill / if you copy it into /tmp, if not, choose another local FS
[root@myclusterdb01 ~]# du -sh /tmp/IB_PATCHING
[root@myclusterdb01 ~]# rm -fr /tmp/IB_PATCHING
[root@myclusterdb01 ~]# mkdir /tmp/IB_PATCHING
[root@myclusterdb01 ~]# unzip -q /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch/p28633752_*_Linux-x86-64.zip -d /tmp/IB_PATCHING
[root@myclusterdb01 ~]# cd /tmp/IB_PATCHING/patch_18.1.9.0.0.181006
[root@myclusterdb01 ~]# ./patchmgr -ibswitches ~/ib_group -ibswitch_precheck -upgrade
Note : despite what patchmgr documentation says, you have to specify an ib_group configuration file containing the list of your IB Switches
If the pre requisites show some conflicts to be resolved, please have a look at this blog where I explain how to manage the OS dependencies issues but do NOT use the -modify_at_prereq option straight away.

1.4.4/ Grid Infrastructure prechecks

To start with, be sure that the patch has been unzipped (as the GI owner user to avoid any further permission issue):
[grid@myclusterdb01 ~]$ cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU
[grid@myclusterdb01 ~]$ unzip -q p28714316*_Linux-x86-64.zip
-- This should create a 27968010 directory.
And start the pre requisites on each node:
[root@myclusterdb01 ~]# . oraenv <<< `grep "^+ASM" /etc/oratab | awk -F ":" '{print $1}'`
[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU/28714316
[root@myclusterdb01 ~]# /u01/app/12.1.0.2/grid/OPatch/opatchauto apply -oh /u01/app/12.1.0.2/grid -analyze
Alternatively, you can start the GI pre requisites on all nodes in parallel in one command :
[root@myclusterdb01 ~]# dcli -g ~/dbs_group -l root "cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU/28714316; /u01/app/12.1.0.2/grid/OPatch/opatchauto apply -oh /u01/app/12.1.0.2/grid -analyze"
Note : You will most likely see some warnings here, check the logfiles and they will probably be due to some patches that will be rolled back as they will not be useful any more.
Now that everything is downloaded, unzipped, updated, and that every pre requisite is successful, we can safely jump to the patching procedure in part 2 !
              Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 coming soon  Part 6

Wednesday, April 15, 2020

EM 13c: Target DB Home Page Shows 'Failed To Connect To The Target Database, Regions That Pull Real-time Data' (Doc ID 2474441.1)

In this Document
Symptoms
Cause
Solution
References


APPLIES TO:

Enterprise Manager for Oracle Database - Version 13.2.0.0.0 and later
Information in this document applies to any platform.

SYMPTOMS

Getting the below error message on 12.2 target Database Home Page/ target cluster database home page in OEM Console:

Failed to connect to the target database. Regions that pull real-time data from the database will not be displayed.
error_screenshot
emoms.trc
2018-11-09 08:37:00,738 [MetricCollector:SITEMAP_THREAD1513:15] ERROR rt.DbMetricCollectorTarget logp.251 - Exception  in getConnection()<Database_name>:oracle_databasejava.sql.SQLException: Connection Cache with this Cache Name does not exist
2018-11-09 08:37:00,738 [MetricCollector:SITEMAP_THREAD1513:15] ERROR perf.sitemap logp.251 - java.sql.SQLException: Connection Cache with this Cache Name does not exist
java.sql.SQLException: Connection Cache with this Cache Name does not exist
at oracle.jdbc.pool.OracleConnectionCacheManager.purgeCache(OracleConnectionCacheManager.java:947)
at oracle.sysman.emSDK.core.util.jdbc.ConnectionCache.close(ConnectionCache.java:345)

CAUSE

Issue is due to the following bug:
Bug 28513706 - 13c2EM: DB home page shows Connection Cache with this Cache Name does not exist

SOLUTION

Perform the following steps on the OMS server as a work around until the Bug 28513706 has been fixed

1. <OMS_HOME>/bin/emctl set property -name use_pooled_target_connections -value false
2. Restart oms  
  <OMS_HOME>/bin/emctl stop oms -all
  <OMS_HOME>/bin/emctl start oms
Note: We are facing the issue while using connection pooling. Disabling the parameter use_pooled_target_connections will just use traditional way to connect db. Connection pooling is an jdbc feature where connection are created as a pool and will be used.  


REFERENCES

BUG:28513706 - 13C2EM: DB HOME PAGE SHOWS CONNECTION CACHE WITH THIS CACHE NAME DOES NOT EXIST

NOTE:2308038.1 - 13c EM: Target db home page shows Failed to connect to the target database, Regions that pull real-time data from the database will not be displayed
NOTE:2338876.1 - EM13c : Exception while loading RAC Database Home Page. Connection Cache with this Cache Name does not exist