How to Patch Exadata / Upgrade Exadata to 18c and 19c -- Part 1 -- Introduction and Prerequisites
Once you have installed your new Exadata will come a time where you'll be asked :
"Shouldn't we patch the Exadata ?"
And the answer is "Yes, definitely".
And the answer is "Yes, definitely".
Indeed, Oracle releases 10 ~ 15 GB "Quarterly Full Stack" patches (aka Bundles) every quarter (for example: Patch 28689205: QUARTERLY FULL STACK DOWNLOAD PATCH FOR EXADATA (OCT2018 - 12.2.0.1)); these Bundles contain all the patches for all the components that make an Exadata. You will need (almost) nothing else to be able to patch your whole Exadata.
Based on dozens of successful Exadata patching sessions, I will clearly describe in this blog every step of this procedure to patch a 12.1 Exadata, a 12.2 Exadata as well as upgrading an Exadata to 18c.
This procedure has been successfully applied dozens of time (hard to say exactly how many times but more than 50 for sure) on pretty much every possible Exadata combinations and models.
This procedure has been successfully applied dozens of time (hard to say exactly how many times but more than 50 for sure) on pretty much every possible Exadata combinations and models.
Let's start with a preview of this patching with the order and the tools we will be using:
As it is quite a long odyssey, I will split this blog in different parts which are also a logic order to patch all the components :
0/ An advice
1/ General Information
2/ Some prerequisites it is worth doing before the maintenance
3/ The patching procedure
3.1/ Patching the cells (aka Storage servers)
3.2/ Patching the IB switches
3.3/ Patching the Database servers (aka Compute Nodes)
3.4/ Patching the Grid Infrastructure
3.5/ Upgrading the Grid Infrastructure:
3.5.1/ Upgrade Grid Infrastructure to 12.2
3.5.2/ Upgrade Grid Infrastructure to 18c
3.6/ Upgrading the Cisco Switch / enabling SSH access to the Cisco Switch
4/ The Rollback procedure
4.1/ Cell Rollback
4.2/ DB nodes Rollback
4.3/ IB Switches Rollback
5/ Troubleshooting
How to take an ILOM snapshot with the command line
How to reboot a database server using its ILOM (same procedure applies for a storage server)
How to manually reboot an Infiniband Switch
Restart SSH on a storage cell with no SSH access
How to re-image an Exadata database server
How to re-image an Exadata cell storage server
. . . more to come . . .
6/ Timing
0/ An advice
1/ General Information
2/ Some prerequisites it is worth doing before the maintenance
3/ The patching procedure
3.1/ Patching the cells (aka Storage servers)
3.2/ Patching the IB switches
3.3/ Patching the Database servers (aka Compute Nodes)
3.4/ Patching the Grid Infrastructure
3.5/ Upgrading the Grid Infrastructure:
3.5.1/ Upgrade Grid Infrastructure to 12.2
3.5.2/ Upgrade Grid Infrastructure to 18c
3.6/ Upgrading the Cisco Switch / enabling SSH access to the Cisco Switch
4/ The Rollback procedure
4.1/ Cell Rollback
4.2/ DB nodes Rollback
4.3/ IB Switches Rollback
5/ Troubleshooting
How to take an ILOM snapshot with the command line
How to reboot a database server using its ILOM (same procedure applies for a storage server)
How to manually reboot an Infiniband Switch
Restart SSH on a storage cell with no SSH access
How to re-image an Exadata database server
How to re-image an Exadata cell storage server
. . . more to come . . .
6/ Timing
0/ An advice
First of all, please strongly keep this advice in mind:
Do NOT continue to the next step before a failed step is properly resolved.
Indeed, everything that needs to be redundant is redundant and it is supported to run different versions between servers. In the MOS note "Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)", we can read that :It is supported to run different Exadata versions between servers. For example, some storage servers may run 11.2.2.4.2 while others run 11.2.3.1.1, or all storage servers may run 11.2.3.1.1 while database servers run 11.2.2.4.2. However, it is highly recommended that this be only a temporary configuration that exists for the purpose and duration of rolling upgrade.
Then if, when patching your cells for example, one cell does not reboot, stop here, do not continue, do not force patch the next one. Indeed, everything will still be working fine and in a supported manner with one cell down (I did it on production, no user could notice anything), it will most likely not be the case with 2 cells down. If this kind of issue happens, have a look at the troubleshooting section of this blog and open a MOS Sev 1.1/ General Information
Please find some information you need to know before starting to patch your Exadata :
- There is no difference in the procedure whether you patch a 12.1 Exadata, a 12.2 Exadata or you upgrade an Exadata to 18c (18c is a patchset of 12.2); the examples of this blog are from a recent maintenance to upgrade an Exadata to 18c
- It is better to have a basic understanding of what is an Exadata before jumping in this patch procedure
- This procedure does not apply to an ODA (Oracle Database Appliance)
- I will use the /Oct2018_Bundle FS to save the Bundle in the examples of this blog
- I use the "DB node" term here, it means "database node", aka "Compute node"; the nodes where the Grid Infrastructure and the database are running, I will also use the db01 term for the database node number 1, usually named "cluster_name"db01
- I use the "cell" word aka "storage servers", the servers that manage your storage. I will also use cel01 for the storage server number 1, usually named "cluster_name"cel01
- It is good to have the screen utility installed; if not, use nohup
- Almost all the procedure will be executed as root
- I will be patching the IB Switches from the DB node 1 server
- I will be patching the cells from the DB node 1 server
- I will be patching the DB nodes from the cel01 server
- I will not cover the databases Homes as there is nothing specific to Exadata here
- I will be using the rac-status.sh script to easily check the status of the resources of the Exadata as well as easily follow the patch progress
- I will be using the exa-versions.sh script to easily check the versions of the Exadata components
- I will be using the cell-status.sh script to easily check the status of the cell and grid disks of the storage servers
1/ Some pre requisites it is worth doing before the maintenance
I highly recommend executing these pre requisites as early as possible. The sooner you discover an issue in these pre requisites, the better.
1.1/ Download and unzip the Bundle
Review the Exadata general note (Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)) to find the latest Bundle, download it and unzip it; be sure that every directory is owned by oracle:dba to avoid any issue in the future :
[oracle@myclusterdb01]$ cd /Oct2018_Bundle [oracle@myclusterdb01]$ ls -ltr total 9609228 -rw-r--r-- 1 oracle oinstall 560430690 Nov 16 18:24 p28689205_121020_Linux-x86-64_10of10.zip -rw-r--r-- 1 oracle oinstall 1030496554 Nov 16 18:26 p28689205_121020_Linux-x86-64_1of10.zip -rw-r--r-- 1 oracle oinstall 1032681260 Nov 16 18:27 p28689205_121020_Linux-x86-64_2of10.zip -rw-r--r-- 1 oracle oinstall 1037111138 Nov 16 18:29 p28689205_121020_Linux-x86-64_3of10.zip -rw-r--r-- 1 oracle oinstall 1037009057 Nov 16 18:31 p28689205_121020_Linux-x86-64_4of10.zip -rw-r--r-- 1 oracle oinstall 1037185003 Nov 16 18:33 p28689205_121020_Linux-x86-64_5of10.zip -rw-r--r-- 1 oracle oinstall 1026218494 Nov 16 18:35 p28689205_121020_Linux-x86-64_6of10.zip -rw-r--r-- 1 oracle oinstall 1026514887 Nov 16 18:36 p28689205_121020_Linux-x86-64_7of10.zip -rw-r--r-- 1 oracle oinstall 1026523343 Nov 16 18:39 p28689205_121020_Linux-x86-64_8of10.zip -rw-r--r-- 1 oracle oinstall 1025677014 Nov 16 18:41 p28689205_121020_Linux-x86-64_9of10.zip [oracle@myclusterdb01]$ for I in `ls p28689205_121020_Linux-x86-64*f10.zip` do unzip $I done Archive: p28689205_121020_Linux-x86-64_10of10.zip inflating: 28689205.tar.splitaj ... Archive: p28689205_121020_Linux-x86-64_9of10.zip inflating: 28689205.tar.splitai [oracle@myclusterdb01]$ cat *.tar.* | tar -xvf - 28689205/ 28689205/automation/ 28689205/automation/bp1-out-of-place-switchback.xml 28689205/automation/bp1-auto-inplace-rolling-automation.xml ...Note : you can use unzip -q to make unzip silent
1.2/ Download the latest patchmgr
patchmgr is the orchestation tool who will perform most of the job here. Its version can change quite often so I recommend double checking that the version shipped with the Bundle is the latest one:
-- patchmgr is in the below directory : /FileSystem/Bundle_patch_number/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/Version Example for the October 2018 PSU : /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_181800_Linux-x86-64.zip -- Patchmgr is delivered on Metalink through this patch: Patch 21634633: DBSERVER.PATCH.ZIP ORCHESTRATOR PLUS DBNU - ARU PLACEHOLDER -- Download the latest patchmgr version and replace it in the Bundle directory [oracle@myclusterdb01]$ cp /tmp/p21634633_191000_Linux-x86-64.zip /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/. [oracle@myclusterdb01]$ ls -ltr /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/ total 563944 -rw-r--r-- 1 oracle oinstall 581 Oct 16 09:51 README.txt -rw-r--r-- 1 oracle oinstall 173620439 Oct 16 09:51 p21634633_181800_Linux-x86-64.zip -rw-r--r-- 1 oracle oinstall 403531297 Nov 21 19:45 p21634633_191000_Linux-x86-64.zip [oracle@myclusterdb01]$ mv /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_181800_Linux-x86-64.zip /Oct2018_Bundle/27475857/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/old_p21634633_181800_Linux-x86-64.zip [oracle@myclusterdb01]$
1.3/ SSH keys
For this step, if you are not confident with the dbs_group, cell_group, etc... files, here is how to create them as I have described it in this post (look for "dbs_group" in the post).
We would need few SSH keys deployed in order to ease the patches application :
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > /root/dbs_group [root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep cel | awk '{print $6}' | sort > /root/cell_group [root@myclusterdb01 ~]# cat /root/dbs_group ~/cell_group > /root/all_group [root@myclusterdb01 ~]# ibswitches | awk '{print $10}' | sort > /root/ib_group [root@myclusterdb01 ~]#
We would need few SSH keys deployed in order to ease the patches application :
- root ssh keys deployed from the db01 server to the IB Switches (you will have to enter the root password once for each IB Switch)
[root@myclusterdb01 ~]# cat ~/ib_group myclustersw-ib2 myclustersw-ib3 [root@myclusterdb01 ~]# dcli -g ~/ib_group -l root -k -s '-o StrictHostKeyChecking=no' root@myclustersw-ib3's password: root@myclustersw-ib2's password: myclustersw-ib2: ssh key added myclustersw-ib3: ssh key added [root@myclusterdb01 ~]#
- root ssh keys deployed from the cel01 server to all the database nodes (you will have to enter the root password once for each database server)
[root@myclustercel01 ~]# cat ~/dbs_group myclusterdb01 myclusterdb02 myclusterdb03 myclusterdb04 [root@myclustercel01 ~]# dcli -g ~/dbs_group -l root -k -s '-o StrictHostKeyChecking=no' root@myclusterdb01's password: root@myclusterdb03's password: root@myclusterdb04's password: root@myclusterdb02's password: myclusterdb01: ssh key added myclusterdb02: ssh key added myclusterdb03: ssh key added myclusterdb04: ssh key added [root@myclustercel01 ~]#
- root ssh keys deployed from the db01 server to all the cells (you will have to enter the root password once for each cell)
[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root hostname myclustercel01: myclustercel01.mydomain.com myclustercel02: myclustercel02.mydomain.com myclustercel03: myclustercel03.mydomain.com myclustercel04: myclustercel04.mydomain.com [root@myclusterdb01 ~]# dcli -g ~/cell_group -l root -k -s '-o StrictHostKeyChecking=no' root@myclustercel04's password: ... root@myclustercel03's password: myclustercel01: ssh key added ... myclustercel06: ssh key added [root@myclusterdb01 ~]#
1.4/ Upgrade opatch
It is highly recommended to upgrade opatch before any patching activity and this Bundle is not an exception. Please find a detailled procedure to quickly upgrade opatch with dcli in this post.
Please note that upgrading opatch will also allow you to be ocm.rsp-free !
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid /u01/app/12.1.0.2/grid/OPatch/opatch version | grep Version [grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid -f /Oct2018_Bundle/28183368/Database/OPatch/12.2/12.2.0.1.*/p6880880_12*_Linux-x86-64.zip -d /tmp [grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid "unzip -o /tmp/p6880880_12*_Linux-x86-64.zip -d /u01/app/12.1.0.2/grid; /u01/app/12.1.0.2/grid/OPatch/opatch version; rm /tmp/p6880880_12*_Linux-x86-64.zip" | grep Version
1.5/ Run the prechecks
It is very important to run those prechecks and take a good care of the outputs. They have to be 100% successful to ensure a smooth application of the patches.
1.4.1/ Cell patching prechecks
First of all, you'll have to unzip the patch:
[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch [root@myclusterdb01 ~]# unzip -q p28633752_*_Linux-x86-64.zip
-- This should create a patch_18.1.9.0.0.181006 directory with the cell patch
And start the pre requisites from database node 1:
[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch/patch_18.1.9.0.0.181006 [root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -patch_check_prereq -rolling
Check disk_repair_time
You have to be aware and understand this parameter. Indeed, disk_repair_time specifies the amount of time before ASM drops a disk after it is taken offline -- the default for this parameter is 3.6h.
Oracle recommends to set this parameter to 8h when patching a cell. But as we will see in the cell patching logs, patchmgr's timeout for this operation is 600 minutes (then 10 hours) and as I had issues in the past with a very long cell patching, I now use to set this parameter to 24h as Oracle has recommended when I faced very long cell patching. I would then recommend anyone to set it to 24h when patching -- this is what I will describe in the cell patchingh procedure. We will just have a look at the value of the parameter for awareness here.
Please note that this prerequisite is only needed for a rolling patch application.
SQL> select dg.name as diskgroup, a.name as attribute, a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and (a.name like '%repair_time' or a.name = 'compatible.asm'); DISKGROUP ATTRIBUTE VALUE ---------------- ----------------------- ---------------------------------------- DATA disk_repair_time 3.6h DATA compatible.asm 11.2.0.2.0 DBFS_DG disk_repair_time 3.6h DBFS_DG compatible.asm 11.2.0.2.0 RECO disk_repair_time 3.6h RECO compatible.asm 11.2.0.2.0 6 rows selected.
1.4.2/ DB Nodes prechecks
As we cannot patch a node we are connected to, we will start the patch from a cell server (myclustercel01). To be able to do that, we first need to copy patchmgr and the ISO file on this cell server. Do NOT unzip the ISO file, patchmgr will take care of it.
I create a /tmp/SAVE directory to patch the database servers. Having a SAVE directory in /tmp is a good idea to avoid the automatic maintenance jobs that purge /tmp every day (directories > 5 MB and older than 1 day). If not, these maintenance jobs will delete the dbnodeupdate.zip file that is mandatory to apply the patch -- this won't survive a reboot though.
[root@myclusterdb01 ~]# ssh root@myclusterdb01 rm -fr /tmp/SAVE [root@myclusterdb01 ~]# ssh root@myclusterdb01 mkdir /tmp/SAVE [root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_*_Linux-x86-64.zip root@myclusterdb01:/tmp/SAVE/. [root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataDatabaseServer_OL6/p28666206_*_Linux-x86-64.zip root@myclusterdb01:/tmp/SAVE/. [root@myclusterdb01 ~]# scp ~/dbs_group root@myclusterdb01:~/. [root@myclusterdb01 ~]# ssh root@myclusterdb01 [root@myclustercel01 ~]# cd /tmp/SAVE [root@myclustercel01 ~]# unzip -q p21634633_*_Linux-x86-64.zip
This should create a dbserver_patch_5.180720 directory (the name may be slightly different if you use a different patchmgr than the one shipped with the Bundle)
And start the pre requisites:
[root@myclusterdb01 ~]# cd /tmp/SAVE/dbserver_patch_* [root@myclusterdb01 ~]# ./patchmgr -dbnodes ~/dbs_group -precheck -iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 18.1.9.0.0.181006 -allow_active_network_mounts
-- You can safely ignore the below warning (this is a patchmgr bug for a while) if the GI version is > 11.2.0.2 -- which is most likely the case
(*) - Yum rolling update requires fix for 11768055 when Grid Infrastructure is below 11.2.0.2 BP12
Note : if your source version is > 12.1.2.1.1, you can use the -allow_active_network_mounts parameter to be able to patch all the DB nodes without taking care of the NFS. In the oposite, if you have some NFS mounted, you will have some error messages, you can ignore them at this stage, we will umount the NFS manually before patching the DB nodes
Dependencies issues
You may have dependencies issues reported by the database servers pre-requisites. I have documented the 2 cases you can be in and the 2 ways you can fix this:
- When there is no OS upgrade, follow this blog.
- When there is an OS upgrade (from 12c or 18c to 19c or above), please have a look at this blog.
1.4.3/ IB Switches prechecks
- To avoid issues with NFS/ZFS when rebooting the IB Switches (I got a lot in the past, not sure if it came from the client configuration but it is always unpleasant), I recommend copying the patch outside of any NFS/ZFS
- This patch is ~ 2.5 GB so be careful not to fill / if you copy it into /tmp, if not, choose another local FS
[root@myclusterdb01 ~]# du -sh /tmp/IB_PATCHING [root@myclusterdb01 ~]# rm -fr /tmp/IB_PATCHING [root@myclusterdb01 ~]# mkdir /tmp/IB_PATCHING [root@myclusterdb01 ~]# unzip -q /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch/p28633752_*_Linux-x86-64.zip -d /tmp/IB_PATCHING [root@myclusterdb01 ~]# cd /tmp/IB_PATCHING/patch_18.1.9.0.0.181006 [root@myclusterdb01 ~]# ./patchmgr -ibswitches ~/ib_group -ibswitch_precheck -upgrade
Note : despite what patchmgr documentation says, you have to specify an ib_group configuration file containing the list of your IB Switches
If the pre requisites show some conflicts to be resolved, please have a look at this blog where I explain how to manage the OS dependencies issues but do NOT use the -modify_at_prereq option straight away.
1.4.4/ Grid Infrastructure prechecks
To start with, be sure that the patch has been unzipped (as the GI owner user to avoid any further permission issue):
[grid@myclusterdb01 ~]$ cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU [grid@myclusterdb01 ~]$ unzip -q p28714316*_Linux-x86-64.zip
-- This should create a 27968010 directory.
And start the pre requisites on each node:
[root@myclusterdb01 ~]# . oraenv <<< `grep "^+ASM" /etc/oratab | awk -F ":" '{print $1}'` [root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU/28714316 [root@myclusterdb01 ~]# /u01/app/12.1.0.2/grid/OPatch/opatchauto apply -oh /u01/app/12.1.0.2/grid -analyze
Alternatively, you can start the GI pre requisites on all nodes in parallel in one command :
[root@myclusterdb01 ~]# dcli -g ~/dbs_group -l root "cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU/28714316; /u01/app/12.1.0.2/grid/OPatch/opatchauto apply -oh /u01/app/12.1.0.2/grid -analyze"
Note : You will most likely see some warnings here, check the logfiles and they will probably be due to some patches that will be rolled back as they will not be useful any more.
Now that everything is downloaded, unzipped, updated, and that every pre requisite is successful, we can safely jump to the patching procedure in part 2 !
Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 coming soon / Part 6