Tuesday, July 27, 2021

DNS and NTP changes on Exadata

 DNS and NTP involves the changes in the four components on Exadata.

 - DB nodes

 - Cell nodes

 - IB Switches

 - Ethernet Switches

---IB Switches DNS--


If your switch is using firmware 2.0.4 or later:


ssh ilom-admin@99.99.99.12/13

-> set /SP/clients/dns nameserver=99.99.99.10,99.99.99.11

---IB Switches NTP--

ssh ilom-admin@99.99.99.12/13

-> show -d properties /SP/clock

  /SP/clock

Properties:

datetime = Mon Jul 26 21:44:16 2021

timezone = +04 (Asia/Baku)

uptime = 464 days, 06:34:49

usentpserver = enabled

-> show -d properties /SP/clients/ntp/server/1

  /SP/clients/ntp/server/1

Properties:

address = 88.88.88.250

-> show -d properties /SP/clients/ntp/server/2

  /SP/clients/ntp/server/2

Properties:

address = 0.0.0.0

-> set  /SP/clients/ntp/server/1 address=99.99.99.250

Set 'address' to '99.99.99.250'

-> set  /SP/clients/ntp/server/2 address=99.99.99.251

Set 'address' to '99.99.99.251'

-> show -d properties /SP/clock

  /SP/clock

Properties:

datetime = Mon Jul 26 21:50:48 2021

timezone = +04 (Asia/Baku)

uptime = 464 days, 06:41:20

usentpserver = enabled


->  show -d properties /SP/clients/ntp/server/1

  /SP/clients/ntp/server/1

Properties:

address = 99.99.99.250

->  show -d properties /SP/clients/ntp/server/2

  /SP/clients/ntp/server/2

Properties:

address = 99.99.99.251


        --Cisco (Ethernet) switch DNS---

ssh admin@99.99.99.11  

dm1sw-adm0# show running-config

!Command: show running-config

!Running configuration last done at: Mon Jul 26 21:00:03 2021

!Time: Mon Jul 26 21:01:42 2021

version 7.0(3)I7(6) Bios:version 05.34

......

no password strength-check

username admin password 5 $5$g6qgjbgv$hhEgdfgdfgdfgdfGhDIIc678GrOa82L6Lj.  role network-admin

username ciscosnmp password 5 $5$ZWkcwADg$JgUzO9DCsdvsdvf65yMIJCcdD  role network-operator

ip domain-lookup

ip domain-name mlspp.gov.az

ip name-server 88.88.88.10 88.88.88.11

system default switchport

copp profile lenient

.....

dm1sw-adm0# configure terminal

Enter configuration commands, one per line. End with CNTL/Z.

dm1sw-adm0(config)# no ip name-server 88.88.88.10

dm1sw-adm0(config)# no ip name-server 88.88.88.11

dm1sw-adm0(config)# end

dm1sw-adm0# configure terminal

Enter configuration commands, one per line. End with CNTL/Z.

dm1sw-adm0(config)# ip name-server 99.99.99.10

dm1sw-adm0(config)# ip name-server 99.99.99.11

dm1sw-adm0(config)# end

Finally Verify the changes

dm1sw-adm0# show running-config

Save the configuration.

dm1sw-adm0# copy running-config startup-config

[########################################] 100%

Copy complete, now saving to disk (please wait)...

Copy complete.

Exit the session

dm1sw-adm0# exit


-- Cisco (ethernet) switch NTP ---  

ssh admin@99.99.99.11 

dm1sw-adm0# configure terminal

Enter configuration commands, one per line. End with CNTL/Z.

dm1sw-adm0(config)# no ntp server 88.88.88.250 

dm1sw-adm0(config)# end

dm1sw-adm0# configure terminal

Enter configuration commands, one per line. End with CNTL/Z.

dm1sw-adm0(config)# ntp server 99.99.99.250 prefer

dm1sw-adm0(config)# ntp server 99.99.99.251

dm1sw-adm0(config)# end

Finally Verify the changes

dm1sw-adm0# show running-config

Save the configuration.

dm1sw-adm0# copy running-config startup-config

[########################################] 100%

Copy complete, now saving to disk (please wait)...

Copy complete.

Exit the session

dm1sw-adm0# exit

--- DB nodes DNS ---

1. Log in to the database server as the root user.

Edit the /etc/resolv.conf file.


2. Set the DNS server and domain name using an editor such as vi. There should be a name server line for each DNS server.

search        example.com

nameserver 99.99.99.10

nameserver 99.99.99.11


3.Set the DNS server in the server ILOM.

  ssh root@99.99.99.6/7

  -> show /SP/clients/dns

 /SP/clients/dns

    Targets:

    Properties:

        auto_dns = enabled

        nameserver = 88.88.88.10

        retries = 1

        searchpath = mlspp.gov.az

        timeout = 5

    Commands:

        cd

        set

        show

-> set /SP/clients/dns nameserver=99.99.99.10,99.99.99.11

Set 'nameserver' to '99.99.99.10,99.99.99.11' [99.99.99.10, 99.99.99.11]

--- DB nodes NTP ---

1. Stop the NTP/Chrony services on each database server.

[root@dm1db1 ~]# systemctl stop chronyd.service


2. Update the ntp.conf/chrony.conf file with the IP address of the new NTP server.

Start the NTP/Chrony services on the database server.

[root@dm1db1 ~]# vi /etc/chrony.conf

[root@dm1db1 ~]# systemctl start chronyd.service


3.Set the DNS server in the server ILOM for each node.

  ssh root@99.99.99.6/7

-> set /SP/clients/ntp/server/1 address=99.99.99.250

Set 'address' to '99.99.99.250'

-> set /SP/clients/ntp/server/2 address=99.99.99.251

Set 'address' to '99.99.99.251'

-> show /SP/clients/ntp/server/1

/SP/clients/ntp/server/1

Targets:

Properties:

address = 99.99.99.250

->  show /SP/clock

/SP/clock

Targets:

Properties:

datetime = Mon Jul 26 23:03:43 2021

timezone = +04 (Asia/Baku)

uptime = 15 days, 06:27:05

usentpserver = enabled


  

--- Cell nodes DNS and NTP---

1. Log in to the Oracle Exadata Storage Server as the root user. (each cell separately and respectively)

2. Specify a time interval to repair the disk and bring it back online.

   The default DISK_REPAIR_TIME attribute value of 3.6 hours should be long enough for most environments.

    a. Check the repair time for all mounted disk groups.

       Log in to the Oracle ASM instance and run the following query:

SQL> SELECT dg.name,a.value FROM v$asm_diskgroup dg, v$asm_attribute a WHERE dg.group_number=a.group_number AND a.name='disk_repair_time';

    b. Adjust the DISK_REPAIR_TIME parameter, if needed.

   In the following command, h.n is the amount of time in hours, such as 4.6.

SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='h.nH';

3. Check that putting the grid disks offline will not cause a problem for Oracle ASM.

   [root@db1cell2 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

DATAC1_CD_00_db1cell2 ONLINE Yes

DATAC1_CD_01_db1cell2 ONLINE Yes

DATAC1_CD_02_db1cell2 ONLINE Yes

DATAC1_CD_03_db1cell2 ONLINE Yes

DATAC1_CD_04_db1cell2 ONLINE Yes

DATAC1_CD_05_db1cell2 ONLINE Yes

RECOC1_CD_00_db1cell2 ONLINE Yes

RECOC1_CD_01_db1cell2 ONLINE Yes

RECOC1_CD_02_db1cell2 ONLINE Yes

RECOC1_CD_03_db1cell2 ONLINE Yes

RECOC1_CD_04_db1cell2 ONLINE Yes

RECOC1_CD_05_db1cell2 ONLINE Yes


   The value Yes should be returned for the grid disks. If one or more disks does not return a Yes value, then restore data redundancy for the disk group, and repeat the command until all disk groups return a Yes value.  

4. Inactivate all grid disks on the cell.

   [root@db1cell2 ~]# cellcli -e alter griddisk all inactive

GridDisk DATAC1_CD_00_db1cell2 successfully altered

GridDisk DATAC1_CD_01_db1cell2 successfully altered

GridDisk DATAC1_CD_02_db1cell2 successfully altered

GridDisk DATAC1_CD_03_db1cell2 successfully altered

GridDisk DATAC1_CD_04_db1cell2 successfully altered

GridDisk DATAC1_CD_05_db1cell2 successfully altered

GridDisk RECOC1_CD_00_db1cell2 successfully altered

GridDisk RECOC1_CD_01_db1cell2 successfully altered

GridDisk RECOC1_CD_02_db1cell2 successfully altered

GridDisk RECOC1_CD_03_db1cell2 successfully altered

GridDisk RECOC1_CD_04_db1cell2 successfully altered

GridDisk RECOC1_CD_05_db1cell2 successfully altered

   This command may take more than 10 minutes to complete. Inactivating the grid disks automatically sets the disk to offline in the Oracle ASM instance.


5. Confirm the grid disks are offline.

   a. Check the status of the grid disks.

       [root@db1cell2 ~]# cellcli -e list griddisk attributes name, asmmodestatus,asmdeactivationoutcome

DATAC1_CD_00_db1cell2 OFFLINE Yes

DATAC1_CD_01_db1cell2 OFFLINE Yes

DATAC1_CD_02_db1cell2 OFFLINE Yes

DATAC1_CD_03_db1cell2 OFFLINE Yes

DATAC1_CD_04_db1cell2 OFFLINE Yes

DATAC1_CD_05_db1cell2 OFFLINE Yes

RECOC1_CD_00_db1cell2 OFFLINE Yes

RECOC1_CD_01_db1cell2 OFFLINE Yes

RECOC1_CD_02_db1cell2 OFFLINE Yes

RECOC1_CD_03_db1cell2 OFFLINE Yes

RECOC1_CD_04_db1cell2 OFFLINE Yes

RECOC1_CD_05_db1cell2 OFFLINE Yes

      The output should show asmmodestatus=OFFLINE or asmmodestatus=UNUSED, and asmdeactivationoutcome=Yes for all grid disks.

   b. List the grid disk to confirm that they are offline.

      [root@db1cell2 ~]# cellcli -e list griddisk

DATAC1_CD_00_db1cell2 inactive

DATAC1_CD_01_db1cell2 inactive

DATAC1_CD_02_db1cell2 inactive

DATAC1_CD_03_db1cell2 inactive

DATAC1_CD_04_db1cell2 inactive

DATAC1_CD_05_db1cell2 inactive

RECOC1_CD_00_db1cell2 inactive

RECOC1_CD_01_db1cell2 inactive

RECOC1_CD_02_db1cell2 inactive

RECOC1_CD_03_db1cell2 inactive

RECOC1_CD_04_db1cell2 inactive

RECOC1_CD_05_db1cell2 inactive


6. Shut down the cell services and ocrvottargetd service.

    [root@db1cell2 ~]# cellcli -e alter cell shutdown services all

Stopping the RS, CELLSRV, and MS services...

The SHUTDOWN of services was successful.

Note: The ocrvottargetd service is not included in some releases. [service ocrvottargetd stop]

7. Use the ipconf utility to change the DNS settings.

[root@db1cell2 ~]# /usr/local/bin/ipconf

[Info]: ipconf command line: /opt/oracle.cellos/ipconf.pl -nocodes

Logging started to /var/log/cellos/ipconf.log

Interface ib0   is                      Linked.    hca: mlx4_0

Interface ib1   is                      Linked.    hca: mlx4_0

Interface eth0  is                      Linked.    driver/mac: igb/00:10:e0:eb:28:6e


The current nameserver(s): 88.88.88.10 88.88.88.11

Do you want to change it (y/n) [n]: y

Nameserver: 99.99.99.10

Add more nameservers (y/n) [n]: y

Nameserver: 99.99.99.11

Add more nameservers (y/n) [n]: n

The current timezone: Asia/Baku

Do you want to change it (y/n) [n]: n

The current NTP server(s): 88.88.88.250

Do you want to change it (y/n) [n]: y

Fully qualified hostname or ip address for NTP server. Press enter if none: 99.99.99.250

Continue adding more ntp servers (y/n) [n]: y

Fully qualified hostname or ip address for NTP server. Press enter if none: 99.99.99.251

Continue adding more ntp servers (y/n) [n]: n


Network interfaces

Name  State  Speed    Status  IP address   Netmask       Gateway       Net type   Hostname

ib0   Linked          UP      77.77.77.7 255.255.252.0               Private    db1cell2-priv1.mlspp.gov.az

ib1   Linked          UP      77.77.77.8 255.255.252.0               Private    db1cell2-priv2.mlspp.gov.az

eth0  Linked default  UP      99.99.99.4  255.255.255.0 99.99.99.253 Management db1cell2.mlspp.gov.az

Select interface name to configure or press Enter to continue: 


Select canonical hostname from the list below

1: db1cell2-priv1.mlspp.gov.az

2: db1cell2-priv2.mlspp.gov.az

3: db1cell2.mlspp.gov.az

Canonical fully qualified domain name [3]: 


Select default gateway interface from the list below

1: eth0

Default gateway interface [1]: 


Canonical hostname: db1cell2.mlspp.gov.az

Nameservers: 99.99.99.10 99.99.99.11

Timezone: Asia/Baku

NTP servers: 99.99.99.250 99.99.99.251

Default gateway device: eth0

Network interfaces

Name  State  Speed    Status  IP address   Netmask       Gateway       Net type   Hostname

ib0   Linked          UP      77.77.77.7 255.255.252.0               Private    db1cell2-priv1.mlspp.gov.az

ib1   Linked          UP      77.77.77.8 255.255.252.0               Private    db1cell2-priv2.mlspp.gov.az

eth0  Linked default  UP      99.99.99.4  255.255.255.0 99.99.99.253 Management db1cell2.mlspp.gov.az

Is this correct (y/n) [y]: y


Do you want to configure basic ILOM settings (y/n) [y]: y

Loading basic configuration settings from ILOM ...

ILOM Fully qualified hostname [db1cell2-ilom.mlspp.gov.az]: 

Inet protocol (IPv4,IPv6) [IPv4]: 

ILOM IP address [99.99.99.9]: 

ILOM Netmask [255.255.255.0]: 

ILOM Gateway or none [99.99.99.253]: 

ILOM Nameserver (multiple IPs separated by a comma) or none [88.88.88.10]: 99.99.99.10

ILOM Use NTP Servers (enabled/disabled) [enabled]: 

ILOM First NTP server. Fully qualified hostname or ip address or none [88.88.88.250]: 99.99.99.250

ILOM Second NTP server. Fully qualified hostname or ip address or none [none]: 99.99.99.251

ILOM Vlan id or zero for non-tagged VLAN (0-4079) [0]: 


Basic ILOM configuration settings:

Hostname             : db1cell2-ilom.mlspp.gov.az

IP Address           : 99.99.99.9

Netmask              : 255.255.255.0

Gateway              : 99.99.99.253

DNS servers          : 99.99.99.10

Use NTP servers      : enabled

First NTP server     : 99.99.99.250

Second NTP server    : 99.99.99.251

Timezone (read-only) : Asia/Baku

VLAN id              : 0

Is this correct (y/n) [y]: n

ILOM Fully qualified hostname [db1cell2-ilom.mlspp.gov.az]: 

Inet protocol (IPv4,IPv6) [IPv4]: 

ILOM IP address [99.99.99.9]: 

ILOM Netmask [255.255.255.0]: 

ILOM Gateway or none [99.99.99.253]: 

ILOM Nameserver (multiple IPs separated by a comma) or none [88.88.88.10]: 99.99.99.10,99.99.99.11

ILOM Use NTP Servers (enabled/disabled) [enabled]: 

ILOM First NTP server. Fully qualified hostname or ip address or none [88.88.88.250]: 99.99.99.250

ILOM Second NTP server. Fully qualified hostname or ip address or none [none]: 99.99.99.251

ILOM Vlan id or zero for non-tagged VLAN (0-4079) [0]: 


Basic ILOM configuration settings:

Hostname             : db1cell2-ilom.mlspp.gov.az

IP Address           : 99.99.99.9

Netmask              : 255.255.255.0

Gateway              : 99.99.99.253

DNS servers          : 99.99.99.10,99.99.99.11

Use NTP servers      : enabled

First NTP server     : 99.99.99.250

Second NTP server    : 99.99.99.251

Timezone (read-only) : Asia/Baku

VLAN id              : 0

Is this correct (y/n) [y]: y


[Info]: Run /opt/oracle.cellos/validations/init.d/saveconfig

[Info]: Custom changes have been detected in /etc/resolv.conf

[Info]: Original file /etc/resolv.conf will be saved in /etc/resolv.conf.backupbyExadata

[Info]: Stopping cellwall service ...

[Info]: cellwall service stopped

[Info]: Restart chronyd service

[Info]: Starting cellwall service ...

[Info]: cellwall service started

[Info]: Save /etc/sysctl.conf in /etc/sysctl.conf.backupbyExadata

[Info]: Adjust settings for IB interfaces in /etc/sysctl.conf

[Info]: Retarting cellwall service ...

active

[Info]: cellwall service restarted

Re-login using new IP address 99.99.99.4 if you were disconnected after following commands

ip addr show eth0

sleep 4


[Warning]: You modified NTP server.

Ensure you also update the Infiniband Switch NTP server

if the same NTP server was also used by the Infiniband switch.



[Warning]: You modified DNS name server.

Ensure you also update the Infiniband Switch DNS server

if the same DNS server was also used by the Infiniband switch.


8. Restart the cell services and ocrvottargetd service.

   [root@db1cell2 ~]# cellcli -e alter cell startup services all

Starting the RS, CELLSRV, and MS services...

Getting the state of RS services...  running

Starting CELLSRV services...

The STARTUP of CELLSRV services was successful.

Starting MS services...

The STARTUP of MS services was successful.


   The server does not need to reboot.


9. Activate the grid disks when the cell comes online.

[root@db1cell2 ~]# cellcli -e alter griddisk all active

GridDisk DATAC1_CD_00_db1cell2 successfully altered

GridDisk DATAC1_CD_01_db1cell2 successfully altered

GridDisk DATAC1_CD_02_db1cell2 successfully altered

GridDisk DATAC1_CD_03_db1cell2 successfully altered

GridDisk DATAC1_CD_04_db1cell2 successfully altered

GridDisk DATAC1_CD_05_db1cell2 successfully altered

GridDisk RECOC1_CD_00_db1cell2 successfully altered

GridDisk RECOC1_CD_01_db1cell2 successfully altered

GridDisk RECOC1_CD_02_db1cell2 successfully altered

GridDisk RECOC1_CD_03_db1cell2 successfully altered

GridDisk RECOC1_CD_04_db1cell2 successfully altered

GridDisk RECOC1_CD_05_db1cell2 successfully altered

10. Verify the disks are active.

[root@db1cell2 ~]# cellcli -e list griddisk

DATAC1_CD_00_db1cell2 active

DATAC1_CD_01_db1cell2 active

DATAC1_CD_02_db1cell2 active

DATAC1_CD_03_db1cell2 active

DATAC1_CD_04_db1cell2 active

DATAC1_CD_05_db1cell2 active

RECOC1_CD_00_db1cell2 active

RECOC1_CD_01_db1cell2 active

RECOC1_CD_02_db1cell2 active

RECOC1_CD_03_db1cell2 active

RECOC1_CD_04_db1cell2 active

RECOC1_CD_05_db1cell2 active

The output should show active.

11. Verify the grid disk status.

    a. Check that all grid disks are online.

   cellcli -e list griddisk attributes name, asmmodestatus

b. Wait for Oracle ASM synchronization to complete for all grid disks. Each disk will go to a SYNCING state first then ONLINE.

   [root@db1cell2 ~]# cellcli -e list griddisk attributes name, asmmodestatus

DATAC1_CD_00_db1cell2 ONLINE

DATAC1_CD_01_db1cell2 SYNCING

DATAC1_CD_02_db1cell2 SYNCING

DATAC1_CD_03_db1cell2 SYNCING

DATAC1_CD_04_db1cell2 SYNCING

DATAC1_CD_05_db1cell2 SYNCING

RECOC1_CD_00_db1cell2 ONLINE

RECOC1_CD_01_db1cell2 ONLINE

RECOC1_CD_02_db1cell2 SYNCING

RECOC1_CD_03_db1cell2 SYNCING

RECOC1_CD_04_db1cell2 SYNCING

RECOC1_CD_05_db1cell2 SYNCING


   Oracle ASM synchronization is complete when all grid disks show asmmodestatus=ONLINE.

   

12. Repeat this procedure for each Oracle Exadata Storage Server.



references: https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-components.htm#DBMMN22932

            Changing IP addresses on Exadata Database Machine (Doc ID 1317159.1) 

Changing IP addresses on Exadata Database Machine (Doc ID 1317159.1)

In this Document

Goal
Solution
 Assumptions
 Pre-Move procedures
 Post-Move Startup Procedures
 If the default gateway is to be changed:
 If the NTP, DNS, or Time Zone is to be changed
 Management Network Change
 Gather information for new addresses
 Database node OS Reconfiguration
 Oracle Grid Infrastructure Reconfiguration
 Storage Cell Reconfiguration
 InfiniBand Switch reconfiguration
 Cisco Ethernet switch reconfiguration
 Power Distribution Unit (PDU) reconfiguration
 KVM switch reconfiguration
 Client Access Network Change
 Gather information for new addresses
 Procedure on Database nodes
 Private Network Change
References

APPLIES TO:

Oracle Exadata Hardware - Version 11.2.0.1 and later
Information in this document applies to any platform.

GOAL

The procedure in this document supports the changing of IP addresses on an Oracle Exadata Database Machine system. The most common use case for this procedure is when a system is moved, so this document was written with that case in mind. Importantly, system hostnames and domain name changes are not handled by this procedure.

It will be more reliable to make these changes using the ipconf and ipconf.pl utilities.  For Compute nodes use:

/opt/oracle.cellos/ipconf.pl

For Storage Nodes use:

/opt/oracle.cellos/ipconf

These utilities make modifications to all pertinent system and Exadata configuration files.  Though making these changes manually is supported it is highly recommended to use these utilities instead.  They will always be up to date as far as what modifications need to be made.  For usage of these utilities, refer to:

System Software User's Guide for Exadata Database Machine
Chapter 5 Maintaining Oracle Exadata System Software
5.2 Using the ipconf Utility

SOLUTION

This document is divided into the tasks of changing the IP address in each individual network.  These sections can be combined into changing all the IP addresses as in the most common use case for readdressing is when an Exadata system is moved.
Importantly, keep in mind that changing the system hostnames, domain name, or Cluster name are not covered in this document.  Those changes can be quite extensive and some will require a complete reinstall.
The procedures in this document primarily describe how to change the IP addresses for the management and client access networks.  In the spirit of not duplicating procedures, the primary source for changing the IP addresses of the Private InfiniBand network, can be found in:
Oracle Database Machine Owner's Guide
Chapter 7 - Maintaining Oracle Exadata Database Machine
Section "Changing InfiniBand IP Addresses and Host Names."

Assumptions

The following assumptions apply to the procedures outlined here.  These steps are tested with Oracle Grid Infrastructure 11.2.0.2 to 12.1.0.2. Using version 11.2.0.1 should not require any modifications, but should be tested with care.

  • The grid infrastructure is installed in ORACLE_HOME=/u01/app/12.1.0.2/grid
  • The eth0 interface on all systems is used for the management network.
  • The Client Access network on the database nodes uses a bonded interface named bondeth0 (older systems may call this bond1).
  • The grid infrastructure owner and the database owner are the same users, oracle.
  • The following documentation IP addresses are used (see RFC 5737).  For the Management network (192.x.2.0/23) and the Client network (198.xx.100.0/23).
  • That user equivalence is properly configured for both the root and oracle users.

Pre-Move procedures

In most cases, address changes occur because the Exadata system is being physically moved.  This covers the initial shutdown of the system as all reconfiguration will be completed at the destination site.

  1. As oracle, stop all databases on the cluster by running the following command once for each database.
    (oracle)$ srvctl stop database -d <dbname>
  2. If DBFS is in use, as oracle, stop the DBFS clusterware resource(s):
    (oracle)$ crsctl stop resource dbfs_mount
  3. As root, stop clusterware on all nodes:
  4. (root)# crsctl stop cluster -all
  5. As root, check to see if clusterware startup is set for auto-start:
    (root)# dcli -g dbs_ib_group -l root /u01/app/12.1.0.2/grid/bin/crsctl config crs
    dm01db01-priv: CRS-4622: Oracle High Availability Services autostart is enabled.
    dm01db02-priv: CRS-4622: Oracle High Availability Services autostart is enabled.
  6. As root, disable clusterware startup if it is set for auto start:
    (root)# dcli -g dbs_ib_group -l root /u01/app/12.1.0.2/grid/bin/crsctl disable crs
  7. Power can be shut down on all nodes in preparation of the rack being moved.  See the manual:
    Oracle Exadata Database Machine Maintenance Guide
    Chapter 1 General Maintenance Information
    Powering On and Off Oracle Exadata Rack
  8. Once these nodes are shut down, the DNS entries should be updated to reflect the new IP addresses. This should be done as soon as possible in the process so that DNS caches will be allowed time to be refreshed.

Post-Move Startup Procedures

  1. Verify the DNS entries have been updated. Ensure that all management network IPs, Client Access network IPs, Virtual IPs (VIPs), and SCAN IPs have been updated in DNS and are resolving properly in both forward (query by name) and reverse (query by IP address) DNS queries.
  2. Connect the new network connections on the system.
  3. All systems should be powered up. Cluster should not start automatically at boot since it was disabled before shutdown. Ensure that the clusterware is not running on any of the database nodes.
  4. If the system has been physically moved, verification of the hardware may be required and is not covered in this document.

If the default gateway is to be changed:

This is a fairly simple procedure that can be performed independently of the change in either the Public or Management network.

  1. Shut down the network stack:
    (root)# service network stop

  2. Edit the /etc/sysconfig/network file to update the GATEWAY parameter.  Depending on the chosen network design, this could be either a router on the Public Access network, or the Management network.

If the NTP, DNS, or Time Zone is to be changed

If the only changes are to specify updated NTP or DNS servers, please follow the proper sections in the:
Database Machine Maintenance Guide
Chapter 4 Maintaining Other Components of Oracle Exadata Racks

Management Network Change

Gather information for new addresses

Using the new IP address and netmask, compute the IP range’s broadcast and network addresses using a utility like ipcalc. The ipcalc utility doesn't make any changes on the system and can be run at any time; it just helps with computing the values for various fields in the subsequent updates.  Make note of these values as they will be used later.

(root)# ipcalc -bnm 192.x.2.66 255.255.254.0
NETMASK=255.255.254.0
BROADCAST=192.x.3.255
NETWORK=192.x.2.0

Alternatively, one can use CIDR notation instead:

(root)# ipcalc -bnm 192.x.2.66/23
NETMASK=255.255.254.0
BROADCAST=192.x.3.255
NETWORK=192.x.2.0

Validate the choice of the default gateway for the system.  Depending on the network design choice, it can either be a router on either the Public Access network or the Management network.  If in doubt, use the same choice of network as the original configuration but use the new address for the new subnet.
Set the $ORACLE_ to the grid home:

/u01/app/12.1.0.2/grid

Then prepend $ORACLE_HOME/bin to the PATH.

It might be advantageous to set the environment for root and the oracle software owner in separate terminal sessions.  Various commands below have to be run by either root or oracle.  They will be differentiated by notation of either “(root)#” or “(oracle)$” as part of the command prompt.

Database node OS Reconfiguration

Note that each database node will need to be modified manually and individually. It may be helpful to utilize the private network to connect from one system to another. For example, if one database node is reconfigured using the console, it may be possible to then leave the server room and connect remotely to the newly reconfigured database node. From that database node, it will be possible to use the private network to connect to the remaining nodes to reconfigure them.  This assumes the Private network has not changed.

  1. Shut down the management interface, if it is not already down:
    (root)# ifdown eth0

  2. On each Database node, modify the file:
    /etc/sysconfig/network-scripts/ifcfg-eth0
    with the new IPADDR, NETMASK, BROADCAST, GATEWAY, and NETWORK parameters.

    If a backup of the original file is made, use a prefix to name the file like “backup-ifcfg-eth0”. If a suffix is used, like ".orig" the OS will interpret the ifcfg-eth0.orig file as a configuration file for the eth0.orig device and will cause network configuration problems at boot time.
  3. Modify the routing rules if the database node has them. In some recent image versions, the default deployment may include routing rules. If these files exist:
    /etc/sysconfig/network-scripts/route-eth0
    /etc/sysconfig/network-scripts/rule-eth0
    they require modification. The rule-eth0 file contains the management IP address and the route-eth0 file contains the management network number and netmask (in CIDR notation), like these:
    (root)#cat rule-eth0
    from 192.x.2.8 table 220
    to 192.x.2.8 table 220
    (root)#cat route-eth0
    192.x.2.0/23 dev eth0 table 220
    default via 192.x.2.1 dev eth0 table 220
      
    These files, if present, require modification to the new IP address and new network numbers and netmask in CIDR notation - see reference:
    http://en.wikipedia.org/wiki/CIDR#Prefix_aggregation

    for translation between the quad-dotted notation and CIDR notation. Also see Document 1306154.1 for more information regarding routing configurations on Exadata servers.

  4. Update the /etc/hosts file for the existing entries with the new management network addresses.
  5. If the network service was stopped, check your work now and restart it:
    (root)# service network start
    If just the eth0 interface was changed, restart it
    (root)# ifup eth0
    This will validate the above files were properly modified.  Please follow the next steps if applicable.  At the step where the Database node is to be rebooted, please follow that step at that time.
  6. Review the /etc/ssh/sshd_config to find any uncommented ListenAddress lines. If there are uncommented ListenAddress entries, update the entry corresponding to the old management network IP addresses with the new management network IP address.
  7. If the addresses of the name servers are changing, follow the section above for changing DNS.
  8. If the addresses of the time servers are changing, follow the section above for changing NTP.
  9. If the time zone is changing follow the section above for changing Time Zone.
  10. To change the ILOM network settings, run these commands from the command line of the Database node using the proper new values for that node's ILOM. Substitute your proper addresses for NTP servers, DNS (only one DNS entry is allowed), IP address, gateway, and netmask.
    (root)# ipmitool sunoem cli "set /SP/clients/ntp/server/1 address=203.x.113.140"
    (root)# ipmitool sunoem cli "set /SP/clients/ntp/server/2 address=203.x.113.141"
    (root)# ipmitool sunoem cli "set /SP/clients/dns nameserver=203.x.113.52"
    (root)# ipmitool sunoem cli "set /SP/network \
    pendingipaddress=192.x.2.29 \
    pendingipgateway=192.x.2.1 \
    pendingipnetmask=255.255.252.0 \
    commitpending=true"
      
  11. Verify the settings are in place:
    (root)# ipmitool sunoem cli "show /SP/clients/ntp/server/1"
    (root)# ipmitool sunoem cli "show /SP/clients/ntp/server/2"
    (root)# ipmitool sunoem cli "show /SP/clients/dns"
    (root)# ipmitool sunoem cli "show /SP/network"
      
  12. Reboot the Database node once these changes are made so that the new updates can take effect.
  13. Verify that ILOM is accessible. The ILOM network changes should be immediate (once commitpending=true is sent through).

Oracle Grid Infrastructure Reconfiguration

The Oracle Grid Infrastructure does not use the Management Network.  No changes need to be made to the Oracle Grid Infrastructure for this section.

Storage Cell Reconfiguration

This change will need to be completed locally on each storage cell.

  1. If the SMTP server or SNMP server used at the new location or on the new network configuration have changed, those settings need to be updated on each storage cell. For example, if the storage cell's SMTP server require update, it may be set using a command like this:
    (root)# dcli -g cell_group -l root "cellcli -e alter cell \
    smtpserver=\'new.smtp.server.com\'"
  2.  From the management node (often the first database node), stop the cell services on all storage nodes by running:
    (root)# dcli -g cell_ib_group -l root cellcli -e alter cell shutdown services all
  3. From the management node, backup the configuration file:
    (root)# dcli -g cell_ib_group -l root cp /opt/oracle.cellos/cell.conf /root/new.cell.conf
  4. Reconfigure each cell using ipconf. It needs to be run interactively.  First shut down all the cell services, and then run ipconf.
    (root)# cellcli -e alter cell shutdown services all
    (root)# /opt/oracle.cellos/ipconf
    (root)# cellcli -e alter cell startup services all

    An alternate method is to change settings using this "bulk" method.  It is faster, but it results in a configuration that *adds* the NTP server(s) and DNS server(s) to the existing ones instead of replacing the old ones with the new ones.
    1. On each cell, edit /root/new.cell.conf to make the changes to only the following fields. Other settings in this file should not be modified.
      <Interface>
      <Name>eth0</Name>
      <Net_Type>Magement</Net_Type>
      <Gateway>{new gateway}</Gateway>
      <IP_address>{new address}</IP_address>
      <Netmask>{new netmask}</Netmask>
      </Interface>
      <Ntp_servers>{first new time server}</Ntp_servers>
      <Ntp_servers>{second new time server}</Ntp_servers>
      <Nameservers>{first new name server}</Nameservers>
      <Nameservers>{second new name server}</Nameservers>
      <Timezone>{new timezone}</Timezone>
      <ilom>
      <ILOM_Gateway>{new gateway}</ILOM_Gateway>
      <ILOM_IP_address>{new address}</ILOM_IP_address>
      <ILOM_Nameserver>
      {new name servers comma delimited}</ILOM_Nameserver>
      <ILOM_Netmask>{new netmask}</ILOM_Netmask>
      <ILOM_First_NTP_server>{first new time server}</ILOM_First_NTP_server>
      <ILOM_Second_NTP_server>{second new time server}</ILOM_Second_NTP_server>
      <ILOM_Timezone>{new timezone}</ILOM_Timezone>
      </ilom>
        
    2. On each cell, commit the changes to the cell's config (the cell will reboot):
      (root)# /opt/oracle.cellos/ipconf -force -newconf /root/new.cell.conf -reboot
    3. After reboot, on each cell, verify the modified configuration file:
      (root)# /opt/oracle.cellos/ipconf -verify -conf /root/new.cell.conf -verbose

      If the above procedure does not work properly or does not produce the desired results, you should run the ipconf utility interactively instead.

  5. Reboot the cell and once it comes back on line, as a final check verify that cell services are running:
    (root)# dcli -g cell_ib_group -l root cellcli –e list cell detail

InfiniBand Switch reconfiguration

To reconfigure the manage network in the InfiniBand switches, follow the procedures in the Exadata Database Machine Installation and Configuration Guide, Chapter 5 Configuring Oracle Exadata Database Machine, in the section labeled "Configuring Sun Datacenter InfiniBand Switch 36 Switch."
For procedures to change the DNS, NTP addresses, or Time Zone, see the Oracle Exadata Database Machine Maintenance Guide, Chapter 4 Maintaining Other Components of Oracle Exadata Racks, in the respective sections labeled either "Changing the DNS Servers" or “Changing the NTP Servers”.

Cisco Ethernet switch reconfiguration

Note that before making any Cisco Ethernet switch changes, the network administrator should be consulted to verify that the switch's configuration may be modified using the default deployment procedures and assumptions in this section.
To reconfigure the Cisco Ethernet switch, follow the procedures in the Oracle Exadata Database Machine Installation and Configuration Guide, Chapter 5 Configuring Oracle Exadata Database Machine, in the section labeled “Configuring the Cisco Ethernet Switch".

For procedures to change the DNS, NTP addresses, or Time Zone, see the Oracle Exadata Database Machine Maintenance Guide, Chapter 4 Maintaining Other Components of Oracle Exadata Racks, in the respective sections labeled either "Changing the DNS Servers" or “Changing the NTP Servers”.

Power Distribution Unit (PDU) reconfiguration

If the PDUs are connected to the network, they require reconfiguration. If not, this section may be skipped.
To reconfigure the PDUs, follow the procedures in the Oracle Exadata Database Machine Installation and Configuration Guide, Chapter 5 Configuring Oracle Exadata Database Machine, in the section labeled "Configuring the Power Distribution Units."

KVM switch reconfiguration

To reconfigure the KVM switch, see the Oracle Exadata Database Machine Maintenance Guide, Chapter 4 Maintaining Other Components of Oracle Exadata Racks, in the section labeled "Configuring the KVM Switch."  This section also includes changing the DNS and NTP servers.

Client Access Network Change

The interrelationship of the network stack and the Clusterware requires changes to occur in order.  In essence, the applicable components of the Clusterware need to be stopped, the network changed, and then the Cluster reconfigured to accommodate the network changes.

This document assumes the names for the SCAN, VIP and hosts do not change.  The only changes made are to the IP addresses.  If the various names change, there will be more steps necessary.  These steps are not covered in this document.

The Storage Cells do not have a public facing interface.  No changes are required for them.

Once all nodes are reconfigured with proper IP addresses and all nodes are running, grid infrastructure may be reconfigured.  This procedure cannot be done in a rolling fashion, so a database outage must be taken.

Note that each database node will need to be modified separately and manually. It may be helpful to utilize the private network to connect from one system to another. For example, if one database node is reconfigured using the console, it may be possible to then leave the server room and connect remotely to the newly reconfigured database node. From that database node, it will be possible to use the private network to connect to the remaining nodes to reconfigure them.

Gather information for new addresses

Using the new IP address and netmask, compute that IP range’s broadcast and network addresses using ipcalc. The ipcalc utility doesn't make any changes on the system and can be run at any time; it just helps with computing the values for various fields in the subsequent updates. Make note of these values as they will be used later.

(root)# ipcalc -bnm 198.xx.100.66 255.255.254.0
NETMASK=255.255.254.0
BROADCAST=198.xx.101.255
NETWORK=198.xx.100.0

 Alternatively, one can use CIDR notation instead:

(root)# ipcalc -bnm 198.xx.100.66/23
NETMASK=255.255.254.0
BROADCAST=198.xx.101.255
NETWORK=198.xx.100.0

Validate the choice of the default gateway for the system.  Depending on the network design choice, it can either be a router on either the Public Access network or the Management network.  If in doubt, use the same choice of network as the original configuration but use the new address for the new subnet.

Set the $ORACLE_ to the grid home:

/u01/app/12.1.0.2/grid

Then prepend $ORACLE_HOME/bin to the PATH.

It might be advantageous to set the environment for root and the oracle software owner in separate terminal sessions.  Various commands below have to be run by either root or oracle.  They will be differentiated by notation of either “(root)#” or “(oracle)$” as part of the command prompt.

Procedure on Database nodes

  1. If not already started, start the cluster on all nodes:
    (root)# dcli -g dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl start crs
    And verify:
    (root)# crsctl stat res -t
  2. Verify the current network configuration. This command should be run as oracle on one node in the cluster.
    (oracle)$ oifcfg getif
  3. Check the existing cluster and OS configuration
    (oracle)$ srvctl config  scan
    (oracle)$ srvctl config nodeapps
    (oracle)$ oifcfg getif
    (root#) ifconfig
  4. Stop the Clusterware components
    (oracle)$ srvctl stop listener -node {each node}
    (oracle)$ srvctl stop scan_listener
    In 12.2 and later the Cluster Health Analyzer was introduced.  It uses the new Management database and that database listener uses a VIP.  Before stopping all the VIP's, this service needs to be stopped.
    (oracle)$ srvctl stop cha
    (oracle)$ srvctl stop mgmtdb
    (oracle)$ srvctl stop mgmtlsnr
    (oracle)$ srvctl stop vip -n {each node}
    (oracle)$ srvctl stop scan
    And validate the state of the various components
    (oracle)$ srvctl status scan
    (oracle)$ srvctl status nodeapps
    (root)# ifconfig
  5. Delete the current public network interface from the configuration. This command should be run as oracle on only one node in the cluster.
    (oracle)$ oifcfg delif -global bondeth0
  6. On each Database node change the IP address of bondeth0.
    1. Shut down the Public interface, if it is not already down
      (root)# ifdown bondeth0
      If the network stack was shut down in Step 2, this step can be skipped, or the error ignored.
    2. Modify the file:
      /etc/sysconfig/network-scripts/ifcfg-bondeth0
      To this file change IPADDR, NETMASK, BROADCAST, GATEWAY, and NETWORK parameters with the new values.
      If a backup of the original file is made, use a prefix to name the file like “backup-ifcfg-bondeth0”. If a suffix is used, like ".orig" the OS will interpret the ifcfg-bondeth0.orig file as a device’s configuration file and will cause network configuration problems at boot time.
    3. Modify the routing rules if the database node has them. In some recent image versions, the default deployment may include routing rules. If the files exist, they require modification:
      (root)# ls /etc/sysconfig/network-scripts/route-bondeth0
      (root)# ls /etc/sysconfig/network-scripts/rule-bondeth0
      The rule-bondeth0 file contains the Public IP address and the route-bondeth0 file contains the Public  network number and netmask (in CIDR notation), like these:
      (root)# cat rule-bondeth0
      from 198.xx.100.8 table 220
      to 198.xx.100.8 table 220
      (root)# cat route-bondeth0
      198.xx.100.0/23 dev bondeth0 table 220
      default via 198.xx.100.1 dev bondeth0 table 220
        
      These files, if present, require modification to the new IP address and new network numbers with the netmask in CIDR notation - see reference:
      http://en.wikipedia.org/wiki/CIDR#Prefix_aggregation
      for translation between the quad-dotted notation and CIDR notation. Also see Document 1306154.1 for more information regarding routing configurations on Exadata servers.
      If a backup of the original files are made, use a prefix to name the file like “backup-urle-bondeth0”. If a suffix is used, like ".orig" the OS will interpret the rule-bondeth0.orig file as a device’s rule file and will cause network configuration problems at boot time.
    4. If the default gateway is the bondeth0 interface, modify the /etc/sysconfig/network file to change the GATEWAY parameter.
    5. Update the /etc/hosts file for the existing entries with the new Public network addresses.  Also, the DNS should be updated with the new addresses for SCAN/VIP/and host names.
    6. If the network service was stopped, check your work now and restart it:
      (root)# service network start
      If just the bondeth0 interface was changed, restart it:
      (root)# ifup bondeth0
      This will validate the above files were properly modified.
  7. Using the new NETWORK number computed on the DB nodes for the bondeth0 interface. This command should be run as oracle on only one node in the cluster.
    (oracle)$ oifcfg setif -global bondeth0/198.51.100.0:public
  8. Verify the new network configuration.
    (oracle)$ oifcfg getif
  9. Modify the network resource to update the new network configuration. These commands should be run as root on only one node in the cluster. The NETWORK number for the Client Access network computed above. Use the fully qualified domain name for the <new-scan-name> in the follow command.
    (root)# srvctl modify network –netnum 1 –subnet 198.51.100.0/255.255.254.0/bondeth0 -pingtarget 198.51.100.1
  10. Modify the scan to update its IP addresses as the oracle user on only one node in the cluster.
    (oracle)$ srvctl modify scan –netnum 1 –scanname  scan.mycluster.example.com
  11. Verify the current SCAN configuration. This command should be run as oracle on only one node in the cluster.
    (oracle)$ srvctl config scan
  12. Depending on the system's original configuration, more advanced features, such as Valid Node Checking for Registration of services with the Listener may need to be reconfigured.  For VNCR changes see note 1914282.1.  For other features, please check their reference notes or documentation.
  13. Restart the various Clusterware components:
    (oracle)$ srvctl start vip -node {each node}
    (oracle)$ srvctl start listener
    (oracle)$ srvctl start scan
    (oracle)$ srvctl start scan_listener
    In 12.2 and later, starting the Cluster Health Analyzer will start the Management database and its listener.
    (oracle)$ srvctl start cha
  14. Validate the changes made:
    (oracle)$ srvctl status nodeapps
    (oracle)$ srvctl status scan_listener
    (oracle)$ srvctl status cha
  15. For the last check, restart clusterware on all nodes. These commands should be run as root on only one node in the cluster.
    (root)# /u01/app/12.1.0.2/grid/bin/crsctl stop cluster -all
    (root)# /u01/app/12.1.0.2/grid/bin/crsctl start cluster -all
  16. If Clusterware was set for auto start previously, re-enable it to auto start. This command should be run as root on only one node in the cluster.
    (root)# dcli -g dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl enable crs
  17. Restart the database(s). This command should be run as oracle on only one node in the cluster.
    (oracle)$ srvctl start database -d <dbname>
  18. Start dbfs resource(s). This command should be run as oracle on only one node in the cluster.
    (oracle)$ crsctl start res dbfs_mount
  19. The LOCAL_LISTENER parameter should not be set in the spfile and will automatically be set by the Clusterware. The SCAN name didn't change, so REMOTE_LISTENER parameter in the database(s) shouldn't need to change. If the SCAN name changes, you'll also need to update the instance initialization parameters for your database instance(s) as appropriate. Some instances might be using the LISTENER_NETWORKS parameter which may require separate updates as well if there are IP addresses embedded in it.
  20. If you plan on using EM Cloud Control to monitor this cluster, please ensure that you re-generate onecommand configuration files with the new IP addresses. To re-generate these configuration files, use the Oracle Exadata Deployment Assistant as outlined in the Oracle Exadata Database Machine Owner's Guide.

Private Network Change

For changing the Private Infiniband network IP addresses please refer to the Oracle Exadata Database Machine Maintenance Guide, Chapter 4 “Maintaining Other Components of Oracle Exadata Racks” in the section titled “Changing InfiniBand IP Addresses and Host Names” 

Thursday, July 15, 2021

Top 14 Commands for Exadata Health Check

 There are multiple ways, utilities or commands to monitor Exadata machine. Here we have mentioned the list of dcli commands which can help to get the list of cell alerts, resource consumption details and configuration details in shortest time.

Exadata alert

dcli -l root -g ~/cell_group "cellcli -e list metriccurrent where alertState!=\'Normal\'"

Exadata cell CPU utilization

dcli -l root -g ~/cell_group "cellcli -e list metriccurrent CL_CPUT"

Exadata cells flashdisk with status NOT present

dcli -l root -g ~/cell_group "cellcli -e list physicaldisk attributes name, id, slotnumber where disktype=\"flashdisk\" and status=\'not present\'"

Exadata cell current temperature

dcli -l root -g ~/cell_group 'cellcli -e list cell detail' | egrep temperature 

Exadata alert history

[root@dm1db1 ~]# dcli -l root -g ~/cell_group "cellcli -e list alerthistory" 

Exadata cells battery replacement checks

dcli -l root -g ~/cell_group '/opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -aALL' |grep replaced

Exadata cells harddisk with status NOT present

dcli -l root -g ~/cell_group "cellcli -e list physicaldisk attributes name, id, slotnumber where disktype=\"harddisk\" and status=\'not present\'" 

Exadata cells services checks

dcli -l root -g ~/cell_group 'cellcli -e list cell detail' | egrep '(cellsrvStatus)|(msStatus)|(rsStatus)'

Exadata cells memory checks

dcli -l root -g ~/cell_group --vmstat="-a 3 2"

Exadata physical memory checks

[root@dm1db1 ~]# dcli -g ~/all_group -l root "cat /proc/meminfo | egrep 'MemTotal:|MemFree:|Cached'"

Exadata physical disk checks

dcli -g ~/all_group -l root /opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Device Present" -A 8

Exadata cell fan status

dcli -l root -g ~/cell_group 'cellcli -e list cell detail' | egrep fan 

Exadata storage cell model detail

dcli -l root -g ~/cell_group 'cellcli -e list cell detail' | egrep makeModel -- For cell

dcli -l root -g ~/dbs_group 'dmidecode -s system-product-name'  -- For DB node

Exadata cells power status

dcli -l root -g ~/cell_group 'cellcli -e list cell detail' | egrep power