Categories: cloudExadataOCIOracle

Exadata Cloud Service scaling (adding nodes) hangs

This will be a short post, mainly just to guide where to look if you ever encounter this issue.

Recently I was tasked on adding a database node on Exadata Cloud Service X8M. The new X8M has dynamic deployment options so if you need more storage or compute power for your database nodes, you can just add those.

Sve has written good posts on this earlier on provisioning X8M and also scaling X8M – read details from there!

My issue with scaling was that after scaling event started – it basically hanged! Nothing happened and the work requests seemed to be stalling. I had done this same action to another X8M just few days prior and thought there was something odd.

Since debugging options are fairly limited (and at that point in time I wasn’t sure which log file to look) I created SR to look this through. Around same time we received an email from OCI:

But it can’t be the security lists since I had looked them through multiple times! Or can it? Looking it another time through I noticed a typo in the CIDR block which I then corrected. Network requirements are defined in the OCI documentation which I always use as a reference.

There are rules in general for ports ICMP, Service Gateway and ports 22, 6200 and 1521. But there’s also note referencing X8M and scaling:

For X8M systems, Oracle recommends that all ports on the client subnet need to be open for ingress and egress traffic. This is a requirement for adding additional database servers to the system.

This is anyway in general what I’ve seen done in many implementations, size the subnet for Exadata only as per network requirements and then open all subnet traffic. But now due to typo this had failed, only problem was nothing happened after opening the ports and work request continued to hang!

We had rather long conversation with support as they didn’t believe me, luckily you can get log file addNodeActions*.log under /u01/app/oraInventory/logs which showed the error AND also showed nothing was running at the moment.

Once that was confirmed, support restarted the workflow and everything completed smoothly within the normal timeframe.

Summary

Small mistake but took some time to resolve, always double check the network rules before scaling! Also additional log files on node 1 can be found under /u01/app/oraInventory/logs for scaling event itself.

Apart from that positive experience with the scaling email received and how fast adding a node overall is as it takes only 4-5 hours!

Simo

Recent Posts

Helping to troubleshoot with OCI VCN Flow Logs

I'm a huge fan of using tools available to help troubleshoot any issues there are.…

9 hours ago

OCI Routing checklist when using 3rd party firewall

This post will be checklist for items you'll need when you have Firewall (or Hub)…

1 year ago

OCI ExaCS Database Upgrade Rollback

Recently I was testing OCI database upgrade from 12c to 19c and ran into an…

1 year ago

Issues with OCI ExaCS PDB cloning

This is mostly just to document if you hit similar issues and how to get…

1 year ago

OCI Tips and Tricks – Managed MySQL Database in OCI (and trying out Heatwave)

Here I'm looking on how to provision MySQL DB on OCI, see how read replicas…

1 year ago

OCI Tips and Tricks: Create 19c Oracle Database (and manage it)

This time I go over on how to create 19c Oracle Database on OCI (hint:…

1 year ago