Categories: cloudExadataOCIOracle

Exadata Cloud Service scaling (adding nodes) hangs

This will be a short post, mainly just to guide where to look if you ever encounter this issue.

Recently I was tasked on adding a database node on Exadata Cloud Service X8M. The new X8M has dynamic deployment options so if you need more storage or compute power for your database nodes, you can just add those.

Sve has written good posts on this earlier on provisioning X8M and also scaling X8M – read details from there!

My issue with scaling was that after scaling event started – it basically hanged! Nothing happened and the work requests seemed to be stalling. I had done this same action to another X8M just few days prior and thought there was something odd.

Since debugging options are fairly limited (and at that point in time I wasn’t sure which log file to look) I created SR to look this through. Around same time we received an email from OCI:

But it can’t be the security lists since I had looked them through multiple times! Or can it? Looking it another time through I noticed a typo in the CIDR block which I then corrected. Network requirements are defined in the OCI documentation which I always use as a reference.

There are rules in general for ports ICMP, Service Gateway and ports 22, 6200 and 1521. But there’s also note referencing X8M and scaling:

For X8M systems, Oracle recommends that all ports on the client subnet need to be open for ingress and egress traffic. This is a requirement for adding additional database servers to the system.

This is anyway in general what I’ve seen done in many implementations, size the subnet for Exadata only as per network requirements and then open all subnet traffic. But now due to typo this had failed, only problem was nothing happened after opening the ports and work request continued to hang!

We had rather long conversation with support as they didn’t believe me, luckily you can get log file addNodeActions*.log under /u01/app/oraInventory/logs which showed the error AND also showed nothing was running at the moment.

Once that was confirmed, support restarted the workflow and everything completed smoothly within the normal timeframe.

Summary

Small mistake but took some time to resolve, always double check the network rules before scaling! Also additional log files on node 1 can be found under /u01/app/oraInventory/logs for scaling event itself.

Apart from that positive experience with the scaling email received and how fast adding a node overall is as it takes only 4-5 hours!

Simo

Recent Posts

Oracle Autonomous Database on GCP – Using gcloud CLI for operations

Continuing testing Oracle Database@Google features, this time I wanted to test provisioning using gcloud CLI…

2 weeks ago

OCI Networking – Routing Oracle Services Network Public IPs via Service Gateway

This topic keeps coming up and there are many good blog posts from Oracle and…

3 weeks ago

Can I connect with custom private hostname to my Autonomous Database?

Short answer: Yes! When you deploy your Autonomous Database with a private endpoint, you will…

4 weeks ago

Using Google Cloud Storage with Autonomous Database@GCP

If you've worked with Autonomous Database previously, you know that every now and then you…

2 months ago

ZDM migration to Autonomous Database on GCP using Network Link for direct migration – part 2

Well this took a while! With recent work and travel it's been couple busy months.…

2 months ago

New Console Experience for OCI

Just saw that OCI has enabled preview for new OCI Console experience. To enable it,…

4 months ago