Skip to content

Conversation

gvnc
Copy link
Contributor

@gvnc gvnc commented Aug 29, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR is to enhance the way node-group-auto-discovery parameter works.

node-group-auto-discovery parameter already accepts min/max values from the user but all the nodepools captured by discovery options will inherit the same min/max values and there is no way to set nodepool specific values.

With this change, users will be able to set nodepool specific min/max values by setting nodepool tags minSize and maxSize.

Lets suppose the cluster has 2 nodepools with the nodepool tags;
nodepool1 -> autoDiscover=true, minSize=2, maxSize=5
nodepool2 -> autoDiscover=true

And the node-group-auto-discovery parameter has values min:1, max:3. Then the nodepool sizes will be determined as;
nodepool1 -> min:2, max:5
nodepool2 -> min:1, max:3

Which issue(s) this PR fixes:

Special notes for your reviewer:

It was tested on OCI. The v=4 logs are as below.

I0829 15:22:27.610593       1 oci_manager.go:360] node group auto discovery spec constructed: &{manager:<nil> kubeClient:<nil> clusterId:ocid1.clusterinteg.oc1.phx.aaaaaaaanfwwmxts7otqy5adgpggkgzycfoikxympk6woplbxc7cb22hjkdq compartmentId:ocid1.compartment.oc1..aaaaaaaa4jbgtpkasgxxx7gyirg7xby227rblq7dsduopn7neeca2yennfma tags:map[autoDiscover:true] minSize:1 maxSize:3}

I0829 15:58:39.694616       1 oci_manager.go:229] auto discovered nodepool in compartment : ocid1.compartment.oc1..aaaaaaaa4jbgtpkasgxxx7gyirg7xby227rblq7dsduopn7neeca2yennfma , nodepoolid: ocid1.nodepoolinteg.oc1.phx.aaaaaaaase6yj4ukfjrhq5utxtlb6ofnlzdolplcnlkgdhqsxnyiyirlyyxq ,minSize: 2, maxSize:5

I0829 15:58:39.694630       1 oci_manager.go:229] auto discovered nodepool in compartment : ocid1.compartment.oc1..aaaaaaaa4jbgtpkasgxxx7gyirg7xby227rblq7dsduopn7neeca2yennfma , nodepoolid: ocid1.nodepoolinteg.oc1.phx.aaaaaaaa6sn34zt42e2ebxfqelyvxynafbgyjcmdio3illy7yn4x7pxsu3hq ,minSize: 1, maxSize:3

Does this PR introduce a user-facing change?

OCI: enable nodepool min and max values with node-group-auto-discovery

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Aug 29, 2025
Copy link

linux-foundation-easycla bot commented Aug 29, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/needs-area area/cluster-autoscaler needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/provider/oci Issues or PRs related to oci provider labels Aug 29, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @gvnc. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 29, 2025
@jlamillan
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 29, 2025
@gvnc gvnc force-pushed the oci-auto-discovery-enhancement branch 2 times, most recently from 1f1b469 to f8667aa Compare September 5, 2025 17:15
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 5, 2025
nodepool.minSize = nodeGroup.minSize
nodepool.maxSize = nodeGroup.maxSize
// set minSize-maxSize from nodepool free form tags, or else use nodeGroupAutoDiscovery configuration
nodepool.minSize = getIntFromMap(nodePoolSummary.FreeformTags, "minSize", nodeGroup.minSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we declare "minSize" and "maxSize" as const variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


nodepool.manager = nodeGroup.manager
nodepool.kubeClient = nodeGroup.kubeClient

m.staticNodePools[nodepool.id] = nodepool
klog.V(5).Infof("auto discovered nodepool in compartment : %s , nodepoolid: %s", nodeGroup.compartmentId, nodepool.id)
klog.V(4).Infof("auto discovered nodepool in compartment : %s , nodepoolid: %s ,minSize: %d, maxSize:%d", nodeGroup.compartmentId, nodepool.id, nodepool.minSize, nodepool.maxSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an important log, should we set this to default verbosity rather than 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the log is important, the only thing holding me back is that refresh cache mechanism is triggered too often and this log will become very noisy if we enable it by default.
@trungng92 Could you provide your feedback on this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a log this with default verbosity inside the the CreateNodePoolManager function after line 175, this way it isn't called after every referesh.
Something similar is done for the 'nodes' argument in line 284.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the log line within this function can remain with a lower verbosity.

Copy link
Contributor Author

@gvnc gvnc Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar is done for the 'nodes' argument in line 284.

We already have an equivalent log for nodeGroupAutoDiscovery.
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/oci/nodepools/oci_manager.go#L343

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move the log line to CreateNodepoolManager function as you suggested, it will be only printed once when the CA pod gets started, but it wouldn't show the latest list of nodepools. When customer updates the nodepool tags or adds a new nodepool that satisfies the discovery rules, we wouldn't see the updates.

Copy link
Contributor

@trungng92 trungng92 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with having it log in the refresh loop as long as we are only logging newly discovered node pools (e.g. if we discover node pool 1 on iteration 1, and node pool 2 on iteration 2, only log node pool 2).

It would be nice to avoid logging all auto discovered node pools every time (although technically it may be useful in the edge case where the user removes an auto discovered tag from a node pool).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code and added comparison logic to log the updates. It makes a copy of already discovered nodepools and runs a fresh auto-discovery, then it compares 2 lists and logs if there are any New/Updated/Deleted nodepools. Otherwise, it logs nothing.
I have also added a test method to validate the changes. See the logs from the test output below.

=== RUN   TestNodeGroupAutoDiscovery
I0911 18:23:19.562355   71028 oci_manager.go:361] node group auto discovery spec constructed: &{manager:<nil> kubeClient:<nil> clusterId:ocid1.cluster.oc1.test-region.test compartmentId:ocid1.compartment.oc1.test-region.test tags:map[ca-managed:true namespace.foo:bar] minSize:1 maxSize:5}
I0911 18:23:19.564702   71028 oci_manager.go:434] Nodepool min/max sizes are updated. [id: node-pool-2 ,minSize: 4, maxSize:10]
I0911 18:23:19.564710   71028 oci_manager.go:432] New nodepool discovered. [id: node-pool-1 ,minSize: 1, maxSize:5]
I0911 18:23:19.564714   71028 oci_manager.go:441] Previously auto-discovered nodepool removed from the managed nodepool list. nodepoolid: node-pool-3
I0911 18:23:19.564717   71028 cache.go:44] rebuilding cache
I0911 18:23:19.564721   71028 oci_manager.go:452] Refreshed NodePool list, next refresh after 2025-09-11 18:28:19.56472 +0100 BST m=+300.032385626
I0911 18:23:19.564759   71028 cache.go:44] rebuilding cache
I0911 18:23:19.564761   71028 oci_manager.go:452] Refreshed NodePool list, next refresh after 2025-09-11 18:28:19.564761 +0100 BST m=+300.032426626
--- PASS: TestNodeGroupAutoDiscovery (0.00s)
PASS

@vbhargav875
Copy link
Contributor

LGTM

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 11, 2025
@gvnc gvnc force-pushed the oci-auto-discovery-enhancement branch from 933ada9 to 4dd21c3 Compare September 11, 2025 17:53
@gvnc gvnc force-pushed the oci-auto-discovery-enhancement branch from 3343152 to e77e29c Compare September 11, 2025 20:28
@jackfrancis
Copy link
Contributor

/release-note-edit

OCI: enable nodepool min and max values with node-group-auto-discovery 

@k8s-ci-robot
Copy link
Contributor

@jackfrancis: /release-note-edit must be used with a release note block.

In response to this:

/release-note-edit

OCI: enable nodepool min and max values with node-group-auto-discovery 

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jackfrancis
Copy link
Contributor

@jlamillan I'll happily merge this once you give an lgtm

@jlamillan
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Sep 12, 2025
@jackfrancis
Copy link
Contributor

/release-note-edit

OCI: enable nodepool min and max values with node-group-auto-discovery 

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 12, 2025
@jackfrancis
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gvnc, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 12, 2025
@k8s-ci-robot k8s-ci-robot merged commit bf86702 into kubernetes:master Sep 12, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/oci Issues or PRs related to oci provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants