Skip to content

Conversation

liulehui
Copy link
Contributor

@liulehui liulehui commented Sep 2, 2025

Why are these changes needed?

  1. g3 instance is relatively old and have low availability on AWS.
  2. upgrade to g4dn machine that have better availability to reduce test flakiness.
  3. one successful release test for the golden_notebook_torch_tune_serve_test.aws

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR upgrades AWS instance types for release tests from g3 to g4dn to improve stability, which is a good initiative. Most changes are consistent with this goal. However, one change in release/golden_notebook_tests/gpu_tpl_aws.yaml introduces a g5g (ARM-based) instance, creating a mixed-architecture cluster that could cause compatibility issues. I've added a comment with a suggestion to align this with the other changes.

Signed-off-by: Lehui Liu <lehui@anyscale.com>
@liulehui
Copy link
Contributor Author

liulehui commented Sep 3, 2025

one sample successful release test for the golden_notebook_torch_tune_serve_test.aws

@liulehui liulehui requested a review from matthewdeng September 3, 2025 01:16
@ray-gardener ray-gardener bot added train Ray Train Related Issue devprod release-test release test labels Sep 3, 2025
@matthewdeng matthewdeng enabled auto-merge (squash) September 3, 2025 17:54
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 3, 2025
@matthewdeng matthewdeng merged commit dbdfc77 into ray-project:master Sep 3, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devprod go add ONLY when ready to merge, run all tests release-test release test train Ray Train Related Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants