Skip to content

Commit 9ab5da8

Browse files
shayts7Matan Perlmutter
andauthored
Limit ray actors (#220)
Limit ray actors to be 4, to avoid cases were app crashes due to out of memory. When tested OSS installation on M1 with 10 cores - we got an out of memory error: `raise value\nray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.\nMemory on the node (IP: 10.5.0.6, ID: ccaf0ebfab0bcfe9ace62f58b7b188bd70cec9f6e62154b6ab30751a) where the task (actor ID: 49f3ed706a5e9912f2268b5501000000, name=CheckExecutor-3:CheckPerWindowExecutor.__init__, pid=4143, memory used=0.36GB) was running was 7.30GB / 7.67GB (0.95197), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: 9eb84eac980461996027638fce5a80848572761d7b55504ca96e4568) because it was the most recently scheduled task; to see more information about memory usage on this node, use ray logs raylet.out -ip 10.5.0.6. To see the logs of the worker, use ray logs worker-9eb84eac980461996027638fce5a80848572761d7b55504ca96e4568*out -ip 10.5.0.6. Top 10 memory users:\nPID\tMEM(GB)\tCOMMAND\n277\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe...\n279\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe...\n278\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe...\n276\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe...\n4143\t0.36\tray::CheckPerWindowExecutor\n361\t0.36\tray::CheckPerWindowExecutor\n356\t0.36\tray::CheckPerWindowExecutor\n357\t0.36\tray::CheckPerWindowExecutor\n358\t0.36\tray::CheckPerWindowExecutor\n101\t0.02\t/usr/local/bin/python /usr/local/lib/python3.11/site-packages/ray/dashboard/dashboard.py --host=loca...\nRefer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable RAY_memory_usage_threshold when starting Ray. To disable worker killing, set the environment variable RAY_memory_monitor_refresh_ms to zero."}` After consulting Yurii and Matan, Yurii suggested to lower the number of actors - that seems to solve the problem. Adding it to the default setting is in order to lower the cases of out of memory errors on default installations. --------- Co-authored-by: Matan Perlmutter <matan@deepchecks.com>
1 parent bec8e82 commit 9ab5da8

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

deploy/oss-conf.env

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
INIT_LOCAL_RAY_INSTANCE=True
2+
TOTAL_NUMBER_OF_CHECK_EXECUTOR_ACTORS=4
23
DEPLOYMENT_URL=https://$DOMAIN
34
oauth_url=https://$DOMAIN:8443
45
oauth_client_id=ba6ce982162fb0d58e0e

0 commit comments

Comments
 (0)