You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Limit ray actors to be 4, to avoid cases were app crashes due to out of
memory.
When tested OSS installation on M1 with 10 cores - we got an out of
memory error:
`raise value\nray.exceptions.OutOfMemoryError: Task was killed due to
the node running low on memory.\nMemory on the node (IP: 10.5.0.6, ID:
ccaf0ebfab0bcfe9ace62f58b7b188bd70cec9f6e62154b6ab30751a) where the task
(actor ID: 49f3ed706a5e9912f2268b5501000000,
name=CheckExecutor-3:CheckPerWindowExecutor.__init__, pid=4143, memory
used=0.36GB) was running was 7.30GB / 7.67GB (0.95197), which exceeds
the memory usage threshold of 0.95. Ray killed this worker (ID:
9eb84eac980461996027638fce5a80848572761d7b55504ca96e4568) because it was
the most recently scheduled task; to see more information about memory
usage on this node, use ray logs raylet.out -ip 10.5.0.6. To see the
logs of the worker, use ray logs
worker-9eb84eac980461996027638fce5a80848572761d7b55504ca96e4568*out -ip
10.5.0.6. Top 10 memory
users:\nPID\tMEM(GB)\tCOMMAND\n277\t0.39\t/usr/local/bin/python -c from
multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5,
pipe...\n279\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn
import spawn_main; spawn_main(tracker_fd=5,
pipe...\n278\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn
import spawn_main; spawn_main(tracker_fd=5,
pipe...\n276\t0.39\t/usr/local/bin/python -c from multiprocessing.spawn
import spawn_main; spawn_main(tracker_fd=5,
pipe...\n4143\t0.36\tray::CheckPerWindowExecutor\n361\t0.36\tray::CheckPerWindowExecutor\n356\t0.36\tray::CheckPerWindowExecutor\n357\t0.36\tray::CheckPerWindowExecutor\n358\t0.36\tray::CheckPerWindowExecutor\n101\t0.02\t/usr/local/bin/python
/usr/local/lib/python3.11/site-packages/ray/dashboard/dashboard.py
--host=loca...\nRefer to the documentation on how to address the out of
memory issue:
https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html.
Consider provisioning more memory on this node or reducing task
parallelism by requesting more CPUs per task. To adjust the kill
threshold, set the environment variable RAY_memory_usage_threshold when
starting Ray. To disable worker killing, set the environment variable
RAY_memory_monitor_refresh_ms to zero."}`
After consulting Yurii and Matan, Yurii suggested to lower the number of
actors - that seems to solve the problem.
Adding it to the default setting is in order to lower the cases of out
of memory errors on default installations.
---------
Co-authored-by: Matan Perlmutter <matan@deepchecks.com>
0 commit comments