Skip to content

[Bug]: Path [/usr/bin] for [ge.debugDir] is invalid.Result: access real path failed. Reason: Permissi #2748

@wuyangjian1115

Description

@wuyangjian1115

Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.7.1+cpu
Is debug build: False

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.31.2
Libc version: glibc-2.31

Python version: 3.9.19 (main, Aug 29 2025, 23:48:01)  [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.10.135.bsk.6-amd64-x86_64-with-glibc2.31

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 57 bits virtual
CPU(s):                          228
On-line CPU(s) list:             0-227
Thread(s) per core:              2
Core(s) per socket:              57
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           207
Model name:                      INTEL(R) XEON(R) PLATINUM 8582C
Stepping:                        2
CPU MHz:                         2600.000
BogoMIPS:                        5200.00
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       5.3 MiB
L1i cache:                       3.6 MiB
L2 cache:                        228 MiB
L3 cache:                        600 MiB
NUMA node0 CPU(s):               0-113
NUMA node1 CPU(s):               114-227
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; TSX disabled
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk arch_lbr amx_bf16 avx512_fp16 amx_tile amx_int8 arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.2
[pip3] torch==2.7.1+cpu
[pip3] torch_npu==2.7.1+git217cd40
[pip3] torchvision==0.22.1+cpu
[pip3] transformers==4.53.3
[conda] Could not collect
vLLM Version: 0.10.0
vLLM Ascend Version: 0.10.0rc2.dev0+g4604882a3.d20250901 (git sha: 4604882a3, date: 20250901)

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ASCEND_LOG_LEVEL=DEBUG
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ASCEND_VISIBLE_DEVICES=4,6
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
VLLM_TORCH_PROFILER_DIR=/opt/tiger/torchrec_npu/prof
ASCEND_RUNTIME_OPTIONS=
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
TORCH_DEVICE_BACKEND_AUTOLOAD=0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ASCEND_PROCESS_LOG_PATH=/var/log/tiger/ascend_diag_logs/run_0/process_log
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
VLLM_TARGET_DEVICE=npu
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/x86_64:/opt/tiger/native_libhdfs/lib/native:/opt/tiger/jdk/jdk8u265-b01/jre/lib/amd64/server:/opt/tiger/yarn_deploy/hadoop_current/lib/native:/opt/tiger/yarn_deploy/hadoop_current/lib/native/ufs:/opt/tiger/yarn_deploy/hadoop/lib/native:/opt/tiger/yarn_deploy/hadoop_current/lib/native:/opt/tiger/yarn_deploy/hadoop_current/lzo/lib:/opt/tiger/workspace/Python-3.9.19/output/lib:/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/$(arch)::/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/toolbox/latest/Ascend-DMI/lib64
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2.8               Version: 24.1.rc2.8                                           |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 4     910B2C              | OK            | 96.5        43                0    / 0             |
| 0                         | 0000:6B:02.0  | 0           0    / 0          3413 / 65536         |
+===========================+===============+====================================================+
| 6     910B2C              | OK            | 89.4        43                0    / 0             |
| 0                         | 0000:73:02.0  | 0           0    / 0          3416 / 65536         |


CANN:
package_name=Ascend-cann-toolkit
version=8.2.RC1
innerversion=V100R001C22SPC001B231
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21],[V100R001C23]
arch=x86_64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.2.RC1/x86_64-linux

🐛 Describe the bug

32卡多机环境,存在/usr/bin的写操作,当前环境下不能开放/usr/bin的写权限,而且为什么存在往/usr/bin下的写操作呢


[Set][Options]OpCompileProcessor init failed![FUNC:ReportInnnerError][FILE:log_inner.cpp][LINE:145]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:0204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619] Error executing method 'init_worker'. This might cause deadlock in distributed execution

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619] Traceback (most recent call last):

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

File "/usr/local/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 611, in execute_method

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52j

ERROR 09-01 20:16:24

[worker_base.py:619]

return run_method(self, method, args, kwargs)

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

..................................... 

[worker_base.py:619]

(RayWorkerWrapper pid=169127,

ip=[2605:340:cd51:4900:d515:c204:16f5:ed52])

ERROR 09-01 20:16:24

[worker_base.py:619]

File "/usr/local/lib/python3.11/site-packages/vllm/utils/__init__.py", line 2985, in run_method

RayWorkerWrapper pid=169127,

ip=[2605:340:cd51:4900:d515:c204:16f5:ed52])

ERROR 09-01 20:16:24

return func(*args, **kwargs)

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

....................

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

File "/home/.local/lib/python3.11/site-packages/ray/util/tracing/tracing_helper.py", line 463, in _ressume_span

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

return method(self, *_args, **_kwargs)

[worker_base.py:619]

................................

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:a515:c204:16f5:ed52

ERROR 09-01 20:16:24

[worker_base.py:619]

File "/usr/local/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 592, in init_worker

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d5155:c204:16f5:ed52]

ERROR 09-01 20:16:24

self.worker = worker_class(**kwargs)

[worker_base.py:619]

.....................

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:0204:16f5:ed52]

ERROR 09-01 20:16:24

File "/usr/local/lib/python3.11/site-packages/vllm_ascernd/worker/worker_v1.py", line 77, in __init__

[worker_base.py:619]

init_ascend_soc_version(

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:0d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

File "/usr/local/lib/python3.11/site-packages/vllm_asctend/utils.py", line 494, in init_ascend_soc_version

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

soc_version = torch_npu.npu.get_soc_version()

ERROR 09-01 20:16:24

...............................

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d[515:c204:16f5:ed52]

ERROR 09-01 20:16:24

File "/usr/local/lib/python3.11/site-packages/torch_npou/npu/_backends.py", line 97, in get_soc_version

[worker_base.py:619]

ERROR 09-01 20:16:24

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

[worker_base.py:619]

torch_npu.npu._lazy_init()

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:a1515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

File "/usr/local/lib/python3.11/site-packages/torch_npu/npu/__init__.py", line 242, in _lazy_init

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52j

ERROR 09-01 20:16:24

[worker_base.py:619]

torch_npu._C._npu_init()

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:a515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

RuntimeError: SetPrecisionMode:torch_npu/csrc/framework/LazyInitAclops.cpp:155 NPU function error: at_npu::native::AclSetCompileopt(aclCompileOpt

::ACL_PRECISION_MODE, precision_mode), error code is 500001

ERROR 09-01 20:16:24

[ERROR] 2025-09-01-20:16:24 (PID:169127, Device:0, RankID:-1) ERR00100 PTA call acl api failed

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d1515:c204:16f5:ed52])

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:a4515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

[Error]: The internal ACL of the system is incorrect.

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

Rectify the fault based on the error information in the ascend log

[worker_base.py:619] E40023: [PID: 169127] 2025-09-01-20:106:24.365.068 Path [/usr/bin] for [ge.debugDir] is invalid.Result: access real path failed. Reason: Permissi

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52ๅี

ERROR 09-01 20:16:24

on denied.

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d[515:c204:16f5:ed52])

ERROR 09-01 20:16:24

Possible Cause: The path does not exist.

[worker_base.py:619]

(RayWorkerwrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204::16f5:ed52j) ERROR 09-01 20:16:24 [worker_base.py:619]

Solution: Change the path to the effective value.

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52j

ERROR 09-01 20:16:24

TraceBack (most recent call last):

[worker_base.py:619]

ERROR 09-01 20:16:24

Failed to initialize TeConfigInfo.

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d5155:c204:16f5:ed52j

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:a515:c204:16f5:ed52

ERROR 09-01 20:16:24

[GraphOpt][InitializeInner][InitTbeFunc] Failed to init tbee.[FUNC:InitializeTeFusion][FILE:[tbe_op_store_adapter.cc](http://tbe_op_store_adapter.cc/)][LINE:1921]

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d5155:c204:16f5:ed52]

ERROR 09-01 20:16:24

[GraphOpt][InitializeInner][InitTeFusion]: Failed to initiaalize TeFusion.[FUNC:InitializeInner][FILE:[tbe_op_store_adapter.cc](http://tbe_op_store_adapter.cc/)][LINE:1888]

[worker_base.py:619]

[SubGraphOpt][PreCompileOp][InitAdapter] InitializeAdapteradapter [tbe_op_adapter] failed! Ret [4294967295][FUNC:InitializeAdapter][FIL

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

:[op_store_adapter_manager.cc](http://op_store_adapter_manager.cc/)][LINE:79]

[SubGraphOpt][PreCompileOp][Init] Initialize op storeadapter failed, OpsStoreName[tbe-custom].[FUNC:Initialize][FILE:op_store_adapter_ma

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52

ERROR 09-01 20:16:24

[worker_base.py:619]

[nager.cc](http://nager.cc/)][LINE:120]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

[FusionMngr][Init] Op store adapter manager init failed.[FUNC:Initialize][FILE:[fusion_manager.cc](http://fusion_manager.cc/)][LINE:115]

ERROR 09-01 20:16:24

[worker_base.py:619]

PluginManager InvokeAll failed.[FUNC:Initialize][FILE:ops_k:[ernel_manager.cc](http://ernel_manager.cc/)][LINE:83]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d5155:c204:16f5:ed52jื

ERROR 09-01 20:16:24

[worker_base.py:619]

ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]]

ERROR 09-01 20:16:24

OpsManager initialize failed.[FUNC:InnerInitialize][FILE:[gelib.cc](http://gelib.cc/)][LINE:239]

RayWorkerWrapper_pid=169127,

[worker_base.py:619]

ERROR 09-01 20:16:24

GELib::InnerInitialize failed.[FUNC:Initialize][FILE:gelibCC][LINE :164]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

[worker_base.py:619]

GEInitialize failed.[FUNC:GEInitialize][FILE:[ge_api.co](http://ge_api.co/)[LINE:384]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d1515:c204:16f5:ed52])

ERROR 09-01 20:16:24

[worker_base.py:619]

[Initialize][Ge]GEInitialize failed. ge result = 4294967295[FUINC:ReportCallError][FILE:log_inner.cpp][LINE:161]

ERROR 09-01 20:16:24

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52jื

[worker_base.py:619]

[Init][Compiler]Init compiler failed[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

(RayWorkerWrapper pid=169127, ip=[2605:340:cd51:4900:d515:c204:16f5:ed52]

ERROR 09-01 20:16:24

[worker_base.py:619]

[Set][Options]OpCompileProcessor init failed![FUNC:Report][nnerError][FILE:log_inner.cpp][LINE:145]```

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions