Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why did it run into a 'loss not found' error when executing 'python main.py --dataset ispd2005 --design_name adaptec1 --load_from_raw True --detail_placement True' #9

Closed
huichengyu opened this issue May 8, 2024 · 13 comments

Comments

@huichengyu
Copy link

Traceback (most recent call last):
File "/home/yuhuicheng/Xplace/main.py", line 104, in
main()
File "/home/yuhuicheng/Xplace/main.py", line 100, in main
run_placement_main(args, logger)
File "/home/yuhuicheng/Xplace/src/run_placement.py", line 41, in run_placement_main
run_placement_all(args, logger)
File "/home/yuhuicheng/Xplace/src/run_placement.py", line 26, in run_placement_all
place_metrics, route_metrics = run_placement_single(cur_args, logger)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yuhuicheng/Xplace/src/run_placement.py", line 12, in run_placement_single
res = run_placement_main_nesterov(args, logger)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yuhuicheng/Xplace/src/run_placement_nesterov.py", line 163, in run_placement_main_nesterov
obj = optimizer.step(obj_and_grad_fn)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/share/process/anaconda3/envs/xplace/lib/python3.11/site-packages/torch/optim/optimizer.py", line 391, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/yuhuicheng/Xplace/src/nesterov_optimizer.py", line 128, in step
return obj
^^^
UnboundLocalError: cannot access local variable 'obj' where it is not associated with a value

@liulixinkerry
Copy link
Member

liulixinkerry commented May 8, 2024

May you provide your OS, cuda and pytorch version?
Thanks.

@huichengyu
Copy link
Author

Thank you very much for your reply.
Linux version 3.10.0-1160.105.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44
CUDA Version: 12.0
pytorch 2.3.0+cu121

@liulixinkerry
Copy link
Member

liulixinkerry commented May 8, 2024

As I remember, the latest torch(>=2.2)'s C++ API requires GCC >= 9.4. Is there any compilation error you encounter when you compile the program?

BTW, GCC 9.4 + CUDA 12.1 + Pytorch 2.3.0+cu121 works for me

@liulixinkerry
Copy link
Member

liulixinkerry commented May 9, 2024

Your CUDA version (12.0) is lower than Pytorch's CUDA version (12.1). I am not sure whether this will cause some problems.

@huichengyu
Copy link
Author

As I remember, the latest torch(>=2.2)'s C++ API requires GCC >= 9.4. Is there any compilation error you encounter when you compile the program?

BTW, GCC 9.4 + CUDA 12.1 + Pytorch 2.3.0+cu121 works for me

support up to CUDA 12.0

@huichengyu
Copy link
Author

Your CUDA version (12.0) is lower than Pytorch's CUDA version (12.1). I am not sure whether this will cause some problems.

Even if I switch to PyTorch 2.4.0 with CUDA 11.8, it still doesn't work

@liulixinkerry
Copy link
Member

Your CUDA version (12.0) is lower than Pytorch's CUDA version (12.1). I am not sure whether this will cause some problems.

Even if I switch to PyTorch 2.4.0 with CUDA 11.8, it still doesn't work

What is your GCC version? Is it still 4.8.5?

@huichengyu
Copy link
Author

Your CUDA version (12.0) is lower than Pytorch's CUDA version (12.1). I am not sure whether this will cause some problems.

Even if I switch to PyTorch 2.4.0 with CUDA 11.8, it still doesn't work

What is your GCC version? Is it still 4.8.5?

GCC version is 11.4.0

@liulixinkerry
Copy link
Member

liulixinkerry commented May 9, 2024

File "/home/yuhuicheng/Xplace/src/run_placement_nesterov.py", line 163, in run_placement_main_nesterov
obj = optimizer.step(obj_and_grad_fn)

Why is your obj = optimizer.step(obj_and_grad_fn) on L163? It should be on https://github.com/cuhk-eda/Xplace/blob/main/src/run_placement_nesterov.py#L145

I guess you are currently using the nn branch. However, I have not been maintaining this branch for approximately one year.

I suggest referring to commit 0aa58ed as a reference point to modify your code in nn branch.

@huichengyu
Copy link
Author

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_BUILD_TYPE: Release
-- PROJECT_SOURCE_DIR=/home/yuhuicheng/Xplace
-- CMAKE_CXX_ABI: _GLIBCXX_USE_CXX11_ABI=0
-- pybind11 v2.13.0 dev1
-- Found PythonInterp: /home/share/process/anaconda3/envs/xplace/bin/python (found suitable version "3.11.8", minimum required is "3.6")
-- Found PythonLibs: /home/share/process/anaconda3/envs/xplace/lib/libpython3.11.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- PYTHON_INCLUDE_DIRS: /home/share/process/anaconda3/envs/xplace/include/python3.11
CMake Deprecation Warning at thirdparty/flute/CMakeLists.txt:32 (CMAKE_MINIMUM_REQUIRED):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- FLUTE_INCLUDE_DIR: /home/yuhuicheng/Xplace/thirdparty/flute
-- LEMON_INCLUDE_DIRS: /home/yuhuicheng/Xplace/thirdparty/lemon/include
-- LEMON_LIBRARIES: /home/yuhuicheng/Xplace/thirdparty/lemon/lib/libemon.a
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.27.1")
-- Checking for module 'cairo'
-- No package 'cairo' found
-- Found Cairo: /home/yuhuicheng/cairo16-linux/include/cairo
-- CAIRO_INCLUDE_DIRS: /home/yuhuicheng/cairo16-linux/include/cairo
-- CAIRO_LIBRARIES: /home/yuhuicheng/cairo16-linux/lib/libcairo.so
-- TORCH_INSTALL_PREFIX=/home/share/process/anaconda3/envs/xplace/lib/python3.11/site-packages/torch
-- TORCH_VERSION=2.3
CMake Warning (dev) at CMakeLists.txt:69 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.

This warning is for project developers. Use -Wno-dev to suppress it.

-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found suitable version "11.8", minimum required is "11.8")
-- TORCH_ENABLE_CUDA=1
-- CUDA_ARCH_FLAGS: -gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;--compiler-options;-fPIC;-std=c++17;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;--extended-lambda;--expt-relaxed-constexpr
-- TORCH_INCLUDE_DIRS=/home/share/process/anaconda3/envs/xplace/include/python3.11/home/share/process/anaconda3/envs/xplace/lib/python3.11/site-packages/torch/include/home/share/process/anaconda3/envs/xplace/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Configuring done (3.3s)
-- Generating done (0.2s)
-- Build files have been written to: /home/yuhuicheng/Xplace/build

@liulixinkerry
Copy link
Member

File "/home/yuhuicheng/Xplace/src/run_placement_nesterov.py", line 163, in run_placement_main_nesterov
obj = optimizer.step(obj_and_grad_fn)

Why is your obj = optimizer.step(obj_and_grad_fn) on L163? It should be on https://github.com/cuhk-eda/Xplace/blob/main/src/run_placement_nesterov.py#L145

I guess you are currently using the nn branch. However, I have not been maintaining this branch for approximately one year.

I suggest referring to commit 0aa58ed as a reference point to modify your code in nn branch.

@huichengyu Please try this.

@huichengyu
Copy link
Author

File "/home/yuhuicheng/Xplace/src/run_placement_nesterov.py", line 163, in run_placement_main_nesterov文件“/home/yuhuicheng/Xplace/src/run_placement_nesterov.py”,第 163 行,run_placement_main_nesterov
obj = optimizer.step(obj_and_grad_fn)obj = 优化器.step(obj_and_grad_fn)

Why is your obj = optimizer.step(obj_and_grad_fn) on L163? It should be on https://github.com/cuhk-eda/Xplace/blob/main/src/run_placement_nesterov.py#L145为什么你在 obj = optimizer.step(obj_and_grad_fn) L163 上?它应该打开 https://github.com/cuhk-eda/Xplace/blob/main/src/run_placement_nesterov.py#L145
I guess you are currently using the nn branch. However, I have not been maintaining this branch for approximately one year.我猜您目前正在使用该 nn 分支。但是,我已经有大约一年没有维护这个分支了。
I suggest referring to commit 0aa58ed as a reference point to modify your code in nn branch.我建议参考提交 0aa58ed 作为参考点来修改分支中的 nn 代码。

@huichengyu Please try this. 请试试这个。

this does not contain nn branch

@huichengyu
Copy link
Author

File "/home/yuhuicheng/Xplace/src/run_placement_nesterov.py", line 163, in run_placement_main_nesterov
obj = optimizer.step(obj_and_grad_fn)

Why is your obj = optimizer.step(obj_and_grad_fn) on L163? It should be on https://github.com/cuhk-eda/Xplace/blob/main/src/run_placement_nesterov.py#L145
I guess you are currently using the nn branch. However, I have not been maintaining this branch for approximately one year.
I suggest referring to commit 0aa58ed as a reference point to modify your code in nn branch.

@huichengyu Please try this.

problem solved ,thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants