Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGL error when starting to training gdrnpp #20

Open
yinguoxiangyi opened this issue Dec 20, 2022 · 5 comments
Open

EGL error when starting to training gdrnpp #20

yinguoxiangyi opened this issue Dec 20, 2022 · 5 comments

Comments

@yinguoxiangyi
Copy link

Training command

(base) root@a3c636c20700:/workspace/gdrnpp_bop2022# CUDA_VISIBLE_DEVICES=0 python ./core/gdrn_modeling/main_gdrn.py     --config-file configs/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless.py --num-gpus 1 --opts MODEL.WEIGHTS=output/gdrn/tless/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_tless/model_final_wo_optim.pth --resume

Error log

20221220_030409|d2.utils.env@41: Using a generated random seed 10091570
20221220_030409|core.utils.default_args_setup@162: Used mmcv backend: cv2
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 556faa62b6a0 at 0x7f9a566023d0>
20221220_030409|DBG|OpenGL.platform.ctypesloader@65: Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 556fab0e1ee0 at 0x7f9a566de0d0>
--- Logging error in Loguru Handler #2 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
    return function(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
    ).run(args, cfg)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
    return run_method(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
    renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
    use_cache=True,
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
    self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
  File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_MATCH,
        baseOperation = eglCreateContext,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
                <OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
                <OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
                <OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
        ),
        result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
    self._queue.put(str_record)
  File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
    obj = _ForkingPickler.dumps(obj)
  File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
    pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---
--- Logging error in Loguru Handler #3 ---
Record was: {'elapsed': datetime.timedelta(seconds=5, microseconds=660273), 'exception': (type=<class 'OpenGL.raw.EGL._errors.EGLError'>, value=EGLError( err=EGL_BAD_MATCH (12297), baseOperation = eglCreateContext ), traceback=<traceback object at 0x7f9b05980550>), 'extra': {}, 'file': (name='main_gdrn.py', path='./core/gdrn_modeling/main_gdrn.py'), 'function': '<module>', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '<module>', process 'MainProcess' (132473), thread 'MainThread' (140308821803200):", 'module': 'main_gdrn', 'name': '__main__', 'process': (id=132473, name='MainProcess'), 'thread': (id=140308821803200, name='MainThread'), 'time': datetime(2022, 12, 20, 3, 4, 9, 967196, tzinfo=datetime.timezone(datetime.timedelta(0), 'UTC'))}
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_logger.py", line 1226, in catch_wrapper
    return function(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 205, in main
    ).run(args, cfg)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 402, in _run_impl
    return run_method(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/lite/lite.py", line 409, in _run_with_strategy_setup
    return run_method(*args, **kwargs)
  File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
    renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 290, in get_renderer
    use_cache=True,
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/workspace/gdrnpp_bop2022/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 218, in init_context
    self._egl_context = eglCreateContext(self._egl_display, configs[0], EGL_NO_CONTEXT, context_attributes)
  File "/opt/conda/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_MATCH,
        baseOperation = eglCreateContext,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f9a55df9680>,
                <OpenGL._opaque.EGLConfig_pointer object at 0x7f9a55e599e0>,
                <OpenGL._opaque.EGLContext_pointer object at 0x7f9a5a804b00>,
                <OpenGL.arrays.lists.c_int_Array_7 object at 0x7f9a55e59cb0>,
        ),
        result = <OpenGL._opaque.EGLContext_pointer object at 0x7f9b05ae2680>
)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/loguru/_handler.py", line 175, in emit
    self._queue.put(str_record)
  File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 358, in put
    obj = _ForkingPickler.dumps(obj)
  File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/conda/lib/python3.7/site-packages/loguru/_recattrs.py", line 73, in __reduce__
    pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---

I noticed that @wangg12 had solved this error in this comment DLR-RM/AugmentedAutoencoder#19 (comment)
Would you mind telling the solution?

@Basilel7
Copy link

I do have a similar issue when running gdrn training in Docker that i don't encounter running on local Ubuntu

20221220_113920|core.utils.default_args_setup@144: Full config saved to output/gdrn/ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv.py
20221220_113920|d2.utils.env@41: Using a generated random seed 22601290
20221220_113920|core.utils.default_args_setup@162: Used mmcv backend: cv2
20221220_113920|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so.1 <CDLL 'libEGL.so.1', handle 6774f80 at 0x7efd3cf29220>
20221220_113920|DBG|OpenGL.platform.ctypesloader@65: Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 71febf0 at 0x7efd3cdb1640>
ane tro tro
--- Logging error in Loguru Handler #2 ---
Record was: {'elapsed': datetime.timedelta(seconds=16, microseconds=268156), 'exception': (type=<class 'OpenGL.raw.EGL.errors.EGLError'>, value=EGLError( err=EGL_BAD_DISPLAY (EGL_BAD_DISPLAY), baseOperation = eglInitialize ), traceback=<traceback object at 0x7efd3cc015c0>), 'extra': {}, 'file': (name='main_gdrn.py', path='core/gdrn_modeling/main_gdrn.py'), 'function': '', 'level': (name='ERROR', no=40, icon='❌'), 'line': 233, 'message': "An error has been caught in function '', process 'MainProcess' (1769891), thread 'MainThread' (139632321048896):", 'module': 'main_gdrn', 'name': 'main', 'process': (id=1769891, name='MainProcess'), 'thread': (id=139632321048896, name='MainThread'), 'time': datetime(2022, 12, 20, 11, 39, 20, 857957, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600), 'CET'))}
Traceback (most recent call last):
File "/opt/conda/envs/gdrn
/lib/python3.8/site-packages/loguru/logger.py", line 1226, in catch_wrapper
return function(*args, **kwargs)
File "core/gdrn_modeling/main_gdrn.py", line 199, in main
Lite(
File "/opt/conda/envs/gdrn
/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 405, in run_impl
return run_method(*args, **kwargs)
File "/opt/conda/envs/gdrn
/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 412, in _run_with_strategy_setup
return run_method(*args, **kwargs)
File "core/gdrn_modeling/main_gdrn.py", line 155, in run
renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
File "/home2/blongo/gdrn1/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 279, in get_renderer
ren = EGLRenderer(
File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 83, in init
self.context = OffscreenContext(gpu_id=cuda_device_idx)
File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in init
self.init_context()
File "/home2/blongo/gdrn1/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 208, in init_context
assert eglInitialize(self.egl_display, major, minor)
File "/opt/conda/envs/gdrn
/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 415, in call
return self( *args, **named )
File "/opt/conda/envs/gdrn
/lib/python3.8/site-packages/OpenGL/error.py", line 230, in glCheckError
raise self._errorClass(
OpenGL.raw.EGL._errors.EGLError: EGLError(
err = EGL_BAD_DISPLAY,
baseOperation = eglInitialize,
cArguments = (
<OpenGL._opaque.EGLDisplay_pointer object at 0x7efd3cea13c0>,
c_long(0),
c_long(0),
),
result = 0
)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/gdrn_/lib/python3.8/site-packages/loguru/handler.py", line 175, in emit
self.queue.put(str_record)
File "/opt/conda/envs/gdrn
/lib/python3.8/multiprocessing/queues.py", line 362, in put
obj = ForkingPickler.dumps(obj)
File "/opt/conda/envs/gdrn
/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/opt/conda/envs/gdrn
/lib/python3.8/site-packages/loguru/_recattrs.py", line 73, in reduce
pickle.dumps(self.value)
ValueError: ctypes objects containing pointers cannot be pickled
--- End of logging error ---

@shanice-l
Copy link
Owner

Maybe you should build egl renderer under the docker environment.

@FedericoVasile1
Copy link

FedericoVasile1 commented Mar 27, 2023

I encountered the same error as @yinguoxiangyi. I built a docker image with ubuntu 18 and cuda 11.3 (nvidia/cuda:11.3.0-cudnn8-devel-ubuntu18.04) . I successfully installed both the dependencies sh scripts/install_deps.sh and the egl renderer sh compile_cpp_egl_renderer.sh.
However, when I run

cd gdrnpp_bop2022
python -m lib.egl_renderer.egl_renderer_v3

I get the following error:

libEGL warning: DRI2: failed to create dri screen
libEGL warning: Not allowed to force software rendering when API explicitly selects a hardware device.
libEGL warning: DRI2: failed to create dri screen
Traceback (most recent call last):
  File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 1422, in <module>
    use_cache=True,
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
    self._context = OffscreenContext(gpu_id=cuda_device_idx)
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
    self.init_context()
  File "/root/repos/pose-estimation/gdrnpp_bop2022/lib/egl_renderer/glutils/egl_offscreen_context.py", line 208, in init_context
    assert eglInitialize(self._egl_display, major, minor)
  File "/root/miniconda3/envs/gdrnpp_bop2022/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
	err = EGL_NOT_INITIALIZED,
	baseOperation = eglInitialize,
	cArguments = (
		<OpenGL._opaque.EGLDisplay_pointer object at 0x7ff44f8fa440>,
		c_long(0),
		c_long(0),
	),
	result = 0
)

I provide you the docker image I'm using:

docker pull federicovasile/ubuntu18-cuda11.3-gdrnpp

You can start a container with:

docker run -it --name gdrnpp-workspace -p 6080:6080 --shm-size=8gb --gpus all --privileged -v /dev:/dev -v /YOUR/PATH/HERE/datasets:/root/datasets:ro federicovasile/ubuntu18-cuda11.3-gdrnpp bash

WARNING: disclaimer... please revise the docker run above before running it. For instance:

  • do not abuse --privileged (see here) and -v /dev:/dev
  • a read-only volume to the datasets folder is created, insert the correct path, e.g. -v /home/federicovasile/datasets:/root/datasets:ro
  • -p 6080:6080 for VNC client, more info below

When inside the container:

cd /root/pose-estimation/gdrnpp_bop2022
conda activate gdrnpp_bop2022
python -m lib.egl_renderer.egl_renderer_v3

Moreover, the image comes with VNC client providing desktop GUI. When inside the container, run start-vnc-session.sh then visit localhost:6080 on your browser

@wangg12 @shanice-l thank you for the nice work, I look forward for your help.

PS: given the requests and common interest #14 #12, I'm planning to release to the community the fully working docker image and the complete inference pipeline (YOLOX + GDR-Net)

@hoenigpeter
Copy link

Hi @FedericoVasile1, have you tried it with a cudagl base image, such as:
FROM nvidia/cudagl:11.3.0-devel-ubuntu20.04

@wangg12
Copy link
Collaborator

wangg12 commented Sep 4, 2024

cudagl dockers should work, at least on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants