Segfault when reading parquet files if pytorch is imported before pyarrow ( github issue, ARROW-3346).
A workaround: always import pyarrow
before import torch
.
torch/_C.so
is loaded
using RTLD_GLOBAL
flag. As a result, dynamic linker places all the symbols exported by _C.so
into the global scope. When pyarrow shared libraries are loaded they will be resolved using _C.so
's symbols.
_C.so
exports some of the standard c++ library symbols. A crash may occur if the versions of the standard C++ libraries
are incompatible.
Loading in reverse order is fine since pyarrow libraries are not exposing their symbols in the linkers' global namespace.
sudo apt-get install libtcmalloc-minimal4
LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4" python examples/mnist/pytorch_example.py
If you see the following error while trying to run the pytorch example, you are in luck:
File "/usr/local/lib/python2.7/dist-packages/torch/__init__.py", line 80, in <module>
from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS
This problem stems from a known defect in glibc dlopen
logic that made conservative
assumptions about static thread-local storage, specific with respect to surplus
DTV slots, which no longer suffice for modern compute needs.
Solutions, in increasing order of effort involved:
- Do
import torch
as early as possible (tensorflow models issue 523) - Install
libgomp1
andexport LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libgomp.so.1
- Build pytorch from source (see below, also pytorch issue 643)
- Patch glibc to increase surplus DTV slots from 14 to 32 or 64.
For background, this issue was reported back in 2013 with Matlab since 2012.
For additional references, find the glibc bug report and fix,
and the accompanying Debian glibc bug report 793689. According to 793641,
some variabnt of the static TLS fix was included in glibc-2.22
.
The OpenMP library libgomp.so.1
has had this fix in place since circa 2015.
Ubuntu Xenial and above also contains this fix, but just updating operating system
may not be sufficient if torch
links against its own version of glibc that
still uses static TLS. An ldd
analysis (per 793689 comment 20)
can reveal whether libraries like torch is actually still using static TLS.
If you choose to build pytorch from source, you can do so using the pytorch Dockerfile as follows:
- Clone the pytorch repo and
docker build -t pytorch -f docker/pytorch/Dockerfile --build-arg PYTHON_VERSION=2.7.6 .
* SetPYTHON_VERSION
to your version of choice, or leave out for pytorch Dockerfile default - Build the custom pytorch docker:
docker build -t petastorm_torch -f examples/mnist/pytorch/Dockerfile .
- Run the container and work with your code:
docker run -it --rm petastorm_torch:latest /bin/bash