You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
# set a breakpoint in ...\modin\core\execution\ray\common\utils.py line 138importmodin.pandasaspddf=pd.DataFrame()
# check the contents of ray_init_kwargs#or really just look at the code there:# object_store_memory = _get_object_store_memory()# ray_init_kwargs = {# "num_cpus": CpuCount.get(),# "num_gpus": GpuCount.get(),# "include_dashboard": False,# "ignore_reinit_error": True,# "object_store_memory": object_store_memory,# "_redis_password": redis_password,# "_memory": object_store_memory,# "resources": RayInitCustomResources.get(),# **extra_init_kw,# }
Issue Description
modin sets _memory and object_store_memory to the same value. This not only leads to instability and crashes, but it also reduces the flexibility as _memory can be set to a value higher then the shared memory while object_store_memory cannot.
A lot of the issues I faced the last few days with read_parquet() (althrough, this still fills up RAM until my pc crashes), to_parquet(), concat(), etc etc stemmed from the issue that when the object store was full and a spill was attempted, a write violation happend, and a raylet died.
I noticed that modin runs a lot more stable when ray.init() was called manually. This is because there the two values are not set to the same value per default.
Also, it would be great if the ray dashboard was not disabled per default, without being able to enable it when initialising with modin. But I digress.
Expected Behavior
If no manual configuration was done, or env variables where set, the default ray init should be used.
And if not default, then not something this debilitating.
After initializing ray manually and just setting _memory to something way larger, stuff just started working.
While setting MODIN_MEMORY to something higher when using modins initialisation did not work, because it lead to a value error from RAY stating that object_store_memory cant be set that high (even though I did never care about the object_store_memory.
Error Logs
Replace this line with the error backtrace (if applicable).
Installed Versions
INSTALLED VERSIONS
commit : c8bbca8
python : 3.11.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Austria.1252
Modin dependencies
Modin has these default values because it helps to achieve good performance in general.
If you have a specific case and Modin's configuration variables don't help you, you can initialize ray yourself.
I see.
I understand my experience does not stand by any means for everyone. But with these defaults I had numerous bluescreens, freezes and crashes. All in all making debugging and figuring this out a lot more troublesome then necessary.
I did not want to initialize ray myself for the exact cause that I thought modin will know best, but it did give me no option to just adapt the two values that lead to issues for me (_memory and include_dashboard)
if you think the current defaults work fine most of the time and my situation is an outlier, fair enough!
I still think introducing config params or env vars that give the option to set _memory, object_store_memory and include_dashboard manually while still relying on modins ray initialisation would be good.
As I understood its a relatively new feature of modin that it initialises ray itself. So maybe there will be some changes along the way anyway. For now, now that I understand that, its fine to initialize ray manually
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
modin sets
_memory
andobject_store_memory
to the same value. This not only leads to instability and crashes, but it also reduces the flexibility as _memory can be set to a value higher then the shared memory while object_store_memory cannot.A lot of the issues I faced the last few days with read_parquet() (althrough, this still fills up RAM until my pc crashes), to_parquet(), concat(), etc etc stemmed from the issue that when the object store was full and a spill was attempted, a write violation happend, and a raylet died.
I noticed that modin runs a lot more stable when ray.init() was called manually. This is because there the two values are not set to the same value per default.
Also, it would be great if the ray dashboard was not disabled per default, without being able to enable it when initialising with modin. But I digress.
Expected Behavior
If no manual configuration was done, or env variables where set, the default ray init should be used.
And if not default, then not something this debilitating.
After initializing ray manually and just setting
_memory
to something way larger, stuff just started working.While setting MODIN_MEMORY to something higher when using modins initialisation did not work, because it lead to a value error from RAY stating that
object_store_memory
cant be set that high (even though I did never care about theobject_store_memory
.Error Logs
Installed Versions
INSTALLED VERSIONS
commit : c8bbca8
python : 3.11.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Austria.1252
Modin dependencies
modin : 0.31.0
ray : 2.34.0
dask : 2024.7.1
distributed : 2024.7.1
pandas dependencies
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 68.2.2
pip : 24.1.2
Cython : 0.29.37
pytest : 8.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.23.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.3.1
gcsfs : None
matplotlib : 3.8.2
numba : 0.60.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 15.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.0
sqlalchemy : 2.0.29
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: