You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
P0 - Critical breaking issue or missing functionality
Current Behavior
Accessing deeplake dataset rows under a multiprocessing library such as concurrent futures results in an error.
Consider the following script which creates a dummy deeplake dataset and tries to access it with multiprocessing
import concurrent
from functools import partial
import deeplake
from deeplake import Dataset
DS_PATH = "/tmp/test_deeplake"
def create_deeplake_ds():
ds = deeplake.empty(DS_PATH, overwrite=True)
with ds:
ds.create_tensor("dummy", htype="text")
ds.dummy.append("dummy_test")
def worker(idx: int, ds: Dataset) -> None:
print("Row", ds[idx])
if __name__ == "__main__":
use_multi = True
create_deeplake_ds()
ds = deeplake.load(DS_PATH, read_only=True)
if use_multi:
with concurrent.futures.ProcessPoolExecutor() as executor:
results = list(executor.map(partial(worker, ds=ds), [0]))
else:
results = worker(0, ds=ds)
With deeplake 3.9.26
this gives the following error:
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/site-packages/deeplake/core/dataset/dataset.py", line 1380, in __getattr__
return self.__getitem__(key)
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/site-packages/deeplake/core/dataset/dataset.py", line 582, in __getitem__
raise TensorDoesNotExistError(item)
deeplake.util.exceptions.TensorDoesNotExistError: "Tensor 'index_params' does not exist."
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/concurrent/futures/process.py", line 202, in _process_chunk
return [fn(*args) for args in chunk]
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/concurrent/futures/process.py", line 202, in <listcomp>
return [fn(*args) for args in chunk]
File "/Users/abhay/deep_test/deep.py", line 19, in worker
print("Row", ds[idx])
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/site-packages/deeplake/core/dataset/dataset.py", line 653, in __getitem__
index_params=self.index_params,
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/site-packages/deeplake/core/dataset/dataset.py", line 1382, in __getattr__
raise AttributeError(
AttributeError: '<class 'deeplake.core.dataset.dataset.Dataset'>' object has no attribute 'index_params'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/abhay/deep_test/deep.py", line 29, in <module>
results = list(executor.map(partial(worker, ds=ds), [0]))
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/concurrent/futures/process.py", line 567, in _chain_from_iterable_of_lists
for element in iterable:
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/concurrent/futures/_base.py", line 608, in result_iterator
yield fs.pop().result()
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/concurrent/futures/_base.py", line 445, in result
return self.__get_result()
File "/Users/abhay/miniconda3/envs/test/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
AttributeError: '<class 'deeplake.core.dataset.dataset.Dataset'>' object has no attribute 'index_params'
Steps to Reproduce
See description in current behavior
Expected/Desired Behavior
Either it should be documented that accessing dataset under multiprocessing is not allowed or the access should not throw the error that is seen
Python Version
Python 3.10.0
OS
MacOS Ventura 13.5
IDE
Terminal
Packages
deeplake==3.9.26
Additional Context
No response
Possible Solution
No response
Are you willing to submit a PR?
I'm willing to submit a PR (Thank you!)
The text was updated successfully, but these errors were encountered:
Severity
P0 - Critical breaking issue or missing functionality
Current Behavior
Accessing deeplake dataset rows under a multiprocessing library such as concurrent futures results in an error.
Consider the following script which creates a dummy deeplake dataset and tries to access it with multiprocessing
With deeplake 3.9.26
this gives the following error:
Steps to Reproduce
See description in current behavior
Expected/Desired Behavior
Either it should be documented that accessing dataset under multiprocessing is not allowed or the access should not throw the error that is seen
Python Version
Python 3.10.0
OS
MacOS Ventura 13.5
IDE
Terminal
Packages
deeplake==3.9.26
Additional Context
No response
Possible Solution
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: