Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when adding new images to run #718

Open
ddobie opened this issue Jun 26, 2024 · 4 comments
Open

Error when adding new images to run #718

ddobie opened this issue Jun 26, 2024 · 4 comments
Labels
bug Something isn't working high priority An issue that needs to be prioritised.

Comments

@ddobie
Copy link
Contributor

ddobie commented Jun 26, 2024

Run is https://dev.pipeline.vast-survey.org/piperuns/77/, with log file 2024-06-26-05-36-59_log.txt

Traceback is

2024-06-26 07:01:03,006 runpipeline ERROR Processing error:
No objects to concatenate
Traceback (most recent call last):
  File "/usr/src/vast-pipeline/vast-pipeline-add-further-debug-logging/vast_pipeline/management/commands/runpipeline.py", line 340, in run_pipe
    pipeline.process_pipeline(p_run)
  File "/usr/src/vast-pipeline/vast-pipeline-add-further-debug-logging/vast_pipeline/pipeline/main.py", line 196, in process_pipeline
    sources_df = parallel_association(
  File "/usr/src/vast-pipeline/vast-pipeline-add-further-debug-logging/vast_pipeline/pipeline/association.py", line 1604, in parallel_association
    dd.from_pandas(images_df.set_index('skyreg_group'), npartitions=npartitions)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/base.py", line 292, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/base.py", line 575, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/multiprocessing.py", line 220, in get
    result = get_async(
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/local.py", line 508, in get_async
    raise_exception(exc, tb)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/local.py", line 221, in execute_task
    result = _execute_task(task, data)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/optimization.py", line 990, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
    return func(*args, **kwargs)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/dataframe/core.py", line 6350, in apply_and_enforce
    df = func(*args, **kwargs)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/dask/dataframe/groupby.py", line 170, in _groupby_slice_apply
    return g.apply(func, *args, **kwargs)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1414, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1455, in _python_apply_general
    values, mutated = self.grouper.apply(f, data, self.axis)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 761, in apply
    res = f(group)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1388, in f
    return func(g, *args, **kwargs)
  File "/usr/src/vast-pipeline/vast-pipeline-add-further-debug-logging/vast_pipeline/pipeline/association.py", line 1143, in association
    sources_df, skyc1_srcs = reconstruct_associtaion_dfs(
  File "/usr/src/vast-pipeline/vast-pipeline-add-further-debug-logging/vast_pipeline/pipeline/utils.py", line 1549, in reconstruct_associtaion_dfs
    measurements = pd.concat(
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 347, in concat
    op = _Concatenator(
  File "/usr/src/vast-pipeline/.venv/vast-pipeline-B4hVHlAK-py3.8/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 404, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
@ddobie ddobie added the bug Something isn't working label Jun 26, 2024
@ddobie
Copy link
Contributor Author

ddobie commented Jul 26, 2024

I've just tried to reproduce this in a local environment by running with data of the form

/import/ruby1/askap/PILOT/release/EPOCH0*/TILES/STOKESI_IMAGES/image.i.*VAST_0530-68*.fits

and then updating the config file to also add data of the form

/import/ruby1/askap/PILOT/release/EPOCH1*/TILES/STOKESI_IMAGES/image.i.*VAST_0530-68*.fits

I'm using fairly standard settings, with forced fitting turned off (which I thought may be the root cause of the issue given I haven't seen it occur for runs with forced fits.

I'm unable to reproduce the issue, so it may be something that only occurs with larger runs. I have added some more debug statements in #725 to hopefully make fixing this easier if it occurs again.

ddobie added a commit that referenced this issue Aug 9, 2024
* Add get_memory_usage function

* Rewrite memory usage functions

* Added memory usage logging to finalise.py

* Added logging to forced_extraction.py

* Added debug logging to loading.py

* Added debug logging to model_generator.py

* Added debug logging to main.py

* Added debug logging to new_sources.py

* Import psutil

* Fix truncated variable name

* Fixed logging in calculate_n_partitions

* Add some logging to partially address #718

* ?

* PEP8

* PEP8

* More PEP8

* PEP8 hell

* Updated changelog
@ddobie
Copy link
Contributor Author

ddobie commented Sep 18, 2024

Complete speculation, but I think this might be occurring when you add new images to the run, but not every field has a new image added. Will need to investigate further.

@ddobie ddobie added high priority An issue that needs to be prioritised. labels Sep 18, 2024
@ddobie
Copy link
Contributor Author

ddobie commented Sep 18, 2024

Investigated further and the above doesn't seem to be the explanation

@ajstewart
Copy link
Contributor

ajstewart commented Sep 19, 2024

This error only occurs when this list is empty or full of non-dataframe objects:

measurements = pd.concat(
[pd.read_parquet(f, columns=cols) for f in img_meas_paths]
)

So I would have a guess that img_meas_paths is empty. Were these log messages in place when this error happened?

logger.debug(images_df_done)
logger.debug(images_df_done['image_dj'])
# Get the parquet paths from the image objects
img_meas_paths = (
images_df_done['image_dj'].apply(lambda x: x.measurements_path)
.to_list()
)
logger.debug(img_meas_paths)

I'd possibly be suspicious of the images_df_done that is being passed in here:

image_mask = images_df['image_name'].isin(done_images_df['name'])
images_df_done = images_df[image_mask].copy()

Something corrupted or strange with the images in the large run that would cause that to not generate properly? Possibly none of the images are considered "done" so that list would be empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority An issue that needs to be prioritised.
Projects
Status: To do
Status: To do
Development

No branches or pull requests

2 participants