Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate CSV write error on rapids branch blocking job completion #478

Open
johnhbenetech opened this issue Mar 30, 2022 · 1 comment
Assignees

Comments

@johnhbenetech
Copy link
Member

johnhbenetech commented Mar 30, 2022

In testing rapids branch on UI and terminal, I experience the following Error preventing matches from being generated and also semantic search index from being prepared.

[2022-03-30 20:03:39,020: INFO] [luigi-interface] Running Worker with 1 processes
[2022-03-30 20:03:39,020: INFO] [luigi-interface] [pid 10041] Worker Worker(salt=146995060, workers=1, host=f90e167d6ad7, username=root, pid=10041) running   CondenseFingerprintsTask(config=Config(sources=SourcesConfig(root='data/', extensions=['mp4', 'ogv', 'webm', 'avi', 'flv', 'mkv'], hash_mode='file', hash_cache='data/representations/hashes'), repr=RepresentationConfig(directory='data/representations', storage_type=<StorageType.DETECT: 'detect'>), database=DatabaseConfig(use=True, uri='postgresql://postgres:admin@postgres:5432/videodeduplicationdb'), processing=ProcessingConfig(video_list_filename='video_dataset_list.txt', match_distance=0.75, filter_dark_videos=True, filter_dark_videos_thr=2, min_video_duration_seconds=3, detect_scenes=True, minimum_scene_duration=2, pretrained_model_local_path=None, frame_sampling=1, save_frames=False, keep_fileoutput=True), templates=TemplatesConfig(source_path='data/templates/', distance=0.07, distance_min=0.05, override=False, extensions=('png', 'jpg', 'jpeg')), security=SecurityConfig(master_key_path=None), file_storage=FileStorageConfig(directory='file-storage'), logging=LoggingConfig(file_path='./processing_error.log', file_format='[%(asctime)s: %(levelname)s] [%(name)s] %(message)s', file_level=<LogLevel.ERROR: 40>, console_format='[%(asctime)s: %(levelname)s] %(message)s', console_level=<LogLevel.INFO: 20>)), prefix=., fingerprint_size=500)
[2022-03-30 20:03:39,021: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frames
[2022-03-30 20:03:39,022: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frame_level
[2022-03-30 20:03:39,023: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_level
[2022-03-30 20:03:39,023: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_signatures
[2022-03-30 20:03:39,184: INFO] [task.CondenseFingerprintsTask] Reading existing condensed fingerprints
[2022-03-30 20:03:39,184: INFO] [task.CondenseFingerprintsTask] Loaded 0 previously condensed fingerprints
[2022-03-30 20:03:39,262: INFO] [task.CondenseFingerprintsTask] Collecting file-keys since the very beginning
[2022-03-30 20:03:39,415: INFO] [task.CondenseFingerprintsTask] Collected 444 file keys
[2022-03-30 20:03:39,415: INFO] [task.CondenseFingerprintsTask] Reading fingerprints
[2022-03-30 20:03:39,416: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frames
[2022-03-30 20:03:39,416: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frame_level
[2022-03-30 20:03:39,417: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_level
[2022-03-30 20:03:39,418: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_signatures
[2022-03-30 20:03:39,687: INFO] [task.CondenseFingerprintsTask] Creating ndarray with fingerprints
[2022-03-30 20:03:39,687: INFO] [task.CondenseFingerprintsTask] Creating file-keys DataFrame
[2022-03-30 20:03:39,688: INFO] [task.CondenseFingerprintsTask] Loaded 444 new fingerprints.
[2022-03-30 20:03:39,688: INFO] [task.CondenseFingerprintsTask] Writing 444 fingerprints to ['data/representations/condensed_fingerprints/condensed_fingerprints__2022_03_25_192103493531.npy', 'data/representations/condensed_fingerprints/condensed_fingerprints__2022_03_25_192103493531.files.csv']
[2022-03-30 20:03:41,690: ERROR] [luigi-interface] [pid 10041] Worker Worker(salt=146995060, workers=1, host=f90e167d6ad7, username=root, pid=10041) failed    CondenseFingerprintsTask(config=Config(sources=SourcesConfig(root='data/', extensions=['mp4', 'ogv', 'webm', 'avi', 'flv', 'mkv'], hash_mode='file', hash_cache='data/representations/hashes'), repr=RepresentationConfig(directory='data/representations', storage_type=<StorageType.DETECT: 'detect'>), database=DatabaseConfig(use=True, uri='postgresql://postgres:admin@postgres:5432/videodeduplicationdb'), processing=ProcessingConfig(video_list_filename='video_dataset_list.txt', match_distance=0.75, filter_dark_videos=True, filter_dark_videos_thr=2, min_video_duration_seconds=3, detect_scenes=True, minimum_scene_duration=2, pretrained_model_local_path=None, frame_sampling=1, save_frames=False, keep_fileoutput=True), templates=TemplatesConfig(source_path='data/templates/', distance=0.07, distance_min=0.05, override=False, extensions=('png', 'jpg', 'jpeg')), security=SecurityConfig(master_key_path=None), file_storage=FileStorageConfig(directory='file-storage'), logging=LoggingConfig(file_path='./processing_error.log', file_format='[%(asctime)s: %(levelname)s] [%(name)s] %(message)s', file_level=<LogLevel.ERROR: 40>, console_format='[%(asctime)s: %(levelname)s] %(message)s', console_level=<LogLevel.INFO: 20>)), prefix=., fingerprint_size=500)
Traceback (most recent call last):
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
    task_gen = self.task.run()
  File "/project/winnow/pipeline/luigi/condense.py", line 263, in run
    target.write(condensed, new_results_time)
  File "/project/winnow/pipeline/luigi/condense.py", line 206, in write
    condensed.file_keys_df.to_csv(keys_out)
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/core/generic.py", line 3466, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1105, in to_csv
    csv_formatter.save()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 257, in save
    self._save()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 262, in _save
    self._save_body()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 300, in _save_body
    self._save_chunk(start_i, end_i)
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 311, in _save_chunk
    libwriters.write_csv_rows(
  File "pandas/_libs/writers.pyx", line 55, in pandas._libs.writers.write_csv_rows
TypeError: write() argument must be str, not bytes
[2022-03-30 20:03:41,698: ERROR] [task_queue.tasks] Error occurred while executing luigi tasks: CondenseFingerprintsTask(config=Config(sources=SourcesConfig(root='data/', extensions=['mp4', 'ogv', 'webm', 'avi', 'flv', 'mkv'], hash_mode='file', hash_cache='data/representations/hashes'), repr=RepresentationConfig(directory='data/representations', storage_type=<StorageType.DETECT: 'detect'>), database=DatabaseConfig(use=True, uri='postgresql://postgres:admin@postgres:5432/videodeduplicationdb'), processing=ProcessingConfig(video_list_filename='video_dataset_list.txt', match_distance=0.75, filter_dark_videos=True, filter_dark_videos_thr=2, min_video_duration_seconds=3, detect_scenes=True, minimum_scene_duration=2, pretrained_model_local_path=None, frame_sampling=1, save_frames=False, keep_fileoutput=True), templates=TemplatesConfig(source_path='data/templates/', distance=0.07, distance_min=0.05, override=False, extensions=('png', 'jpg', 'jpeg')), security=SecurityConfig(master_key_path=None), file_storage=FileStorageConfig(directory='file-storage'), logging=LoggingConfig(file_path='./processing_error.log', file_format='[%(asctime)s: %(levelname)s] [%(name)s] %(message)s', file_level=<LogLevel.ERROR: 40>, console_format='[%(asctime)s: %(levelname)s] %(message)s', console_level=<LogLevel.INFO: 20>)), prefix=., fingerprint_size=500), write() argument must be str, not bytes
[2022-03-30 20:03:41,705: INFO] [luigi-interface] Informed scheduler that task   CondenseFingerprintsTask_Config_sources_S_500___8510f14eba   has status   FAILED
[2022-03-30 20:03:41,709: INFO] [luigi-interface] 
===== Luigi Execution Summary =====

Scheduled 5 tasks of which:
* 2 complete ones were encountered:
    - 1 ExifTask(...)
    - 1 SignaturesTask(...)
* 1 failed:
    - 1 CondenseFingerprintsTask(...)
* 2 were left pending, among these:
    * 2 had failed dependencies:
        - 1 AnnoyIndexTask(...)
        - 1 DBMatchesTask(...)
@johnhbenetech
Copy link
Member Author

@fsbatista confirmed not happening on development branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants