Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason to set min-scene-len == TimeCodeValue("0.6s") as a default value #477

Open
awkrail opened this issue Jan 30, 2025 · 4 comments
Open

Comments

@awkrail
Copy link
Collaborator

awkrail commented Jan 30, 2025

Description:

Thank you for your work! I downloaded a video from here and applied HistDetector to it via both Python API and command line. I found that they outputs different scene cuts, and the parameter of min-scene-len is different. In Python API version, the default value =15 is set, but in command line version, value = TimeCodeValue("0.6s") is set in the code.
Could you tell me why min-scene-len is set to be TimeCodeValue("0.6s") as a default?
Example:

Python API

from scenedetect import detect, HistogramDetector
scene_list = detect('RoripwjYFp8_60.0_210.0.mp4', HistogramDetector())

Command line

scenedetect -i RoripwjYFp8_60.0_210.0.mp4 list-scenes detect-hist

Environment:
PySceneDetect: 0.6.5
Ubuntu

Media/Files:
Input video file:

RoripwjYFp8_60.0_210.0.mp4
@Breakthrough
Copy link
Owner

Breakthrough commented Jan 30, 2025

I should start this answer off by pointing out, the current default values were both chosen very emperically by myself or another developer. I haven't had the time unfortunately to find a good dataset to use to find an optimal value, but I am more than open to any suggestions for how we should approach that. Feedback on the default values is always welcome too.

As for why the values are different, right now the SceneDetector interface assumes frame numbers are ints. This is something that I think could and should be fixed so that it takes a FrameTimecode instead. However, in order to create a FrameTimecode you must have the video framerate. This means we cannot set the default value for min_scene_len in the API in seconds, as we do not know the FPS of the video we are going to process statically.

The config file/command line parser handles this by storing the value as a string until after the video is loaded. As for why the values are the way they are, I recall finding 0.6 seconds seemed like a good reasonable default between what one would consider strobing/flashing images (e.g. as in #35), and was using a 23.976 FPS video to test. I rounded the frame count up from 14.3 to 15 and set that as the default for the API. I can't recall if it's ever been changed since to be honest.


These are things I think should be fixed, namely:

  • SceneDetector classes should take FrameTimecode objects instead of integer frame numbers. The unit of time between frames is also potentially useful information for a detector.
  • FrameTimecode objects should be able to be created in the absence of a framerate. In the event an operation can't be calculated due to missing information (e.g. you want to add frames with seconds, but neither have a defined framerate), an error can be thrown.
  • Eventually, a better default for min_scene_len should be chosen more systematically, even if the end result is still somewhat empirical (e.g. a survey).

I'll make sure the above items at least have a Github issue filed against them before closing this out.


Lastly, how do you find the default values for your purpose? Do they work well or are there any changes you would make?

Thanks for the question, hope this helps!

@awkrail
Copy link
Collaborator Author

awkrail commented Jan 30, 2025

@Breakthrough
Thank you for quick and detailed response! I understand how the values are determined.

The config file/command line parser handles this by storing the value as a string until after the video is loaded. As for why the values are the way they are, I recall finding 0.6 seconds seemed like a good reasonable default between what one would consider strobing/flashing images (e.g. as in #35), and was using a 23.976 FPS video to test. I rounded the frame count up from 14.3 to 15 and set that as the default for the API. I can't recall if it's ever been changed since to be honest.

I applied different detectors to several videos, and most of cuts are accurate, so min_scene_len=15 seems to be a reasonable score. But as you pointed out, probably these values, including threshold, may be determined based on somehow statistical approach. I will investigate whether the suitable training datasets exist.

  • SceneDetector classes should take FrameTimecode objects instead of integer frame numbers. The unit of time between frames is also potentially useful information for a detector.
  • FrameTimecode objects should be able to be created in the absence of a framerate. In the event an operation can't be calculated due to missing information (e.g. you want to add frames with seconds, but neither have a defined framerate), an error can be thrown.
  • Eventually, a better default for min_scene_len should be chosen more systematically, even if the end result is still somewhat empirical (e.g. a survey).

It seems interesting. I will think about how to treat these issue.

Lastly, how do you find the default values for your purpose? Do they work well or are there any changes you would make?

Inspired by PySceneDetect, I am developing Shutoh, which is yet another scene detector written in C++20. This is my personal project, and to learn C++, I thought that implementing PySceneDetect in C++ is a good starting point. In addition, PySceneDetect is very great project, so I think about how to contribute to PySceneDetect.

@Breakthrough
Copy link
Owner

Inspired by PySceneDetect, I am developing Shutoh, which is yet another scene detector written in C++20. This is my personal project, and to learn C++, I thought that implementing PySceneDetect in C++ is a good starting point. In addition, PySceneDetect is very great project, so I think about how to contribute to PySceneDetect.

Very cool - I've wanted to do something similar for a while, but haven't had time to pursue it. It looks like you have made a lot of progress, that is great to see!

There are many API changes I wish I could make, but can't anymore (at least not without causing significant churn) due to how many packages depend on the scenedetect API. If you are curious, I can discuss with you further some of those changes if you think it would help your project. If you are only making a CLI tool though, the API is less important than the user interface :)

@awkrail
Copy link
Collaborator Author

awkrail commented Feb 1, 2025

Very cool - I've wanted to do something similar for a while, but haven't had time to pursue it. It looks like you have made a lot of progress, that is great to see!

Thank you very much! As discussed in SceneStats, my major concern is speed. PySceneDetect works well with reasonable speed for 2-3 mins videos (2.0 secs via APIs), but it takes much time for longer 10- mins videos. This motivates me to develop the Shutoh.

There are many API changes I wish I could make, but can't anymore (at least not without causing significant churn) due to how many packages depend on the scenedetect API. If you are curious, I can discuss with you further some of those changes if you think it would help your project. If you are only making a CLI tool though, the API is less important than the user interface :)

Thank you! I am very interested in it. Currently I am focusing on developing CLI tool, but am going to develop APIs in the future. I will join the PySceneDetect discord channel, and glad to talk with you on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants