Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider video length #68

Open
DuendeInexistente opened this issue Sep 3, 2024 · 2 comments
Open

Consider video length #68

DuendeInexistente opened this issue Sep 3, 2024 · 2 comments

Comments

@DuendeInexistente
Copy link

I find it odd that video length isn't taken into consideration- at least, videos with different length should be given greater dupe distance.

@ianwal
Copy link
Collaborator

ianwal commented Sep 28, 2024

I thought about considering videos of different length to be non-duplicates, but some people requested that the small clip in larger video be included so I just left it as is.

Is there a specific reason that you would prefer to not consider them? Are you getting lots of small clips that are included in larger videos marked as duplicates?

@DuendeInexistente
Copy link
Author

It's more that it creates a lot of false positives, and I imagine getting distance based on difference would be a nightmare in terms of processing power and programming it. So making it so length affects the distance should be a good compromise.

IMO the best implementation, if we want to base it on available data alone, would be only videos with the same number of frames and length are distance 0, only one matching (IE a gif with the same frames but slower so different time, or same length but twice the frames) are distance 2, and stuff that doesn't match either is distance 4. This keeps the clips of larger videos as positives but gives you degrees of closeness to filter out the same video from different sources. Resolution could be counted as a data point too, but one of the main purposes of deduplicating is picking the highest resolution. Maybe make resolution-frames-time distance 0 and frames-time distance 2, with only-one as 4 and neither as 6?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants