You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I thought about considering videos of different length to be non-duplicates, but some people requested that the small clip in larger video be included so I just left it as is.
Is there a specific reason that you would prefer to not consider them? Are you getting lots of small clips that are included in larger videos marked as duplicates?
It's more that it creates a lot of false positives, and I imagine getting distance based on difference would be a nightmare in terms of processing power and programming it. So making it so length affects the distance should be a good compromise.
IMO the best implementation, if we want to base it on available data alone, would be only videos with the same number of frames and length are distance 0, only one matching (IE a gif with the same frames but slower so different time, or same length but twice the frames) are distance 2, and stuff that doesn't match either is distance 4. This keeps the clips of larger videos as positives but gives you degrees of closeness to filter out the same video from different sources. Resolution could be counted as a data point too, but one of the main purposes of deduplicating is picking the highest resolution. Maybe make resolution-frames-time distance 0 and frames-time distance 2, with only-one as 4 and neither as 6?
I find it odd that video length isn't taken into consideration- at least, videos with different length should be given greater dupe distance.
The text was updated successfully, but these errors were encountered: