Skip to content

Commit

Permalink
ignore_name, skip_empty
Browse files Browse the repository at this point in the history
  • Loading branch information
e3rd committed Mar 13, 2024
1 parent d31fd3e commit 9b3249c
Show file tree
Hide file tree
Showing 9 changed files with 409 additions and 280 deletions.
35 changes: 21 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Works great when the files keep more or less the same name. (Photos downloaded f

You can impose the same file *mtime*, tolerate few hours (to correct timezone confusion) or ignore the date altogether.

Note: we ignore smaller than a second differences.

* The file size, the image hash or the video frame count.

The file must have the same size. Or take advantage of the media magic under the hood which ignores the file size but compares the image or the video inside. It is great whenever you end up with some files converted to a different format.
Expand Down Expand Up @@ -66,7 +68,7 @@ Warnings: 1
We found out all the files in the *duplicates* folder seem to be useless but one. It's date is earlier than the original one. See with full log.

```python3
Deduplidog("/home/user/duplicates", "/media/disk/origs", ignore_date=True, rename=True, set_both_to_older_date=True, logging_level=logging.INFO)
Deduplidog("/home/user/duplicates", "/media/disk/origs", ignore_date=True, rename=True, set_both_to_older_date=True, log_level=logging.INFO)
```

```
Expand Down Expand Up @@ -116,27 +118,30 @@ from deduplidog import Deduplidog

Or change these parameter from CLI or TUI, by launching `deduplidog`.

Find the duplicates. Normally, the file must have the same size, date and name. (Name might be just similar if parameters like strip_end_counter are set.) If media_magic=True, media files receive different rules: Neither the size nor the date are compared. See its help.
Find the duplicates. Normally, the file must have the same size, date and name. (Name might be just similar if parameters like strip_end_counter are set.) If `media_magic=True`, media files receive different rules: Neither the size nor the date are compared. See its help.

| parameter | type | default | description |
|-----------|------|---------|-------------|
| work_dir | str \| Path | - | Folder of the files suspectible to be duplicates. |
| original_dir | str \| Path | - | Folder of the original files. Normally, these files will not be affected.<br> (However, they might get affected by treat_bigger_as_original or set_both_to_older_date). |
| original_dir | str \| Path | - | Folder of the original files. Normally, these files will not be affected.<br> (However, they might get affected by `treat_bigger_as_original` or `set_both_to_older_date`). |
| **Actions** |
| execute | bool | False | If False, nothing happens, just a safe run is performed. |
| bashify | bool | False | Print bash commands that correspond to the actions that would have been executed if execute were True.<br> You can check and run them yourself. |
| affect_only_if_smaller | bool | False | If media_magic=True, all writing actions like rename, replace_with_original, set_both_to_older_date and treat_bigger_as_original<br> are executed only if the affectable file is smaller than the other. |
| rename | bool | False | If execute=True, prepend ✓ to the duplicated work file name (or possibly to the original file name if treat_bigger_as_original).<br> Mutually exclusive with replace_with_original and delete. |
| delete | bool | False | If execute=True, delete theduplicated work file name (or possibly to the original file name if treat_bigger_as_original).<br>Mutually exclusive with replace_with_original and rename. |
| replace_with_original | bool | False | If execute=True, replace duplicated work file with the original (or possibly vice versa if treat_bigger_as_original).<br>Mutually exclusive with rename and delete. |
| set_both_to_older_date | bool | False | If execute=True, media_magic=True or (media_magic=False and ignore_date=True), both files are set to the older date. Ex: work file get's the original file's date or vice versa. |
| treat_bigger_as_original | bool | False | If execute=True and rename=True and media_magic=True, the original file might be affected (by renaming) if smaller than the work file. |
| rename | bool | False | If `execute=True`, prepend ✓ to the duplicated work file name (or possibly to the original file name if treat_bigger_as_original).<br>Mutually exclusive with `replace_with_original` and `delete`. |
| delete | bool | False | If `execute=True`, delete theduplicated work file name (or possibly to the original file name if treat_bigger_as_original).<br>Mutually exclusive with replace_with_original and rename. |
| replace_with_original | bool | False | If `execute=True`, replace duplicated work file with the original (or possibly vice versa if treat_bigger_as_original).<br>Mutually exclusive with rename and delete. |
| set_both_to_older_date | bool | False | If `execute=True`, `media_magic=True` or (media_magic=False and `ignore_date=True`), both files are set to the older date. Ex: work file get's the original file's date or vice versa. |
| treat_bigger_as_original | bool | False | If `execute=True` and `rename=True` and `media_magic=True`, the original file might be affected (by renaming) if smaller than the work file. |
| skip_bigger | bool | False | If `media_magic=True`, all writing actions, such as `rename`, `replace_with_original`, `set_both_to_older_date` and `treat_bigger_as_original` are executed only if the affectable file is smaller (or the same size) than the other. |
| skip_empty | bool | False | Skip files with zero size. |
| neglect_warning | bool | False | By default, when a file with bigger size or older date should be affected, just warning is generated. Turn this to suppress it.|
| **Matching** |
| casefold | bool | False | Case insensitive file name comparing. |
| checksum | bool | False | If media_magic=False and ignore_size=False, files will be compared by CRC32 checksum. <br> (This mode is considerably slower.) |
| tolerate_hour | int \| tuple[int, int] \| bool | False | When comparing files in work_dir and media_magic=False, tolerate hour difference.<br> Sometimes when dealing with FS changes, files might got shifted few hours.<br> * bool → -1 .. +1<br> * int → -int .. +int<br> * tuple → int1 .. int2<br> Ex: tolerate_hour=2 → work_file.st_mtime -7200 ... + 7200 is compared to the original_file.st_mtime |
| ignore_date | bool | False | If media_magic=False, files will not be compared by date. |
| ignore_size | bool | False | If media_magic=False, files will not be compared by size. |
| checksum | bool | False | If `media_magic=False` and `ignore_size=False`, files will be compared by CRC32 checksum. <br> (This mode is considerably slower.) |
| tolerate_hour | int \| tuple[int, int] \| bool | False | When comparing files in work_dir and `media_magic=False`, tolerate hour difference.<br> Sometimes when dealing with FS changes, files might got shifted few hours.<br> * bool → -1 .. +1<br> * int → -int .. +int<br> * tuple → int1 .. int2<br> Ex: tolerate_hour=2 → work_file.st_mtime -7200 ... + 7200 is compared to the original_file.st_mtime |
| ignore_name | bool | False | Files will not be compared by stem nor suffix. |
| ignore_date | bool | False | If `media_magic=False`, files will not be compared by date. |
| ignore_size | bool | False | If `media_magic=False`, files will not be compared by size. |
| space2char | bool \| str | False | When comparing files in work_dir, consider space as another char. Ex: "file 012.jpg" is compared as "file_012.jpg" |
| strip_end_counter | bool | False | When comparing files in work_dir, strip the counter. Ex: "00034(3).MTS" is compared as "00034.MTS" |
| strip_suffix | str | False | When comparing files in work_dir, strip the file name end matched by a regular. Ex: "001-edited.jpg" is compared as "001.jpg" |
Expand All @@ -145,7 +150,9 @@ Find the duplicates. Normally, the file must have the same size, date and name.
| media_magic | bool | False | Nor the size or date is compared for files with media suffixes.<br>A video is considered a duplicate if it has the same name and a similar number of frames, even if it has a different extension.<br>An image is considered a duplicate if it has the same name and a similar image hash, even if the files are of different sizes.<br>(This mode is considerably slower.) |
| accepted_frame_delta | int | 1 | Used only when media_magic is True |
| accepted_img_hash_diff | int | 1 | Used only when media_magic is True |
| img_compare_date | bool | False | If True and media_magic=True, the file date or the EXIF date must match. |
| img_compare_date | bool | False | If True and `media_magic=True`, the work file date or the work file EXIF date must match the original file date (has to be no more than an hour around). |
| **Helper** |
| log_level | int | 30 (warning) | 10 debug .. 50 critical |

## Utils
In the `deduplidog.utils` packages, you'll find a several handsome tools to help you. You will find parameters by using you IDE hints.
Expand Down
159 changes: 53 additions & 106 deletions deduplidog/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,118 +2,65 @@
from dataclasses import fields
from typing import get_args

import click
from dataclass_click import dataclass_click
from textual import events
from textual.app import App, ComposeResult
from textual.containers import VerticalScroll
from textual.widgets import Checkbox, Footer, Input, Label
from click import MissingParameter

from .interface_utils import Field
from .deduplidog import Deduplidog


class CheckboxApp(App[None]):
CSS_PATH = "form.tcss"

BINDINGS = [
("up", "go_up", "Go up"),
("down", "go_up", "Go down"),
("ctrl+s", "confirm", "Run"), # ctrl/alt+enter does not work; enter does not work with checkboxes
("escape", "exit", "Exit"),
]

def compose(self) -> ComposeResult:
yield Footer()
self.inputs = INPUTS
with VerticalScroll():
for input in self.inputs:
if isinstance(input, Input):
yield Label(input.placeholder)
yield input
yield Label(input._link.help)
yield Label("")

def on_mount(self):
self.inputs[0].focus()

def action_confirm(self):
self.exit(True)

def action_exit(self):
self.exit()

def on_key(self, event: events.Key) -> None:
try:
index = self.inputs.index(self.focused)
except ValueError: # probably some other element were focused
return
match event.key:
case "down":
self.inputs[(index + 1) % len(self.inputs)].focus()
case "up":
self.inputs[(index - 1) % len(self.inputs)].focus()
case letter if len(letter) == 1: # navigate by letters
for inp_ in self.inputs[index+1:] + self.inputs[:index]:
label = inp_.label if isinstance(inp_, Checkbox) else inp_.placeholder
if str(label).casefold().startswith(letter):
inp_.focus()
break


class RaiseOnMissingParam(click.Command):
def __call__(self, *args, **kwargs):
return super(RaiseOnMissingParam, self).__call__(*args, standalone_mode=False, **kwargs)
from .tui import CheckboxApp, tui_state
from .cli import cli


@click.command(cls=RaiseOnMissingParam)
@dataclass_click(Deduplidog)
def cli(dd: Deduplidog):
return dd
from .helpers import Field
from .deduplidog import Deduplidog


def main():
global INPUTS

# CLI
try:
dd = cli()
if not dd: # maybe just --help
return
if input("See more options? [Y/n] ").casefold() not in ("", "y"):
sys.exit()
except click.MissingParameter:
# User launched the program without parameters.
# This is not a problem, we have TUI instead.
dd = None

# TUI
dog_fields: list[Field] = []
for f in fields(Deduplidog):
# CLI
try:
dog_fields.append(Field(f.name,
getattr(dd, f.name, f.default),
get_args(f.type)[0],
get_args(f.type)[1].kwargs["help"]))
except Exception as e:
# we want only documented fields, in case of an incorrenctly defined field, we do not let user to edit
continue
while True:
print("")
INPUTS = [f.get_widgets() for f in dog_fields]
if not CheckboxApp().run():
break
for form, field in zip(INPUTS, dog_fields):
field.value = form.value
try:
Deduplidog(**{f.name: f.convert() for f in dog_fields})
except Exception as e:
print("-"*100)
print(e)
input()
continue
if input("See more options? [Y/n] ").casefold() not in ("y", ""):
break
deduplidog = cli()
if not deduplidog: # maybe just --help
return
if input("See more options? [Y/n] ").casefold() not in ("", "y"):
sys.exit()
except MissingParameter:
# User launched the program without parameters.
# This is not a problem, we have TUI instead.
deduplidog = None

# TUI
dog_fields: list[Field] = []
for f in fields(Deduplidog):
try:
dog_fields.append(Field(f.name,
getattr(deduplidog, f.name, f.default),
get_args(f.type)[0],
get_args(f.type)[1].kwargs["help"]))
except Exception as e:
# we want only documented fields, in case of an incorrenctly defined field, we do not let user to edit
continue
tui_state.FOCUSED_I = 0
while True:
print("")
tui_state.INPUTS = [f.get_widgets() for f in dog_fields]
if not CheckboxApp().run():
break
for form, field in zip(tui_state.INPUTS, dog_fields):
field.value = form.value
try:
# if deduplidog:
# # To prevent full inicialization with the slow metadata refresh, we re-use the same object.
# [setattr(deduplidog, f.name, f.convert()) for f in dog_fields]
# deduplidog.perform()
# else:
deduplidog = Deduplidog(**{f.name: f.convert() for f in dog_fields})
except Exception as e:
print("-"*100)
print(e)
input()
continue
if input("See more options? [Y/n] ").casefold() not in ("y", ""):
break
except KeyboardInterrupt:
sys.exit()


if __name__ == "__main__":
main()
main()
15 changes: 15 additions & 0 deletions deduplidog/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import click
from dataclass_click import dataclass_click

from .deduplidog import Deduplidog


class RaiseOnMissingParam(click.Command):
def __call__(self, *args, **kwargs):
return super(RaiseOnMissingParam, self).__call__(*args, standalone_mode=False, **kwargs)


@click.command(cls=RaiseOnMissingParam)
@dataclass_click(Deduplidog)
def cli(dd: Deduplidog):
return dd
Loading

0 comments on commit 9b3249c

Please sign in to comment.