Tools for quickly normalizing image datasets for machine learning. I maintain a series of video tutorials on normalizing image datasets—many utilizing this set of scripts—on my YouTube page.
Note: If you’re installing this on a Mac, I highly recommend installing it alongside Anaconda. A video tutorial is available here.
git clone https://github.com/dvschultz/dataset-tools.git
cd dataset-tools
pip install -r requirements.txt
python dataset-tools.py --input_folder path/to/input/ --output_folder path/to/output/
You can view auto generated documentation in docs.md
--verbose
: Print progress to console.--input_folder
: Directory path to the inputs folder. Default:./input/
--output_folder
: Directory path to the outputs folder. Default:./output/
--process_type
: Process to use. Options:resize
,square
,crop
,crop_to_square
,canny
,canny-pix2pix
,scale
,crop_square_patch
,many_squares
Default:resize
--blur_type
: Blur process to use. Use with--process_type canny
. Options:none
,gaussian
,median
. Default:none
--blur_amount
: Amount of blur to apply (use odd integers only). Use with--blur_type
. Default:1
--max_size
: Maximum width or height of the output images. Default:512
--force_max
: forces the resize to the max size (by default--max_size
only scales down)--direction
: Paired Direction. For use with pix2pix process. Options:AtoB
,BtoA
. Default:AtoB
--mirror
: Adds mirror augmentation.--rotate
: Adds 90 degree rotation augmentation.--border_type
: Border style to use when using thesquare
process type Options:stretch
,reflect
,solid
(solid
requires--border-color
) Default:stretch
--border_color
: border color to use with thesolid
border type; use BGR values from 0 to 255 Example:255,0,0
is blue--height
: height of crop in pixels; use with--process_type crop
or--process_type resize
(when used withresize
it will distort the aspect ratio)--width
: width of crop in pixels; use with--process_type crop
or--process_type resize
(when used withresize
it will distort the aspect ratio)--shift_y
: y (Top to bottom) amount to shift in pixels; negative values will move it up, positive will move it down; use with--process_type crop
--shift_x
: x (Left to right) amount to shift in pixels; negative values will move it left, positive will move it right; use with--process_type crop
--file_extension
: file format to output Options:jpg
,png
Default:png
Remove duplicate images from your dataset
--absolute
: Use absolute matching. Default--avg_match
: average pixel difference between images (use with--relative
) Default:1.0
--file_extension
: file format to output Options:jpg
,png
Default:png
--input_folder
: Directory path to the inputs folder. Default:./input/
--output_folder
: Directory path to the outputs folder. Default:./output/
--relative
: Use relative matching.--verbose
: Print progress to console.
python dedupe.py --input_folder path/to/input/ --output_folder path/to/output/
python dedupe.py --input_folder path/to/input/ --output_folder path/to/output/ --relative
This tool produces randomized multi-scale crops. A video tutorial is here
--input_folder
: Directory path to the inputs folder. Default:./input/
--output_folder
: Directory path to the outputs folder. Default:./output/
--file_extension
: file format to output Options:jpg
,png
Default:png
--max_size
: Maximum width and height of the crop. Default:None
--min_size
: Minimum width and height of the crop. Default:1024
--resize
: size to resize crops to (ifNone
it will default tomin_size
). Default:None
--no_resize
: If set the crops will not be resized. Default:False
--verbose
: Print progress to console.
--file_extension
: file format to output Options:jpg
,png
Default:png
--verbose
: Print progress to console.--input_folder
: Directory path to the inputs folder. Default:./input/
--output_folder
: Directory path to the outputs folder. Default:./output/
--process_type
: Process to use. Options:sort
,exclude
Default:exclude
--max_size
: Maximum width or height of the output images. Default:2048
--min_size
: Minimum width or height of the output images. Default:1024
--min_ratio
: Ratio of image (height/width). Default:1.0
--exact
: Match to exact specs. Use--min_size
for shorter dimension,--max_size
for longer dimension
Sorts a folder of images into separate folders based on dominant color. It will check the image's dominant color against all colors passed in, so depending on your threshold level and specified colors, you may end up with duplicate images across folders (e.g. the same sunset image in the red folder, orange folder, and yellow folder).
-v, --verbose
: Print progress to console.-i, --input_folder
: Directory path to the inputs folder. Default:./input/
-o, --output_folder
: Directory path to the outputs folder. Default:./output/
-t, --threshold
: Threshold for color matching, lower values are more exact. Default:40
-c, --colors
: Comma-separated list of W3C color names to check against . Default:red,orange,yellow,green,blue,purple,black,white
--rgb
A single color of comma-separated RGB values (e.g. 128,255,30): . Default:None