Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added progress callback when save_all is used #7435

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions Tests/test_file_apng.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from io import BytesIO

import pytest

from PIL import Image, ImageSequence, PngImagePlugin
Expand Down Expand Up @@ -663,6 +665,36 @@ def test_apng_save_blend(tmp_path):
assert im.getpixel((0, 0)) == (0, 255, 0, 255)


def test_save_all_progress():
out = BytesIO()
progress = []

def callback(filename, frame_number, n_frames):
progress.append((filename, frame_number, n_frames))

Image.new("RGB", (1, 1)).save(out, "PNG", save_all=True, progress=callback)
assert progress == [(None, 1, 1)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe indicate index here (instead of None)? 1 for the 1st input image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be open to the idea of having an index all of the time? Or, we could pass back the image instance, and let the user figure out the index or filename as they see fit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having index always is probably simpler. As for passing the image, it wouldn't be much help with identifying the index if an image is reused in the input (like in that one test… come to think of it, having explicit index there would make adding a comment somewhat redundant).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to switch to an index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Why did you remove the filename though? I thought you were talking about adding an index, not removing the filename (as my own suggestion was to provide index when the filename isn't available)

Copy link

@LeXofLeviafan LeXofLeviafan Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that (index, filename, frame_number, frame_total) crosses the line from 'neat and tidy' to 'passing back everything to the user just in case something is useful'.

You can use a namedtuple instead (i.e. define Process = namedtuple('Process', "index filename frame n_frames") somewhere global, and pass its value to the progress callback).

And I'd say that the name of the input file being currently processed is a quite appropriate piece of information to be included, as it's the most user-friendly way of showing, well, which file is being processed (whether you're logging your progress to a terminal or displaying it in GUI on top of the progressbar).

Why make the most likely usecase inconvenient? The name of the game shouldn't be "provide as little information as possible and force the calling code to figure out the rest of it every single time", it should aim at the most likely general purpose case (which I imagine more often than not would be logging/displaying something in the lines of Converting: input/image2.png [#2] -> output.png (frame 12 of 84); and the only thing here that would be instantly available for the calling code without complication in any situation would be the output file name, as it's passed in as a raw string and never changes throughout the entire process).

I would have thought it was simple, given that the user just provided these images as the base image and append_images?

Suppose the code that wants to log progress isn't concerned with input at all – it only has a generator that supplies these images; why would it need to implement workarounds like instantiating all values from that generator at once, or wrapping it into a generator object that mucks around with the data being yielded, when all it wants to do is display progress on screen? And like I said, filename is the clearest way possible of identifying an image that was loaded from a file.

append_images are arbitrary PIL.Image.Image

I'm pretty sure these can be obtained with Image.open(), just like the first image.

why not pass just saved im object itself?

I suggested that, but it was pointed out that the image might be provided as an input more than once.

…which is not the case with filenames, I suppose :-)

The index can be necessary in case of duplicate files in the input, but the filename is likely to be necessary for the purpose of displaying the progress in user-friendly manner. (Now passing the entire image object, that would actually be "passing back everything to the user just in case something is useful" 😆)

I don't like passing indices since it's hard to figure out the difference between index and frame_number and it's very simple to write the code which uses index incorrectly.

Incidentally, passing progress status as a namedtuple (or the like) would help with that.

I'm not opposed to dropping index altogether - I think the key benefit of this feature is being able to track how many frames have been completed out of the total number.

Filename and/or index are necessary to show which image this current frame comes from, in case when more than one is on the input – it shouldn't be a stretch to expect it to be necessary information for the vast majority of usecases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that a filename or image index is necessary, but I've pushed a commit to restore the filename.

>>> from collections import namedtuple
>>> State = namedtuple("State", "a b")
>>> State(2, 4)[0]
2

Because namedtuples still have order, and I'm concerned that there could be different views about what the best order is, or that we might want to add more items in the future, I've used a dictionary instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keys are a bit overly verbose, but that should do the job at least.

Still, I believe having a single fixed class with predefined attributes would make for better API than using free-form dicts in every implementation (not to mention avoiding potential issues if there's a need to make changes at some point in the future). If you're concerned with possibility of using indices to access a namedtuple field (which I honestly find rather unlikely as attribute access is much clearer), you can use a @dataclass instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, it would be a good idea to have this getattr(im, "filename", None) thing in one place in the codebase (and if you keep the dict, it would still be a good idea to have one piece of code defining its creation, at least for sake of ensuring there won't be any discrepancies in key names).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to move the creation of the dictionary into a common method.


out = BytesIO()
progress = []

with Image.open("Tests/images/apng/single_frame.png") as im:
with Image.open("Tests/images/apng/delay.png") as im2:
im.save(
out, "PNG", save_all=True, append_images=[im, im2], progress=callback
)

assert progress == [
("Tests/images/apng/single_frame.png", 1, 7),
("Tests/images/apng/single_frame.png", 2, 7),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line could use a comment that single_frame.png gets appended to itself in the save() call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

come to think of it, having explicit index there would make adding a comment somewhat redundant

You no longer think this is necessary?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You no longer think this is necessary?

The comment was meant to clarify that the image is repeated because it occurs twice in the input; providing explicit indices makes pointing that out unnecessary.

("Tests/images/apng/delay.png", 3, 7),
("Tests/images/apng/delay.png", 4, 7),
("Tests/images/apng/delay.png", 5, 7),
("Tests/images/apng/delay.png", 6, 7),
("Tests/images/apng/delay.png", 7, 7),
]


def test_seek_after_close():
im = Image.open("Tests/images/apng/delay.png")
im.seek(1)
Expand Down
23 changes: 23 additions & 0 deletions Tests/test_file_gif.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,29 @@ def test_roundtrip_save_all_1(tmp_path):
assert reloaded.getpixel((0, 0)) == 255


def test_save_all_progress():
out = BytesIO()
progress = []

def callback(filename, frame_number, n_frames):
progress.append((filename, frame_number, n_frames))

Image.new("RGB", (1, 1)).save(out, "GIF", save_all=True, progress=callback)
assert progress == [(None, 1, 1)]

out = BytesIO()
progress = []

with Image.open("Tests/images/hopper.gif") as im:
with Image.open("Tests/images/chi.gif") as im2:
im.save(out, "GIF", save_all=True, append_images=[im2], progress=callback)

expected = [("Tests/images/hopper.gif", 1, 32)]
for i in range(31):
expected.append(("Tests/images/chi.gif", i + 2, 32))
assert progress == expected


@pytest.mark.parametrize(
"path, mode",
(
Expand Down
25 changes: 25 additions & 0 deletions Tests/test_file_mpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,3 +278,28 @@ def test_save_all():
# Test that a single frame image will not be saved as an MPO
jpg = roundtrip(im, save_all=True)
assert "mp" not in jpg.info


def test_save_all_progress():
out = BytesIO()
progress = []

def callback(filename, frame_number, n_frames):
progress.append((filename, frame_number, n_frames))

Image.new("RGB", (1, 1)).save(out, "MPO", save_all=True, progress=callback)
assert progress == [(None, 1, 1)]

out = BytesIO()
progress = []

with Image.open("Tests/images/sugarshack.mpo") as im:
with Image.open("Tests/images/frozenpond.mpo") as im2:
im.save(out, "MPO", save_all=True, append_images=[im2], progress=callback)

assert progress == [
("Tests/images/sugarshack.mpo", 1, 4),
("Tests/images/sugarshack.mpo", 2, 4),
("Tests/images/frozenpond.mpo", 3, 4),
("Tests/images/frozenpond.mpo", 4, 4),
]
31 changes: 28 additions & 3 deletions Tests/test_file_pdf.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import io
import os
import os.path
import tempfile
import time
from io import BytesIO

import pytest

Expand Down Expand Up @@ -169,6 +169,31 @@ def im_generator(ims):
assert os.path.getsize(outfile) > 0


def test_save_all_progress():
out = BytesIO()
progress = []

def callback(filename, frame_number, n_frames):
progress.append((filename, frame_number, n_frames))

Image.new("RGB", (1, 1)).save(out, "PDF", save_all=True, progress=callback)
assert progress == [(None, 1, 1)]

out = BytesIO()
progress = []

with Image.open("Tests/images/sugarshack.mpo") as im:
with Image.open("Tests/images/frozenpond.mpo") as im2:
im.save(out, "PDF", save_all=True, append_images=[im2], progress=callback)

assert progress == [
("Tests/images/sugarshack.mpo", 1, 4),
("Tests/images/sugarshack.mpo", 2, 4),
("Tests/images/frozenpond.mpo", 3, 4),
("Tests/images/frozenpond.mpo", 4, 4),
]


def test_multiframe_normal_save(tmp_path):
# Test saving a multiframe image without save_all
with Image.open("Tests/images/dispose_bgnd.gif") as im:
Expand Down Expand Up @@ -323,12 +348,12 @@ def test_pdf_info(tmp_path):

def test_pdf_append_to_bytesio():
im = hopper("RGB")
f = io.BytesIO()
f = BytesIO()
im.save(f, format="PDF")
initial_size = len(f.getvalue())
assert initial_size > 0
im = hopper("P")
f = io.BytesIO(f.getvalue())
f = BytesIO(f.getvalue())
im.save(f, format="PDF", append=True)
assert len(f.getvalue()) > initial_size

Expand Down
28 changes: 27 additions & 1 deletion Tests/test_file_tiff.py
Original file line number Diff line number Diff line change
Expand Up @@ -658,7 +658,7 @@ def test_palette(self, mode, tmp_path):
with Image.open(outfile) as reloaded:
assert_image_equal(im.convert("RGB"), reloaded.convert("RGB"))

def test_tiff_save_all(self):
def test_save_all(self):
mp = BytesIO()
with Image.open("Tests/images/multipage.tiff") as im:
im.save(mp, format="tiff", save_all=True)
Expand Down Expand Up @@ -688,6 +688,32 @@ def im_generator(ims):
with Image.open(mp) as reread:
assert reread.n_frames == 3

def test_save_all_progress(self):
out = BytesIO()
progress = []

def callback(filename, frame_number, n_frames):
progress.append((filename, frame_number, n_frames))

Image.new("RGB", (1, 1)).save(out, "TIFF", save_all=True, progress=callback)
assert progress == [(None, 1, 1)]

out = BytesIO()
progress = []

with Image.open("Tests/images/hopper.tif") as im:
with Image.open("Tests/images/multipage.tiff") as im2:
im.save(
out, "TIFF", save_all=True, append_images=[im2], progress=callback
)

assert progress == [
("Tests/images/hopper.tif", 1, 4),
("Tests/images/multipage.tiff", 2, 4),
("Tests/images/multipage.tiff", 3, 4),
("Tests/images/multipage.tiff", 4, 4),
]

def test_saving_icc_profile(self, tmp_path):
# Tests saving TIFF with icc_profile set.
# At the time of writing this will only work for non-compressed tiffs
Expand Down
30 changes: 27 additions & 3 deletions Tests/test_file_webp.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import io
import re
import sys
import warnings
from io import BytesIO

import pytest

Expand Down Expand Up @@ -102,10 +102,10 @@ def test_write_rgb(self, tmp_path):
def test_write_method(self, tmp_path):
self._roundtrip(tmp_path, self.rgb_mode, 12.0, {"method": 6})

buffer_no_args = io.BytesIO()
buffer_no_args = BytesIO()
hopper().save(buffer_no_args, format="WEBP")

buffer_method = io.BytesIO()
buffer_method = BytesIO()
hopper().save(buffer_method, format="WEBP", method=6)
assert buffer_no_args.getbuffer() != buffer_method.getbuffer()

Expand All @@ -122,6 +122,30 @@ def test_save_all(self, tmp_path):
reloaded.seek(1)
assert_image_similar(im2, reloaded, 1)

@skip_unless_feature("webp_anim")
def test_save_all_progress(self):
out = BytesIO()
progress = []

def callback(filename, frame_number, n_frames):
progress.append((filename, frame_number, n_frames))

Image.new("RGB", (1, 1)).save(out, "WEBP", save_all=True, progress=callback)
assert progress == [(None, 1, 1)]

out = BytesIO()
progress = []

with Image.open("Tests/images/iss634.webp") as im:
im2 = Image.new("RGB", im.size)
im.save(out, "WEBP", save_all=True, append_images=[im2], progress=callback)

expected = []
for i in range(42):
expected.append(("Tests/images/iss634.webp", i + 1, 43))
expected.append((None, 43, 43))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, saying that the last frame comes from image #​2 would be more informative than "don't know" – especially when there's multiple images involved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to switch to an index.

assert progress == expected

def test_icc_profile(self, tmp_path):
self._roundtrip(tmp_path, self.rgb_mode, 12.5, {"icc_profile": None})
if _webp.HAVE_WEBPANIM:
Expand Down
16 changes: 14 additions & 2 deletions src/PIL/GifImagePlugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@
# See the README file for information on usage and redistribution.
#

import itertools
import math
import os
import subprocess
Expand Down Expand Up @@ -578,10 +577,17 @@ def _write_multiple_frames(im, fp, palette):
duration = im.encoderinfo.get("duration")
disposal = im.encoderinfo.get("disposal", im.info.get("disposal"))

progress = im.encoderinfo.get("progress")
imSequences = [im] + list(im.encoderinfo.get("append_images", []))
if progress:
n_frames = 0
for imSequence in imSequences:
n_frames += getattr(imSequence, "n_frames", 1)

im_frames = []
frame_count = 0
background_im = None
for imSequence in itertools.chain([im], im.encoderinfo.get("append_images", [])):
for imSequence in imSequences:
for im_frame in ImageSequence.Iterator(imSequence):
# a copy is required here since seek can still mutate the image
im_frame = _normalize_mode(im_frame.copy())
Expand Down Expand Up @@ -611,6 +617,10 @@ def _write_multiple_frames(im, fp, palette):
# This frame is identical to the previous frame
if encoderinfo.get("duration"):
previous["encoderinfo"]["duration"] += encoderinfo["duration"]
if progress:
progress(
getattr(imSequence, "filename", None), frame_count, n_frames
)
continue
if encoderinfo.get("disposal") == 2:
if background_im is None:
Expand All @@ -624,6 +634,8 @@ def _write_multiple_frames(im, fp, palette):
else:
bbox = None
im_frames.append({"im": im_frame, "bbox": bbox, "encoderinfo": encoderinfo})
if progress:
progress(getattr(imSequence, "filename", None), frame_count, n_frames)

if len(im_frames) > 1:
for frame_data in im_frames:
Expand Down
15 changes: 13 additions & 2 deletions src/PIL/MpoImagePlugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
# See the README file for information on usage and redistribution.
#

import itertools
import os
import struct

Expand All @@ -42,6 +41,7 @@ def _save(im, fp, filename):


def _save_all(im, fp, filename):
progress = im.encoderinfo.get("progress")
append_images = im.encoderinfo.get("append_images", [])
if not append_images:
try:
Expand All @@ -50,11 +50,19 @@ def _save_all(im, fp, filename):
animated = False
if not animated:
_save(im, fp, filename)
if progress:
progress(getattr(im, "filename", None), 1, 1)
return

mpf_offset = 28
offsets = []
for imSequence in itertools.chain([im], append_images):
imSequences = [im] + list(append_images)
if progress:
frame_number = 0
n_frames = 0
for imSequence in imSequences:
n_frames += getattr(imSequence, "n_frames", 1)
for imSequence in imSequences:
for im_frame in ImageSequence.Iterator(imSequence):
if not offsets:
# APP2 marker
Expand All @@ -73,6 +81,9 @@ def _save_all(im, fp, filename):
else:
im_frame.save(fp, "JPEG")
offsets.append(fp.tell() - offsets[-1])
if progress:
frame_number += 1
progress(getattr(imSequence, "filename", None), frame_number, n_frames)

ifd = TiffImagePlugin.ImageFileDirectory_v2()
ifd[0xB000] = b"0100"
Expand Down
5 changes: 5 additions & 0 deletions src/PIL/PdfImagePlugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ def _save(im, fp, filename, save_all=False):
# catalog and list of pages
existing_pdf.write_catalog()

progress = im.encoderinfo.get("progress")
page_number = 0
for im_sequence in ims:
im_pages = ImageSequence.Iterator(im_sequence) if save_all else [im_sequence]
Expand Down Expand Up @@ -281,6 +282,10 @@ def _save(im, fp, filename, save_all=False):
existing_pdf.write_obj(contents_refs[page_number], stream=page_contents)

page_number += 1
if progress:
progress(
getattr(im_sequence, "filename", None), page_number, number_of_pages
)

#
# trailer
Expand Down
Loading