Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Available Levels Fails with Large SVS Images: Not a JPEG file: starts with 0xff 0x11 #1759

Open
amrosado opened this issue Jan 7, 2025 · 5 comments

Comments

@amrosado
Copy link

amrosado commented Jan 7, 2025

When attempting to open an SVS in large using the openslide tile source fails when attempting to get available levels with an error:
'Not a JPEG file: starts with 0xff 0x11'

def _getAvailableLevels(self, path):
        """
        Some SVS files (notably some NDPI variants) have levels that cannot be
        read.  Get a list of levels, check that each is at least potentially
        readable, and return a list of these sorted highest-resolution first.

        :param path: the path of the SVS file.  After a failure, the file is
            reopened to reset the error state.
        :returns: levels.  A list of valid levels, each of which is a
            dictionary of level (the internal 0-based level number), width, and
            height.
        """
        levels = []
        svsLevelDimensions = self._openslide.level_dimensions
        for svslevel in range(len(svsLevelDimensions)):
            try:
                width, height = svsLevelDimensions[svslevel]
                self._openslide.read_region((0, 0), svslevel, (1, 1))

                level = {
                    'level': svslevel,
                    'width': svsLevelDimensions[svslevel][0],
                    'height': svsLevelDimensions[svslevel][1],
                }
                if level['width'] > 0 and level['height'] > 0:
                    # add to the list so that we can sort by resolution and
                    # then by earlier entries
                    levels.append((level['width'] * level['height'], -len(levels), level))
            except openslide.lowlevel.OpenSlideError as e:
                # excep = json.dumps({"exception": str(e)})
                # print(excep)
                self._openslide = openslide.OpenSlide(path)
        # sort highest resolution first.
        levels = [entry[-1] for entry in sorted(levels, reverse=True, key=lambda x: x[:-1])]
        # Discard levels that are not a power-of-two compared to the highest
        # resolution level.
        levels = [entry for entry in levels if
                  nearPowerOfTwo(levels[0]['width'], entry['width']) and
                  nearPowerOfTwo(levels[0]['height'], entry['height'])]
        return levels

If I comment out the read_region call then the tile source works without issue. Does read region here serve a specific functionality given how it uses openslide which have already created a patch for this error? I would like to suggest just commenting this line or rewriting this code so that it works with files above 4 GB. This is a breaking issue for my current project and it requires me manually changing this line of code whenever I update large image.

@manthey
Copy link
Member

manthey commented Jan 7, 2025

Do you have a sample file you can share that throws an error? We test and use this with files over 4 GB. What OS are you using and what version of openslide (are you using the wheels supplied by openslide or by large_image_wheels or by a library installed on your system)?

There are svs files that don't populate all levels (for instance, some of the TCGA images only have 20x, 5x, and 1.25x magnifications), and testing for this allows us to read these files more efficiently, so disabling this check would degrade or break read performance on those files.

@amrosado
Copy link
Author

amrosado commented Jan 7, 2025

I wasn't sure about the specific version given I built on another computer I'm no longer using. I downloaded the 4.0.0.6 binary from the openslide website (https://openslide.org/download/#binaries) and attempted running the code with this binary. It seems to be working now with the updated dll. I'm going to close this issue since I was able to get it working, but if I am able to reproduce it I will reopen.

If you look at this you can see that the code handles for the case when in large aperio files it can't be read specifically at 0,0 because it is partially overwritten which overrides the JPEG marker.

@amrosado amrosado closed this as completed Jan 7, 2025
@manthey
Copy link
Member

manthey commented Jan 7, 2025

Thanks. We run our CI tests against the latest version from openslide, so in this case it returns a tile (just the missing-tile tile, but it is still a tile that validates reading the level). It probably means we should check the version of openslide and if it is too old issue some sort of warning.

@amrosado
Copy link
Author

@manthey @dgutman I'm noticing the same problem when using large_image in the Deid application that we've been working with. In this particular environment we are compiling an exec for a python engine that runs as a backend for the application that should open the file with the openslide tile source. Is it possible for you to test a file above 4GB after compiling something with pyinstaller? Can a test case be added that checks if it works with this kind of file after being made into an executable with pyinstaller?

@amrosado amrosado reopened this Jan 14, 2025
@manthey
Copy link
Member

manthey commented Jan 15, 2025

The tile source might be using your system openslide in preference to the bundled openslide depending on how it resolves paths within how pyinstaller lays things out. We don't actually bundle large_image with pyinstaller in any test, so this would be some work -- but we'd be happy to add such a test if you have a bare-bones starting point for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants