-
Notifications
You must be signed in to change notification settings - Fork 4
The Scroll routine appears to return incorrect data #4
Comments
The Zarr volume was normalized and quantized (16-bits to 8-bits) to reduce the size on disk. The zeros could be empty air that was clipped outside of the mean range representing other denser materials like papyrus. The precision loss hasn't affected ink detection or other downstream applications to my knowledge--however, retraining on the new dataset was required. **I'm paraphrasing from the Discord channel here. So it's not an issue in how the API accesses the data. There are zeros in the original dataset too, but are less frequent. ComparisonHere's a quick comparison between the two data sources in the region you highlighted. |
@bostelk - What version of python are you using ? On ubuntu 24.10, python2 would not completely install. I was able to the the install with python3, but only after putting in some fake links to python2 stuff. The data returned had the same holes as I found in the MacOS 12.7.3 version. Your histogram of the data returned looks very much like mine except that the holes are filled in and it has too many points. |
I'm sorry, my earlier comparison was a bad illustration. I had copied the image into an editor and the size was larger hence the wrong number of points and interpolated graph. I'm using Python 3. I opened the volume in a different viewer (https://dl.ash2txt.org/view/Scroll1) and the issue (???) is apparent there too. So I don't think it's caused by the reader but rather in how the volume was created/converted from a higher-precision dataset. I don't have a concrete explanation, only speculation. New Comparison |
Good catch! It seems that for some reason the quantization process ended up setting (only?) bit 3 to zero, so that all resulting numbers look like |
I have been complaining about this problem for a month and I see in the general discussion that people are going to fix the problem and redo some work - will the fix get into the C and Python routines ? |
Yes, and thanks for bringing attention to it. Those libraries pull from the same data source, so as soon as the volumes are updated on the server, both libraries will have the revised data. You may wish to clear the local cache, if you have one, to make sure you get the new versions. |
I ran a simple example -
import vesuvius
scroll = vesuvius.Volume("Scroll1")
img = scroll[1000,5000:5256,5000:5256]
binary_file=open("file1p.raw", "wb")
binary_file.write(img)
binary_file.close()
and I did a histogram of "file1p.raw" -
4704 58 41 46 56 37 57 53 0 0 0 0 0 0 0 0
57 45 47 58 53 52 60 63 0 0 0 0 0 0 0 0
66 72 45 73 58 75 68 75 0 0 0 0 0 0 0 0
95 79 84 102 83 112 103 106 0 0 0 0 0 0 0 0
137 141 148 162 181 176 216 202 0 0 0 0 0 0 0 0
379 328 341 418 405 435 478 526 0 0 0 0 0 0 0 0
891 908 964 1006 1077 1134 1190 1247 0 0 0 0 0 0 0 0
1783 1716 1741 1722 1774 1857 1917 1919 0 0 0 0 0 0 0 0
1852 1982 1911 1806 1816 1811 1709 1614 0 0 0 0 0 0 0 0
1336 1248 1219 1072 1131 1067 987 969 0 0 0 0 0 0 0 0
608 577 530 532 490 510 459 440 0 0 0 0 0 0 0 0
272 246 250 230 225 207 192 170 0 0 0 0 0 0 0 0
113 134 96 109 97 99 96 81 0 0 0 0 0 0 0 0
64 50 47 43 41 33 47 38 0 0 0 0 0 0 0 0
26 27 29 31 28 22 28 28 0 0 0 0 0 0 0 0
12 14 26 18 14 19 13 243 0 0 0 0 0 0 0 0
note that only 128 non zero values appear with gaps of 8 zeros in the results.
The vesuvius-c routines show the same problem.
The text was updated successfully, but these errors were encountered: