-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idr-testing May 2024 #696
Comments
Using today's build (23rd May) of OMEZarrReader:
Install mkngff on omeroreadwrite...
|
Setup screen and delete duplicate plate for idr0015
mkngff...
Fix idr0004 bfoptions (remove quick_read)
Repeat for all other studies... |
idr0010 (15:43...) |
BF cache memo generation...
Delete (move) cache
Generate target IDs...
Back on proxy server...
|
Unfortunately it seems that the cache generation is silently failing. Checked for
Ran again..
EDIT: (27th May)
Run again...
28th May: ...after a day -
Check a random output...
Not sure what's going on here. On completion (after OME 2024 meeting):
Only see "Permission denied" with
On the 27th (previous run) we see "Permission denied" for all servers:
|
@sbesson Any ideas what's causing those
Running again now...
EDIT:
But checking these in the webclient either viewed fine or showed spinner (no memo file)..? Except for idr0064... |
Another attempt (5th)...
EDIT: checking back a few days later - looks better: NB use
But actually, most of these are Dataset images. If we exclude those, find only
Also lots of
Many images with ResourceError are viewable in webclient.
Finds entries in logs on omeroreadwrite, including
But No Errors with this search. Also found 14
Found 708 errors like:
|
@sbesson - updated the previous comment above with more details of errors on the last attempt at memo file generation for NGFF data (1599 Plate Images and 44 Datasets). Summay:
|
From Seb:
Fixed by unmouting and re-mounting goofys:
Also check this is mounted on all servers:
|
6th attempt:
Immediately there are a bunch of
A bunch of
And some "communicators not destroyed"
No other errors initially...
Both of these are from idr0013 and are NOT NGFF data! Checking progress (12th June)...
Previous errors in stdout:
But lots more in stderr - errors (with counts)
|
session timeoutLet's try to log-in and set sessions timeout before running parallels
Hmmm - oh well.... Give another try anyway...
After few hours overnight - no errors except know 2 above - no
Some timeouts have changed to
Some "ok:", similar number as before (and NONE of these are Dataset Images). Still small portion of 1599 total Plates!
Lots of these - equally distributed between all 5 servers:
|
Looking at the logs for the source of these
But I haven't found yet a good explanation for what could cause these timeouts breaking the SSH connection |
#As omero-server, updated OMEZarrReader on all 5 idr-testing servers:
However, trying to view images in web (connected to omeroreadwrite:80) gave errors suggesting we're missing dependencies:
Reverting back to latest daily build...
|
Yesterday: ssh to each server in turn and ran
Later yesterday:
Today:
|
Found Created
|
Also did the same for idr0013 last 3 plates of idr0013.csv not converted (see above #696 (comment))
Checked all these Images - All are viewable in webclient except for idr0015 and idr0013 Plates fixed above - viewed to trigger memo file... |
Progress: current state...
Checking progress in webclient:
|
Need better way to monitor completed memo filesets. e.g.
On proxy server, make a list of all "ok" logs:
E.g. idr0010, check with Images are in ok logs - 80 / 148 are ok:
|
Updating iviewer to 0.14.0:
|
Team testing on idr-testing with microservices today was "slower than expected" with a few odd image rendering failures. Now, stopped memo generation and will target the completion of individual studies... Running individual Screens on different servers like this:
After a couple of minutes we have:
But also lots of
On Error counts are now:
9-fewer errors for omeroreadwrite and 9 more oks!
Also cancelled, delete logs and restarted other 4 readonly servers too... (12:35) |
15:39...
on omeroreadwrite, rename previous log dir so it doesn't show up in grep, start idr0125... - DONE for the 2 not ok images from idr0011a, these both had DatabaseBusy exceptions at 10:40
Both viewed OK in webclient - idr0011A DONE.
|
Overnight...
After few mins...
|
Start ALL plates memo file generation...
Start all fresh... in batches of 1500 rows with:
|
Cancel all memo generation (terminate screens) in prep for testing later this morning... Checking idr0012 logs on omeroreadonly-3..
Manually checked each in webclient. All viewed OK except for Checking idr0015 logs on omeroreadonly-4..
Took the Image:IDs not found in logs and put them in a file to run
Stuck on
Cancelled after ~ an hour (8:09) in prep for testing... |
Current status Summary:
|
Testing delayed until 1:30 today, so let's restart idr0015 last few on omeroreadonly-4...
Also run idr0011C and idr0011D on omeroreadonly-2 (4 plates, 8 plates).
All completed with "ok" (and checked in webclient). |
Restarting with e.g.:
|
idr-testing omeroreadonly-3. Testing started at 13:30 (12:30 GMT on logs):
First ERROR after that (using grep)...
with no other errors, Then....found a bunch of similar errors due to Database query times, starting at
|
Restart ALL memo file generation (includes idr0013A and idr0016 not completed)..
EDIT (2 days later) - I realise I should have used e.g. |
Later - still can't ssh to omeroreadonly-4
|
Briefly discussed with @khaledk2 @dominikl @jburel @pwalczysko, to reduce the number of possible collisions during the (heavy) memo file regeneration process, one option would be to stop all services that might be concurrently access the database i.e. |
Cancelled all memo generation.
[EDIT: all nodes ran the same ids.txt with 6762 plates - so, ~ 2624/6762 less than half complete] All of the readonly servers have the same ResourceErrors:
The file at Actually, for those 2 plates (5th and 7th of idr008), there are NO Seb: likely due to ome/bioformats#3806 |
idr0070 delete & reimportFollowing steps at #691 (comment) to fix idr0070...
Removed header and footers (4 rows) then...
Created
Done: logs checked as described
Re-annotate as at #691 (comment)
As omero-server
Needed to cache-bust with |
To check that rendering via s3 is working (not relying on goofys mount), tested on omeroreadwrite, unmounting and check that we can view image...
Able to view NGFF images (from idr0004) 👍
|
Checking available disk space on all servers ahead of today's testing...
|
idr0009 bfoptions updateFollowed instructions at #684 (comment) |
Following successful testing of completed NGFF studies this morning, let's resume memo file generation for ALL data...
Running (stopping Prometheus first) e.g.
Datasets... Split
After about 1 hour...
Check directories (useful for counting
|
idr0043 ResouceErrors:
Affecting Datasets from e.g 9101 -> 14734 - All seem to be idr0043: Stack trace``` 2024-06-26 13:20:46,228 DEBUG [ loci.formats.Memoizer] (.Server-15) start[1719408046201] time[26] tag[loci.formats.Memoizer.setId] 2024-06-26 13:20:46,228 ERROR [ ome.io.bioformats.BfPixelBuffer] (.Server-15) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2021-06/05/12-54-52.237/141893_A_1_1.tif 2024-06-26 13:20:46,228 ERROR [ ome.io.nio.PixelsService] (.Server-15) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2021-06/05/12-54-52.237/141893_A_1_1.tif java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79) at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124) at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898) at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:653) at ome.io.nio.PixelsService.getPixelBuffer(PixelsService.java:571) at ome.services.RenderingBean$12.doWork(RenderingBean.java:2205) at jdk.internal.reflect.GeneratedMethodAccessor319.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at ome.services.util.Executor$Impl$Interceptor.invoke(Executor.java:568) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.security.basic.EventHandler.invoke(EventHandler.java:154) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:282) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:249) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy101.doWork(Unknown Source) at ome.services.util.Executor$Impl.execute(Executor.java:447) at ome.services.util.Executor$Impl.execute(Executor.java:392) at ome.services.RenderingBean.getPixelBuffer(RenderingBean.java:2202) at ome.services.RenderingBean.load(RenderingBean.java:417) at jdk.internal.reflect.GeneratedMethodAccessor1342.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy122.load(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor1342.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:93) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy122.load(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor1417.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:172) at ome.services.throttling.Callback.run(Callback.java:56) at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56) at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:140) at ome.services.blitz.impl.RenderingEngineI.load_async(RenderingEngineI.java:316) at jdk.internal.reflect.GeneratedMethodAccessor1416.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at omero.cmd.CallContext.invoke(CallContext.java:85) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) at com.sun.proxy.$Proxy124.load_async(Unknown Source) at omero.api._RenderingEngineTie.load_async(_RenderingEngineTie.java:248) at omero.api._RenderingEngineDisp.___load(_RenderingEngineDisp.java:1223) at omero.api._RenderingEngineDisp.__dispatch(_RenderingEngineDisp.java:2405) at IceInternal.Incoming.invoke(Incoming.java:221) at Ice.ConnectionI.invokeAll(ConnectionI.java:2536) at Ice.ConnectionI.dispatch(ConnectionI.java:1145) at Ice.ConnectionI.message(ConnectionI.java:1056) at IceInternal.ThreadPool.run(ThreadPool.java:395) at IceInternal.ThreadPool.access$300(ThreadPool.java:12) at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:832) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) at java.base/java.util.Objects.checkIndex(Objects.java:374) at java.base/java.util.ArrayList.get(ArrayList.java:459) at loci.formats.MetadataList.get(MetadataList.java:121) at loci.formats.SubResolutionFormatReader.getCurrentCore(SubResolutionFormatReader.java:238) at loci.formats.FormatReader.getPixelType(FormatReader.java:735) at loci.formats.MetadataTools.populatePixels(MetadataTools.java:149) at loci.formats.MetadataTools.populatePixels(MetadataTools.java:116) at loci.formats.in.BaseTiffReader.initMetadataStore(BaseTiffReader.java:426) at loci.formats.in.SVSReader.initMetadataStore(SVSReader.java:669) at loci.formats.in.BaseTiffReader.initMetadata(BaseTiffReader.java:99) at loci.formats.in.BaseTiffReader.initFile(BaseTiffReader.java:610) at loci.formats.FormatReader.setId(FormatReader.java:1480) at loci.formats.ImageReader.setId(ImageReader.java:865) at ome.io.nio.PixelsService$3.setId(PixelsService.java:869) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692) at loci.formats.ChannelFiller.setId(ChannelFiller.java:258) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692) at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:317) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692) at loci.formats.Memoizer.setId(Memoizer.java:726) at ome.io.bioformats.BfPixelsWrapper.(BfPixelsWrapper.java:52) at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73) ... 82 common frames omitted 2024-06-26 13:20:46,229 INFO [ org.perf4j.TimingLogger] (.Server-15) start[1719408046193] time[35] tag[omero.call.exception] 2024-06-26 13:20:46,229 INFO [ ome.services.util.ServiceHandler] (.Server-15) Excp: ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-13/2021-06/05/12-54-52.237/141893_A_1_1.tif 2024-06-26 13:20:46,229 INFO [ org.perf4j.TimingLogger] (.Server-15) start[1719408046181] time[47] tag[omero.call.exception] ``` |
Current status: ok:
ResourceError
3 hours later:
|
6:25 am
10:32 am
No progress on
Last Image:9538241 to be processed took half an hour (1542 secs) to process:
Checking logs for one of those 10 images: https://idr.openmicroscopy.org/webclient/?show=image-9544982 finds only:
Blitz log is still active (server not dead)! Checking logs against input ids... e.g.
Directory counts:
|
|
Looks like
Took the next 10 Image IDs from above:
And viewed them all in webclient (connected to omeroreadwrite), e.g. - took about a minute for each to render (memo) Now let's run same again on
Seems stuck again.... but not quite |
Nearly 2 hours later...
Only a few more plates processed. We need ~100 - ~300 more processed on each of Need to stop processing of Datasets on Last 100 rows
rows 6350...
|
Restart
|
Overnight: looking good... All
|
Yesterday, started deleting the contents of
created above (old memo files). Ran in a screen, deleting parts of
|
Testing today across ALL data in idr-testing... Eventually we ran into slow-down, and finally
Initially, some images took a long time to load, possibly due to lack of memo files (didn't complete for Dataset Images): e.g. Seeing lots of these blocks repeating just now (omeroreadonly-4):
Looking at memo files created during testing window - > 9:30 BST is > 8:30 GMT found 56 entries in
Similar story on
e.g.
and a few other images too:
|
Many error messages like this from the NGINX error log file from different nodes 2024/06/28 11:08:26 [error] 1183#1183: *107253 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.2.190, server: test122-omeroreadonly-2.novalocal, request: "GET /webgateway/get_thumbnails/?id=8849964&id=8849508&id=8848993&id=8848857&id=8849556&id=8849157&id=8849484&id=8849596&id=8848937&id=8850072&id=8849153&id=8848901&id=8849029&id=8848737&id=8849716&id=8850000&id=8848881&id=8850104&id=8849141&id=8850040&id=8850236&id=8849640&id=8849365&id=8849856&id=8849868&id=8848849&id=8849213&id=8849800&id=8849676&id=8849297&id=8849728&id=8848797&id=8850208&id=8849193&callback=jQuery36201755339981256374_1719572434117&_=1719572434118 HTTP/1.0", upstream: "http://127.0.0.1:4080/webgateway/get_thumbnails/?id=8849964&id=8849508&id=8848993&id=8848857&id=8849556&id=8849157&id=8849484&id=8849596&id=8848937&id=8850072&id=8849153&id=8848901&id=8849029&id=8848737&id=8849716&id=8850000&id=8848881&id=8850104&id=8849141&id=8850040&id=8850236&id=8849640&id=8849365&id=8849856&id=8849868&id=8848849&id=8849213&id=8849800&id=8849676&id=8849297&id=8849728&id=8848797&id=8850208&id=8849193&callback=jQuery36201755339981256374_1719572434117&_=1719572434118", host: "idr-testing.openmicroscopy.org", referrer: "http://idr-testing.openmicroscopy.org/webclient/?show=project-1701" |
Need to generate memo files for Dataset Images, excluding idr0043...
After ~15 mins.. most images have memos so far....
|
Also need to regenerate memo files for idr0009 - reported slow (missing) during testing and seen as
|
Missed memo files for idr0009-ScreenB Now on idr-next... omeroreadwrite:
20 mins later...
70 Plates - 1 ResourceError checked OK in webclient. |
Steps needed on idr-next for NGFF upgrade.
NB: current checklist is for actions on idr-testing (newly redeployed on 21st May 2024)
Detailed workflow is at https://github.com/IDR/mkngff_upgrade_scripts but this is an outline, also includes study-specific jobs:
Manual Software updates (should be part of the original deployment for idr-next):
NGFF and other udpates:
$SECRET
, update all sql commands with it and run them (see https://github.com/IDR/mkngff_upgrade_scripts)mkngff symlink
on all studies, including bfoptions creationRegenerate thumbnails for idr0015 plate https://idr.openmicroscopy.org/webclient/?show=plate-4653Plate 130-16
, by going to an Image (e.g. http://localhost:1080/webclient/?show=image-3063425 and "Save To All"idr00XX.csv
andsql
files from Filesets to swap idr-utils#56 into separate repos, then review/merge the PR.The text was updated successfully, but these errors were encountered: