-
Notifications
You must be signed in to change notification settings - Fork 22
Folder and disk allocation issues around priming #199
Comments
We should really update the docs for this. Sorry you are stuck. This is an unfortunate side-effect of us trying to make a standalone deploy simple. For a production install, you really need to attach volumes that are exclusively used for disk allocations. Putting data on the system disk has some issues as you have found. This ensures that:
The "free space" on a volume is calculated as its real size minus the size of all allocations. Many of those allocations may be larger than the consumed space, since pipeline steps allocate before adding data. So simply looking at space consumer doesn't work. It would seem possible to allow non-allocated data on the disk, and to track that dynamically. In practice the time it takes to measure disk consumption by directory is slow, and would slow the system considerably, and still be prone to race conditions. And still expose you to the two situations above. If you have just one large expensive disk with the software/OS, and really want to use it for your data too, we recommend making two partitions. This lets the OS enforce a hard boundary. If you need a fast fix, or really want to track non-GMS space dynamically in some way instead if partitioning:
The final solution is a band-aid though, and not recommended long-term, but will allow you to control things until you can re-partition. |
Pasting from #162 to keep this in the same thread """ While I have done as fresh an install as possible, with new drives, and managed to circumvent this - it's the other issues we have reported recently that have been real show-stoppers - particularly the fact that without us making direct changes to the "Creator.pm" code to invoke Volume.pm's "sync_total_kb()" method, our disk is considered to run out of space long before the drive actually fills up... It was this issue that caused us to have to rebuild our GMS setup altogether, along with issues around re-priming, and an unfortunate problem where GMS started writing its temporary files to "/home/ubuntu" rather than /tmp (due to an accessibility problem caused by replacing /tmp with a symlink, which we then reverted). Despite re-creating the original setup as best we could, and various attempts using environment variables and poking around in the code, we were unable to get GMS to write to /tmp any more, and every job would fail due to lack of space in /home/ubuntu. While I think we can consider this issue closed, I would very much appreciate your assistance with these other two - particularly as your colleague appears to have misunderstood our setup (we've been using an 80 Gb root drive, and two additional drives of 4 TB and 7 TB each, mounted at /tmp and /opt respectively, which we understand to be your recommended setup). Also, the replacement of /tmp with a symlink was done because we would keep getting disk space errors, and believed that perhaps consolidating to a single larger /opt disk and symlinking /tmp onto this drive would ensure that we never run out of temporary space due to a mismatch in size between the two volumes. While this was perhaps ill-advised, due to how GMS handles storage, it did bring to light the rather show-stopping issues that occur when /tmp becomes unavailable. Would very much appreciate your assistance with the /tmp and Volume.pm issues any time you can manage. |
We do have a instance of the SGMS here where
This seems to work fine as far as I can tell, when I look at workflow-server.err in a build directory for example it says Could you check and let us know what the permissions for your symlink and the destination folder under |
Not certain what they started as. During troubleshooting, I did a chmod a+rwx on the symlink - but perhaps it was too late. Once it had started writing to /home/ubuntu, it never stopped doing so, even when we restored the original /tmp partition (with full permissions for the ubuntu user, naturally). That's why we had to reinstall the whole machine: seemed to be no way to get it to use /tmp again. The destination folder under /opt was /opt/tmp-store, and similarly world-writeable. -- Liviu On 16 Jan 2016, at 7:50 am, Avi Ramu <[email protected]mailto:[email protected]> wrote: We do have a instance of the SGMS here where /tmp is a symlink, specifically it looks like this when I do a ls -lhd ls -lhd /tmp ls -lhd /opt/gms/tmp This seems to work fine as far as I can tell, when I look at workflow-server.err for example it says Could you check and let us know what the permissions for your symlink and the destination folder under /opt/ are? — NOTICE |
I wonder if something like |
Hi there,
We've had a great deal of trouble with GMS not re-assessing available folders and free space in the process of starting new runs. The issues we've discovered are twofold:
a) When the amount of free space available on a disk changes, GMS does not update its information.
We have noticed that GMS will doggedly abide by the "remaining space" measurement stored in its "Volume" object, as set when the machine was primed, even if actual space on device is greater than this.
In order to run anything, we have had to manually edit Creator.pm's _get_allocation_without_lock_impl() command, to invoke the Volume object's sync_total_kb() command in the for-loop, prior to checking for free space on candidate volumes.
b) When the /tmp folder is replaced with a symlink, GMS begins to write to the home directory instead.
We swapped the /tmp folder with a symlink to /opt/tmp-store, because our tmp disk seemed insufficient for the large amount of scratch space GMS appears to require. However, doing this caused it to switch its tmp directory to the user's home directory, causing some poor performance and root-drive filling issues.
We reverted our change, and the /tmp directory is back to normal, but the issue with /home/user being used in place of /tmp persists, and appears to be a configuration that cannot be changed by the end-user.
The temp directory appears to be set when the system is primed, and it seems re-priming or editing this config is impossible... but it's changed on us anyway, so we would like to know how to get into the database or other config-location to revert this change. We've checked in /etc/genome.conf, and the path to the temp directory is not stored there... would appreciate a solution.
Regards,
-- Liviu & Shu at Garvan
The text was updated successfully, but these errors were encountered: