-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run, and document ,the CDMLite marine workflow #68
Comments
Here is a dump of what we know so far... Re-structuring the input filesThe first stage involves restructuring the input data so that the files are in a structure that can be directly loaded into PostgreSQL database partitions (using the
These tasks are run on the high-memory nodes (as defined in the script) and can take up 24 hours to complete. The restructuring task also includes various, functions, checks and fixes such as:
Generating SQL scriptsAfter restructuring the files, a separate script is run as a set of LOTUS jobs to generate SQL scripts that will load the restructured pipe-separated (PSV) files:
This writes a set of commands to load entire PSV files directly in to database partitions, e.g.:
Loading the data into the database partitionsThe loader script is run as a single process for marine data. It is spawned under "nohup" so that it will complete even if the SSH connection to the server is interrupted:
Log files are written to:
The logs can be analysed for any errors. A successful process will report the number of records copied into the database partition per PSV file, e.g.:
|
Test run, and document, the CDMLite marine workflow to:
The r3.0 marine files are located in:
The text was updated successfully, but these errors were encountered: