Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs #25

Merged
merged 11 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions INSTALL/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,24 @@ mkdir -p /var/log/reform
cp ./conf/supervisor*ini /etc/supervisord.d/
systemctl start supervisord
systemctl enable supervisord


# Create the following cron job
crontab -e
# add following command to the files, and save it.
0 1 * * * /bin/bash /home/reform/reformWeb/cleanup.sh

# Create local File folder for test site
mkdir -p /home/reform/reformWeb/staticData
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we add this to .gitignore? Especially when the static files are so large. Probably too large for github. We can investigate using the test data from the reform repo, https://github.com/gencorefacility/reform/tree/master/test_data.

cd /home/reform/reformWeb/staticData
# Create directory for reference sequences and upload Exampl Ref Sequences
# for example:
mkdir ref
cd ref
wget --no-check-certificate -nv ftp://ftp.ensembl.org/pub/release-88/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.toplevel.fa.gz
wget --no-check-certificate -nv ftp://ftp.ensembl.org/pub/release-88/gff3/mus_musculus/Mus_musculus.GRCm38.88.gff3.gz
# Create directory for inserted and up-down-seq
mkdir ../inserted
mkdir ../up-down-seq
# Please upload files to inserted and up-down-seq.
# And change the local files path in jobs.py line 97
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,20 @@ Installation script found at [`./INSTALL/install.sh`](https://github.com/gencore
- After submission, the data and files will be gathered and submitted to a message queue to run [*ref*orm](https://github.com/gencorefacility/reform)
- If sucessful or failure, an e-mail will be sent to the e-mail address provided

## Using the Test Site
Test site is designed to help developers and researchers test updates more easily and quickly by using local files and default populated addresses. Files used for testing need to be uploaded in advance according to the requirements in INSTALL/install.sh.

- Fill the form with the required parameters
* `email` Use the default test email address.
* `chrom` Default is 1.
* `position` User-provided.
* `upstream_fasta` If no file is uploaded, use the local upstream_fasta file.
* `downstream_fasta` If no file is uploaded, use the local downstream_fasta file.
* `in_fasta` If no file is uploaded, use the local in_fasta file.
* `in_gff` If no file is uploaded, use the local in_gff file.
* `ref_fasta` Use the example FTP link or the local ref_fasta file.
* `ref_gff` Use the example FTP link or the local ref_gff file.

## Troubleshooting

### Error or Unexepected behavior when submitting form
Expand All @@ -49,6 +63,30 @@ reform: started

### How to monitor logs
Logs are written to `/var/log/reform`. Most of the echo outs are controlled in `run.sh`. Edit as needed for debugging.
* `reform.err.log` Contains errors from Flask (e.g., errors in app.py) and ERROR, INFO logs from Gunicorn.
* `reform.out.log` Contains standard output from Gunicorn.
* `worker.out.log` Contains echo outputs from run.sh, indicating the current task of the worker.
* `worker.err.log` Contains errors from reform.py and FTP download records.

### Receive an Email with a Zip of an Empty Download Folder
An empty download folder means `reform` didn't finish, resulting in empty `result` and `download` folders. To debug this error, please check `/var/log/reform` for more information. Here are some common issues:

**In `worker.err.log`:**

1. **`OSError: [Errno 28] No space left on device`**:
- There is no free space on the server. Please clean up files in `/data/downloads/`, `/data/results/`, and `/data/uploads/`.
2. **`FileNotFoundError: [Errno 2] No such file or directory`**:
- Please check the corresponding folder, especially when using the test site.

**In `worker.err.log`:**

1. **`./run.sh: line XX: syntax error near unexpected token`**:
- An invalid input has been passed into `run.sh`, usually an invalid FTP link.

### Get Error Status Code when access reform web
Error status code usually represent the web service has not run as expectly.
1. `502 Bad Gateway`: This error typically occurs due to communication problems between servers. It might be helpful to check `app.py` for any unhandled routes.
2. `500 Internal Server Error`: This error indicates that the server encountered an unexpected condition that prevented it from fulfilling the request. Check `/var/log/reform/reform.err.log` for detailed information.

### Cannot SSH
SSH is only accessible on NYU VPN
Expand Down
42 changes: 22 additions & 20 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,14 @@
app.secret_key = 'development key'

UPLOAD_FOLDER = './uploads'
# The type of file that needs to be uploaded to the server by user.
UPLOAD_FILES = ['in_fasta', 'in_gff']
DOWNLOAD_FILES = ['ref_fasta', 'ref_gff']


# Route for submitting data on the production site.
@app.route('/', methods=['GET', 'POST'])
def submit():
form = SubmitJob(request.form)
# Validate user input according to the validation rules defined in forms.py.
if request.method == 'POST' and form.validate():
if (request.files['downstream_fasta'].filename or request.files['upstream_fasta'].filename) and request.form[
'position']:
Expand Down Expand Up @@ -62,9 +63,12 @@ def submit():
upload(target_dir, 'upstream_fasta')
upload(target_dir, 'downstream_fasta')

# (4) Send the job to the backend
# Connect to the Redis server and intial a queue
redis_conn = Redis()
q = Queue(connection=redis_conn, default_timeout=3000)

# Push job function and parameters into RQ
job = q.enqueue(redisjob, args=(target_dir,
timestamp,
request.form['email'],
Expand All @@ -79,16 +83,17 @@ def submit():
result_ttl=-1,
job_timeout=3000
)
# (5) Update record in the database and flush message on the user front-end
db_update(timestamp, "jobID", job.get_id())
flash(Markup('JOB ID: ' + job.get_id() + '<br>' +
"You'll receive an e-mail when job is done with download link"), 'info')
return render_template('form.html', form=form)

# test site
# Route for submitting data on the test site
@app.route('/test', methods=['GET', 'POST'])
def submit_test():

# Default in_fasta and in_gff
# Path for local files
DEFAULT_FILES = {
'ref_fasta': './staticData/ref/Mus_musculus.GRCm38.dna.toplevel.fa',
'ref_gff': './staticData/ref/Mus_musculus.GRCm38.88.gff3',
Expand All @@ -97,8 +102,8 @@ def submit_test():
'upstream_fasta': './staticData/up-down-seq/test-up.fa',
'downstream_fasta': './staticData/up-down-seq/test-down.fa'
}

form = Testjob(request.form) # test job
# Validate user input based on test site rule
form = Testjob(request.form)
if request.method == 'POST' and form.validate():
if (request.files['downstream_fasta'].filename or request.files['upstream_fasta'].filename) and request.form[
'position']:
Expand All @@ -108,11 +113,6 @@ def submit_test():
if not (request.files['downstream_fasta'].filename and request.files['upstream_fasta'].filename):
flash("Error: Must enter both upstream and downstream", 'error')
return redirect(url_for('submit'))
# # comment out the condition check, since we allowed default up/down stream fasta
# if not (request.files['downstream_fasta'].filename or request.files['upstream_fasta'].filename) and not \
# request.form['position']:
# flash("Error: You must provide either the position, or the upstream and downstream sequences.", 'error')
# return redirect(url_for('submit'))
else:
# User Submits Job #
# (1) Create unique ID for each submission
Expand All @@ -131,23 +131,22 @@ def submit_test():
if not verified:
return redirect(url_for('submit'))

# Upload Files to UPLOAD_DIR/timestamp/ and save the name into uploaded_files or use local files
# Choose to upload new files or use local files
if verified:
# Storing all files that will be passed to run.sh
uploaded_files = {}
for file_key in UPLOAD_FILES: # upload inserted files
for file_key in UPLOAD_FILES:
uploaded_files[file_key] = upload_test(target_dir, file_key, DEFAULT_FILES)
# set defualt None to up/down stream fasta
# Set defualt None to up/down stream fasta
for file_key in ['upstream_fasta', 'downstream_fasta']:
uploaded_files['upstream_fasta'] = None
uploaded_files['downstream_fasta'] = None


# Uploaded upstream/downstream files when position is not provided
if not request.form['position']:
# Handle case where position is not provided and upstream/downstream files are required
for file_key in ['upstream_fasta', 'downstream_fasta']:
uploaded_files[file_key] = upload_test(target_dir, file_key, DEFAULT_FILES)

# Replace Ref Sequence files with local file realpath
# Replace Ref Sequence with local path if example ftp detected
if request.form['ref_fasta'] == 'ftp://ftp.ensembl.org/pub/release-88/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.toplevel.fa.gz':
uploaded_files['ref_fasta'] = DEFAULT_FILES['ref_fasta']
else:
Expand All @@ -157,8 +156,9 @@ def submit_test():
else:
uploaded_files['ref_gff'] = request.form['ref_gff']

# Use same Redis for production site and test site
redis_conn = Redis() # initializes a connection to the default Redis server running on localhost
# (4) Send job to the backend
# Use the redis queue as same as production site
redis_conn = Redis()
q = Queue(connection=redis_conn, default_timeout=3000)

job = q.enqueue(redisjob, args=(target_dir,
Expand All @@ -176,11 +176,13 @@ def submit_test():
result_ttl=-1,
job_timeout=3000
)
# (5) Update record in the database and flush message on the user front-end
db_update(timestamp, "jobID", job.get_id())
flash(Markup('JOB ID: ' + job.get_id() + '<br>' +
"You'll receive an e-mail when job is done with download link"), 'info')
return render_template('form.html', form=form)

# Route for downloading result
@app.route('/download/<timestamp>')
def downloadFile(timestamp):
try:
Expand Down
68 changes: 0 additions & 68 deletions cleanup.py

This file was deleted.

30 changes: 30 additions & 0 deletions cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash

# This script is designed to manage and clean up old directories and files within the ./downloads folder.
# Its primary purpose is to delete directories and their contents that are older than 1 week (168 hours) to free up disk space
# and maintain a clean file system. This script will iterate through all the files in the ./downloads folder
# and delete all sub-folders and files if the folder was created more than 1 week ago. If there are files in this folder,
# the files will also be deleted one by one.

# Define path to the downloads directory
download_folder="./downloads/"
# cutoffMins = 1 week ago = 168 hours ago (in seconds)
cutoffMins=$((169 * 60)) # add 1 hour to handle edge case, which is job finished at 1:00 am.

# Traverse the download folder and remove directories older than cutoff time
# -mindepth 1: Excludes the top-level directory, includes only subdirectories
find "$download_folder" -mindepth 1 -type d -mmin +$((cutoffMins)) | while read -r dirpath; do
# echo "$dirpath"
if [ -d "$dirpath" ]; then
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
rm -rf "$dirpath"
# Captures the success of the rm command
result=$?
if [ $result -eq 0 ]; then
echo "Removed old directory: $dirpath - $timestamp"
else
echo "Error removing directory: $dirpath - $timestamp"
fi
fi
done

4 changes: 2 additions & 2 deletions forms.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

ALLOWED_EXTENSIONS = {'fa', 'gff', 'gff3', 'gtf', 'fasta', 'fna', 'tar', 'gz'}


# Use it for production site
class SubmitJob(Form):
email = StringField('Email Address',
description="When job is complete this e-mail will receive the download links",
Expand Down Expand Up @@ -74,7 +74,7 @@ class SubmitJob(Form):
InputRequired()
])

# form for test job
# Use it for test site
class Testjob(Form):
email = StringField('Email Address',
description="When job is complete this e-mail will receive the download links",
Expand Down
Loading