-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to wrapper to integrate with Bibiserv #62
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you replace sort_values with sort_index?
DataFrame.sort_values() is only available in Pandas 0.17.0 seemingly, and
the version in Ubuntu Trusty is 0.13.0. No backports are available either.
Could build the Docker image with a later Ubuntu release though.
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#changes-to-sorting-api
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.sort_values.html
https://stackoverflow.com/questions/19332171/difference-between-sort-values-and-sort-index
…On 2 August 2017 at 13:58, aweimann ***@***.***> wrote:
***@***.**** commented on this pull request.
Why would you replace sort_values with sort_index?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#62 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRBpGzoYdvoxwEV_Y9QUJaDrxxhzjEDks5sUGSDgaJpZM4OpiGX>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. The solution is to use bioconda / conda for installing the dependencies. Actually, I'm currently working on making a conda recipe for Traitar itself. Could you use that as well? https://conda.io/docs/install/quick.html
sample_table = pd.DataFrame([sample_file_names, sample_cat.loc[sample_file_names,]]) | ||
sample_table = pd.DataFrame(sample_file_names) | ||
categories = pd.Series(sample_cat.loc[sample_file_names, ]['category'].tolist()) | ||
sample_table['category'] = categories | ||
sample_table.columns = ["sample_file_name", "category"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the changes; I think I haven't really properly tested this one!
traitar/hmmer2filtered_best.py
Outdated
@@ -49,7 +49,7 @@ def aggregate_domain_hits(filtered_df, out_f): | |||
#sort by gene identifier and Pfam | |||
with open(out_f, 'w') as out_fo: | |||
ps.DataFrame(filtered_df.columns).T.to_csv(out_f, sep = "\t", index = False, header = False, mode = 'a') | |||
filtered_df.sort_values(by = ["target name", "query name"], inplace = True) | |||
filtered_df.sort_index(by = ["target name", "query name"], inplace = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you do this?
Bioconda - sounds good. So, we can just change all the deps in the
Dockerfile (inc. Traitar).
In traitar/traitar_from_archive.py
<#62 (comment)>:
2nd argument for DF was a DF and not a list
In traitar/hmmer2filtered_best.py
<#62 (comment)>:
As discussed - only available in Pandas 0.17.0
…On 2 August 2017 at 14:14, aweimann ***@***.***> wrote:
***@***.**** commented on this pull request.
Oh I see. The solution is to use bioconda / conda for installing the
dependencies. Actually, I'm currently working on making a conda recipe for
Traitar itself. Could you use that as well?
------------------------------
In traitar/traitar_from_archive.py
<#62 (comment)>:
> #replace index with cleaned file names
sample_cat.index.rename(str, dict([(tf, sfn) for sfn, tf in zip(sample_file_names, namelist)]))
- sample_table = pd.DataFrame([sample_file_names, sample_cat.loc[sample_file_names,]])
+ sample_table = pd.DataFrame(sample_file_names)
+ categories = pd.Series(sample_cat.loc[sample_file_names, ]['category'].tolist())
+ sample_table['category'] = categories
sample_table.columns = ["sample_file_name", "category"]
thanks for the changes; I think I haven't really properly tested this one!
------------------------------
In traitar/hmmer2filtered_best.py
<#62 (comment)>:
> @@ -49,7 +49,7 @@ def aggregate_domain_hits(filtered_df, out_f):
#sort by gene identifier and Pfam
with open(out_f, 'w') as out_fo:
ps.DataFrame(filtered_df.columns).T.to_csv(out_f, sep = "\t", index = False, header = False, mode = 'a')
- filtered_df.sort_values(by = ["target name", "query name"], inplace = True)
+ filtered_df.sort_index(by = ["target name", "query name"], inplace = True)
Why would you do this?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#62 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRBpKPLLtKx5ytB1QR9qAj_iVosGPSiks5sUGgegaJpZM4OpiGX>
.
|
No description provided.