Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging ML #143

Merged
merged 13 commits into from
Jun 14, 2024
Merged

Debugging ML #143

merged 13 commits into from
Jun 14, 2024

Conversation

ntalluri
Copy link
Collaborator

No description provided.

@ntalluri
Copy link
Collaborator Author

I was seeing these messages pop up when I was running the ML post processing on pathlinker:

Running Docker containers
/opt/anaconda3/envs/spras/lib/python3.8/site-packages/sklearn/decomposition/pca.py:527: RuntimeWarning: invalid value encountered in divide
explained_variance_ratio
= explained_variance_ / total_var
/opt/anaconda3/envs/spras/lib/python3.8/site-packages/seaborn/matrix.py:615: UserWarning: Attempting to set identical left == right == 0 results in singular transformations; automatically expanding.
ax.set_xlim(0, max_dependent_coord * 1.05)
/opt/anaconda3/envs/spras/lib/python3.8/site-packages/scipy/cluster/hierarchy.py:2845: UserWarning: Attempting to set identical bottom == top == 0 results in singular transformations; automatically expanding.
ax.set_ylim([0, dvw])

However these messages describe what I would expect to see in the files, which I do see. So I think the code is working

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general strategy for fixing the errors looks good.

spras/analysis/ml.py Outdated Show resolved Hide resolved
spras/analysis/ml.py Outdated Show resolved Hide resolved
@ntalluri
Copy link
Collaborator Author

I've finished the review/comments of the ml bug pull request.

@ntalluri
Copy link
Collaborator Author

On VSCode, I am getting a weird highlighting/lighting issue where after the df is concatted in the ml summarize_networks function, the rest of the lines after it are shown as unreachable. However when I was testing the code, it does seem like it is reachable and the function is returning the dataframe. If you could look and see if anything weird is happening on your end when reviewing again, that would be great.

concated_df = pd.concat(edge_dataframes, axis=1, join='outer') # all lines of code after this line are "unreachable"
concated_df = concated_df.fillna(0)
concated_df = concated_df.astype('int64')

    # don't do ml post processing if there is an empty dataframe or the number of samples is <= 1
    if concated_df.empty:
        raise OSError("ML post-processing cannot proceed because the summarize network dataFrame is empty.\nWe suggest setting ml's include: true in the configuration file to false to avoid this error.")
    if min(concated_df.shape) <= 1:
        raise OSError (f"ML post-processing cannot proceed because the available number of pathways is insufficient. The ml post processing requires more than one pathway, but currently, there are only {min(concated_df.shape)} pathways.")

return concated_df

@ntalluri
Copy link
Collaborator Author

Ready for code review

@agitter agitter mentioned this pull request Jun 14, 2024
@agitter
Copy link
Collaborator

agitter commented Jun 14, 2024

On VSCode, I am getting a weird highlighting/lighting issue where after the df is concatted in the ml summarize_networks function, the rest of the lines after it are shown as unreachable. However when I was testing the code, it does seem like it is reachable and the function is returning the dataframe. If you could look and see if anything weird is happening on your end when reviewing again, that would be great.

concated_df = pd.concat(edge_dataframes, axis=1, join='outer') # all lines of code after this line are "unreachable"
concated_df = concated_df.fillna(0)
concated_df = concated_df.astype('int64')

    # don't do ml post processing if there is an empty dataframe or the number of samples is <= 1
    if concated_df.empty:
        raise OSError("ML post-processing cannot proceed because the summarize network dataFrame is empty.\nWe suggest setting ml's include: true in the configuration file to false to avoid this error.")
    if min(concated_df.shape) <= 1:
        raise OSError (f"ML post-processing cannot proceed because the available number of pathways is insufficient. The ml post processing requires more than one pathway, but currently, there are only {min(concated_df.shape)} pathways.")

return concated_df

I don't see this in PyCharm

@agitter agitter merged commit 1f59293 into Reed-CompBio:master Jun 14, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants