-
Notifications
You must be signed in to change notification settings - Fork 648
Google Season of Docs
NumFocus will apply as an umbrella org for Google Season of Docs. MDAnalysis is interested in the program. As part of the NumFocus application we need to contribute the following (by April 14).
- NumFocus: April 14
- https://developers.google.com/season-of-docs/docs/timeline
-
List previous experience with technical writers or documentation: If you or any of your mentors have worked with technical writers before, or have developed documentation, please mention this. Describe the documentation that you produced and the ways in which you worked with the technical writer. For example, describe any review processes that you used, or how the technical writer's skills were useful to your project. Explain how this previous experience may help you to work with a technical writer in Season of Docs.
- We have not had any technical writers work on the MDAnalysis docs.
-
Summarize your project's approach to documentation work: How does new documentation get written? How does existing documentation get reviewed and/or updated? How are documentation tasks shared among contributors? Is there anyone in the project in charge of documentation?
- Developers write Python doc strings (API docs) and, where applicable, module-level overview and example docs.
- During code review on a pull request, docs are also reviewed.
- Docs are updated when code is updated or in response to user comments. Missing or outdated docs are logged on the issue tracker and tagged with the Component-Docs label.
- API docs are reasonably complete because developers have to write them for all public-facing code according to our documentation guidelines. PRs are not merged without docs.
- Documentation writers should follow the Writing Documentation style guide and rules.
- Core developers share the responsibility for the documentation.
-
Let us know whether you have participated in GSoC before. If yes, describe your achievements in that program and explain how this experience may influence the way you work in Season of Docs.
- MDAnalysis has continually participated in GSoC since 2016 (first under the PSF and then under the NumFOCUS umbrella). We mentored 5 students so far who all successfully completed the program and all contributed useful features to the project (2018: 2 students, see Release 0.19.x; 2017: 1 student, see Release 0.17.0; 2016: 2 students, see release 0.16.0).
- Our mentors have experience in introducing students to the code base and know how to explain the scientific motivation for the project to students who are not involved in molecular simulations. The latter is important for GSoD because a technical writer needs to understand what motivates our users and what they are looking for. We will teach the writer how to use MDAnalysis (with the help of our basic tutorial, our Workshop materials and example Jupyter notebooks (also available as live binder notebooks) so that she or he is in a better position to evaluate existing examples and collaborate with the developers to develop better examples. We know how important it is to maintain ongoing communication and will be responsive via issue tracker/email/zoom teleconference.
- We know that new developers need to have some guidelines on what the rules are that everyone else follows. We have therefore created a developer documentation on the wiki, including the Code Style Guide and documentation guidelines, which also describe how to build and preview the documentation.
Please fill out your ideas list describing in some detail the project(s) that you envision a technical writer could work on during the program: Similar to GSoC, the quality of the ideas list will matter most in terms of whether we are accepted. We recommend you focus on one or two ideas and give detailed information on them. Empty .md docs have been created already; find your project's here: https://github.com/numfocus/gsod/tree/master/2019 Remember: most likely, the writer you get will initially not be familiar with your project. You will need to teach them the basics to use it and start the documentation work. Keep this in mind when composing your ideas list. Google provides examples of ideas in the instructions: https://developers.google.com/season-of-docs/docs/project-ideas
The list of ideas for GSoD is maintained at https://github.com/numfocus/gsod/blob/master/2019/MDAnalysis_ideas_list.md (perform a PR).
Name: MDAnalysis
Description: Scientific Python library for the analysis of molecular dynamics simulations in biophysics, chemistry, materials sciences.
website: https://www.mdanalysis.org/
repository: https://github.com/MDAnalysis/mdanalysis
email: [email protected] (all developers)
mentors: Oliver Beckstein [email protected] (primary contact) and Richard Gowers ([email protected])
Project name: Make it easy for new users to analyze their data
Description: MDAnalysis is a Python library that provides an abstract and object-oriented interface to data from particle-based simulations (primarily molecular dynamics simulations), which are widely used for simulating diverse systems such as the interaction of drugs with biomolecules or new materials. MDAnalysis is widely used in the scientific community and is written by scientists for scientists. Feedback from our users indicates that they like using MDAnalysis but wished that the documentation were easier to read and had more examples. The docs for scikit-learn and the PyTorch tutorials are generally cited as excellent examples of documentation and, taking these as examples, we would like to improve our documentation to make it more accessible and more immediately useful for new users.
At the moment, the primary sources of information for users are
- the package documentation https://www.mdanalysis.org/docs
- the basic tutorial
- the most recent scientific article on MDAnalysis: R. J. Gowers et al. MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations. In S. Benthall and S. Rostrup, editors, Proceedings of the 15th Python in Science Conference, pages 98-105, Austin, TX, 2016. SciPy, doi:10.25080/majora-629e541a-00e.
- example Jupyter notebooks (also available as live binder notebooks)
- Workshop materials
- two videos from conference presentations
We identified two areas for improvement (in rank order of priority):
We want to restructure our docs for user-friendliness issue #1175 and refactor docs away from how the source code is organized into how the user interacts with the code (started in PR #1827). We envision a split into three major blocks:
- introduction with examples (more like a tutorial) and explanation of the underlying principles and guiding concepts (see the 2016 MDAnalysis paper (doi:10.25080/majora-629e541a-00e), SciPy 2016 talk and the slides in the presentation scipy-MDAnalysis-Beckstein.pdf, which all outline the fundamentals)
- API docs (similar to the majority of the current docs at https://www.mdanalysis.org/docs/)
- developer docs (notes for developers, can be technical/arcane – e.g., some material from the wiki, details of the fundamental data structures, notes on file formats)
The current documentation is part of the code base and consists of:
- Python doc strings that are directly embedded in the code and associated with functions, classes, methods, attributes, and constants. Many modules also directly contain overviews and examples.
- Pages in the
doc/sphinx/source
directory, which consists of documents that combine multiple modules or give more general overviews. The documentation is written in restructured text and automatically processed with sphinx. As part of the continuous integration process, it is tested that these docs build correctly. Docs from the latest build are automatically and immediately published in HTML format as the "development docs" at https://www.mdanalysis.org/mdanalysis/.
We would like to maintain the ability to automatically build the docs and continue working in the sphinx framework outlined above. A technical writer would be trained in working with our current development process where changes to this documentation would be handled like other changes to the code base. This means that the writer would use git for version control and submit pull requests to the GitHub repository. As part of the standard review process, the mentors (and other developers and community members) would give rapid feedback on the contribution of the writer. Once a PR is approved, it will be merged and the docs will be autogenerated and immediately available.
We have one "official" introductory tutorial and various other tutorials but it is initially confusing to new users what they should look at and it is too long. We need to provide a better "road map" for new users and clearly lay out tutorials for different levels and with clear learning goals.
We need to split the current MDAnalysis Tutorial into multiple self-contained tutorials and sort them by level (introductory, intermediate, advanced). The tutorials can and should build on each other. There should be a top level entry point that gives an overview over the tutorials. An initial outline would contain the following (not all content exists yet, especially at intermediate/advanced level):
- Introductory level
- Installation: installing MDAnalysis and testing trajectories (MDAnalysisTests for simple examples, MDAnalysisData for advanced examples)
-
Basic trajectory analysis: Loading data into a
Universe
, selecting atoms withUniverse.select_atoms()
as anAtomGroup
, iterating through a trajectory, getting positions fromAtomGroup.positions
, and using numpy operations to calculate observables of interest from the positions. -
Using analysis tools in
MDAnalysis.analysis
: Performing common analysis tasks such as RMSD calculation and fitting, hydrogen bond analysis, or dihedral analysis using the common analysis classes. -
Working with AtomGroups: introduction to some often used methods of
AtomGroup
and how to work with multiple AtomGroups; slicing and fancy indexing ofAtomGroup
. -
Writing trajectories: difference between "trajectories" and "single frame" file formats; standard code pattern for writing trajectories or single frames; writing single frames directly with
AtomGroup.write()
- Intermediate level
- Selections (requires Basic trajectory analysis and Working with AtomGroups): in-depth tutorial on the selection language; dynamically updating selections
-
Working with Groups (requires Working with AtomGroups): The "container" hierarchy (
Universe
>Segment
>Residue
>Atom
) and the corresponding groupsSegmentGroup
,ResidueGroup
,AtomGroup
: commonalities and differences, aggregating methods. How to work withfragments
ormolecules
. - Writing selections: outputting selections for other codes
- Working with topology information: introduction to the topology system; how to work with bonds; identify bonded atoms; working with angles and dihedrals; selections by type
- Applying on-the-fly transformations: A unique capability of MDAnalysis are trajectory transformations that change the trajectory while it is being read and so avoid generating intermediate files that are needed with other analysis packages. This tutorial would be based on the notebook on-the-fly-transformations.ipynb.
- In-memory trajectories: how to use the MemoryReader to speed up analysis or generate temporary reduced system trajectories for analysis (see, e.g., Workshop notebook trajectory_magic.ipynb)
- Visualization in notebooks with NGLView: how to use nglview with MDAnalysis (see Workshop notebook Visualisation_with_NGLView.ipynb and binder notebook nglview_drawframes.ipynb)
- Advanced level
- System building (requires Working with topology information): how to add atoms or bonds or create simple topologies from scratch; generating initial coordinates
- Extending file reading with own code (requires System building): write a Reader for once own custom file format and dynamically add it to MDAnalysis
- Write your own analysis class: shows how to leverage the MDAnalysis.analysis.AnalysisBase class to create feature-full custom analysis tools.
For this and other documents we want to start adding example Jupyter notebooks (such as the first few example notebooks) to our sphinx-based restructured text documentation via the nbsphinx extension.
We also want to include more diagrams, pictures, and graphs to make clearer what the relationships between different parts of the code are and what output might look like.
The project suggestions below will not be included in the first proposal so that we can focus on our two initial priorities and present a stronger proposal.
We want to create a library of short how-tos (mini-tutorials) that quickly demonstrate how to obtain a specific result. The ideal format are Jupyter Notebooks with runnable code but reading notebooks can be cumbersome on the web. Therefore, the notebooks will be integrated with normal documentation in restructured text.
We already have a few example notebooks that can run on Binder but we are not covering a sufficient range of examples, and examples might be outdated. Ideally we would regularly test if the notebooks still work.
- Developers need to identify a small number of FAQ cases that would make for good small self-contained how-tos
- Initial selection of examples
- B-factor coloring
- PDB file manipulation (e.g., adding ChainIDs)
- selections with updating AtomGroups
- trajectory conversion
- RMSD analysis
- RMSF analysis
- dihedral analysis
- Jupyter notebooks need to be written that solve these problems. These notebooks should use example data and should be executable on binder. They can use either data files from MDAnalysisTests or MDAnalysisData.
- For each case, a short how-to document needs to be written (in restructured text and processed with sphinx) that
- links to installation instructions and introductory tutorials (basically: all the things one needs to know)
- integrates the notebook in the documentation via the nbsphinx extension so that there's no need to leave the docs and switch to a notebook viewer.
Loosely connected to the introductory section of the tutorial, we would like to compile a more extensive and in-depth document on the algorithms and data structures. This would be more scholarly and could conceivably be expanded to a paper on fundamentals of analyzing molecular dynamics simulations. It would require more diagrams and graphs and more emphasis on citations. Topics to be included:
- algorithms for distance calculations, including treatment of periodic boundaries: difference between
distance_array
andself_distance_array
as well ascapped_distance_array
; howcalc_bonds
works. - Treatment of periodic boundaries in different contexts; wrapping/unwrapping of molecules
- algorithms for commonly performed analyses
- hydrogen bonds and hydrogen bond correlations
- RMSD
- RMSF
- contacts
- dihedrals and Ramachandran plots
- PCA
- radial distribution functions and volumetric densities