Skip to content

Google Season of Docs

Oliver Beckstein edited this page Apr 15, 2019 · 35 revisions

NumFocus will apply as an umbrella org for Google Season of Docs. MDAnalysis is interested in the program. As part of the NumFocus application we need to contribute the following (by April 14).

Deadlines

MDAnalysis contribs to NumFOCUS proposal

  1. List previous experience with technical writers or documentation: If you or any of your mentors have worked with technical writers before, or have developed documentation, please mention this. Describe the documentation that you produced and the ways in which you worked with the technical writer. For example, describe any review processes that you used, or how the technical writer's skills were useful to your project. Explain how this previous experience may help you to work with a technical writer in Season of Docs.

    • We have not had any technical writers work on the MDAnalysis docs.
  2. Summarize your project's approach to documentation work: How does new documentation get written? How does existing documentation get reviewed and/or updated? How are documentation tasks shared among contributors? Is there anyone in the project in charge of documentation?

    • Developers write Python doc strings (API docs) and, where applicable, module-level overview and example docs.
    • During code review on a pull request, docs are also reviewed.
    • Docs are updated when code is updated or in response to user comments. Missing or outdated docs are logged on the issue tracker and tagged with the Component-Docs label.
    • API docs are reasonably complete because developers have to write them for all public-facing code according to our documentation guidelines. PRs are not merged without docs.
    • Documentation writers should follow the Writing Documentation style guide and rules.
    • Core developers share the responsibility for the documentation.
  3. Let us know whether you have participated in GSoC before. If yes, describe your achievements in that program and explain how this experience may influence the way you work in Season of Docs.

    • MDAnalysis has continually participated in GSoC since 2016 (first under the PSF and then under the NumFOCUS umbrella). We mentored 5 students so far who all successfully completed the program and all contributed useful features to the project (2018: 2 students, see Release 0.19.x; 2017: 1 student, see Release 0.17.0; 2016: 2 students, see release 0.16.0).
    • Our mentors have experience in introducing students to the code base and know how to explain the scientific motivation for the project to students who are not involved in molecular simulations. The latter is important for GSoD because a technical writer needs to understand what motivates our users and what they are looking for. We will teach the writer how to use MDAnalysis (with the help of our basic tutorial, our Workshop materials and example Jupyter notebooks (also available as live binder notebooks) so that she or he is in a better position to evaluate existing examples and collaborate with the developers to develop better examples. We know how important it is to maintain ongoing communication and will be responsive via issue tracker/email/zoom teleconference.
    • We know that new developers need to have some guidelines on what the rules are that everyone else follows. We have therefore created a developer documentation on the wiki, including the Code Style Guide and documentation guidelines, which also describe how to build and preview the documentation.

Project ideas

Please fill out your ideas list describing in some detail the project(s) that you envision a technical writer could work on during the program: Similar to GSoC, the quality of the ideas list will matter most in terms of whether we are accepted. We recommend you focus on one or two ideas and give detailed information on them. Empty .md docs have been created already; find your project's here: https://github.com/numfocus/gsod/tree/master/2019 Remember: most likely, the writer you get will initially not be familiar with your project. You will need to teach them the basics to use it and start the documentation work. Keep this in mind when composing your ideas list. Google provides examples of ideas in the instructions: https://developers.google.com/season-of-docs/docs/project-ideas

The list of ideas for GSoD is maintained at https://github.com/numfocus/gsod/blob/master/2019/MDAnalysis_ideas_list.md (perform a PR).


Name: MDAnalysis

Description: Scientific Python library for the analysis of molecular dynamics simulations in biophysics, chemistry, materials sciences.

website: https://www.mdanalysis.org/

repository: https://github.com/MDAnalysis/mdanalysis

email: [email protected] (all developers)

mentors: Oliver Beckstein [email protected] (primary contact) and Richard Gowers ([email protected])

Project name: Make it easy for new users to analyze their data

Description: MDAnalysis is a Python library that provides an abstract and object-oriented interface to data from particle-based simulations (primarily molecular dynamics simulations), which are widely used for simulating diverse systems such as the interaction of drugs with biomolecules or new materials. MDAnalysis is widely used in the scientific community and is written by scientists for scientists. Feedback from our users indicates that they like using MDAnalysis but wished that the documentation were easier to read and had more examples. The docs for scikit-learn and the PyTorch tutorials are generally cited as excellent examples of documentation and, taking these as examples, we would like to improve our documentation to make it more accessible and more immediately useful for new users.

At the moment, the primary sources of information for users are

We identified two areas for improvement (in rank order of priority):

Restructure docs

We want to restructure our docs for user-friendliness issue #1175 and refactor docs away from how the source code is organized into how the user interacts with the code (started in PR #1827). We envision a split into three major blocks:

  1. introduction with examples (more like a tutorial) and explanation of the underlying principles and guiding concepts (see the 2016 MDAnalysis paper (doi:10.25080/majora-629e541a-00e), SciPy 2016 talk and the slides in the presentation scipy-MDAnalysis-Beckstein.pdf, which all outline the fundamentals)
  2. API docs (similar to the majority of the current docs at https://www.mdanalysis.org/docs/)
  3. developer docs (notes for developers, can be technical/arcane – e.g., some material from the wiki, details of the fundamental data structures, notes on file formats)

The current documentation is part of the code base and consists of:

  • Python doc strings that are directly embedded in the code and associated with functions, classes, methods, attributes, and constants. Many modules also directly contain overviews and examples.
  • Pages in the doc/sphinx/source directory, which consists of documents that combine multiple modules or give more general overviews. The documentation is written in restructured text and automatically processed with sphinx. As part of the continuous integration process, it is tested that these docs build correctly. Docs from the latest build are automatically and immediately published in HTML format as the "development docs" at https://www.mdanalysis.org/mdanalysis/.

We would like to maintain the ability to automatically build the docs and continue working in the sphinx framework outlined above. A technical writer would be trained in working with our current development process where changes to this documentation would be handled like other changes to the code base. This means that the writer would use git for version control and submit pull requests to the GitHub repository. As part of the standard review process, the mentors (and other developers and community members) would give rapid feedback on the contribution of the writer. Once a PR is approved, it will be merged and the docs will be autogenerated and immediately available.

Improve and expand tutorials

We have one "official" introductory tutorial and various other tutorials but it is initially confusing to new users what they should look at and it is too long. We need to provide a better "road map" for new users and clearly lay out tutorials for different levels and with clear learning goals.

We need to split the current MDAnalysis Tutorial into multiple self-contained tutorials and sort them by level (introductory, intermediate, advanced). The tutorials can and should build on each other. There should be a top level entry point that gives an overview over the tutorials. An initial outline would contain the following (not all content exists yet, especially at intermediate/advanced level):

  1. Introductory level
    1. Installation: installing MDAnalysis and testing trajectories (MDAnalysisTests for simple examples, MDAnalysisData for advanced examples)
    2. Basic trajectory analysis: Loading data into a Universe, selecting atoms with Universe.select_atoms() as an AtomGroup, iterating through a trajectory, getting positions from AtomGroup.positions, and using numpy operations to calculate observables of interest from the positions.
    3. Using analysis tools in MDAnalysis.analysis: Performing common analysis tasks such as RMSD calculation and fitting, hydrogen bond analysis, or dihedral analysis using the common analysis classes.
    4. Working with AtomGroups: introduction to some often used methods of AtomGroup and how to work with multiple AtomGroups; slicing and fancy indexing of AtomGroup.
    5. Writing trajectories: difference between "trajectories" and "single frame" file formats; standard code pattern for writing trajectories or single frames; writing single frames directly with AtomGroup.write()
  2. Intermediate level
    1. Selections (requires Basic trajectory analysis and Working with AtomGroups): in-depth tutorial on the selection language; dynamically updating selections
    2. Working with Groups (requires Working with AtomGroups): The "container" hierarchy (Universe > Segment > Residue > Atom) and the corresponding groups SegmentGroup, ResidueGroup, AtomGroup: commonalities and differences, aggregating methods. How to work with fragments or molecules.
    3. Writing selections: outputting selections for other codes
    4. Working with topology information: introduction to the topology system; how to work with bonds; identify bonded atoms; working with angles and dihedrals; selections by type
    5. Applying on-the-fly transformations: A unique capability of MDAnalysis are trajectory transformations that change the trajectory while it is being read and so avoid generating intermediate files that are needed with other analysis packages. This tutorial would be based on the notebook on-the-fly-transformations.ipynb.
    6. In-memory trajectories: how to use the MemoryReader to speed up analysis or generate temporary reduced system trajectories for analysis (see, e.g., Workshop notebook trajectory_magic.ipynb)
    7. Visualization in notebooks with NGLView: how to use nglview with MDAnalysis (see Workshop notebook Visualisation_with_NGLView.ipynb and binder notebook nglview_drawframes.ipynb)
  3. Advanced level
    1. System building (requires Working with topology information): how to add atoms or bonds or create simple topologies from scratch; generating initial coordinates
    2. Extending file reading with own code (requires System building): write a Reader for once own custom file format and dynamically add it to MDAnalysis
    3. Write your own analysis class: shows how to leverage the MDAnalysis.analysis.AnalysisBase class to create feature-full custom analysis tools.

For this and other documents we want to start adding example Jupyter notebooks (such as the first few example notebooks) to our sphinx-based restructured text documentation via the nbsphinx extension.

We also want to include more diagrams, pictures, and graphs to make clearer what the relationships between different parts of the code are and what output might look like.


The project suggestions below will not be included in the first proposal so that we can focus on our two initial priorities and present a stronger proposal.

How-to Library

We want to create a library of short how-tos (mini-tutorials) that quickly demonstrate how to obtain a specific result. The ideal format are Jupyter Notebooks with runnable code but reading notebooks can be cumbersome on the web. Therefore, the notebooks will be integrated with normal documentation in restructured text.

We already have a few example notebooks that can run on Binder but we are not covering a sufficient range of examples, and examples might be outdated. Ideally we would regularly test if the notebooks still work.

  1. Developers need to identify a small number of FAQ cases that would make for good small self-contained how-tos
  2. Initial selection of examples
    • B-factor coloring
    • PDB file manipulation (e.g., adding ChainIDs)
    • selections with updating AtomGroups
    • trajectory conversion
    • RMSD analysis
    • RMSF analysis
    • dihedral analysis
  3. Jupyter notebooks need to be written that solve these problems. These notebooks should use example data and should be executable on binder. They can use either data files from MDAnalysisTests or MDAnalysisData.
  4. For each case, a short how-to document needs to be written (in restructured text and processed with sphinx) that
    • links to installation instructions and introductory tutorials (basically: all the things one needs to know)
    • integrates the notebook in the documentation via the nbsphinx extension so that there's no need to leave the docs and switch to a notebook viewer.

Background and Algorithms

Loosely connected to the introductory section of the tutorial, we would like to compile a more extensive and in-depth document on the algorithms and data structures. This would be more scholarly and could conceivably be expanded to a paper on fundamentals of analyzing molecular dynamics simulations. It would require more diagrams and graphs and more emphasis on citations. Topics to be included:

  1. algorithms for distance calculations, including treatment of periodic boundaries: difference between distance_array and self_distance_array as well as capped_distance_array; how calc_bonds works.
  2. Treatment of periodic boundaries in different contexts; wrapping/unwrapping of molecules
  3. algorithms for commonly performed analyses
    • hydrogen bonds and hydrogen bond correlations
    • RMSD
    • RMSF
    • contacts
    • dihedrals and Ramachandran plots
    • PCA
    • radial distribution functions and volumetric densities
Clone this wiki locally