The repository is named pma2020-analytics2
because this is the second pass at
writing analytics tools. The first pass was done in R
. The Python package is
simply called analytics
. The functionality is provided as a command-line
tool.
Note: the repository name and the package name are different!
Some of the data that is extracted:
- Specific XML tags from the submitted instance
- Total active screen time during the whole survey
- Short break time during the whole survey
- File sizes of photos, submission, and log
- Total swiping events
Also, five data points for each known prompt in the log are recorded. Each column name suffix and description is below.
_c
for the number of times a constraint/required was invoked on the prompt._t
for the active screen time spent on the prompt._v
for the total number of visits to the prompt._d
for the total number of times the answer changed on the prompt._b
for short break time associated with the prompt.
All times are in millseconds.
PMA Analytics makes use of Python 3.6. Install Python 3.6.
Install via pip with
python3 -m pip install https://github.com/jkpr/pma2020-analytics2/zipball/master
All required arguments are named to follow the same pattern as ODK Briefcase.
Example usage:
python3 -m analytics.condense --storage_directory ~/Documents/odkbriefcase/ --form_id HQ-rjr1-v25 --export_directory . --export_filename hq-out.csv
A JSON file can be supplied through the --lookup
option of the condense
command-line interface. The file should have the proper format: a list of JSON
objects with properties
form_id
(string) form idform_title
(string) form titleprompts
(list of string) prompts in thelog.txt
filestags
(list of string) names of XML tags fromsubmission.xml
files
Put something in prompts
in order to get information about number of visits
and time spent at that prompt. Put something in tags
to get the actual value
in the submission, e.g. <your_name>Jane Doe</your_name>
An example is given below.
[
{
"form_id": "HQ-rjr1-v12",
"form_title": "RJR1-Household-Questionnaire-v12",
"prompts": [
...
"hh_duplicate_check",
"duplicate_warning",
"resubmit_reasons",
"duplicate_warning_hhmember",
"available",
"consent_start",
"consent",
"begin_interview",
...
],
"tags": [
...
"your_name",
"start",
"end",
"deviceid",
"HHQ_result"
...
]
}
]
The analytics.formdata
subpackage has a command line interface to display the supported form titles and form ids. Use:
python3 -m analytics.formdata
Then pipe to grep
to filter results. For example, in order to see the supported forms for Uganda round 5, use
python3 -m analytics.formdata | grep UGR5
As analytics runs, it emits logging messages. The standard levels, in order of increasing severity, are DEBUG
, INFO
, WARNING
, and ERROR
. Analytics uses all of these to convey specific meaning.
Information that is useful for debugging. Analytics uses DEBUG
to say which instance (folder) is currently being analyzed.
High-level information about the running program.
- What form id is being analyzed.
- When the analysis completes
When WARNING
is about the logs, it typically means something with the potential to be problematic has occurred. Usually, however, this is not a cause for concern.
This is used when something happens that prevents the analytics from completing its task.
- The analytics file column headers do not match what data is to be appended.
- Some problem with the file prevents its analysis (e.g. corrupt file).
python3 -m pip install https://github.com/jkpr/pma2020-analytics2/zipball/master --upgrade
Submit bug reports to James Pringle at [email protected]
minus the BEAR.