This guide is intended for people who want to modify or extend the REDCap-ETL software. If you just want to use the REDCap-ETL software, please see the installation and configuration guides.
This is a list of the steps for setting up a REDCap-ETL development environment. Example commands are shown for Ubuntu 16.
-
Install PHP
sudo apt install php php-curl php-mbstring php-mysql php-xml sudo phpenmod mysqli # enable mysqli extension sudo phpenmod pdo_mysql # enable PDO extension for PHP sudo apt install php-sqlite3 # add PHP support for SQLite sudo apt install php-xdebug # Install XDebug to be able to see phpunit test code coverage
-
Install Composer (needed to get PHPCap and development dependencies)
sudo apt install composer
-
Install sendmail (needed for logging to e-mail)
sudo apt install sendmail
Note: after REDCap-ETL has been installed, its bin/email_test.php script can be used to test if e-mail logging works.
-
Install MySQL
sudo apt install mysql-server sudo mysql_secure_installation systemctl status mysql.service # check status # Set up SSL sudo mysql_ssl_rsa_setup --uid=mysql sudo systemctl restart mysql
-
Create MySQL Database and User
Create a database and database user that will be used as the place to store the REDCap data, for example, in MySQL use:
CREATE DATABASE `etl_test`; CREATE USER 'etl_user'@'localhost' IDENTIFIED BY 'etlPassword'; GRANT ALL ON `etl_test`.* TO 'etl_user'@'localhost';
-
Install SQLite
sudo apt install sqlite3 sudo apt install sqlitebrowser # optional
-
Install PostgreSQL
To run the PostgreSQL system tests (discussed below), you will need to install
PostgreSQL. If PostgreSQL is not installed, these tests should be automatically skipped.sudo apt install postgresql postgresql-contrib sudo apt install php-pgsql
You can use the following command to test your PostgreSQL installation:
sudo -u postgres psql -c "SELECT version();"
To log in as the root user use:
sudo -u postgres psql
To create the a database, schema and user that match the values in the test configuration file, use:
create database etl_test; \connection etl_test create schema etl_test; create user etl_user with password 'etlPassword'; grant all privileges on database etl_test to etl_user; grant usage on schema etl_test to etl_user; grant create on schema etl_test to etl_user;
To test the account created:
psql -U etl_user -h localhost -d etl_test -W
-
Install SQL Server
To run the SQL Server system tests (discussed below), you will need to install SQL Server. If SQL Server is not installed, these tests should be automatically skipped.
For details see SQL Server.
-
Install Git
sudo apt install git # Add e-mail and name information, for example: git config --global user.email "[email protected]" git config --global user.name "J Smith"
-
Get REDCap-ETL Code
Execute the following command in the directory where you want to put REDCap-ETL:
git clone https://github.com/IUREDCap/redcap-etl
-
Install Composer Dependencies
In the top-level directory where the code was downloaded, run:
composer install
-
Install phpDocumentor
Install phpDocumentor, which is used for generating API documentation from PHPDoc comments in the code. It should be set up so that it can be run from the command line using:
phpdoc
After the steps above have been successfully completed, the REDCap-ETL software directory should have a structure similar to what is listed below. The main directories and files are as follows:
- bin/ - directory containing scripts
- config/ - default configuration directory for REDCap-ETL
- dependencies/ - static dependencies used for production installation
- docs/ - documentation
- src/ - the REDCap-ETL source code directory
- tests/ - automated tests directory
- config/ - directory for test configuration files that have been customized for your system
- config-init/ - initial test configuration files for copying and then customizing for your specific installation
- coverage/ - directory set up for storing testing code coverage reports, where files are ignored by Git
- data/ - test result comparison data
- integration/ - code for integration tests
- logs/ - test log files (ignored by Git)
- output/ - test generate output files (ignored by Git)
- projects/ - test REDCap projects
- system/ - code for system tests
- unit/ - code for unit tests
- vendor/ - directory where the dependencies generated by Composer are installed
- composer.json - JSON configuration file for Composer
- phpunit.xml - configuration file for automated tests run with PHPUnit
- README.md - main README file for REDCap-ETL
There are 3 types of automated tests:
- Unit - each test focuses on a single class
- Integration - tests focus on the integration of multiple classes
- System - tests focus on testing the system as a whole (multiple classes + scripts)
The test types above are listed in order of least to most setup effort.
Unit | Integration | System | |
---|---|---|---|
Configuration file setup required | ✓ | ✓ | |
REDCap and REDCap project setup required | ✓ | ✓ | |
MySQL and SQLite database setup required for loading data | ✓ |
You should be able to run the unit tests at this point if you have completed the previous steps. To run the unit tests, enter the following in a command shell at the top-level directory of the REDCap-ETL installation:
./vendor/bin/phpunit --testsuite unit
If this command runs successfully, you should see an "OK" message that indicates the number of tests and assertions that were successful.
Setting up the integration and system tests requires having access to a REDCap instance, and the ability to get API tokens for projects on that instance. Setting these tests up is not required, but the tests have much better code coverage when they are.
To set up the integration tests, you need to first set up the REDCap projects that have the data for the tests:
-
In REDCap, create one project using the "Upload a REDCap project XML file" option, for each of the following files from REDCap-ETL:
tests/projects/BasicDemography.REDCap.xml tests/projects/MultipleRootInstruments.REDCap.xml tests/projects/RepeatingEvents.REDCap.xml tests/projects/RepeatingForms.REDCap.xml tests/projects/Visits.REDCap.xml
-
Additional steps for the project created using RepeatingForms.REDCap.xml. This project has two Data Access Groups named 'dag1' and 'dag2'. After the project has been created:
- Assign records 1001 and 1002 to dag1.
- Assign records 1049 and 1050 to dag2.
-
Request API tokens for the projects you just created (or create tokens if you are an admin). The tokens needs to have export permission.
The next thing you need to do is to create the configuration files for the tests:
-
Copy the file ./tests/config-example.ini to ./tests/config.ini
-
Edit the ./tests/config.ini file:
-
redcap_api_url - set this to the URL for your REDCap's API. Be sure to set this to the URL for the API, which typically ends with "/api/".
-
data_source_api_token - set these API tokens to correspond to the API tokens for the projects that are referenced.
-
After the above steps have been completed successfully, you need to run the test setup script to set up the individual test configuration files, which are stored in the ./tests/config directory:
php ./bin/test_setup.php
Finally, run the integration tests by executing the following command in the top-level directory of your REDCap-ETL installation:
./vendor/bin/phpunit --testsuite integration
Steps for setting up the REDCap projects:
REDCap Project Setup. If you did not already set up the "Repeating Events" and "Visits" projects as described in the steps for setting up integration tests, then you need to do that now.
SQLite Database Setup. You need to create SQLite test databases. Use the following commands
cd tests/output
sqlite3 sqliteTest.db
When in the sqlite shell from executing the above sqlite3 commands, enter the following:
.databases
.quit
SQL Server Database Setup. If you want to run the SQL Server automated tests, you need to have a SQL Server database. For information on setting one up on Ubuntu 18, see SQL Server.
ETL Configuration File Setup. The next thing you need to do is to create the configuration files for the "Repeating Events" and "Visits" projects:
-
If you did not already do this as part of the integration tests setup, then copy the file ./tests/config-example.ini to ./tests/config.ini
-
Edit the file ./tests/config.ini. Set the db_connection properties for the databases that you have set up. The "with-ssl" databases are databases that support SSL. You can use the same database for the non-SSL and SSL database configurations. If any db_connection properties are not set, then the tests that use those properties will be skipped.
-
To run all of the database SSL tests, you also need to create a file named ca.crt in the tests/config directory that is a valid certificate authority certificate file. If this file is missing, the MySQL SSL tests will be skipped.
After the above steps have been completed successfully, you should be able to run the system tests by executing the following command in the top-level directory of your REDCap-ETL installation:
./vendor/bin/phpunit --testsuite system
To run all of the automated test, in the top-level directory run:
./vendor/bin/phpunit
If the output from the above command indicates that there were tests that were skipped, you can get more information by running the above command with the verbose option:
./vendor/bin/phpunit -v
To see test coverage information, you need to have XDebug installed, and then run the following command from the root directory of the project:
XDEBUG_MODE=coverage
export XDEBUG_MODE
./vendor/bin/phpunit --coverage-html tests/coverage
Then with a browser, open the file:
tests/coverage/index.html
The output could be stored in a different directory, but directory tests/coverage has been set up to be ignored by Git.
ETL configuration files, transformation rules files, and SQL files for tests need to be placed in the following directory:
./tests/config-init
The name of each configuration file needs to include the project name it accesses, e.g., "basic-demography", "repeating-events", or "visits". The test setup script (.bin/tests_setup.php) searches for this to know which API URL and token values to use from the test configuration file ./tests/config.ini. The ETL configuration files are copied by the script to the ./tests/config directory with the API URL, API token, and database connection values set to the values in the ./tests/config.ini file.:
The name of the configuration file also needs to contain the name of the database system used (if any). And "-ssl" is added to the database system name if is is used for tests that require the database to support SSL (secure) connection. Possible values include:
mysql
mysql-ssl
postgresql
sqlite
sqlserver
sqlserver-ssl
If possible, the existing REDCap projects should be used for tests. These projects are in:
tests/projects/
If you do need to add a new test project, then a REDCap XML download of the project should be added to the directory above, and a section should be added to the following file, where the section name corresponds to the project name:
tests/config-example.ini
The following script will need to be modified to handle the new project:
bin/tests_setup.php
Finally, new test setup information for this test project will need to be added to:
docs/DeveloperGuide.md
The API documentation is programmatically generated and is not stored in GitHub.
To generate the API documentation, execute the following command in the top-level REDCap-ETL directory:
phpdoc
To view the API documentation, open the following file (relative to the top-level REDCap-ETL directory) with a web browser:
./docs/api/index.html
The configuration information for the phpdoc command is in the following file in the top-level REDCap-ETL directory:
phpdoc.xml
REDCap-ETL follows these PHP coding standards:
- PSR-1: Basic Coding Standard
- PSR-2: Coding Style Guide
- PSR-4: Autoloader
- Lower camel case variable names, e.g., $primaryKey
From the top-level directory of your REDCap-ETL installation, the following command can be used to check for coding standards compliance:
./vendor/bin/phpcs
The coding standards checks that are done (by default) are configured in the file phpcs.xml in the top-level directory.
To avoid requiring Composer to be run when the REDCap-ETL is installed, the non-development dependencies are copied to the dependencies/ directory, and this directory is committed to Git. To update the contents of this directory, the following commands can be used from the top-level directory:
composer update
composer install --no-dev
rm -rf dependencies
mv vendor dependencies
composer install
To check for out of date dependencies, use:
composer outdated --direct
The "--direct" option above only checks dependencies directly used by REDCap-ETL (i.e., specified in the composer.json file).
For more information, see: REDCap-ETL Software Architecture
If you want to add a new database type (e.g., Oracle) to REDCap-ETL you need to do the following:
- Create a new database connection class for the new database type that:
- extends class PdoDbConnection, if the connection uses PDO (PHP Data Objects)
- extends class DbConnection, if the connection does not use PDO
- Modify class DbConnectionFactory to add your new database type:
- Add a new constant for your database type
- Add a case for your new database type in the constructor
REDCap-ETL supports 3 different ETL configuration formats:
- Array - a PHP array with configuration data is passed to REDCap-ETL. This format is for programmatic access to REDCap-ETL. Specific uses include automated testing and use by the REDCap-ETL External Module to pass data to its embedded REDCap-ETL server. The array format corresponds to the array format generated by calling PHP's parse_ini_file on an INI ETL configuration file.
- INI - the path of a ".ini" file is passed to REDCap-ETL. This format is intended for use by users of REDCap-ETL.
- JSON - the path of a ".json" file is passed to REDCap-ETL. This format is intended for programmatic use, although it could be used by users.
Notes:
- The task_config property can only be represented correctly in the array and JSON formats.
- Only the JSON format can represent task names that are the same as REDCap-ETL configuration property names.
The JSON format supports the complete specification of a configuration, including transformation rules and pre and post-processing SQL, in a single file, which is not possible with the INI format.
Examples of the JSON format can be seen in the .json files in the tests/config-init/ directory. Below is an example of a very simple JSON configuration file for a single task that uses the auto-generation feature for transformation rules generation (transform_rules_source = 3).
{
"redcap_api_url" : "http://localhost/redcap/api/",
"data_source_api_token" : "1235ABCA87862FF0318837CC05721348",
"transform_rules_source": "3",
"db_connection": "MySQL:127.0.0.1:etl_user:etl_password:etl_db"
}
The basic structure of a JSON workflow configuration is shown below. In this example "task1" and "task2" are the task names for the workflow, and they can be set to arbitrary values. The "workflow", "global_properties" and "tasks" labels are in effect keywords that must appear as shown. Each property defined in "global_properties" will apply to all tasks that do not explicitly override the property. The "workflow_name" property can only appear in the "global_properties" and is required.
{
"workflow": {
"global_properties": {
"workflow_name": "workflow1",
"batch_size": 10,
...
},
"tasks": {
"task1": {
"redcap_api_url": "http://localhost/redcap/api/",
"data_source_api_token": "11347CC74A8B98AC31BA9F78215814968",
...
},
"task2": {
...
},
}
}
}