Skip to content

Script to time how long it takes OpenRefine to carry out data loading/operations/export as data file size increases

Notifications You must be signed in to change notification settings

ostephens/openrefine-timer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

#OpenRefine Timer ##Overview This is a very basic file to test how well OpenRefine scales. It creates a csv file, loads it into OpenRefine, carries out a few basic operations, then exports the file as a tsv. Each step in this process is timed, and the timings recorded in a csv file for analysis.

The script uses the google-refine gem by Max Ogden https://github.com/maxogden/refine-ruby, and the faker gem by Benjamin Curtis https://github.com/stympy/faker.

The script will keep going until some part of the process fails.

##Running the tests OpenRefine must already be running and available on http://127.0.0.1:3333 to successfully run the scripts.

The script runs from the command line with the following flags: Usage: openrefine_scale_test.rb [options]

* -f, --filename [FILENAME]        Name to use for data file to be used in testing
* -p, --projectname [PROJECTNAME]  Name to use for OpenRefine project to be used for testing
* -i, --increments [INCREMENTS]    Number of lines to increment data file by each run
* -o, --operationsfile [OPERATIONSFILE] Valid JSON file of OpenRefine operations to carry out during testing        
* -t, --timingsfile [TIMINGSFILE]  File to save the timings to
* -r, --repeats [REPEATS]          Number of times to repeat each test to get average
* -h, --help                       Show this message

You'll need to enter correct values for all flags (except -h and -o) to successfully run the script. For example:

./openrefine_scale_test.rb -f data.csv -p timings -i 25000 -o operations.json -t timings.csv -r 1

If you omit the 'o' flag then no operations will be carried out on the data within OpenRefine - the data will just be loaded and then exported.

##Caveats This script is meant to help give a rough idea of the limits of OpenRefine, not offer definitive information about its capacity. There are many factors that may affect the performance of OpenRefine including:

The script measures timings in terms of calling the relevant command via the refine-ruby gem, and waiting until it has completed - this may not reflect the time actually taken for OpenRefine to carry out the operation.

##TO DO I'd like to:

  • Enable the data file used for testing to have columns added as well as rows
  • Offer conversion of the data file used for testing into other formats (e.g. xls) to test if this makes a difference to performance
  • Build in graphing of the results of the test

About

Script to time how long it takes OpenRefine to carry out data loading/operations/export as data file size increases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages