-
Notifications
You must be signed in to change notification settings - Fork 0
Working with CSV data
In this tutorial we describe you how you can import your CSV data into a Phovea application and what the different types of CSV data are.
- Create a new
data
directory in your phovea application directory - Create a new
my_phovea_app/data/index.json
, that will contain an array of metadata for the CSV files
Below we distinguish between different data types: Table, Matrix, Vector, and Stratification.
A table contains multiple columns with different data types in one CSV file (e.g., user name, age, ...)
Place this users.csv
in your data directory:
user_id, username, age
user_0, User A, 18
user_1, User B, 54
user_2, User C, 47
user_3, User D, 27
user_4, User E, 58
user_5, User F, 29
user_6, User G, 68
user_7, User H, 34
user_8, User I, 21
user_9, User J, 94
Heads up!
Phovea requires an id
column as first column for this data type (i.e, string
or int
)!
Now we have to register this file in the index.json
and add some metadata.
[
{
"name": "User Data",
"description": "Some user attributes",
"path": "users.csv",
"separator": ",",
"type": "table",
"size": [10, 3],
"idtype": "Users",
"columns": [
{
"name": "username",
"value": {
"type": "string"
}
},
{
"name": "age",
"value": {
"type": "int",
"range": [0, 100]
}
}
]
}
]
For this example we assume that the index.json
and the users.csv
are stored in the same data
directory. Otherwise you can adapt the path to the CSV file. Make sure to add an idtype
and the size
of the table. Each column contains of a name that is used for later reference, and a value type (i.e., string
, int
, real
).
Heads up!
After changing the source data or the index.json
you have to restart the Phovea server using docker-compose restart api
from the workspace or project directory.
You can access the data now directly from the Phovea REST API.
Dataset
-
/api/dataset/
returns the metadata of all available datasets including an automatically generatedid
-
/api/dataset/<dataset_id>
and/api/dataset/table/<dataset_id>/data
return the formatted data for the given dataset id -
/api/dataset/table/<dataset_id>
returns the metadata for the given dataset id -
/api/dataset/table/<dataset_id>/rows
returns a list of all row ids from the dataset -
/api/dataset/table/<dataset_id>/rowIds
returns the ids in the Phovea range format (e.g.,(0:10)
) -
/api/dataset/table/<dataset_id>/raw
returns the JSON data for the given dataset id -
/api/dataset/table/<dataset_id>/col/<column_name>
returns the data for a column of the given dataset id
Views
-
/table/<dataset_id>/view/<view_name>
returns the metadata of the view -
/table/<dataset_id>/view/<view_name>/raw
returns the JSON data for the given view of the dataset -
/table/<dataset_id>/view/<view_name>/rows
returns a list of all row ids found for the view of the dataset -
/table/<dataset_id>/view/<view_name>/rowIds
returns the ids in the Phovea range format (e.g.,(0:10)
) for the view of the dataset
TODO Explain how to define the view in the index.json
.
In contrast to a table all columns of a matrix have the same data type (e.g., int
or real
).
Place this time-series.csv
in your data directory:
user_id, 2010, 2011, 2012, 2013, 2014, 2015
user_0, 18, 34, 57, 32, 25, 46
user_1, 95, 41, 15, 43, 82, 44
user_2, 57, 46, 37, 54, 25, 86
user_3, 34, 93, 68, 41, 54, 18
user_4, 68, 23, 32, 69, 12, 39
user_5, 34, 12, 49, 80, 11, 58
user_6, 21, 58, 30, 99, 68, 17
user_7, 84, 85, 60, 48, 48, 38
user_8, 71, 17, 48, 20, 60, 39
user_9, 72, 69, 23, 57, 53, 56
Heads up!
Phovea requires an id
column as first column for this data type (i.e, string
or int
)!
Now we have to register this file in the index.json
and add some metadata.
[
{
"name": "Performance Time Series",
"description": "User performance over time",
"path": "time-series.csv",
"separator": ",",
"type": "matrix",
"size": [10, 6],
"rowtype": "Users",
"coltype": "Years",
"value": {
"type": "int",
"range": [0, 100]
}
}
]
You can access the data now directly from the Phovea REST API.
Dataset
-
/api/dataset/
returns the metadata of all available datasets including an automatically generatedid
-
/api/dataset/<dataset_id>
and/api/dataset/matrix/<dataset_id>/data
return the formatted data for the given dataset id -
/api/dataset/matrix/<dataset_id>
returns the metadata for the given dataset id -
/api/dataset/matrix/<dataset_id>/rows
returns a list of all row ids from the dataset -
/api/dataset/matrix/<dataset_id>/rowIds
returns the ids in the Phovea range format (e.g.,(0:10)
) -
/api/dataset/matrix/<dataset_id>/cols
returns a list of all column ids from the dataset -
/api/dataset/matrix/<dataset_id>/colIds
returns the ids in the Phovea range format (e.g.,(0:10)
) -
/api/dataset/matrix/<dataset_id>/raw
returns the JSON data for the given dataset id -
/api/dataset/matrix/<dataset_id>/hist
returns a histogram for the matrix data -
/api/dataset/matrix/<dataset_id>/stats
returns statistical values of the matrix data (e.g., q1, q3, min, max, sum, median, mean, skewness)
TODO
TODO