-
Notifications
You must be signed in to change notification settings - Fork 59
Home
defines a single global object CSV
useful for defining operations on tabular or matrix data
For begin and a tutorial, please see the README.md file on the main page
CSV.begin(datasource)....middleware()....go(finalCallback);
var handler = CSV.begin(datasource)....middleware()....finalize(finalCallback); // returns function
handler(); // same as using ....go(finalCallback)
// or maybe use it with jQuery. multiple calls to "handler()" are automatically debounced
$('#someButton').on('click', handler);
If undefined, then finalCallback = function(e,D){ if(e) console.log(e); }
e
is the error/exception string from CSV operations, or null for no error
D
is an object supplied by CSV containing:
- the row data in
D.rows
- perhaps meta data in
D.meta
D
will be null
on error
saves data in browser local storage or by ajax push if shared.ajaxMapper is defined
reformats data rows to a csv file and pushes file out the browser, as if it were a download.
Works on recent Chrome and Firefox
adds new rows to the internal data in shared.data.rows
filters the dataset to only include rows that pass the filter
the header is preserved as row 0
Filters the rows based on lower and/or upper limits for one or more columns
Currently assumes row 0 is a header row and keeps this header row.
filterObject = {colName1: [low,high], colName2: [low,high], ...}
in the object, range restrictions on zero or more columns are defined.
You may set low or high to 'null' to obtain a one sided test.
or
filterObject = [ [low,high], [low,high], ..., [low,high] ]
where in the array of pair arrays, the number of pairs must match the number of columns
in the header row.
where colName1
is a string column name from the header row.
The default is to apply an inclusive range test, i.e. to test that low <= value && value <= high
If either low or high is null, the test becomes one sided.
If low and high are the same value, then both are tests are performed and the result is an equality test.
appends a new column of data to all rows
colName
: If string, the name of the new column. This name is pushed to the header row.
If colName
is not type string, then colName is ignored and the data is treated as headerless.
colData
: an array of data
If the 3rd parameter is 'strict'
, then the following error is thrown if colData.length
does not equal
to the number of data rows in the data set. (total rows - 1 for the header if present).
throw "CSV: addCol new columnn data length "+colOrFunc.length+" needed "+(rows.length-h)
This error will cause the rest of the middleware to be skipped and the go()
or finalize()
finalCallback
will be called with e set to the error text
If the 3rd parameter is undefined
, then the length check is not performed, and in the event of a length
mismatch then elements of colData
are cyclically reused in order to fill out the number of data rows required.
appends a new column to all rows
colName
: If string, the name of the new column. This name is pushed to the header row.
colDataFunction
: a function(i, row)
that provides the column value for row i
needRowPropsBoolean
:
if true, the row
is supplied to colDataFunction
as an object
with keys equal to the header row names
if false, the row
is supplied to colDataFunction
as an array
containing the values of the existing columns for that row.
creates an HTML table from the row data, first slicing on [beginRow, endRow)
divId is an id of the div where the table is to be created. If the divId is not found in the document, then it will be created and appended to the end of document.body
options is an object defining table options to the internal routine makeTable
Many of these options will take a callback function instead of a string literal.
options.table
---> HTML attributes supplied as <table attributes >
example: options.table = { border: 1 }
sets standard HTML table border
options.caption
---> sets content for caption with <caption>content</caption>
options.header
---> if truthy, or options.thead is present, put row zero in header tags
options.thead
---> sets attributes for <thead>
tag
options.theadtr
---> sets attributes for the single <tr>
tag for the header
options.th
--> sets attributes for the <th>
tag
can be a literal or callback(i,j,v) where i is the row=0, j=col, v=value
options.tbody
---> sets attributes for <tbody>
tag
options.tr
---> sets attributes for the <tr>
tags for the data rows
can be a liteal or callback(i) where i is the row number
options.td
---> sets attributes for the <td>
tags for data cells
can be a literal or callback(i,j,v)
options.cell
--> if defined, overrides cell contents
if defined, must be a callback(i,j,v)
create an editor
the editor is very basic and does not include adding or deleting rows
divID The id of an existing div to be emptied and filled by html5csv with the editor's HTML.
If the div does not exist, it will be created and appended to document.body
headerBool -- optional -- true if data.rows[0] should be treated as a header row
beginRowNumber -- optional -- the beginning row number of the editing window
If beginRowNumber is undefined
then row 0 or 1 will be used depending on whether there is a header.
endRowNumber -- optional -- the end row number of the editing window
precallback -- optional -- function(tableOptions) -- callback with parameter equal to the table Options object. This object may be modified by the user supplied callback to change behaviors. After returning, tableOptions is passed to table
internally for generating HTML to render the table.
onCell -- optional -- function(rowNum,colNum,newVal,shared.data) -- callback with parameters pertinent to a user editing event -- the table row and column number, the value entered by the user, and the previous
shared.data object. This callback overrides the default behavior which would roughly correspond to shared.data.rows[rowNum][colNum] = newVal
, therefore if provided the callback must arrange to update the shared.data.rows if updating the data for subsequent middleware is desirable.
requires loading jqplot -- usually multiple js + css files
create one or more plots. Calls optional afterCallback(plotsObject) after calling jqPlot
var plotSpecArray = [ [name1,pairs1,options1], ... ] // plotSpecArray is an array of triplets
name is a string giving the plot name, and is the key of the plotsObject returned to afterCallback
pairs1 is an Array of Pairs, as in
[['x','y'],['x','z']]
to put y vs x
and z vs x
on the same axes
options1 is an Object of jqplot options, passed directly to jqplot
jsFiddle Example of sine and cosine plot and data table
To learn how to make specific plots, like bar plots vs lines vs points, see jqPlot's documentation and examples.
requires loading the numeric.js file
Performs linear fitting via Ordinary Least Squares
fitSpecArray=[fit1,fit2,fit3]
is an array of fits to perform
Each fit is a 3 element array fit=[fitname, fitdep, fitindep]
where:
fitname
is a string you choose to identify the fit
fitdep
is a string giving the column name (from the header col) for the dependent variable data
The dependent variable is the one you are trying to analyze or predict. This is the "y" data.
fitindep
is an array of strings, functions, or other identifiers for independent variables
The independent variables are the ones you think are useful as explanations.
The array fitindep
is passed to shared.submatrix(null, fitindep)
to create the independent
variable matrix X for the Ordinary Least Squares calculations. If an array element is a string, the corresponding column is taken from the CSV data in shared.data.rows. If an array element is a number, a constant column is created (useful for modelling a constant term or intercept). If an array element is a function F
, then F(i,j,rows[i])
is called, looping i=1...(L-1)
Each fit is evaluated independently, so that ordinary math errors in one fit (such as caused by singular matrices when there is autocollinearity among the independent variables) will not influence the execution of other fits.
Each fit places a fit result object in shared.data.fit
, which can be accessed from call
middleware or in the finalCallback(e,D)
function of go/finalize as D.fit
.
A successful fit result object for an individual fit looks like this:
`{ name: fitname, dep: fitdep, indep: fitindep, beta: BETA}`
where fitname
, fitdep
, and fitindep
are the same as above and copied from the call to ols;
and BETA
is an array with beta.length = fitindep.length
where the j-th element corresponds to the
j-th coefficient of a linear model multiplying the j-th column from fitindep
On error, beta is undefined or null and an error
string is defined. Math errors in fit do not skip subsequent middleware or cause an error invocation of finalCallback. The primary math error is when two columns are identical, or close to being identical, so that the XtX matrix is not invertible. This is also called multicollearity or autocollinearity.
Example: There are examples and unit tests of ols of in the test, near the end of the file qtestcsv.js
Possible Future Enhancements: Residuals e^2 and Goodness of fit R^2 is straightforward to calculate once beta is known, but is not currently performed in the library. Confidence intervals on beta, such as provided by R or other stats software, can be dangerous in the hands of untrained people because the conditions under which they would be valid can be violated by heteroskedasticity or other deviations from the standard normal model of ordinary least squares. There are well known tests for some of these deviations.
Perform Principal Component Analysis of selected columns. Calculation methodology is based on SVD and comparable with that in R's prcomp
function.
Requires numeric.js to be loaded
indep -- an array of column names
options -- optional object
options.center: 1 if the data in each column should be demeaned
options.scale: 1 if the data in each column should be scaled to have unit variance
Note: if options.scale is 1 and options.center is set to 0 or undefined, a column mean of 0 will be assumed and used for calculating column variance irregardless of the empirical column mean
options.newcols: an array of new column names. Columns providing the PCs of the data will be column appended to the CSV data, in new columns using these names. If there are 3 names in options.newcols, the first 3 PCs will be appended. The actual names are up to the user.
side effects:
pca current creates a pca object in data with the following fields:
- data.pca.indep: the arrary indep supplied to pca of input column names
- data.pca.options: the options object supplied to pca
- data.pca.C : calculated column centers
- data.pca.V : calculated column variances
- data.pca.adjX: data matrix after demean/scale adjustment
- data.pca.svd: singular value decomposition of data matrix, from numeric.js
Example: If you generate a CSV file with columns a,b,c you can try reading it into this PCA Example
calls programmer defined customFunction(next) with this=shared and first parameter "next"
- Important style request -- In your function's first line, please use:
var shared = this;
- The data rows are then in shared.data.rows
- The meta data, if any, is in shared.data.meta
- next(), the first parameter supplied to your customFunction, can be called to go to the next task.
- If you forget to call next and just return, thats ok too.
- If you need to defer the next task, e.g., for ajax,
return 'defer';
from customFunction
- To skip future middleware on an error, simply
throw "Oh No! My detailed error message goes here";
If you refine a call customFunction enough for reuse, it can be set up as an extension very easily.
CSV.extend(middlewareExtensions, sharedExtensions);
var extension = {
'diff': function(){
var shared = this; // pick up shared object from this, will be set internally by func.apply
// the rows of data are in shared.data.rows
// this code assumes shared.data.rows[0] is a header row, and later rows are numeric data
var newrows = [shared.data.rows[0].slice(0)]; // copy old header row
var i,l,j,k,newrow,r0,r1;
for(i=2,l=shared.data.rows.length;i<l;++i){
newrow = [];
r0 = shared.data.rows[i-1];
r1 = shared.data.rows[i];
for(j=0,k=r0.length;j<k;++j) newrow[j]=r1[j]-r0[j]
newrows.push(newrow);
}
shared.data.rows = newrows;
}
};
CSV.extend(extension); // attach to list of known middleware
CSV.begin(datasource)...diff()....go(); // diff should now work