Skip to content
Dr Paul Brewer edited this page Sep 2, 2013 · 36 revisions

html5csv.js

defines a single global object CSV

useful for defining operations on tabular or matrix data

CSV.begin(data, options)..middleware...go(finalCallback);

For begin and a tutorial, please see the README.md file on the main page

usage

can chain as many "middleware" calls as you like, go begins execution of the chain

CSV.begin(datasource)....middleware()....go(finalCallback);

can also create functions to call later, that represent the entire chained workflow

var handler = CSV.begin(datasource)....middleware()....finalize(finalCallback); // returns function


handler();  // same as using ....go(finalCallback)
// or maybe use it with jQuery. multiple calls to "handler()" are automatically debounced
$('#someButton').on('click', handler);

optional finalCallback(e,D)

If undefined, then finalCallback = function(e,D){ if(e) console.log(e); }

e is the error/exception string from CSV operations, or null for no error

D is an object supplied by CSV containing:

  • the row data in D.rows
  • perhaps meta data in D.meta

D will be null on error

currently defined middleware

###.save(csvName). saves data in browser local storage or by ajax push if shared.ajaxMapper is defined

###.download(csvFileName). reformats data rows to a csv file and pushes file out the browser, as if it were a download works on recent Chrome and Firefox

###.push(newrow,..., ). adds new rows to the internal data in shared.data.rows

###.hslice(beginRowNumber,endRowNumber). filters the dataset to only include rows that pass the filter the header is preserved as row 0

###.hslice(filterObject). Filters the rows based on lower and/or upper limits for one or more columns

Currently assumes row 0 is a header row and keeps this header row.

filterObject = {colName1: [low,high], colName2: [low,high], ...} in the object, range restrictions on zero or more columns are defined.
You may set low or high to 'null' to obtain a one sided test.

or

filterObject = [ [low,high], [low,high], ..., [low,high] ] where in the array of pair arrays form the number of pairs must match the number of columns in the header row.

where colName1 is a string column name from the header row.

The default is to apply an inclusive range test, i.e. to test that low <= value && value <= high

If either low or high is null, the test becomes one sided. If low and high are the same value, then both are tests are performed and the result is an equality test.

###.appendCol(colName, colData, optional 'strict'). appends a new column of data to all rows

colName: If string, the name of the new column. This name is pushed to the header row.

If colName is not type string, then colName is ignored and the data is treated as headerless.

colData: an array of data

If the 3rd parameter is 'strict', then the following error is thrown if colData.length does not equal to the number of data rows in the data set. (total rows - 1 for the header if present).

throw "CSV: addCol new columnn data length "+colOrFunc.length+" needed "+(rows.length-h)

This error will cause the rest of the middleware to be skipped and the go() or finalize() finalCallback will be called with e set to the error text

If the 3rd parameter is undefined, then the length check is not performed, and in the event of a length mismatch then elements of colData are cyclically reused in order to fill out the number of data rows required.

###.appendCol(colName, colDataFunction, needRowPropsBoolean). appends a new column to all rows

colName: If string, the name of the new column. This name is pushed to the header row.

colDataFunction: a function(i, row) that provides the column value for row i

needRowPropsBoolean:

if true, the row is supplied to colDataFunction as an object with keys equal to the header row names

if false, the row is supplied to colDataFunction as an array containing the values of the existing columns for that row.

###.table(divId,options,beginRowNumber,endRowNumber). creates an HTML table from the row data, first slicing on [beginRow, endRow)

divId is an id of the div where the table is to be created. If the divId is not found in the document, then it will be created and appended to the end of document.body

options is an object defining table options to the internal routine makeTable

Many of these options will take a callback function instead of a string literal.

options.table ---> HTML attributes supplied as <table attributes >

example: options.table = { border: 1 } sets standard HTML table border

options.caption ---> sets content for caption with <caption>content</caption>

options.header ---> if truthy, or options.thead is present, put row zero in header tags

options.thead ---> sets attributes for <thead> tag

options.theadtr ---> sets attributes for the single <tr> tag for the header

options.th --> sets attributes for the <th> tag

can be a literal or callback(i,j,v) where i is the row=0, j=col, v=value

options.tbody ---> sets attributes for <tbody> tag

options.tr ---> sets attributes for the <tr> tags for the data rows

can be a liteal or callback(i) where i is the row number

options.td ---> sets attributes for the <td> tags for data cells

can be a literal or callback(i,j,v)

options.cell --> if defined, overrides cell contents

if defined, must be a callback(i,j,v)

###.editor(divId,headerBool,beginRowNumber,endRowNumber,precallback,onCellCallback). create an editor

the editor is very basic and does not include adding or deleting rows

divID The id of an existing div to be emptied and filled by html5csv with the editor's HTML.

If the div does not exist, it will be created and appended to document.body

headerBool -- optional -- true if data.rows[0] should be treated as a header row

beginRowNumber -- optional -- the beginning row number of the editing window

If beginRowNumber is undefined then row 0 or 1 will be used depending on whether there is a header.

endRowNumber -- optional -- the end row number of the editing window

precallback -- optional -- function(tableOptions) -- callback with parameter equal to the table Options object. This object may be modified by the user supplied callback to change behaviors. After returning, tableOptions is passed to table internally for generating HTML to render the table.

onCell -- optional -- function(rowNum,colNum,newVal,shared.data) -- callback with parameters pertinent to a user editing event -- the table row and column number, the value entered by the user, and the previous shared.data object. This callback overrides the default behavior which would roughly correspond to shared.data.rows[rowNum][colNum] = newVal, therefore if provided the callback must arrange to update the shared.data.rows if updating the data for subsequent middleware is desirable.

###.jqplot(plotSpecArray, afterCallback).

requires loading jqplot -- usually multiple js + css files

create one or more plots. Calls optional afterCallback(plotsObject) after calling jqPlot

var plotSpecArray = [ [name1,pairs1,options1], ... ] // plotSpecArray is an array of triplets

name is a string giving the plot name, and is the key of the plotsObject returned to afterCallback

pairs1 is an Array of Pairs, as in

[['x','y'],['x','z']]

to put y vs x and z vs x on the same axes

options1 is an Object of jqplot options, passed directly to jqplot

jsFiddle Example of sine and cosine plot and data table

To learn how to make specific plots, like bar plots vs lines vs points, see jqPlot's documentation and examples.

###.ols(fitSpecArray).

requires loading the numeric.js file

Performs linear fitting via Ordinary Least Squares

fitSpecArray=[fit1,fit2,fit3] is an array of fits to perform

Each fit is a 3 element array fit=[fitname, fitdep, fitindep] where:

fitname is a string you choose to identify the fit

fitdep is a string giving the column name (from the header col) for the dependent variable data

The dependent variable is the one you are trying to analyze or predict. This is the "y" data.

fitindep is an array of strings, functions, or other identifiers for independent variables

The independent variables are the ones you think are useful as explanations.

The array fitindep is passed to shared.submatrix(null, fitindep) to create the independent variable matrix X for the Ordinary Least Squares calculations. If an array element is a string, the corresponding column is taken from the CSV data in shared.data.rows. If an array element is a number, a constant column is created (useful for modelling a constant term or intercept). If an array element is a function F, then F(i,j,rows[i]) is called, looping i=1...(L-1)

Each fit is evaluated independently, so that ordinary math errors in one fit (such as caused by singular matrices when there is autocollinearity among the independent variables) will not influence the execution of other fits.

Each fit places a fit result object in shared.data.fit, which can be accessed from call middleware or in the finalCallback(e,D) function of go/finalize as D.fit.

A successful fit result object for an individual fit looks like this:

  `{ name: fitname, dep: fitdep, indep: fitindep, beta: BETA}` 

where fitname, fitdep, and fitindep are the same as above and copied from the call to ols;

and BETA is an array with beta.length = fitindep.length where the j-th element corresponds to the j-th coefficient of a linear model multiplying the j-th column from fitindep

On error, beta is undefined or null and an error string is defined. Math errors in fit do not skip subsequent middleware or cause an error invocation of finalCallback. The primary math error is when two columns are identical, or close to being identical, so that the XtX matrix is not invertible. This is also called multicollearity or autocollinearity.

Example: There are examples and unit tests of ols of in the test, near the end of the file qtestcsv.js

Possible Future Enhancements: Residuals e^2 and Goodness of fit R^2 is straightforward to calculate once beta is known, but is not currently performed in the library. Confidence intervals on beta, such as provided by R or other stats software, can be dangerous in the hands of untrained people because the conditions under which they would be valid can be violated by heteroskedasticity or other deviations from the standard normal model of ordinary least squares. There are well known tests for some of these deviations.

###.pca(indep, options).

Perform Principal Component Analysis of selected columns. Calculation methodology is based on SVD and comparable with that in R's prcomp function.

Requires numeric.js to be loaded

indep -- an array of column names

options -- optional object

options.center: 1 if the data in each column should be demeaned

options.scale: 1 if the data in each column should be scaled to have unit variance

Note: if options.scale is 1 and options.center is set to 0 or undefined, a column mean of 0 will be assumed and used for calculating column variance irregardless of the empirical column mean

options.newcols: an array of new column names. Columns providing the PCs of the data will be column appended to the CSV data, in new columns using these names. If there are 3 names in options.newcols, the first 3 PCs will be appended. The actual names are up to the user.

side effects:

pca current creates a pca object in data with the following fields:

  • data.pca.indep: the arrary indep supplied to pca of input column names
  • data.pca.options: the options object supplied to pca
  • data.pca.C : calculated column centers
  • data.pca.V : calculated column variances
  • data.pca.adjX: data matrix after demean/scale adjustment
  • data.pca.svd: singular value decomposition of data matrix, from numeric.js

If you generate a CSV file with columns a,b,c you can try reading it into the PCA Example

###.call(customFunction). calls programmer defined customFunction(next) with this=shared and first parameter "next"

  • Important style request -- In your function's first line, please use: var shared = this;
  • The data rows are then in shared.data.rows
  • The meta data, if any, is in shared.data.meta
  • next(), the first parameter supplied to your customFunction, can be called to go to the next task.
    • If you forget to call next and just return, thats ok too.
    • If you need to defer the next task, e.g., for ajax, return 'defer'; from customFunction
  • To skip future middleware on an error, simply throw "Oh No! My detailed error message goes here";

If you refine a call customFunction enough for reuse, it can be set up as an extension very easily.

defining extensions

CSV.extend(middlewareExtensions, sharedExtensions);

A simple middleware extension to numerically difference rows

var extension = {
    'diff':  function(){
        var shared = this; // pick up shared object from this, will be set internally by func.apply
        // the rows of data are in shared.data.rows
        // this code assumes shared.data.rows[0] is a header row, and later rows are numeric data  
        var newrows = [shared.data.rows[0].slice(0)]; // copy old header row
        var i,l,j,k,newrow,r0,r1;
        for(i=2,l=shared.data.rows.length;i<l;++i){
            newrow = [];
            r0 = shared.data.rows[i-1];
            r1 = shared.data.rows[i];
            for(j=0,k=r0.length;j<k;++j) newrow[j]=r1[j]-r0[j]
            newrows.push(newrow);
        }
        shared.data.rows = newrows;
    }
};

CSV.extend(extension); // attach to list of known middleware

CSV.begin(datasource)...diff()....go();  // diff should now work
Clone this wiki locally