This is a library for parsing simple *.CSV files.
The library does not fully comply to rfc4180 because we do not support quoted values.
The main goal is importing *.CSV data to the SQLite database with the minimal memory footprint.
License : BSD
The library performs importing of CSV files to SQLite tables. We made our best to make it use as little memory as possible.
- Clear Objective-C interface - the user sees only the facade class in Objective-C. All the tricks are
- Low memory consuption - C++ iostreams are used to avoid memory warnings since datasets may be fairly large
- Multiple line endings support - both Windows ( CR LF ) and Unix (LF ) line endings are processed correctly
- SQL schema validation - CSV column names are parsed and user specified types are assigned to columns. In case of column count or name mismatch an error is produced
- Time performance optimizations - file IO (FS bound) and parsing operations (CPU bound) are performed on multiple threads and use the producer-consumer model.
- Non standard comments - the CSV content may be preceeded by some
The library does not fully comply to rfc4180 for speed and simplicity.
We do not support quoted values.
Here is an example of CSV importer usage :
-(void)testImportWithInvalidDefauls
{
NSString* csvPath_ = [ [ NSBundle bundleForClass: [ self class ] ] pathForResource: @"UnixTest3"
ofType: @"csv" ];
NSString* fullDatabasePath = @"1.sqlite";
NSDictionary* schema_ = @{
@"Date" : @"DATETIME",
@"Integer" : @"INTEGER",
@"Name" : @"VARCHAR",
@"Id" : @"VARCHAR",
@"TypeId" : @"INTEGER"
};
NSOrderedSet* primaryKey_ = [ NSOrderedSet orderedSetWithObjects: @"Date", @"Id", @"TypeId", nil ];
CsvDefaultValues* defaults_ = [ CsvDefaultValues new ];
[ defaults_ addDefaultValue: @""
forColumn: @"Name" ];
[ defaults_ addDefaultValue: @"10"
forColumn: @"TypeId" ];
CsvToSqlite* converter_ = [ [ CsvToSqlite alloc ] initWithDatabaseName: fullDatabasePath
dataFileName: csvPath_
databaseSchema: schema_
primaryKey: primaryKey_
defaultValues: defaults_
separatorChar: ';'
commentChar: '#'
lineReader: [ UnixLineReader new ]
dbWrapperClass: [ FMDatabase class ] ];
converter_.csvDateFormat = @"yyyyMMdd";
NSError* error_;
[ converter_ storeDataInTable: @"Campaigns"
error: &error_ ];
XCTAssertNotNil( error_, @"Unexpected error" );
}
Columns parsing is the largest bottleneck for this implementation of the CSV importer. For some datasets this step may be skipped.
In order to implement this, the original CSV line
Date , Id, Visits
2014-01-01, 10, 100500
is converted to the query below:
INSERT INTO [TrafficStats] - SQL Insert statement added
( Date, Id, Visits ) - Brackets added
VALUES - SQL keyword added
( '2014-01-01', '10', '100500' ) - Brackets and quotes added
- The dataset does not contain any dates.
- The dataset contains dates in ANSI format (yyyy-MM-dd) or any other format supported : by SQLite http://www.sqlite.org/lang_datefunc.html.
- The dataset contains dates in the
yyyyMMdd
format
We have measured the benchmarks for iPad2 which was the oldest and slowest model I could test against. We have measured the entire process of importing including
- database scheme creation
- CSV data parsing
- Inserting parsed data to the database
The numbers in the tables below are an average of 10 launches. Feel free doing any further research using the dodikk/CsvToSqlite-Profiling repository.
Date Format | Format Comment | Time |
---|---|---|
yyyy-MM-dd | ANSI format | 11 sec |
yyyyMMdd | "Compact" ANSI | 17 sec |
other | Using NSDateFormatter | 28 sec |
Date Format | Format Comment | Time |
---|---|---|
yyyy-MM-dd | ANSI format | 20 sec |
yyyyMMdd | "Compact" ANSI | 32 sec |
other | Using NSDateFormatter | 40 sec |
- dodikk / ObjcScopedGuard https://github.com/dodikk/ObjcScopedGuard.git
- dodikk / ESLocale https://github.com/dodikk/ESLocale.git
- ccgus / fmdb https://github.com/ccgus/fmdb.git
- dodikk / ESDatabaseWrapper https://github.com/dodikk/ESDatabaseWrapper.git
The recommended approach is using sub-projects. However, cocoapods users are welcome to enter the pod install CsvToSqlite
command
Make the library rfc4180 compliant. Start using davedelong / CHCSVParser for better CSV handling