-
Notifications
You must be signed in to change notification settings - Fork 12
tt crud import
Welcome to the tt crud import
wiki that describes the current functionality and limitations of this implementation.
-
tt crud
- module for interact with the CRUD module of tarantool. -
tt crud import
- subcommand for import data into tarantool via CRUD.
The following types of target space are supported:
- boolean (values [
true
,t
] in fields of input data are interpreted astrue
, if format field require bool; forfalse
- values [false
,f
]) - string
- integer
- unsigned
- double (unstable, see
issue <https://github.com/tarantool/crud/issues/298>
_) - number
- decimal
- any
- scalar
- datetime
- interval
- uuid
- array
- map
Before importing a record into a space, the fields values of record are converted according to the format of this space.
tt crud import
URI FILE SPACE [flags]
-
URI
- address of router. -
FILE
- file with input data for import. -
SPACE
- target space name to import data.
--batch-size
Crud batch size during import.
--error
Name of file with rows that were not imported. Overwrite existed file.
--format
Format of input data. Currently only "csv" is supported.
--header
First line is a header, not a data.
-
-h
,--help
Help for import.
--match
Use correspondence between fields in the input data and fields in the target space. Specify this option as
and option header as for using header as matching scheme during import.If there are fields in the space format that are not specified in the header, an attempt will be made to insert null into them. If there are fields in the header that are not specified in the space format, they will be ignored. Also you can set a manual match, for example: <spaceFieldFoo=csvFieldFoo:spaceFieldBar=csvFieldBar:...:...>, where FieldFoo can be numeric (position) or string (string only when option header as ).--null
Sets value to be interpreted as NULL when importing. By default, an empty value. Example for csv: field1val,,field3val, where field2val will be taken as NULL.
--on-error
If error occurs, either skips the problematic line and goes on or stops work. Allows values or . Errors at the level of the duplicate primary key can be handled separately via --on-exist option.
--on-exist
Defines action when error of duplicate primary key occurs. Allows values , or . All other errors are handled separately via --on-error option.
-
-p
,--password
Connection password.
--progress
Name of progress file. If there is a file with the specified name, it will be taken into account when importing. At each launch, the content of the progress file with specified name is completely overwritten. If the file with the specified name does not exist, a progress file will be created with the results of this run. If the option is not set, then this mechanism is not used.
--success
Name of file with rows that were imported. Overwrite existed file.
-
-u
,--username
Connection username.
--th-sep
The string of symbols that will be defined as thousand separators for numeric data.
--dec-sep
The string of symbols that will be defined as decimal separators for numeric data.
--quote
The symbol that will be defined as quote in CSV.
--dump
Set the default name for error, success, progress files.
For demonstration, you can use the file playground.lua from crud repository.
Please note that for work crud import
, you need crud
with batching
(need crud.insert_many(...), crud.replace_many(...), etc) and vshard
(vshard
as playground.lua requirement).
Start playground:
$ git clone https://github.com/tarantool/crud.git
$ cd crud
$ tarantoolctl rocks make
$ ./doc/playground.lua
Check playground via tt connect
:
$ tt connect localhost:3301
# localhost:3301> crud.select('developers')
---
- metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
{'name': 'name', 'type': 'string'}, {'type': 'string', 'name': 'surname', 'is_nullable': true},
{'type': 'number', 'name': 'age', 'is_nullable': false}]
rows:
- [1, 477, 'Alexey', 'Adams', 20]
- [2, 401, 'Sergey', 'Allred', 21]
- [3, 2804, 'Pavel', 'Adams', 27]
- [4, 1161, 'Mikhail', 'Liston', 51]
- [5, 1172, 'Dmitry', 'Jacobi', 16]
- [6, 1064, 'Alexey', 'Sidorov', 31]
- null
...
Input file (2-ed record has duplicated key problem, 1-st and 3-rd are ok):
$ cat developers.csv
id,bucket_id,name,surname,age
7,700,"Ned",Flanders,35
7,900,Marge,Simpson,33
8,800,Homer,"Simpson",40
Run import:
$ tt crud import localhost:3301 ./developers.csv developers --dump --username=guest --header --match=header --on-error=stop --batch-size=1
• Running crud import...
• PID: [182849]
⨯ timestamp: 2023-09-07 23:04:52.558754084 +0300 MSK m=+0.019168424
index: 3
record: 7,900,Marge,Simpson,33
error: CallError: Failed for 0b29f665-1082-4ad8-bbfd-720968ffa649: Function returned an error: Duplicate key exists in unique index "primary_index" in space "developers" with old tuple - [7, 700, "Ned", "Flanders", 35] and new tuple - [7, 900, "Marge", "Simpson", 33]
• Summary:
total read: 2
skipped: 0
parsed success: 2
parsed error: 0
import success: 1
import error: 1
⨯ CallError: Failed for 0b29f665-1082-4ad8-bbfd-720968ffa649: Function returned an error: Duplicate key exists in unique index "primary_index" in space "developers" with old tuple - [7, 700, "Ned", "Flanders", 35] and new tuple - [7, 900, "Marge", "Simpson", 33]
Success file:
$ cat tt_import.success
7,700,Ned,Flanders,35
Error file:
$ cat tt_import.error
7,900,Marge,Simpson,33
Progress file:
$ cat tt_import_progress.yml
delimiter: ','
quote: '"'
thousand_separators: ' `'
decimal_separators: .,
null_value: ""
match: header
with_header: true
batch_size: 1
on_error: stop
on_exist: stop
rollback_on_error: false
start_time: 2023-09-07T23:04:52.545469378+03:00
position: 3
retry_positions:
- 3
Let's correct input file. Now 2-en record has not duplicated key problem:
$ cat developers.csv
id,bucket_id,name,surname,age
7,700,"Ned",Flanders,35
9,900,Marge,Simpson,33
8,800,Homer,"Simpson",40
So let's continue working from the stop point, repeating the problem lines before it. Use --progress
option for it.
$ tt crud import localhost:3301 ./developers.csv developers --dump --username=guest --header --match=header --on-error=stop --batch-size=1
• Running crud import...
• PID: [184045]
• Progress has been restored from tt_import_progress.yml
• Summary:
total read: 3
skipped: 1
parsed success: 2
parsed error: 0
import success: 2
import error: 0
Now, you can make sure that all records (see id = 7,8,9) are imported via tt connect
:
# localhost:3301> crud.select('developers')
---
- metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
{'name': 'name', 'type': 'string'}, {'type': 'string', 'name': 'surname', 'is_nullable': true},
{'type': 'number', 'name': 'age', 'is_nullable': false}]
rows:
- [1, 477, 'Alexey', 'Adams', 20]
- [2, 401, 'Sergey', 'Allred', 21]
- [3, 2804, 'Pavel', 'Adams', 27]
- [4, 1161, 'Mikhail', 'Liston', 51]
- [5, 1172, 'Dmitry', 'Jacobi', 16]
- [6, 1064, 'Alexey', 'Sidorov', 31]
- [7, 700, 'Ned', 'Flanders', 35]
- [8, 800, 'Homer', 'Simpson', 40]
- [9, 900, 'Marge', 'Simpson', 33]
- null
...