Skip to content

tt crud import

Albert Skalt edited this page Sep 20, 2023 · 2 revisions

Welcome to the tt crud import wiki that describes the current functionality and limitations of this implementation.

  • tt crud - module for interact with the CRUD module of tarantool.
  • tt crud import - subcommand for import data into tarantool via CRUD.

Current type restrictions

The following types of target space are supported:

  • boolean (values [true, t] in fields of input data are interpreted as true, if format field require bool; for false - values [false, f])
  • string
  • integer
  • unsigned
  • double (unstable, see issue <https://github.com/tarantool/crud/issues/298>_)
  • number
  • decimal
  • any
  • scalar
  • datetime
  • interval
  • uuid
  • array
  • map

Before importing a record into a space, the fields values of record are converted according to the format of this space.

CLI description

USAGE

tt crud import URI FILE SPACE [flags]

  • URI - address of router.
  • FILE - file with input data for import.
  • SPACE - target space name to import data.

FLAGS

  • --batch-size

Crud batch size during import.

  • --error

Name of file with rows that were not imported. Overwrite existed file.

  • --format

Format of input data. Currently only "csv" is supported.

  • --header

First line is a header, not a data.

  • -h, --help

Help for import.

  • --match

Use correspondence between fields in the input data and fields in the target space. Specify this option as

and option header as for using header as matching scheme during import.If there are fields in the space format that are not specified in the header, an attempt will be made to insert null into them. If there are fields in the header that are not specified in the space format, they will be ignored. Also you can set a manual match, for example: <spaceFieldFoo=csvFieldFoo:spaceFieldBar=csvFieldBar:...:...>, where FieldFoo can be numeric (position) or string (string only when option header as ).
  • --null

Sets value to be interpreted as NULL when importing. By default, an empty value. Example for csv: field1val,,field3val, where field2val will be taken as NULL.

  • --on-error

If error occurs, either skips the problematic line and goes on or stops work. Allows values or . Errors at the level of the duplicate primary key can be handled separately via --on-exist option.

  • --on-exist

Defines action when error of duplicate primary key occurs. Allows values , or . All other errors are handled separately via --on-error option.

  • -p, --password

Connection password.

  • --progress

Name of progress file. If there is a file with the specified name, it will be taken into account when importing. At each launch, the content of the progress file with specified name is completely overwritten. If the file with the specified name does not exist, a progress file will be created with the results of this run. If the option is not set, then this mechanism is not used.

  • --success

Name of file with rows that were imported. Overwrite existed file.

  • -u, --username

Connection username.

  • --th-sep

The string of symbols that will be defined as thousand separators for numeric data.

  • --dec-sep

The string of symbols that will be defined as decimal separators for numeric data.

  • --quote

The symbol that will be defined as quote in CSV.

  • --dump

Set the default name for error, success, progress files.

Demonstration

For demonstration, you can use the file playground.lua from crud repository. Please note that for work crud import, you need crud with batching (need crud.insert_many(...), crud.replace_many(...), etc) and vshard (vshard as playground.lua requirement).

Start playground:

$ git clone https://github.com/tarantool/crud.git
$ cd crud
$ tarantoolctl rocks make
$ ./doc/playground.lua

Check playground via tt connect:

$ tt connect localhost:3301
# localhost:3301> crud.select('developers')
---
- metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
    {'name': 'name', 'type': 'string'}, {'type': 'string', 'name': 'surname', 'is_nullable': true},
    {'type': 'number', 'name': 'age', 'is_nullable': false}]
  rows:
  - [1, 477, 'Alexey', 'Adams', 20]
  - [2, 401, 'Sergey', 'Allred', 21]
  - [3, 2804, 'Pavel', 'Adams', 27]
  - [4, 1161, 'Mikhail', 'Liston', 51]
  - [5, 1172, 'Dmitry', 'Jacobi', 16]
  - [6, 1064, 'Alexey', 'Sidorov', 31]
- null
...

Input file (2-ed record has duplicated key problem, 1-st and 3-rd are ok):

$ cat developers.csv
id,bucket_id,name,surname,age
7,700,"Ned",Flanders,35
7,900,Marge,Simpson,33
8,800,Homer,"Simpson",40

Run import:


$ tt crud import localhost:3301 ./developers.csv developers --dump --username=guest --header --match=header --on-error=stop --batch-size=1

 • Running crud import...
 • PID: [182849]
 ⨯ timestamp: 2023-09-07 23:04:52.558754084 +0300 MSK m=+0.019168424
    index:     3
    record:    7,900,Marge,Simpson,33
    error:     CallError: Failed for 0b29f665-1082-4ad8-bbfd-720968ffa649: Function returned an error: Duplicate key exists in unique index "primary_index" in space "developers" with old tuple - [7, 700, "Ned", "Flanders", 35] and new tuple - [7, 900, "Marge", "Simpson", 33]

 • Summary:
   total read:        2
   skipped:           0
   parsed success:    2
   parsed error:      0
   import success:    1
   import error:      1

  ⨯ CallError: Failed for 0b29f665-1082-4ad8-bbfd-720968ffa649: Function returned an error: Duplicate key exists in unique index "primary_index" in space "developers" with old tuple - [7, 700, "Ned", "Flanders", 35] and new tuple - [7, 900, "Marge", "Simpson", 33]

Success file:

$ cat tt_import.success
7,700,Ned,Flanders,35

Error file:

$ cat tt_import.error
7,900,Marge,Simpson,33

Progress file:

$ cat tt_import_progress.yml
delimiter: ','
quote: '"'
thousand_separators: ' `'
decimal_separators: .,
null_value: ""
match: header
with_header: true
batch_size: 1
on_error: stop
on_exist: stop
rollback_on_error: false
start_time: 2023-09-07T23:04:52.545469378+03:00
position: 3
retry_positions:
  - 3

Let's correct input file. Now 2-en record has not duplicated key problem:

$ cat developers.csv
id,bucket_id,name,surname,age
7,700,"Ned",Flanders,35
9,900,Marge,Simpson,33
8,800,Homer,"Simpson",40

So let's continue working from the stop point, repeating the problem lines before it. Use --progress option for it.

$ tt crud import localhost:3301 ./developers.csv developers --dump --username=guest --header --match=header --on-error=stop --batch-size=1

 • Running crud import...
 • PID: [184045]
 • Progress has been restored from tt_import_progress.yml
 • Summary:
   total read:        3
   skipped:           1
   parsed success:    2
   parsed error:      0
   import success:    2
   import error:      0

Now, you can make sure that all records (see id = 7,8,9) are imported via tt connect:

# localhost:3301> crud.select('developers')
---
- metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
    {'name': 'name', 'type': 'string'}, {'type': 'string', 'name': 'surname', 'is_nullable': true},
    {'type': 'number', 'name': 'age', 'is_nullable': false}]
  rows:
  - [1, 477, 'Alexey', 'Adams', 20]
  - [2, 401, 'Sergey', 'Allred', 21]
  - [3, 2804, 'Pavel', 'Adams', 27]
  - [4, 1161, 'Mikhail', 'Liston', 51]
  - [5, 1172, 'Dmitry', 'Jacobi', 16]
  - [6, 1064, 'Alexey', 'Sidorov', 31]
  - [7, 700, 'Ned', 'Flanders', 35]
  - [8, 800, 'Homer', 'Simpson', 40]
  - [9, 900, 'Marge', 'Simpson', 33]
- null
...
Clone this wiki locally