Skip to content

Commit

Permalink
Merge pull request #2194 from mneedham/parquet-local-file
Browse files Browse the repository at this point in the history
explain how to ingest local file
  • Loading branch information
pjhampton authored Apr 3, 2024
2 parents 2553ab0 + 13ea3e7 commit 76cca4d
Showing 1 changed file with 22 additions and 11 deletions.
33 changes: 22 additions & 11 deletions docs/en/integrations/data-ingestion/data-formats/parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,23 @@ slug: /en/integrations/data-formats/parquet
# Working with Parquet in ClickHouse

Parquet is an efficient file format to store data in a column-oriented way.
ClickHouse provides support for both reading and writing Parquet files.

:::tip
When you reference a file path in a query, where ClickHouse attempts to read from will depend on the variant of ClickHouse that you're using.

If you're using [`clickhouse-local`](/docs/en/operations/utilities/clickhouse-local.md) it will read from a location relative to where you launched ClickHouse Local.
If you're using ClickHouse Server or ClickHouse Cloud via `clickhouse client`, it will read from a location relative to the `/var/lib/clickhouse/user_files/` directory on the server.
:::

## Importing from Parquet

Before loading data, we can use [file()](/docs/en/sql-reference/functions/files.md/#file) function to explore an [example parquet file](assets/data.parquet) structure:

```sql
DESCRIBE TABLE file('data.parquet', Parquet)
DESCRIBE TABLE file('data.parquet', Parquet);
```

:::tip
When using the `file()` function, with ClickHouse Cloud you will need to run the commands in `clickhouse client` on the machine where the file resides. Another option is to use [`clickhouse-local`](/docs/en/operations/utilities/clickhouse-local.md) to explore files locally.
:::

We've used [Parquet](/docs/en/interfaces/formats.md/#data-format-parquet) as a second argument, so ClickHouse knows the file format. This will print columns with the types:

```response
Expand All @@ -35,7 +39,7 @@ We can also explore files before actually importing data using all power of SQL:
```sql
SELECT *
FROM file('data.parquet', Parquet)
LIMIT 3
LIMIT 3;
```
```response
┌─path──────────────────────┬─date───────┬─hits─┐
Expand All @@ -52,7 +56,7 @@ In that case, ClickHouse will automatically detect format based on file extensio

## Importing to an existing table

Let's create a table to import parquet data to:
Let's create a table into which we'll import Parquet data:

```sql
CREATE TABLE sometable
Expand All @@ -62,10 +66,10 @@ CREATE TABLE sometable
`hits` UInt32
)
ENGINE = MergeTree
ORDER BY (date, path)
ORDER BY (date, path);
```

Now we can import data using a `FROM INFILE` clause:
Now we can import data using the `FROM INFILE` clause:


```sql
Expand All @@ -86,10 +90,17 @@ LIMIT 5;
└───────────────────────────────┴────────────┴──────┘
```

Note how ClickHouse automatically converted parquet strings (in the `date` column) to the `Date` type. This is because ClickHouse does a typecast automatically based on the types in the target table.
Note how ClickHouse automatically converted Parquet strings (in the `date` column) to the `Date` type. This is because ClickHouse does a typecast automatically based on the types in the target table.

## Inserting a local file to remote server

If you want to insert a local Parquet file to a remote ClickHouse server, you can do this by piping the contents of the file into `clickhouse-client`, as shown below:

```sql
clickhouse client -q "INSERT INTO sometable FORMAT Parquet" < data.parquet
```

## Creating new tables from parquet files
## Creating new tables from Parquet files

Since ClickHouse reads parquet file schema, we can create tables on the fly:

Expand Down

0 comments on commit 76cca4d

Please sign in to comment.