-
Notifications
You must be signed in to change notification settings - Fork 430
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Ion Koutsouris <[email protected]>
- Loading branch information
1 parent
db2d1e0
commit b6cf0fe
Showing
2 changed files
with
54 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# LakeFS | ||
`delta-rs` offers native support for using LakeFS as an object storage backend. Each | ||
deltalake operation is executed in a transaction branch and safely merged into your source branch. | ||
|
||
You don’t need to install any extra dependencies to read/write Delta tables to LakeFS with engines that use `delta-rs`. You do need to configure your LakeFS access credentials correctly. | ||
|
||
## Passing LakeFS Credentials | ||
|
||
You can pass your LakeFS credentials explicitly by using: | ||
|
||
- the `storage_options `kwarg | ||
- Environment variables | ||
|
||
## Example | ||
|
||
Let's work through an example with Polars. The same logic applies to other Python engines like Pandas, Daft, Dask, etc. | ||
|
||
Follow the steps below to use Delta Lake on LakeFS with Polars: | ||
|
||
1. Install Polars and deltalake. For example, using: | ||
|
||
`pip install polars deltalake` | ||
|
||
2. Create a dataframe with some toy data. | ||
|
||
`df = pl.DataFrame({'x': [1, 2, 3]})` | ||
|
||
3. Set your `storage_options` correctly. | ||
|
||
```python | ||
storage_options = { | ||
"endpoint": "https://mylakefs.intranet.com", # LakeFS endpoint | ||
"access_key_id": "LAKEFSID", | ||
"secret_access_key": "LAKEFSKEY", | ||
} | ||
``` | ||
|
||
4. Write data to Delta table using the `storage_options` kwarg. The subpath after the bucket is always the branch you want to write into. | ||
|
||
```python | ||
df.write_delta( | ||
"lakefs://bucket/branch/table", | ||
storage_options=storage_options, | ||
) | ||
``` | ||
|
||
## Cleaning up failed transaction branches | ||
|
||
It might occur that a deltalake operation fails midway. At this point a lakefs transaction branch was created, but never destroyed. The branches are hidden in the UI, but each branch starts with `delta-tx`. | ||
|
||
With the lakefs python library you can list these branches and delete stale ones. | ||
|
||
<TODO add example here> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters