Skip to content
/ pond Public

Distributed analytical query processing on AWS Lambda using DuckDB and Apache Arrow

License

Notifications You must be signed in to change notification settings

wssheldon/pond

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pond is a serverless distributed query engine that runs DuckDB on AWS Lambda, enabling scalable analytical query processing across multiple Lambda functions. Built with Rust and leveraging Apache Arrow for efficient data interchange, this project brings the power of DuckDB's analytical capabilities to serverless architectures.

Motiviation

Inspired by projects like MotherDuck, which connects DuckDB to cloud resources, DuckDB Lambda aims to provide a serverless approach to distributed analytical query processing. By utilizing AWS Lambda, we can offer a scalable solution that can handle varying workloads without the need for always-on infrastructure.

Deployment

cargo lambda build --release
cargo lambda deploy

Development

Local Testing

cargo lambda watch
cargo lambda invoke pond-planner --data-ascii '{"query": "SELECT COUNT(*) FROM read_parquet('\''https://shell.duckdb.org/data/tpch/0_01/parquet/customer.parquet'\'') WHERE c_name LIKE '\''Customer%'\'' GROUP BY customer "}' --output-format json

About

Distributed analytical query processing on AWS Lambda using DuckDB and Apache Arrow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published