GitHub - wssheldon/pond: Distributed analytical query processing on AWS Lambda using DuckDB and Apache Arrow

Pond is a serverless distributed query engine that runs DuckDB on AWS Lambda, enabling scalable analytical query processing across multiple Lambda functions. Built with Rust and leveraging Apache Arrow for efficient data interchange, this project brings the power of DuckDB's analytical capabilities to serverless architectures.

Motiviation

Inspired by projects like MotherDuck, which connects DuckDB to cloud resources, DuckDB Lambda aims to provide a serverless approach to distributed analytical query processing. By utilizing AWS Lambda, we can offer a scalable solution that can handle varying workloads without the need for always-on infrastructure.

Deployment

cargo lambda build --release

cargo lambda deploy

Development

Local Testing

cargo lambda watch

cargo lambda invoke pond-planner --data-ascii '{"query": "SELECT COUNT(*) FROM read_parquet('\''https://shell.duckdb.org/data/tpch/0_01/parquet/customer.parquet'\'') WHERE c_name LIKE '\''Customer%'\'' GROUP BY customer "}' --output-format json

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
distribution/lambda		distribution/lambda
pond		pond
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motiviation

Deployment

Development

Local Testing

About

Releases

Packages

Languages

License

wssheldon/pond

Folders and files

Latest commit

History

Repository files navigation

Motiviation

Deployment

Development

Local Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages