What a great time to work in Distributed Systems!
- The Google File System, SOSP, 2003
- MapReduce: Simplified Data Processing on Large Clusters, OSDI, 2004
- Bigtable: A Distributed Storage System for Structured Data, OSDI, 2006
- The Chubby lock service for loosely-coupled distributed systems, OSDI, 2006
- Dremel: Interactive Analysis of Web-Scale Datasets, VLDB, 2010
- Pregel: a system for large-scale graph processing, SIGMOD, 2010
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services, CIDR, 2011
- F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business, SIGMOD, 2012
- Spanner: Google¡¯s Globally-Distributed Database, OSDI, 2012
- F1: A Distributed SQL Database That Scales, VLDB, 2013
- Online, Asynchronous Schema Change in F1, VLDB, 2013
- Large-scale cluster management at Google with Borg, EuroSys, 2015
- Spanner: Becoming a SQL System, SIGMOD, 2017
- F1 Query: Declarative Querying at Scale, VLDB, 2018
- Cassandra - A Decentralized Structured Storage System, ACM SIGOPS, 2009
- Finding a needle in Haystack: Facebook¡¯s photo storage, OSDI, 2010
- Scaling Memcache at Facebook, NSDI, 2013
- TAO: Facebook¡¯s Distributed Data Store for the Social Graph, USENIX ATC, 2013
- f4: Facebook¡¯s Warm BLOB Storage System, OSDI, 2014
- MyRocks: LSM-Tree Database Storage Engine Serving Facebook's Social Graph, VLDB, 2020
- Dynamo: Amazon¡¯s Highly Available Key-value Store, SOSP, 2007
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases, SIGMOD, 2017
- Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes, SIGMOD, 2018
- Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency, SOSP, 2011
- Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation, VLDB, 2021
- FoundationDB Record Layer: A Multi-Tenant Structured Datastore , SIGMOD, 2019
- FoundationDB: A Distributed Unbundled Transactional Key Value Store, SIGMOD, 2021
- PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database, VLDB, 2018
- Cloud-Native Database Systems at Alibaba: Opportunities and Challenges, VLDB, 2019
- POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database, FAST, 2020
- PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers, SIGMOD, 2021
- The Snowflake Elastic Data Warehouse, SIGMOD, 2016
- The Part-Time Parliament, TOCS, 1998
- Paxos Made Simple, SIGACT News, 2001
- Fast Paxos, Distributed Computing, 2006
- Paxos made live - An engineering perspective, PODC, 2007
- There Is More Consensus in Egalitarian Parliaments, SOSP, 2013
- CASPaxos: Replicated State Machines without logs, 2018
- In Search of an Understandable Consensus Algorithm, USENIX ATC, 2014
- CRaft: An Erasure-coding-supported Version of Raft for Reducing Storage Cost and Network Cost, FAST, 2020
- A simple totally ordered broadcast protocol, LADIS, 2008
- ZooKeeper: Wait-free coordination for Internet-scale systems, USENIX ATC, 2010
- Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems, 1998
- Viewstamped Replication Revisited, 2012
- Astrolabe: A Robust and Scalable Technology For Distributed Systems Monitoring, Management, and Data Mining, ACM TOCS, 2003
- SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol, DSN, 2002
- Distributed consensus revised, Heidi Howard, 2019
- Exploiting Commutativity For Practical Fast Replication, NSDI, 2019
- Strong and Efficient Consistency with Consistency-Aware Durability, FAST, 2020
- Time, Clocks and the Ordering of Events in a Distributed System, 2007
- Large-scale Incremental Processing Using Distributed Transactions and Notifications, OSDI, 2010
- Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases, 2014
- Ceph: A Scalable, High-Performance Distributed File System, OSDI, 2006
- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data, ACM SC, 2006
- PacificA: Replication in Log-Based Distributed Storage Systems, 2008
- WiscKey: Separating Keys from Values in SSD-conscious Storage, FAST, 2016
- PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees, SOSP, 2017
- SLM-DB: Single-Level Key-Value Store with Persistent Memory, FAST, 2019
- From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees, OSDI, 2020
- Succinct: Enabling Queries on Compressed Data, NSDI, 2015
- SuRF: Practical Range Query Filtering with Fast Succinct Tries, SIGMOD, 2018
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI, 2012
- S4: Distributed Stream Computing Platform, ICDMW, 2010
- Apache Flink?: Stream and Batch Processing in a Single Engine, 2015
- Scaling Distributed Machine Learning with the Parameter Server, OSDI, 2014
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, 2015
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, 2018
- PyTorch: An Imperative Style, High-Performance Deep Learning Library, NeurIPS, 2019
- The Case for Learned Index Structures, SIGMOD, 2018
- SageDB: A Learned Database System, CIDR, 2019
- F1 Lightning: HTAP as a Service, VLDB, 2020
- TiDB: A Raft-based HTAP Database, VLDB, 2020
- CockroachDB: The Resilient Geo-Distributed SQL Database, SIGMOD, 2020
- Greenplum: A Hybrid Database for Transactional and Analytical Workloads, SIGMOD, 2021
- FaRM: Fast Remote Memory, NSDI, 2014
- Accelerating Relational Databases by Leveraging Remote Memory and RDMA, SIGMOD, 2016
- Revisiting the Design of LSM-tree Based OLTP Storage Engine with Persistent Memory, VLDB, 2021
- Viper: An Efficient Hybrid PMem-DRAM Key-Value Store, VLDB, 2021
- Chord: A Scalable Peer-to-peer Lookup Service for Interne Applications, SIGCOMM, 2001
- Bitcoin: A Peer-to-Peer Electronic Cash System, 2008
- Consistent Hashing and Random Trees
- Dynamic-sized nonblocking hash tables
- Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
- Correct and Efficient Work-Stealing for Weak Memory Models
- Implementing Lock-Free Queues
- Designing Data-Intensive Applications
- Distributed Systems for Fun and Profit
- Distributed Systems: Principles and Paradigms
- Principles of Distributed Computing
- Scalable Web Architecture and Distributed Systems
- Raft lecture (Raft user study)
- Paxos lecture (Raft user study)
- Designing for Understandability: The Raft Consensus Algorithm
- OSDI12 - Spanner: Google¡¯s Globally-Distributed Database
- Tim Kraska, MIT, The Case for Learned Index Structures
- 6.824: Distributed Systems
- 6.852: Distributed Algorithms
- 6.826: Principles of Computer Systems
- 15-712 Advanced Operating Systems and Distributed Systems
- TiDB, an open source distributed HTAP database compatible with the MySQL protocol.
- OceanBase, an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.
- CockroachDB, the open source, cloud-native distributed SQL database.
- YugabyteDB, the high-performance distributed SQL database for global, internet-scale apps.
- VoltDB, a horizontally-scalable, in-memory SQL RDBMS designed for applications that benefit from strong consistency, high throughput and low, predictable latency.
- FoundationDB, the open source, distributed, transactional key-value store.
- Cassandra, a highly-scalable partitioned row store.
- Scylladb, NoSQL data store using the seastar framework, compatible with Apache Cassandra.
- Zookeeper, an open-source server which enables highly reliable distributed coordination.
- Etcd, distributed reliable key-value store for the most critical data of a distributed system.
- Phxpaxos, the Paxos library implemented in C++ that has been used in the WeChat production environment.
- ClickHouse, an open-source column-oriented database management system that allows generating analytical data reports in real time.
- Apache Doris, an MPP-based interactive SQL data warehousing for reporting and analysis. Its original name was Palo, developed in Baidu. After donated to Apache Software Foundation, it was renamed Doris.
- StarRocks, a next-gen sub-second MPP database for full analysis scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
- TensorFlow, An Open Source Machine Learning Framework for Everyone.
- PyTorch, Tensors and Dynamic neural networks in Python with strong GPU acceleration.