Skip to content

Commit

Permalink
docs: general clean-up and update
Browse files Browse the repository at this point in the history
  • Loading branch information
hopeyen committed Mar 22, 2024
1 parent eee61a6 commit 7177e24
Show file tree
Hide file tree
Showing 10 changed files with 146 additions and 130 deletions.
82 changes: 44 additions & 38 deletions docs/client_guide.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,67 @@
# File Sharing Client
# File Downloader

Tired of data drudgery? Imagine a world where you skip the tedious task of indexing data and start serving queries in a flash! That's the magic of our file sharing network. Instead of spending hours to months catching up to the head of a chain, you can tap into a vast pool of indexed information, ready and waiting for your queries.

This document provides an overview of the file sharing client.
This document provides an overview of the file download client.

## Functionality

The client facilitates data retrieval from FHS. The client implements:
The client facilitates data retrieval from FHS, handles several functions such as

- File retrieval: Download entire files or specific chunks at a time.
- Payment: Pay for file access using tokens deposited in an Escrow account on-chain.
- Free access: Utilize a free query auth token for limited access to particular servers.
- Indexer selection: Choose from multiple indexer endpoints; Currently just for availability, later add optimization for performance and redundancy.
- File retrieval: Download files from file servers and store in local filesystem or remote object storage.
- Payment: Pay for file access using tokens deposited in on-chain Escrow accounts; alternatively, they can acquire free query auth token for accessing particular servers.
- Indexer selection: Choose from multiple indexer endpoints based on availability and price, with configurable redundancy.

## Minimal Trust Requirement

To minimize trust requirements, the client employs a chunk-based payment system. Instead of paying for the entire file upfront, users can pay for individual chunks as they download and verify them. This ensures transparency and reduces the risk of losing funds due to server downtime or malicious actors.
To minimize trust requirements, the client employs a chunk-based payment system. Instead of paying for the entire file upfront, users can pay for individual chunks as they download and verify them. This ensures transparency and reduces the risk of losing funds due to server downtime or malicious actors. For detailed description, refer to documentation on [manifest](./manifest.md)

## Limitations

## CLI Usage
1. Consumers must gather a set of file service endpoints. At the current stage of the protocol, they will gather the list off-chain, either through private exchanges, forum posts, or public channels. Automatic on-chain discovery of available file services can be accomplished after Horizon's deployment of data service contracts.

The client operates through a command-line interface (CLI) for simplicity and ease of use. Client would need to determine the Bundle that contains the dataset they desire. This may mean looking at Bundle manifests or in the future a tool that matches manifests by the provided critieria.
2. Consumers are responsible for determining which bundle contains files/data they desire. They will find available bundles from a set of indexer endpoints, and read through bundle manifests for descriptions of the files. This requires the consumer to place trust to particular Bundle manifests. Critically, consumers must understand that the tool guarantees that once a manifest has been picked, the transferred and stored data can be verified against hashes contained in the manifest; this tool **does not** guarantee the correctness of manifest descriptions. In the future, we can provide a separate tool or service that check for or challenge manifest description correctness.

After determining the Bundle CID, client should supply a local path for writing the Bundle corresponding files, a wallet for payments or a free query auth token, and a list of indexer endpoints (this should be handled by gateway or a scraping client).
### Requirements

### CLI example
To use the client effectively, you will need:

- Bundle Manifest CID: The CID of the Bundle Manifest you want to download.
- Indexer Endpoints: A list of available server addresses.
- Storage options
- Local Path: A directory where the downloaded file will be stored. (Default: "./example-download")
- Remote Path: S3 bucket configurations including endpoint, access key id, secret key, bucket name, and region
- Payment options
- Wallet: A blockchain wallet containing tokens for escrow payments.
- Free Query Auth Token: For limited access to a particular server.

Download into local file system
## Usage

The client operates through a command-line interface (CLI) for simplicity and ease of use.

After gathering a list of indexer endpoints and determining the Bundle CID (`ipfs-hash`), client should also supply a local or remote storage path for storing the downloads.

If the client provides a free query auth token, the download will use the free query flow, otherwise, the downloader requires payment configurations, which includes a wallet mnemonic, a Escrow verifier, a Eth provider, and optionally a maximum automatic deposit amount.

### Quick Start CLI example

Download into local file system with free query auth token

```
$ file-exchange downloader \
--ipfs-hash QmHash \
--indexer-endpoints http://localhost:5678,http://localhost:5677 \
--free-query-auth-token 'Bearer auth_token' \
--mnemonic "seed phrase" \
--verifier 0xfC24cE7a4428A6B89B52645243662A02BA734ECF \
--provider "arbitrum-sepolia-rpc-endpoint" \
--network-subgraph https://api.thegraph.com/subgraphs/name/graphprotocol/graph-network-arbitrum-sepolia \
--escrow-subgraph https://api.thegraph.com/subgraphs/name/graphprotocol/scalar-tap-arbitrum-sepolia \
--provider-concurrency 2 \
--max-auto-deposit 500 \
local-files --output-dir "../example-download"
local-files --main-dir "../example-download"
```

Download into remote object storage bucket
Download into remote object storage bucket with paid query flow

```
$ file-exchange downloader \
--ipfs-hash QmHash \
--indexer-endpoints http://localhost:5678,http://localhost:5677 \
--free-query-auth-token 'Bearer auth_token' \
--mnemonic "seed phrase" \
--verifier 0xfC24cE7a4428A6B89B52645243662A02BA734ECF \
--provider "arbitrum-sepolia-rpc-endpoint" \
Expand All @@ -63,37 +76,30 @@ $ file-exchange downloader \
--endpoint "https://ams3.digitaloceanspaces.com"
```

### Requirements

To use the client effectively, you will need:

- Bundle Manifest CID: The CID of the Bundle Manifest you want to download.
- Local Path: A directory where the downloaded file will be stored. (Later will be a generic storage path, enabling cloud storage access)
- Wallet: A blockchain wallet containing tokens for escrow payments.
- Indexer Endpoints: A list of available server addresses.
- (Optional) Free Query Auth Token: For limited access to small files.

### Getting Started

1. Download and install the source code.
2. Gather configurations: Identify the CID of the desired Bundle, registered indexer endpoints, a local path for storing the downloaded files, private key (or mnemonics) of a wallet valid for Escrow payments, (optional) Obtain a free query auth token for limited access, the preference to concurrent providers for downloading.
1. You can use the provided binaries, docker image, or download and install the source code.
2. Gather configurations as described in the above Requirements section.
3. Use the CLI commands to download files.
4. Before downloading, the client will check the status and price of the providers. If the download can be achived by availablility and price at the time of initiation, then download will proceed. If there is no availability, the client will suggest alternative bundles that overlaps with the target bundle and the corresponding providers. If there is not enough balance, the client will suggest Escrow top-up amounts for the Escrow accounts. With a configured on-chain deposit, the downloader might send GraphToken approval transaction to approve Escrow spending and deposit required amounts to the providers.

Enjoy seamless access to a vast world of data!
Before downloading, the client will check the status and price of the providers. If the download can be achived by availablility and price at the time of initiation, then download will proceed.
- If there is no availability, the client will suggest alternative bundles that overlaps with the target bundle and the corresponding providers.
- If there is not enough balance in the escrow account, the client will suggest Escrow top-up amounts for the Escrow accounts. With a configured on-chain deposit, the downloader might send GraphToken approval transaction to approve Escrow spending and then deposit required amounts to the providers.

4. Depending on the log setting, there will be logs on the download progress.

### Security Considerations

The client prioritizes user safety and security. It employs secure communication protocols and wallet management practices. However, users should always be mindful of potential risks:

- Choosing manifests: Verify correctness of the Bundle before initiating file requests.
- Securing wallet: Implement strong key protection and other security measures for the wallet.
- Securing wallet: Employ strong key protection and other security measures for the wallet.
- Staying informed: Updated on the latest security threats, invalid manifests, and updates for the client software.

### Join the Community

To learn more, share experiences, and contribute to the network's growth.

- Discord channel to be created
- Discord channel (to be created)
- Documentation: Explore detailed guides and technical information.
- GitHub Repository: Contribute to the client's development and propose improvements.
13 changes: 0 additions & 13 deletions docs/contracts.md

This file was deleted.

16 changes: 0 additions & 16 deletions docs/database_migration.md
Original file line number Diff line number Diff line change
@@ -1,16 +0,0 @@
# Database Migration setup

Indexer service binary does _NOT_ run database migrations automatically in the program binary, as it might introduce conflicts with the migrations run by indexer agent. Indexer agent is solely responsible for syncing and migrating the database.

The migration files here are included here for testing only.

### Prerequisite: Install sqlx-cli

Run `cargo install sqlx-cli --no-default-features --features native-tls,postgres`

Simple option: run general installation that supports all databases supported by SQLx
`cargo install sqlx-cli`

### Run the migration

Run `sqlx migrate run --source file-service/migrations`
30 changes: 16 additions & 14 deletions docs/discovery.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,36 @@
# File Discovery

With various packaging methods, types of files, and other types of combinations, we stay focused on discovering and matching file CIDs on/off-chain.
This document describe the expectation of discovering and matching file CIDs on/off-chain between servers and clients.

## Off-chain approach
## Off-chain approach (Current)

Indexers serve `/status` endpoint that provides a list of Manifest IPFS hashes, representing the list of available lists the local indexer is serving. This is sufficient for matching specific manifests for bundles, later on we will allow matching for single file manifests.
Indexers share their server endpoint to potential consumers. This process is currently manual, off-chain, and p2p. With Horizon introducing new contract interfaces, indexers should be able to register their URL on-chain for automatic discovery.

On both **Bundle/File** level, the discovery is relatively straightforward for the client, given that they have choosen a Manifest IPFS hash to download (`target_manifest`).
Indexers serve `/status` GraphQL endpoint that provides their serving statuses, including Manifest object, File hashes, descriptions, etc. This is sufficient for a client to match specific manifests to the files they are looking for.

1. Client provide a status list of `indexer_endpoints`. Later on, we can add an automatic mode of grabbing all registered indexer service url from the registery contract.
As described in the limitations of the protocol, clients are responsible for choosing Manifest IPFS hash to download (`target_manifest`).

2. Client pings `/operator` and `/status` endpoint for all indexer endpoints. `/operator` will provide the indexer operator and `/status` endpoint will provide the indexer's available manifests.
1. Client is provided with a list of `indexer_endpoints`. Later on, we can add an automatic mode of grabbing all registered indexer service url from the registery contract.

2. Client pings `/operator` and `/status` endpoint for all indexer endpoints. `/operator` will provide the indexer operator information and `/status` endpoint will provide the indexer's available manifests.

a. if `target_manifest` is in the available manifests, collect indexer operator and endpoint as an available service

3. Collect a list of available services. Returns early if the list is empty.
3. Collect a list of available services.

If discovery is matching on a `Bundle` level, we further consider matching files across bundle manifests so that consumers can be prompted with alternatives if the `target_manifest` is unavailable. This increases file availability by decreasing the criteria for matching a bundle.
If discovery is matching on a `Bundle` level and `target_manifest` is unavailable, we further consider matching files across bundle manifests so that consumers can be prompted with alternatives by matching lower level file manfests. This increases file availability by decreasing the criteria for matching the exact bundle.

Imagine a server serving $bundle_a = {file_x, file_y, file_z}$. Client requests $bundle_b = {file_x}$. The Bundle IPFS hash check will determine that $bundle_a\neq bundle_b$. We add an additional check to resolve $bundle_a$ and $bundle_b$ to file manifest hashes for matching.
Imagine a server serving $bundle_a = {file_x, file_y, file_z}$. Client requests $bundle_b = {file_x}$. The Bundle IPFS hash check will determine that $bundle_a\neq bundle_b$. We add an additional check to resolve $bundle_a$ and $bundle_b$ to file manifest hashes.

1. Query the content of `target_bundle` for its vector of file manifest hashes
1. Query the content of `target_manifest` for its vector of file manifest hashes

2. Query the content of bundles served by indexers, create a nested map of indexer to bundles to files: `Map<Indexer, Map<Bundle, Files>>`.

3. For each `target_bundle` file manifest, check if there is an indexer serving a bundle that contains the target file. Record the indexer and bundle hash, indexed by the file hash.
3. Taking `target_manifest`'s vector of file manifest hashes, for each file manifest, check if there is an indexer serving a bundle that contains the file. Record the indexer and bundle hash, valued by the file hash.

4. if there is a target file unavailable from all indexers, immediately return unavailability as the `target_manifest` cannot be completed.

5. return the recorded map of file to queriable `indexer_endpoint` and manifest hash for the user evaluation
5. return the recorded map of file to queriable `indexer_endpoint` and manifest hash for the user evaluation.

Later on, we may generate a summary of which manifest has the highest percentage of compatibility. The further automated approach will consist of client taking the recorded availability map and construct range download requests based on the corresponding indexer_endpoint, server manifest, and file hash.

Expand Down Expand Up @@ -58,7 +60,7 @@ graph LR
E -->|respond| C
```

## On-chain approach (unrealized)
## On-chain approach (alternative)


**On-chain portion**
Expand Down Expand Up @@ -135,7 +137,7 @@ graph TD
As we keep the diagram simple, it is possible to have indexer serve/host schema files as part of indexer service and become independent of IPFS gateway


## Trade-off
## High level Trade-off

First assume that on-chain allocation does not bring significant economic guarantee to FHS (no rational slashing).

Expand Down
22 changes: 12 additions & 10 deletions docs/feature_checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- [x] construct and write a file_manifest.yaml (root, nodes)
- [x] Unit tests: same file same hash, different file different hash, big temp file same/modified
- [x] last chunk lengths,
- [ ] Analyze merkle tree vs hash list for memory usage and verification runtime
- [x] Analyze merkle tree vs hash list for memory usage and verification runtime
- [x] Manifest builder / publisher - CLI
- [x] Take a file, use File hasher to get the file_manifest, publish file_manifest to IPFS
- [x] later, take a list of files, use File hasher to hash all files and get root hashes
Expand All @@ -26,7 +26,7 @@
- [x] Deserialize and serialize yaml files
- [ ] Manifest server
- [x] require operator mnemonic
- [ ] Use a generic path
- [x] Use a generic path
- [x] Initialize service; for one Bundle, take (ipfs_hash, local_path)
- [x] Take a Bundle IPFS hash and get the file using IPFS client
- [x] Parse yaml file for all the file_manifest hashes using Yaml parser, construct the Bundle object
Expand All @@ -43,7 +43,7 @@
- [x] Route `/version` for Bundle server version
- [x] Configure and check free query auth token
- [ ] (?) Server Certificate
- [ ] Upon receiving a service request (ipfs_hash, range, receipt)
- [x] Upon receiving a service request (ipfs_hash, range, receipt)
- [x] start off with request as (ipfs_hash, range)
- [x] Check if ipfs_hash is available
- [x] Check if range is valid against the Bundle and the specific file_manifest
Expand All @@ -54,23 +54,25 @@
- [ ] determine if streaming is necessary
- [x] Start with free service and requiring a free query auth token
- [x] default pricing, allow updates for pricing per byte
- [ ] Runs TAP agent for receipt management
- [x] Runs TAP agent for receipt management
- [x] Integration testing
- [ ] File Download Client
- [ ] Take private key/mneomic for wallet connections
- [x] Take private key/mneomic for wallet connections
- [x] Request using ipfs_hash
- [ ] take budget for the overall bundle/file
- [ ] construct receipts using budget and chunk sizes
- [ ] add receipt to request
- [x] take budget for the overall bundle/file
- [x] construct receipts using budget and chunk sizes
- [x] add receipt to request
- [x] add free_token to request
- [ ] File discovery and matching (Gateway?)
- [x] Read bundle manifest
- [x] Ping indexer endpoints data availability
- [ ] Pricing and performances, run indexer selection
- [x] Select indexers based on pricing
- [ ] Select indexers based on performances
- [x] Parallel requests
- [x] Use random endpoints
- [x] Construct and send requests to indexer endpoints
- [x] Parallelize requests
- [ ] Multiple connections (HTTPS over HTTP2)
- [x] Multiple connections (HTTPS over HTTP2)
- [x] Wait for the responses (For now, assume that the response chunks correspond with the verifiable chunks)
- [x] Keeps track of the downloaded and missing pieces,
- [x] continually requesting missing pieces until the complete file is obtained
Expand Down
4 changes: 2 additions & 2 deletions docs/manifest.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Manfiest specifications
## Manfiest

Structure of Bundle and File Manifests

Expand Down Expand Up @@ -103,7 +103,7 @@ Depending on the package sizes and client requirements, different validation met

![Diagram](./verification-tradeoffs.png)

### Current manifest
### Manifest examples

#### Bundle manifest

Expand Down
Loading

0 comments on commit 7177e24

Please sign in to comment.