Skip to content

Commit

Permalink
Clarify ingest tool/library names for Ingest CLI and Python Ingest li…
Browse files Browse the repository at this point in the history
…brary (#131)
  • Loading branch information
Paul-Cornell authored Jul 31, 2024
1 parent d0cc879 commit 4bea6c8
Show file tree
Hide file tree
Showing 71 changed files with 170 additions and 126 deletions.
12 changes: 5 additions & 7 deletions api-reference/api-services/free-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ import SharedPagesBilling from '/snippets/general-shared-text/pages-billing.mdx'
## Quickstart

Let's say you want to preprocess an `*.eml` file using the free Unstructured API. There are several ways
you can do this, which all lead to the same result, so pick your preferred method: [POST](#post-request), [CLI](#unstructured-cli), [SDK](#unstructured-python-sdk-and-javascript-typescript-sdk), or [open source](#calling-the-unstructured-api-from-the-unstructured-open-source-library).
you can do this, which all lead to the same result, so pick your preferred method: [POST](#post-request), [CLI](#unstructured-ingest-cli), [SDK](#unstructured-python-sdk-and-javascript-typescript-sdk), or [open source](#calling-the-unstructured-api-from-the-unstructured-open-source-library).

### POST request

Expand All @@ -63,17 +63,17 @@ After the command successfully runs, see the results in the specified output pat

If you do not have any files available, you can download some from the [example-docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs) folder in the Unstructured repo on GitHub.

<Info>`POST` requests support using only local machine paths as the source (input) for the file to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the [CLI](#unstructured-cli), the [Python SDK](#unstructured-python-sdk-and-javascript-typescript-sdk), or the [open source library](#calling-the-unstructured-api-from-the-unstructured-open-source-library) instead.</Info>
<Info>`POST` requests support using only local machine paths as the source (input) for the file to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) or the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) instead.</Info>

import SharedPOSTSingleFile from '/snippets/general-shared-text/post-api-single-file.mdx';

<SharedPOSTSingleFile />

[Learn more about how to use POST requests](/api-reference/api-services/post-requests).

### Unstructured CLI
### Unstructured Ingest CLI

To work with the Free Unstructured API by using the Unstructured CLI, you will need to:
To work with the Free Unstructured API by using the Unstructured Ingest CLI, you will need to:

- Install Python, and then install the CLI package:

Expand Down Expand Up @@ -113,9 +113,7 @@ To work with the Free Unstructured API in Python or JavaScript, use the
Unstructured [Python SDK](https://github.com/Unstructured-IO/unstructured-python-client), or
[JavaScript SDK](https://github.com/Unstructured-IO/unstructured-js-client).

<Info>The JavaScript/TypeScript SDK supports using only local machine paths as the source (input) for the files to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the [CLI](#unstructured-cli), the Python SDK, or the [open source library](#calling-the-unstructured-api-from-the-unstructured-open-source-library) instead.</Info>

Install your preferred SDK:
<Info>The JavaScript/TypeScript SDK supports using only local machine paths as the source (input) for the files to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) or the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) instead.</Info>

<CodeGroup>
```bash Python
Expand Down
10 changes: 5 additions & 5 deletions api-reference/api-services/saas-api-development-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ import SharedPagesBilling from '/snippets/general-shared-text/pages-billing.mdx'

The following example illustrates how to preprocess an `*.eml` file using the Unstructured Serverless API.

There are several ways to use the Unstructured Serverless API, which all lead to the same result, so pick your preferred method: [POST](#post-request), [CLI](#unstructured-cli), [SDK](#unstructured-python-sdk-and-javascript-typescript-sdk), or [open source](#calling-the-unstructured-api-from-the-unstructured-open-source-library).
There are several ways to use the Unstructured Serverless API, which all lead to the same result, so pick your preferred method: [POST](#post-request), [CLI](#unstructured-ingest-cli), [SDK](#unstructured-python-sdk-and-javascript-typescript-sdk), or [open source](#calling-the-unstructured-api-from-the-unstructured-open-source-library).

### POST request

Expand Down Expand Up @@ -72,17 +72,17 @@ After the command successfully runs, see the results in the specified output pat

If you do not have any files available, you can download some from the [example-docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs) folder in the Unstructured repo on GitHub.

<Info>`POST` requests support using only local machine paths as the source (input) for the files to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the CLI, the Python SDK, or the open source library instead.</Info>
<Info>`POST` requests support using only local machine paths as the source (input) for the files to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) or the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) instead.</Info>

import SharedPOSTSingleFile from '/snippets/general-shared-text/post-api-single-file.mdx';

<SharedPOSTSingleFile />

[Learn more about how to use POST requests](/api-reference/api-services/post-requests).

### Unstructured CLI
### Unstructured Ingest CLI

To work with the Unstructured Serverless API by using the Unstructured CLI, you will need to:
To work with the Unstructured Serverless API by using the Unstructured Ingest CLI, you will need to:

- Install Python, and then install the CLI package:

Expand Down Expand Up @@ -126,7 +126,7 @@ To work with the Unstructured Serverless API in Python, JavaScript, or TypeScrip
Unstructured [Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) or
[JavaScript/TypeScript SDK](https://github.com/Unstructured-IO/unstructured-js-client).

<Info>The JavaScript/TypeScript SDK supports using only local machine paths as the source (input) for the files to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the [CLI](#unstructured-cli), the Python SDK, or the [open source library](#calling-the-unstructured-api-from-the-unstructured-open-source-library) instead.</Info>
<Info>The JavaScript/TypeScript SDK supports using only local machine paths as the source (input) for the files to preprocess and as the destination (output) that Unstructured sends the processed data to. To specify a source or destination other than a local machine, use the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) or the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) instead.</Info>

First, install your preferred SDK:
<CodeGroup>
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/destination-connector/azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentAzure/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:

import AzureAPISh from '/snippets/destination_connectors/azure.sh.mdx';
import AzureAPIPyV2 from '/snippets/destination_connectors/azure.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/destination-connector/local.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentLocal/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:

import AzureAPISh from '/snippets/destination_connectors/azure.sh.mdx';
import AzureAPIPyV2 from '/snippets/destination_connectors/azure.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/destination-connector/s3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentS3/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:

import S3APISh from '/snippets/destination_connectors/s3.sh.mdx';
import S3APIPyV2 from '/snippets/destination_connectors/s3.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/destination-connector/singlestore.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedSingleStore />
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector:

import SingleStoreAPISh from '/snippets/destination_connectors/singlestore.sh.mdx';
import SingleStoreAPIPyV2 from '/snippets/destination_connectors/singlestore.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/source-connectors/azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentAzure/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

import AzureAPISh from '/snippets/source_connectors/azure.sh.mdx';
import AzureAPIPyV2 from '/snippets/source_connectors/azure.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/source-connectors/google-drive.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentGoogleDrive/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

import GoogleDriveAPISh from '/snippets/source_connectors/google_drive.sh.mdx';
import GoogleDriveAPIPyV2 from '/snippets/source_connectors/google_drive.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/source-connectors/local.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentLocal/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

import LocalAPISh from '/snippets/source_connectors/local.sh.mdx';
import LocalAPIPyV2 from '/snippets/source_connectors/local.v2.py.mdx';
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/source-connectors/s3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
<SharedContentS3/>
<SharedAPIKeyURL/>

Now call the Unstructured CLI or Python SDK. The destination connector can be any of the ones supported. This example uses the local destination connector:
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector:

import S3APISh from '/snippets/source_connectors/s3.sh.mdx';
import S3APIPyV2 from '/snippets/source_connectors/s3.v2.py.mdx';
Expand Down
4 changes: 2 additions & 2 deletions faq/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ Yes, you can still use your old API keys. We will migrate all the user keys to t
### How can I generate and use a new API Key to process my documents?

When you log in to the Serverless API dashboard, you can access your API keys by clicking the `API Keys` link in the side navigation.
Under the `Actions` column, click the `Copy` icon to copy the key or the boilerplate codes to process the documents
using the Unstructured REST API POST with `curl`, or the Unstructured CLI, or the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) or [Unstructured JavaScript/Typescript SDK](https://github.com/Unstructured-IO/unstructured-js-client).
Under the `Actions` column, click the `Copy` icon to copy the key or an example code snippet to process the documents
using the Unstructured REST API, or the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli), or the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) or [Unstructured JavaScript/Typescript SDK](https://github.com/Unstructured-IO/unstructured-js-client).

### What is the new Unstructured API pricing structure?

Expand Down
24 changes: 12 additions & 12 deletions ingestion/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ You can perform ingestion with the following tools:

- The [Unstructured Platform](/platform/overview), a no-code user interface, unlimited pay-as-you-go platform to get all of your data ready for Retrieval Augmented Generation (RAG) and model fine-tuning.
- The [Unstructured Ingest CLI](#unstructured-ingest-cli), with unlimited pay-as-you-go and limited free options, that enable you to use command-line scripts to get all of your data ready for RAG and model fine-tuning.
- The [Unstructured Ingest Python](#unstructured-ingest-python) library and connectors, with unlimited pay-as-you-go and limited free options, that enable you to use Python code to get all of your data ready for RAG and model fine-tuning.
- The [Unstructured Ingest Python library](#unstructured-ingest-python-library), with unlimited pay-as-you-go and limited free options, that enable you to use Python code to get all of your data ready for RAG and model fine-tuning.

<Info>
The [Unstructured Python SDK](/api-reference/api-services/sdk-python) and Unstructured JavaScript/TypeScript SDK](/api-reference/api-services/sdk-jsts) can process only one file at a time.
Expand All @@ -32,7 +32,7 @@ flowchart LR

The Unstructured Platform enables you to connect to many kinds of [sources](/platform/platform-source-connectors/overview) and [destinations](/platform/platform-destination-connectors/overview).

If you use the Unstructured Ingest CLI or Unstructured Ingest Python, the source or destination can be a cloud storage location or a local location. For example:
If you use the Unstructured Ingest CLI or the Unstructured Ingest Python library, the source or destination can be a cloud storage location or a local location. For example:

```mermaid
flowchart LR
Expand Down Expand Up @@ -66,10 +66,10 @@ flowchart LR
```

- This flow always happens for the Unstructured Platform. The Platform only allows sending files from cloud storage and sending processed data to cloud storage.
- For the Unstructured CLI or Unstructured Ingest Python, to use this flow:
- For the Unstructured Ingest CLI or the Unstructured Ingest Python library, to use this flow:

- When using the Unstructured CLI, include the `--partition-by-api` option and set `--api-key` and `--partition-endpoint` to a valid, matching Unstructured API key and API URL, respectively.
- When using Unstructured Ingest Python, set `partition_by_api=True` and `api_key` and set `partition_endpoint` to a valid, matching Unstructured API key and API URL, respectively.
- When using the Unstructured Ingest CLI, include the `--partition-by-api` option and set `--api-key` and `--partition-endpoint` to a valid, matching Unstructured API key and API URL, respectively.
- When using the Unstructured Ingest Python library, set `partition_by_api=True` and `api_key` and set `partition_endpoint` to a valid, matching Unstructured API key and API URL, respectively.

## Local ingestion options

Expand All @@ -81,10 +81,10 @@ flowchart LR
```

- This flow never happens for the Unstructured Platform. The Platform does not allow sending files from a local destination to Unstructured or Unstructured sending processed data to a local destination.
- For the Unstructured CLI or Unstructured Ingest Python, to use this flow:
- For the Unstructured Ingest CLI or the Unstructured Ingest Python library, to use this flow:

- When using the Unstructured CLI, omit the `--partition-by-api`, `--api-key`, and `--partition-endpoint` options.
- When using the Unstructured Ingest Python, omit `partition_by_api` or explicitly set `parition_by_api=False`. Also omit `api_key` and `partition_endpoint`.
- When using the Unstructured Ingest CLI, omit the `--partition-by-api`, `--api-key`, and `--partition-endpoint` options.
- When using the Unstructured Ingest Python library, omit `partition_by_api` or explicitly set `parition_by_api=False`. Also omit `api_key` and `partition_endpoint`.

## Unstructured Ingest CLI

Expand Down Expand Up @@ -133,11 +133,11 @@ To begin using the CLI, see the quickstarts for the:
- [Unstructured Serverless API](/api-reference/api-services/saas-api-development-guide#unstructured-cli)
- [Free Unstructured API](/api-reference/api-services/free-api#unstructured-cli)

## Unstructured Ingest Python
## Unstructured Ingest Python library

The Unstructured Ingest Python library and connectors enable you to use Python code to get all of your data ready for RAG and model fine-tuning.
The Unstructured Ingest Python library enable you to use Python code to get all of your data ready for RAG and model fine-tuning.

One approach to using Unstructured Ingest Python is installing Python and then running the following command to install the library and the default connectors:
One approach to using the Unstructured Ingest Python library is installing Python and then running the following command to install the library and the default connectors:

```bash
pip install unstructured
Expand All @@ -159,4 +159,4 @@ Some source and destination connectors provide newer v2 and older v1 implementat
- [v1 fsspec connectors](https://github.com/Unstructured-IO/unstructured/tree/main/unstructured/ingest/connector/fsspec)
- [v1 Notion connector](https://github.com/Unstructured-IO/unstructured/tree/main/unstructured/ingest/connector/notion)

To begin using Unstructured Ingest Python, see the code examples for the [source](/api-reference/ingest/source-connectors/overview) and [destination](/api-reference/ingest/destination-connector/overview) connectors.
To begin using the Unstructured Ingest Python library, see the code examples for the [source](/api-reference/ingest/source-connectors/overview) and [destination](/api-reference/ingest/destination-connector/overview) connectors.
2 changes: 1 addition & 1 deletion open-source/ingest/destination-connectors/azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import SharedAzure from '/snippets/dc-shared-text/azure.mdx';

<SharedAzure />

Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector:
Now call the Unstructured Ingest CLI or Unstructured Ingest Python. The source connector can be any of the ones supported. This example uses the local source connector:

import AzureAPISh from '/snippets/destination_connectors/azure.sh.mdx';
import AzureAPIPyV2 from '/snippets/destination_connectors/azure.v2.py.mdx';
Expand Down
Loading

0 comments on commit 4bea6c8

Please sign in to comment.