Skip to content

Commit

Permalink
Update Gremlin API docs (#533)
Browse files Browse the repository at this point in the history
* update docs to include api page

* add timeout option docs

* Apply suggestions from code review

Co-authored-by: cn337131 <[email protected]>
Co-authored-by: p29876 <[email protected]>

* add gafferpy example

---------

Co-authored-by: cn337131 <[email protected]>
Co-authored-by: p29876 <[email protected]>
  • Loading branch information
3 people authored Oct 10, 2024
1 parent 2e95fc9 commit 3033e47
Show file tree
Hide file tree
Showing 6 changed files with 121 additions and 55 deletions.
1 change: 1 addition & 0 deletions docs/administration-guide/gaffer-deployment/gremlin.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ A full breakdown of the available properties is as follows:
| `gaffer.schemas` | The path to the directory containing the graph schema files. | No |
| `gaffer.userId` | The default user ID for the Tinkerpop graph. | No (User is always set via the [`UserFactory`](../security/user-control.md).) |
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed | No |
| `gaffer.rest.timeout` | The timeout for gremlin queries submitted to the REST API in ms. Default is 2 mins if not specified. | Yes |
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/custom-features.md)) | Yes |
| `gaffer.elements.getalllimit` | The default limit for unseeded queries e.g. `g.V()`. | Yes |
| `gaffer.elements.hasstepfilterstage` | The default stage to apply any `has()` steps e.g. `PRE_AGGREGATION` | Yes |
106 changes: 106 additions & 0 deletions docs/user-guide/apis/gremlin-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Gremlin API

!!! warning
The Gremlin API is still under development and has some [limitations](../query/gremlin/gremlin-limits.md).
The implementation may not allow some advanced features of Gremlin and it's
performance is unknown in comparison to standard Gaffer `OperationChains`.

## What is Gremlin?

[Gremlin](https://tinkerpop.apache.org/gremlin.html) is a query language for
traversing graphs. It is a core component of the Apache Tinkerpop library and
allows users to easily express more complex graph queries.

GafferPop is a lightweight Gaffer implementation of the [TinkerPop framework](https://tinkerpop.apache.org/),
where TinkerPop methods are delegated to Gaffer graph operations.

The addition of Gremlin as query language in Gaffer allows users to represent
complex graph queries in a simpler language, akin to other querying languages
used in traditional and NoSQL databases. It also has wide support for various
languages; for example you can write queries in Python via the [`gremlinpython`](https://pypi.org/project/gremlinpython/)
library.

!!! tip
In-depth tutorials on Gremlin as a query language and its associated libraries
can be found in the [Apache Tinkerpop Gremlin docs](https://tinkerpop.apache.org/gremlin.html).

## How to Query a Graph

There are two main methods of using Gremlin in Gaffer, these are via a websocket
similar to a typical [Gremlin Server](https://tinkerpop.apache.org/docs/current/reference/#connecting-gremlin-server)
or by submitting queries via the REST Endpoints like standard Gaffer Operations.
Once connected, the [Gremlin in Gaffer](../query/gremlin/gremlin.md) page
provides a simple comparison of Gremlin compared to Gaffer Operations.

!!! note
Both methods require a running [Gaffer REST API](./rest-api.md) instance.

### Websocket API

The websocket provides the most _standard_ way to use the Gremlin API. The
Gaffer REST API provides a Gremlin server-like experience via a websocket at
`/gremlin`. This can be connected to to provide a graph traversal source for
spawning queries.

The websocket should support all standard Gremlin tooling and uses GraphSONv3
serialisation for communication. To connect a tool like [`gremlinpython`](https://pypi.org/project/gremlinpython/)
we can do something similar to [`gafferpy`](./python-api.md). First import the
required libraries (many of these will be needed later for queries):

```python
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.driver.serializer import GraphSONSerializersV3d0
from gremlin_python.process.graph_traversal import __
```

We can then establish a connection to the Gremlin server and save the reference
(typically called `g`):

```python
# Setup a connection with the REST API running on localhost
g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8080/gremlin', 'g', message_serializer=GraphSONSerializersV3d0()))
```

Now that we have the traversal reference this can be used to spawn graph traversals
and get results back.

### REST API Endpoints

The Gremlin endpoints provide a similar interface to running Gaffer Operations.
They accept a plaintext Gremlin Groovy or OpenCypher query and will return
the results in [GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
format.

The two endpoints are:

- `/rest/gremlin/execute` - Runs a Gremlin Groovy script and outputs the result
as GraphSONv3 JSON.
- `/rest/gremlin/cypher/execute` - Translates a Cypher query to Gremlin and
executes it returning a GraphSONv3 JSON result. Note will always append a
`.toList()` to the translation.

A query can be submitted via the Swagger UI or simple POST request such as:

```bash
curl -X 'POST' \
'http://localhost:8080/rest/gremlin/execute' \
-H 'accept: application/x-ndjson' \
-H 'Content-Type: text/plain' \
-d 'g.V().hasLabel('\''something'\'').toList()'
```

You can also utilise [Gafferpy](./python-api.md) to connect and run queries
using the endpoints.

```python
from gafferpy import gaffer_connector

gc = gaffer_connector.GafferConnector("http://localhost:8080/rest")

# Execute and return gremlin
gremlin_result = gc.execute_gremlin("g.V('1').toList()")

# Execute and return cypher
cypher_result = gc.execute_cypher("MATCH (n) WHERE ID(n) = '1' RETURN n")
```
3 changes: 3 additions & 0 deletions docs/user-guide/query/gremlin/custom-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,3 +229,6 @@ data in that reaches the last step then that step will be missing from the expla
in the Graph schema.
- All submitted Cypher explains will be translated to Gremlin first and have a `.toList()`
appended to the translation so it is actually executed.
- An explanation of a Gremlin `project()` step will not include all the Operations called.
As a Gremlin `project` is essentially a for-each loop the explain will only include the
last iteration of the loop.
8 changes: 5 additions & 3 deletions docs/user-guide/query/gremlin/gremlin-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,16 @@ Current known limitations or bugs:
- Edge IDs in GafferPop are not the same as in standard Gremlin. Instead of `g.E(11)`
edge IDs take the format `g.E("[source, dest]")` or `g.E("[source, label, dest]")`.
- The entity group `id` is reserved for an empty group containing only the
vertex ID, this is currently used as a workaround for other limitations.
vertex ID, this is currently used as a workaround for other limitations. One such
use is for holding 'orphaned' vertexes, these are vertexes on an edge that do not
have a Gaffer entity associated with them.
- Chaining `hasLabel()` calls together like `hasLabel("label1").hasLabel("label2")`
will act like an OR rather than an AND in standard Gremlin. This means you
may get results back when you realistically shouldn't.
- Input seeds to Gaffer operations are deduplicated.
Therefore, the results of a query against a GafferPop graph may be different than a standard Gremlin graph.
Therefore, the results of a query against a GafferPop graph may be different than a standard Gremlin graph.
For example, for the Tinkerpop Modern graph:
```
```text
(Gremlin) g.V().out() = [v2, v3, v3, v3, v4, v5]
(GafferPop) g.V().out() = [v2, v3, v4, v5]
```
Expand Down
57 changes: 5 additions & 52 deletions docs/user-guide/query/gremlin/gremlin.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,4 @@
# Gremlin in Gaffer

!!! warning
GafferPop is still under development and has some [limitations](gremlin-limits.md).
The implementation may not allow some advanced features of Gremlin and it's
performance is unknown in comparison to standard Gaffer `OperationChains`.

[Gremlin](https://tinkerpop.apache.org/gremlin.html) is a query language for
traversing graphs. It is a core component of the Apache Tinkerpop library and
allows users to easily express more complex graph queries.

GafferPop is a lightweight Gaffer implementation of the [TinkerPop framework](https://tinkerpop.apache.org/),
where TinkerPop methods are delegated to Gaffer graph operations.

The addition of Gremlin as query language in Gaffer allows users to represent
complex graph queries in a simpler language akin to other querying languages
used in traditional and NoSQL databases. It also has wide support for various
languages so for example, you can write queries in Python via the [`gremlinpython` library](https://pypi.org/project/gremlinpython/)

!!! tip
In-depth tutorials on Gremlin as a query language and its associated libraries
can be found in the [Apache Tinkerpop Gremlin docs](https://tinkerpop.apache.org/gremlin.html).

## Using Gremlin Queries in Gaffer
# Using Gremlin Queries in Gaffer

Gremlin was added to Gaffer in version 2.1 as a new graph query language and since
version 2.3 it has been added as standard into the Gaffer REST API. A full tutorial
Expand All @@ -31,35 +8,11 @@ on the configuration of Gremlin in Gaffer is provided in the
This guide will use the [Python API for Gremlin](https://pypi.org/project/gremlinpython/)
to demonstrate some basic capabilities and how they compare to standard Gaffer syntax.

To start querying in Gremlin we first need a reference to what is known as the
Graph Traversal. To obtain this we need to connect to the Gremlin websocket provided
by the Gaffer REST API (if you have used [`gafferpy`](../../apis/python-api.md)
before this will be quite similar). We can do this by first importing the required
libraries like so (many of these will be needed later for queries):

```python
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.driver.serializer import GraphSONSerializersV3d0
from gremlin_python.process.graph_traversal import __
```

We can then establish a connection to the Gremlin server and save a reference to
this (typically called `g`):

```python
# Setup a connection with the REST API running on localhost
g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8080/gremlin', 'g', message_serializer=GraphSONSerializersV3d0()))
```

Now that we have the traversal reference this can be used to spawn graph traversals
and get results back.

!!! note
Its important to use the GraphSON v3 serialiser if connecting to the Gaffer
REST API.
!!! tip
For information on how to set up a Gremlin connection please see the
[API guide](../../apis/gremlin-api.md).

### Basic Gremlin Queries
## Basic Gremlin Queries

Gremlin queries (similar to Gaffer queries) usually require a starting set of
entities to query from. Commonly Gremlin queries will be left without any IDs in
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ nav:
- 'Spring REST': 'user-guide/apis/rest-api.md'
- 'Python (gafferpy)': 'user-guide/apis/python-api.md'
- 'Java': 'user-guide/apis/java-api.md'
- 'Gremlin (GafferPop)': 'user-guide/apis/gremlin-api.md'
- Querying:
- Gaffer Query Syntax:
- 'Operations': 'user-guide/query/gaffer-syntax/operations.md'
Expand Down

0 comments on commit 3033e47

Please sign in to comment.