Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openapi: Extend the /v1/compile API #108

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

philipaconrad
Copy link
Member

What changed?

This PR tracks a planned extension to the Enterprise OPA /v1/compile API, as part of our intended support for UCAST and data filtering experiments.

Currently, my approach is to add a single additional string option under "options" ➡️ "targetFormat". Allowed values for the field include: json_ast (the default), sql, and ucast.

Here's what an example /v1/compile input might look like:

POST /v1/compile HTTP/1.1
Content-Type: application/json
{
  "query": "data.example.allow == true",
  "input": {
    "subject": {
      "clearance_level": 4
    }
  },
  "options": {
    "disableInlining": [],
    "targetFormat": "sql"
  },
  "unknowns": [
    "data.reports"
  ]
}

The response would then resemble:

{
  "result": {
    "queries": [
      [
        {
          "index": 0,
          "sql": "WHERE ...\n-- not actually sure what this SQL query would look like for the reference query."
        }
      ]
    ]
  }
}

@philipaconrad philipaconrad added the enhancement New feature or request label Jan 9, 2025
@philipaconrad philipaconrad requested a review from chendrix January 9, 2025 08:32
@philipaconrad philipaconrad self-assigned this Jan 9, 2025
type: array
targetFormat:
type: string
enum: [json_ast, sql, ucast]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we may need further per-targetFormat options?

For example "ucast" would have different valid inputs (PE fragment) depending on whether it's meant for conversion with ucast.as_sql, or ucast-linq or ucast-prisma. Could be a "flavour", akin to database dialects? 🤔

Same for SQL, if we want to expand the set of builtins that we can translate, this could depend on the target dialect, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's a good point. 🤔 I suppose we could switch this to be an object type or something that allows us to nest target-specific options.

@chendrix
Copy link
Collaborator

chendrix commented Jan 9, 2025

I was interested in seeing if we could actually separate out the response types for different targets into different application/json+<target> MIME types.

I don't want to overload the response to just generic arbitrary JSON

@philipaconrad
Copy link
Member Author

philipaconrad commented Jan 22, 2025

One concern I've seen come up in discussions around the new Compile targets is about handling multiple output targets in a single request.

The argument is that firing up the PE machinery for a single target vs several targets is roughly the same amount of work, so we might as well be able to handle N-many targets in our responses.

I propose the following:

  • If the Accept header is provided on its own, we should use that to generate a single target in the response.
  • If a targetDialects array field is provided in the options part of the input, we should use that to return the multiple output types together.
    • If both Accept header and targetDialects are present, targetDialects overrides. (This is an arbitrary choice, fwiw.)

EDIT: A third option might be to provide a application/vnd.styra.compile+json, or mixed+json header variation. That would make some validation checks a little easier, and would force API clients to opt-in to multi-target behavior. 🤔

@philipaconrad philipaconrad force-pushed the philip/extend-compile-api branch from d4c89b0 to e42fdd5 Compare January 22, 2025 23:03
This commit adds a response type for the /v1/compile API, and begins
extending it for supporting more output formats than just the JSON AST
format.

Signed-off-by: Philip Conrad <[email protected]>
This commit also includes a small change to include table mapping info
for the SQL targets.

Signed-off-by: Philip Conrad <[email protected]>
@philipaconrad philipaconrad force-pushed the philip/extend-compile-api branch from e42fdd5 to 0bf704a Compare January 22, 2025 23:05
This commit makes the following changes:
 - A few of the new array types have `items` type constraints now.
 - Some descriptions have been added and updated.
 - Proposed initial target dialects include Prisma, LINQ, MySQL,
   and Postgres.

Signed-off-by: Philip Conrad <[email protected]>
openapi/openapi.yaml Outdated Show resolved Hide resolved
Copy link
Member

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward!

Comment on lines +465 to +466
queries:
type: array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For back-compatibility with OPA, this could also contain a "support" field, see here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've correctly added an Object field that handles the support field. 😅

Comment on lines +476 to +478
queries:
type: array
description: Array of UCAST JSON objects describing each query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an array? I think we're constructing a single condition, so if "queries" from PE are translated, they'll become and OR-of-AND condition object, a single one. So let's do

Suggested change
queries:
type: array
description: Array of UCAST JSON objects describing each query.
ucast:
type: object
description: UCAST JSON objects describing the conditions under which the query is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array was a holdover from the original PE output format, which might contain results for several queries, as I understood it. I'll refactor this tomorrow.

Comment on lines +488 to +490
queries:
type: array
description: Array of strings representing the SQL equivalent of each query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
queries:
type: array
description: Array of strings representing the SQL equivalent of each query.
sql:
type: string
description: A string representing the SQL WHERE clauses for the conditions under which the query is true.

If we have a response like

{
  result: {
    ucast: { ... }
  }
}

for UCAST and

{
  result: {
    sql: { ... }
  }
}

for SQL targets, the mixed response could be

{
  result: {
    ucast: { ... },
    sql: "..."
  }
}

OTOH if we want to support multiple SQL dialects in a single "mixed" request, then we might need to return something like

{
  result: {
    ucast+prisma: { ... },
    ucast+linq: { ... },
    sql+postgres: "...",
    sql+mysql: "..."
  }
}

For UCAST, it might not be necessary, because the encoding of the it is the same for different dialects -- only the set of builtins varies, but if your data policy is valid for LINQ, it's valid for Prisma, too, the same UCAST IR.

But for SQL, we'll have to differentiate...

@philipaconrad
Copy link
Member Author

philipaconrad commented Jan 23, 2025

ℹ️ After thinking on it a bit, I find the "third option" (explicit header variant for mixed/multi-target use case) is appealing because it locks down the different things the implementation might have to do, and it prevents the whole "if options.targetDialects is present, change our behavior" issue. Validation becomes easier, and more stuff can be spelled out directly in the OpenAPI spec then.

I'll push up the change to the spec shortly...

Copy link
Member

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward. I think the end is near ✨

Comment on lines +178 to +180
application/vnd.styra.ucast.+json:
schema:
$ref: "#/components/schemas/CompileResultUCAST"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c/p left over?

application/vnd.styra.ucast.+json:
schema:
$ref: "#/components/schemas/CompileResultUCAST"
application/vnd.styra.sql.mssql+json:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure about what to prefer for MS SQL Server. Asking a gigaton of floats about it, I got

Let me help you search through code to see which term is more commonly used when referring to Microsoft SQL Server.

Let me analyze the usage patterns of "mssql" versus "sqlserver" in code repositories. Based on the search results, here are the key observations:

  1. "sqlserver" appears to be more commonly used in official contexts and Microsoft's own documentation:

    • Microsoft's ODBC driver is referred to as "msodbcsql18"
    • The official driver name contains "sqlserver": "SQLServerDriver"
    • In connection strings, the format is typically "sqlserver://"
  2. "mssql" tends to be used more in:

    • Package names and informal references
    • Configuration files
    • Quick references or shorthand naming

The evidence suggests that "sqlserver" is the more formal and official term, while "mssql" is commonly used as a shorter, convenient alternative in code and configurations.

For completeness, you can view more results in GitHub's code search: https://github.com/search?q=content%3A%22mssql%22+OR+content%3A%22sqlserver%22+language%3Asql+OR+language%3Ayaml+OR+language%3Ajson+OR+language%3Adockerfile&type=code

So. We're none the wiser. I'm fine with either, if you prefer mssql, let's keep it.

Comment on lines +503 to +504
support:
type: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was misleading you here. "support" only is a thing for (unconstrained, backwards-compatible) PE. When we generate SQL or UCAST, any PE (intermediate) result that has support modules is a fragment violation because as of now, we cannot translate that into clauses (UCAST/SQL).

ucastAll:
type: object
properties:
queries:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be singular, query. We'll return a single DNF expression (ORs-of-ANDs).

Comment on lines +531 to +554
ucastAll:
type: object
properties:
queries:
type: array
description: Array of UCAST JSON objects describing each query.
ucastMinimal:
type: object
properties:
queries:
type: array
description: Array of UCAST JSON objects describing each query.
ucastPrisma:
type: object
properties:
queries:
type: array
description: Array of UCAST JSON objects describing each query.
ucastLINQ:
type: object
properties:
queries:
type: array
description: Array of UCAST JSON objects describing each query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, my understanding (and the way the code works right now) is a bit different: the UCAST format is always the same, so there's no difference in form depending on the dialects used.

The supported builtins are different, and this is what I thought we'd want to support when querying for multiple dialects: if you want UCAST for Prisma and LINQ, the set of supported builtins and features is the intersection of the builtins and features supported by each of them. But the outcome will be a single UCAST object.

For the SQL-target dialects, it's similar, but their out form can also be different, so we need to differentiate them.

Does that make sense? I think I'd expect an output for mixed targets like this:

{
  result: {
    sqlMSSQL: "...",
    sqlPostgres: " ...", // if these two have been requested for target SQL
    ucast: { ... } // one field only, for all the requested dialects.
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants