Skip to content

Commit

Permalink
Merge pull request #582 from Worklytics/rc-v0.4.41
Browse files Browse the repository at this point in the history
v0.4.41
  • Loading branch information
eschultink authored Nov 24, 2023
2 parents 767a8a4 + 3702adf commit bd7791d
Show file tree
Hide file tree
Showing 220 changed files with 2,579 additions and 1,791 deletions.
8 changes: 1 addition & 7 deletions .github/workflows/build-java.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,20 +29,14 @@ jobs:
java-version: ${{ inputs.java-version }}
# https://github.com/actions/setup-java#supported-distributions
distribution: zulu
- name: Cache Maven packages
uses: actions/cache@v3
with:
path: ~/.m2
key: ${{ runner.os }}-m2-v1-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-m2-v1-
- name: Clear our artifacts from Maven cache # q: does this work!?!?!
run: |
rm -rf ~/.m2/repository/co/worklytics/
rm -rf ~/.m2/repository/com/avaulta/
- name: Compile
working-directory: java/
run: |
mvn clean compile -T 2C -Dversions.logOutput=false -DprocessDependencies=false -DprocessDependencyManagement=false
mvn clean compile -T 2C -Dversions.logOutput=false
- name: Test
working-directory: java/
run: |
Expand Down
15 changes: 11 additions & 4 deletions .github/workflows/ci-java-all.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ on:
branches:
- 'main'
- 'rc-*'
- 's152-java-20-test'
- 's162-prep-release'

jobs:
# Java 11 - supported until 30 Sept 2023
# Java 11 - Oracle support ended 30 Sept 2023 ... but still what ships with GCP cloud shell!!!
ci_java11:
uses: ./.github/workflows/build-java.yaml
with:
Expand All @@ -26,11 +26,18 @@ jobs:
with:
java-version: '17'

# NOTE: java 19 support ended 21 Mar 2023; it's known to be incompatible with our code
# Java 20 - support ended 19 Sept 2023
# NOTE: psoxy versions 0.4.40 supported this; if you need it, option to downgrade to that.
# although beyond me why 17 and 21 both work, but 20 doesn't; best guess is Mockito 5 degrading
# behavior in some way for 20 that isn't needed for 21 and doesn't matter for 17?

# Java 20 - released 21 Mar 2023, supported until 19 Sept 2023
ci_java20:
uses: ./.github/workflows/build-java.yaml
with:
java-version: '20'

# Java 21 - released 19 Sept 2023, supported until Sept 2028 (LTS)
ci_java21:
uses: ./.github/workflows/build-java.yaml
with:
java-version: '21'
2 changes: 1 addition & 1 deletion .github/workflows/ci-java.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ jobs:
ci_java:
uses: ./.github/workflows/build-java.yaml
with:
java-version: '17'
java-version: '17' # 21 is LTS, so fair that this should be our default
27 changes: 11 additions & 16 deletions .github/workflows/ci-java8-core.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,19 @@ name: CI - java8 core
#
# see https://help.github.com/en/actions/language-and-framework-guides/building-and-testing-java-with-maven

# Disable for now - hope we can dispense with this soon
on:
push:
branches:
- '**' # should match all branches
workflow_dispatch: # allow manual triggering

# push:
# branches:
# - '**' # should match all branches

jobs:
ci_java8_core:
env:
compile-profile: '-P java8 ' # NOTE: trailing space is important
java-version: '17'
java-version: '17' # build w java 17, but pom configured to still build java 8 byte code
runs-on: ubuntu-latest
steps:
- name: Check out code
Expand All @@ -26,18 +29,6 @@ jobs:
java-version: ${{ env.java-version }}
# https://github.com/actions/setup-java#supported-distributions
distribution: zulu
- name: Cache Maven packages
uses: actions/cache@v3
with:
path: ~/.m2
key: ${{ runner.os }}-m2-v1-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-m2-v1-
- name: Clear our artifacts from Maven cache # q: does this work!?!?!
working-directory: java/
run: |
rm -rf ~/.m2/repository/co/worklytics/
rm -rf ~/.m2/repository/com/avaulta/
mvn clean
- name: Compile gateway-core
working-directory: java/gateway-core
run: |
Expand All @@ -54,6 +45,10 @@ jobs:
run: |
mvn compile ${{ env.compile-profile }}-T 2C -Dversions.logOutput=false \
-DprocessDependencies=false -DprocessDependencyManagement=false
# JDK-8 core tests failing after fixes to support JDK-21 (see https://github.com/Worklytics/psoxy/pull/572)
# not bothering to fix for now, as only used as library in linked builds - deploying to jre-8 is not supported
# (and actually it compiles, it's just the tests the fail)
- name: Test core
working-directory: java/core
run: |
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ Working tracking of changes, updated as work done prior to release. Please revi
then wildcard policy to read shared also grants read of secrets across all connectors)
- keys/salts per value kind (PII, item id, etc)

## [0.4.41](https://github.com/Worklytics/psoxy/release/tag/v0.4.41)
* GCP only : Compute Engine API will be enabled in the project. Newer versions of GCP terraform
provider seem to require this. You may see this in your next `terraform plan`, although it may
also be a no-op if you already have the API enabled.

## [0.4.36](https://github.com/Worklytics/psoxy/release/tag/v0.4.36)
* Microsoft 365 - Azure AD Directory - default rules change to return `proxyAddresses` field for
users, pseudonymized; needed to match user's past email addresses against other data sources
Expand Down
34 changes: 22 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,22 @@ source.
Psoxy replaces PII in your organization's data with hash tokens to enable Worklytics's analysis to
be performed on anonymized data which we cannot map back to any identifiable individual.

It is intended to be a simple, serverless, transparent solution to provide more granular access to
data source APIs.
Psoxy is a pseudonymization service that acts as a Security / Compliance layer, which you can deploy
between your data sources (SaaS tool APIs, Cloud storage buckets, etc) and the tools that need to
access those sources.

Psoxy ensures more secure, granular data access than direct connections between your tools will
offer - and enforces access rules to fulfill your Compliance requirements.

Objectives:
- **serverless** - we strive to minimize the moving pieces required to run psoxy at scale, keeping
your attack surface small and operational complexity low. Furthermore, we define
infrastructure-as-code to ease setup.
- **transparent** - psoxy's source code is available to customers, to facilitate code review
and white box penetration testing.
- **simple** - psoxy's functionality will focus on performing secure authentication with the 3rd
party API and then perform minimal transformation on the response (pseudonymization, field
redcation). to ease code review and auditing of its behavior.
redaction) to ease code review and auditing of its behavior.

Psoxy may be hosted in [Google Cloud ](docs/gcp/development.md) or [AWS](docs/aws/getting-started.md).

Expand All @@ -30,7 +36,7 @@ Worklytics and the data source you wish to connect. In this role, the proxy per
authentication necessary to connect to the data source's API and then any required transformation
(such as pseudonymization or redaction) on the response.

Orchestration continues to be performed on the Worklytics-side.
Orchestration continues to be performed on the Worklytics side.

![proxy illustration](docs/proxy-illustration.jpg)

Expand Down Expand Up @@ -135,7 +141,7 @@ The API key/secret will be used to authenticate with the source's REST API and a
| Source | Details + Examples | API Permissions / Scopes |
|---------------------------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| Asana | [docs/sources/asana](docs/sources/asana/README.md) | a [Service Account](https://asana.com/guide/help/premium/service-accounts) (provides full access to Workspace) |
| Github | [docs/sources/github](docs/sources/github/README.md) | **Read Only** permissions for: <br/>Repository: Contents, Issues, Metadata, Pull requests<br/>Organization: Administration, Members |
| GitHub | [docs/sources/github](docs/sources/github/README.md) | **Read Only** permissions for: <br/>Repository: Contents, Issues, Metadata, Pull requests<br/>Organization: Administration, Members |
| Jira Cloud | [docs/sources/atlassian/jira-cloud](docs/sources/atlassian/jira/README.md) | "Classic Scopes": `read:jira-user` `read:jira-work` "Granular Scopes": `read:group:jira` `read:user:jira` "User Identity API" `read:account` |
| Jira Server / Data Center | [docs/sources/atlassian/jira-server](docs/sources/atlassian/jira/jira-server.md) | Personal Acccess Token on behalf of user with access to equivalent of above scopes for entire instance |
| Salesforce | [docs/sources/salesforce](docs/sources/salesforce/README.md) | `api` `chatter_api` `refresh_token` `offline_access` `openid` `lightning` `content` `cdp_query_api` | |
Expand Down Expand Up @@ -195,13 +201,17 @@ command line tools.

You will need all of the following in your deployment environment (eg, your laptop):

| Tool | Version | Test Command |
|----------------------------------------------|---------------|---------------------------|
| [git](https://git-scm.com/) | 2.17+ | `git --version` |
| [Maven](https://maven.apache.org/) | 3.6+ | `mvn -v` |
| [Java 11+ JDK](https://openjdk.org/install/) | 11+, <=20 | `mvn -v &#124; grep Java` |
| [Terraform](https://www.terraform.io/) | 1.3.x, <= 1.5 | `terraform version` |

| Tool | Version | Test Command |
|--------------------------------------------------|------------------------|---------------------------|
| [git](https://git-scm.com/) | 2.17+ | `git --version` |
| [Maven](https://maven.apache.org/) | 3.6+ | `mvn -v` |
| [Java JDK 11+ LTS](https://openjdk.org/install/) | 11, 17, 21 (see notes) | `mvn -v &#124; grep Java` |
| [Terraform](https://www.terraform.io/) | 1.3.x, <= 1.5 | `terraform version` |

NOTE: we will support Java versions for duration of official support windows, in particular the
LTS versions. As of Nov 2023, we still support java 11 but may end this at any time. Minor
versions, such as 12-16, and 18-20, which are out of official support, may work but are not
routinely tested.

NOTE: Using `terraform` is not strictly necessary, but it is the only supported method. You may
provision your infrastructure via your host's CLI, web console, or another infrastructure provisioning
Expand Down
49 changes: 49 additions & 0 deletions docs/aws/authentication-authorization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Authentication and Authorization in AWS Deployments of Psoxy

This page provides an overview of how proxy authenticates and confirms authorization of clients
(Worklytics tenants).

## Authentication

Each Worklytics tenant operates as a unique GCP service account within Google Cloud. GCP issues
an identity token for this service account to processes running in the tenant, which the tenant then
uses to authenticate against AWS.

This is [OIDC](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_oidc.html) based
identity federation (aka "web identity federation" or "workload identity federation").

No secrets or keys need to be exchanged between Worklytics and your AWS instance. The integrity of
the authentication is provided by the signature of the identity token provided by GCP, which AWS
verifies against Google's public certificates.

## Authorization

Within your AWS account, you create an IAM role, with a role assumption policy that allows your
Worklytics tenant's GCP Service Account (identified by a numeric ID you obtain from the Worklytics
portal) to assume the role.

This assumption policy will have a statement similar to the following, where the value of the `aud`
claim is the numeric ID of your Worklytics tenant's GCP Service Account:
```json
{
"Effect": "Allow",
"Principal": {
"Federated": "accounts.google.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"accounts.google.com:aud": "12345678901234567890123456789"
}
}
}
```

Colloquially, this allows a web identity federated from `accounts.google.com` where Google has
asserted the claim that `aud` == `12345678901234567890123456789` to assume the role.

Then you use this AWS IAM role as the principal in AWS IAM policies you define to authorize to
invoke your proxy instances via their function URLs (API connectors) or to read from their sanitized
output buckets (bulk data connectors)

See: https://github.com/Worklytics/psoxy/blob/v0.4.40/infra/modules/aws/main.tf#L81-L102
46 changes: 46 additions & 0 deletions docs/development/releases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Releases


## Prepare Release Candidate

From `main`:

```shell
./tools/release/prep.sh v0.4.15 rc-v0.4.16
```

- follow steps output by that tool
- if need interim testing, create a "branch" of the release (eg, branch `v0.4.16` instead of tag),
and trigger `gh workflow run ci-terraform-examples-release.yaml`

## Release

On `rc-`:

```shell
./tools/release/prep.sh rc-v0.4.16 v0.4.16
```

QA aws, gcp dev examples by running `terraform apply` for each, and testing various connectors.

```shell
./tools/release/rc-to-release.sh v0.4.16
```

After merged to `main`:
```shell
./tools/release/publish.sh v0.4.16
```

## Java 8 Library Binaries

```shell
cd java/gateway-core
mvn clean install -P java8
cd ../core
mvn clean install -P java8
```




55 changes: 55 additions & 0 deletions docs/faq-security.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# FAQ - Security


## Can Psoxy invocation be locked to a set of known IP addresses?

No, but this is not necessary, as requests from your Worklytics tenant to your Psoxy instances are
authenticated via identity federation (OIDC) and authorized by your Cloud providers IAM policies.

Your Worklytics tenant is a process running in GCP, personified by a unique GCP service account. You
simply use your cloud's IAM to grant that service account access to your psoxy instance.

This is functionally equivalent to how access is authenticated and authorized to within and between
any public cloud infrastructure. Eg, access to your S3 buckets is authorized via a policy you specify
in AWS IAM.

Remember that Psoxy is, in effect, a drop-in replacement for a data sources API; in general, these
APIs, such as for Google Workspace, Slack, Zoom, and Microsoft 365, are already accessible from
anywhere on the internet without IP restriction. Psoxy exposes only a more restricted view of the
source API - a subset of its endpoints, http methods (read-only), and fields - with field values that
contain PII redacted or pseudonymized.

See [AWS Authentication and Authorization](aws/authentication-authorization.md) for more details.

See [GCP Authentication and Authorization](gcp/authentication-authorization.md) for more details.

## Can Psoxy instances be deployed behind an AWS API Gateway?

Yes - and prior to March 2022 this was necessary. But AWS has released [Lambda function urls](https://docs.aws.amazon.com/lambda/latest/dg/lambda-urls.html)
, which provide a simpler and more direct way to securely invoke lambdas via HTTP. As such, the
Worklytics-provided Terraform modules use function URLs rather than API gateways.

API gateways provide a layer of indirection that can be useful in certain cases, but is overkill for
psoxy deployments - which do little more than provide a transformed, read-only view of a subset of
endpoints within a data source API. The indirection provides flexibility and control, but at the
cost of complexity in infrastructure and management - as you must provision a gateway, route, stage,
and extra IAM policies to make that all work, compared to a function URL.

That said, the payload lambdas receive when invoked via a function URL is equivalent to the payload
of API Gateway v2, so the proxy itself is compatible with either API Gateway v2 or function urls.

## Can I deploy a WAF in front of my Psoxy instances?

Sure, but why? Psoxy is itself a rules-based layer that validates requests, authorizes them, and
then sanitizes the response. It is a drop-in replacement for the API of your data source, which in
many cases are publicly exposed to the internet and likely implement their own WAF.

Psoxy never exposes *more* data than is in the source API itself, and in the usual case it provides
read-only access to a small subset of API endpoints and fields within those endpoints.

Psoxy is stateless, so all requests must go to the source API. Psoxy does not cache or store any
data. There is no database to be vulnerable to SQL injections.

A WAF could make sense if you are using Psoxy to expose an on-prem, in-house built tool to
Worklytics that is otherwise not exposed to the internet.

27 changes: 27 additions & 0 deletions docs/gcp/authentication-authorization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Authentication and Authorization in GCP Deployments of Psoxy

This page provides an overview of how psoxy authenticates and confirms authorization of clients
(Worklytics tenants) to access data for GCP-hosted deployments.

## Authentication

As Worklytics tenants run inside GCP, they are implicitly authenticated by GCP. No secrets or keys
need be exchanged between your Worklytics tenant and your Psoxy instance. GCP can verify the
identity of requests from Worklytics to your instance, just as it does between any process and
resource within GCP.


## Authorization

Invocations of your proxy instances are authorized by the IAM policies you define in GCP. For API
connectors, you grant the Cloud Function Invoker role to your Worklytics tenant's GCP service account
on the Cloud Function for your instance.

For the bulk data case, you grant the Storage Object Viewer role to your Worklytics tenant's GCP
service account on the sanitized output bucket for your connector.

You can obtain the identity of your Worklytics tenant's GCP service account from the Worklytics
portal.



1 change: 1 addition & 0 deletions docs/gcp/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Service Account Keys and activate Google Workspace APIs.
*attempt* to enable these, but as there is sometimes a few minutes delay in activation and in
some cases they are required to read your existing infra prior to apply, you may experience
errors. To pre-empt those, we suggest ensuring the following are enabled:
- [Compute Engine API](https://console.cloud.google.com/apis/library/compute.googleapis.com) (`compute.googleapis.com`)
- [Cloud Build API](https://console.cloud.google.com/apis/library/cloudbuild.googleapis.com) (`cloudbuild.googleapis.com`)
- [Cloud Functions API](https://console.cloud.google.com/apis/library/cloudfunctions.googleapis.com) (`cloudfunctions.googleapis.com`)
- [Cloud Resource Manager API](https://console.cloud.google.com/apis/library/cloudresourcemanager.googleapis.com) (`cloudresourcemanager.googleapis.com`)
Expand Down
Loading

0 comments on commit bd7791d

Please sign in to comment.