Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial implementation #703

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

initial implementation #703

wants to merge 7 commits into from

Conversation

BTreston
Copy link
Contributor

@BTreston BTreston commented Dec 27, 2024

🎟️ Tracking

https://bitwarden.atlassian.net/browse/PM-14360

📔 Objective

This PR implements services that allow the directory connector to batch large import requests into multiple small requests to avoid potential timeouts due to packet size exceeding default size limits for some servers. This implementation splits the requests into batches of 2000 users/groups and POSTs them sequentially.

This change also refactors some of the sync service to be more testable and removes some directory related code that does not belong. Also included are some new unit tests to the new functionality and the sync service.

📸 Screenshots

⏰ Reminders before review

  • Contributor guidelines followed
  • All formatters and local linters executed and passed
  • Written new unit and / or integration tests where applicable
  • Used internationalization (i18n) for all UI strings
  • CI builds passed
  • Communicated to DevOps any deployment requirements
  • Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

  • 👍 (:+1:) or similar for great changes
  • 📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
  • ❓ (:question:) for questions
  • 🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion
  • 🎨 (:art:) for suggestions / improvements
  • ❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
  • 🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
  • ⛏ (:pick:) for minor or nitpick changes

@BTreston BTreston requested a review from eliykat December 27, 2024 15:19
Copy link
Contributor

github-actions bot commented Dec 27, 2024

Logo
Checkmarx One – Scan Summary & Detailsc7d13443-c968-4a9a-8d36-877e9acec29f

New Issues

Severity Issue Source File / Package Checkmarx Insight
LOW Use_Of_Hardcoded_Password /src/services/sync.service.spec.ts: 151 Attack Vector

@BTreston
Copy link
Contributor Author

BTreston commented Dec 27, 2024

@eliykat Whenever you're back and able to give this first pass a look over, I just had a few questions:

  • it seems like jest doesnt like ECMAScript Modules (which are used in the googleapis) and plugging it into the existing transformer in the jest config didn't seem to fix it. Not sure how to handle the resulting error.
  • stripe has a cap on how large a transaction can be (999,999.99 USD) so there's a soft limit to how many users you can import and provision seats for at one time. Have we run into this before?
  • in terms of batching groups, do we want to batch based on the number of groups, or the number of users within the groups?

I also still have a few todo's:

  • More unit tests for the sync service (ie. for generateNewHash, buildRequests)
  • Refactor the sync call into more testable pieces, potentially move some code that might not belong in this service (looking at some of the directory stuff in this service)

Copy link
Member

@eliykat eliykat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, here are a few comments but I'll take a closer look once you've finished your draft.

To answer your questions:

it seems like jest doesnt like ECMAScript Modules (which are used in the googleapis) and plugging it into the existing transformer in the jest config didn't seem to fix it. Not sure how to handle the resulting error.

Looking at the jest documentation, it doesn't have great support for ECMAScript. I think the easiest solution is to decouple the GsuiteDirectoryService from SyncService, thus avoiding importing the google apis into jest at all. e.g. move getDirectoryService to its own class with its own abstraction, and have SyncService only inject the abstraction.

This will become a problem again if/when we get a Gsuite testbed set up, but I think that solves it for now, and is a better design.

stripe has a cap on how large a transaction can be (999,999.99 USD) so there's a soft limit to how many users you can import and provision seats for at one time. Have we run into this before?

We have not run into this before, but spending $1 million USD in a single provisioning sync seems unlikely. I think we can leave it unhandled.

in terms of batching groups, do we want to batch based on the number of groups, or the number of users within the groups?

The goal here is to avoid timeouts at the network edge, so there isn't any hard and fast rule as to how we should split our requests. What do you think?

I can see an argument for splitting based on the number of users in each group - that's the property that scales as you add more users. But my assumption was that these timeouts are actually fairly generous (this request data is very small and this is the first time I've heard of requests timing out), so the simplest approach of batching by n users and n groups would be enough to resolve the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both strategies should implement this interface. For example, a RequestBuilder interface which has BatchRequestBuilder and SingleRequestBuilder implementations for the new and existing behaviour respectively.

Comment on lines 241 to 260
return [
new OrganizationImportRequest({
groups: (groups ?? []).map((g) => {
return {
name: g.name,
externalId: g.externalId,
memberExternalIds: Array.from(g.userMemberExternalIds),
};
}),
users: (users ?? []).map((u) => {
return {
email: u.email,
externalId: u.externalId,
deleted: u.deleted || (removeDisabled && u.disabled),
};
}),
overwriteExisting: overwriteExisting,
largeImport: largeImport,
}),
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following on from my comment above - this would be moved into the SingleRequestBuilder implementation.

@@ -133,6 +116,36 @@ export class SyncService {
}
}

async generateNewHash(reqJson: string): Promise<string> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is doing too much - it's called generateNewHash, but it's both generating a new hash and comparing it to the existing hash from state. And its return value is not very clear - a new hash if it's different, or an empty string if it's not.

I suggest that generateNewHash only generates a hash (generateHash really), and the comparison is done elsewhere (either a separate method or inline).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory is used for integration tests. Because your new service is more or less functional - i.e. it's just processing a set of inputs to a set of outputs - I'm not sure we need dedicated integration tests on it.

If you disagree and want to integration test it, I don't think we should maintain a set of 11k users and 11k groups. Ideally you'd specify a lower batchSize and use a more maintanable set of test data (e.g. 250 users with a batchSize of 45). I recall we discussed trying to replicate real world conditions as much as possible, but this is a lot of fixture data to maintain forever. And would be comparatively slow to run in the pipeline.

Comment on lines 41 to 44
expect(requests.length).toEqual(
Math.ceil(mockGroups.length / batchingService.batchSize) +
Math.ceil(mockUsers.length / batchingService.batchSize),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost always prefer constants in assertions, otherwise you risk rewriting the logic of the SUT.

For example - you know that you've generated 11k users and you have batch sizes of 2k, so you know you're going to have 6 user requests. Assert that your request has a length of 6. (example only, obviously you have groups as well.)

I think it would help with this to specify the batchSize in the constructor of the service. That way, the test can specify a fixed batch size and base its assertions on that, without affecting the batch size used by the production code.

Comment on lines 14 to 31
userSimulator = (userCount: number) => {
const simulatedArray: UserEntry[] = [];
for (let i = 0; i < userCount; i += batchingService.batchSize) {
for (let j = 0; j <= batchingService.batchSize; j++) {
simulatedArray.push(new UserEntry());
}
}
return simulatedArray;
};

groupSimulator = (groupCount: number) => {
const simulatedArray: GroupEntry[] = [];
for (let i = 0; i < groupCount; i += batchingService.batchSize) {
for (let j = 0; j <= batchingService.batchSize; j++) {
simulatedArray.push(new GroupEntry());
}
}
return simulatedArray;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this needs to know about batchSize - it should just generate the number of mock objects requested. Particularly because this is generating mock input data, which is usually supplied by a directory service which knows nothing about batching.

@BTreston BTreston marked this pull request as ready for review January 7, 2025 15:30
@BTreston BTreston requested a review from a team as a code owner January 7, 2025 15:30
@BTreston BTreston requested review from JimmyVo16 and eliykat January 7, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants