initial implementation #703

BTreston · 2024-12-27T15:19:10Z

🎟️ Tracking

https://bitwarden.atlassian.net/browse/PM-14360

📔 Objective

This PR implements services that allow the directory connector to batch large import requests into multiple small requests to avoid potential timeouts due to packet size exceeding default size limits for some servers. This implementation splits the requests into batches of 2000 users/groups and POSTs them sequentially.

This change also refactors some of the sync service to be more testable and removes some directory related code that does not belong. Also included are some new unit tests to the new functionality and the sync service.

📸 Screenshots

⏰ Reminders before review

Contributor guidelines followed
All formatters and local linters executed and passed
Written new unit and / or integration tests where applicable
Used internationalization (i18n) for all UI strings
CI builds passed
Communicated to DevOps any deployment requirements
Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

👍 (:+1:) or similar for great changes
📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
❓ (:question:) for questions
🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion
🎨 (:art:) for suggestions / improvements
❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
⛏ (:pick:) for minor or nitpick changes

github-actions · 2024-12-27T15:22:45Z

Checkmarx One – Scan Summary & Details – c7d13443-c968-4a9a-8d36-877e9acec29f

New Issues

Severity	Issue	Source File / Package	Checkmarx Insight
	Use_Of_Hardcoded_Password	/src/services/sync.service.spec.ts: 151	Attack Vector

BTreston · 2024-12-27T15:44:51Z

@eliykat Whenever you're back and able to give this first pass a look over, I just had a few questions:

it seems like jest doesnt like ECMAScript Modules (which are used in the googleapis) and plugging it into the existing transformer in the jest config didn't seem to fix it. Not sure how to handle the resulting error.
stripe has a cap on how large a transaction can be (999,999.99 USD) so there's a soft limit to how many users you can import and provision seats for at one time. Have we run into this before?
in terms of batching groups, do we want to batch based on the number of groups, or the number of users within the groups?

I also still have a few todo's:

More unit tests for the sync service (ie. for generateNewHash, buildRequests)
Refactor the sync call into more testable pieces, potentially move some code that might not belong in this service (looking at some of the directory stuff in this service)

eliykat

Looking good, here are a few comments but I'll take a closer look once you've finished your draft.

To answer your questions:

it seems like jest doesnt like ECMAScript Modules (which are used in the googleapis) and plugging it into the existing transformer in the jest config didn't seem to fix it. Not sure how to handle the resulting error.

Looking at the jest documentation, it doesn't have great support for ECMAScript. I think the easiest solution is to decouple the GsuiteDirectoryService from SyncService, thus avoiding importing the google apis into jest at all. e.g. move getDirectoryService to its own class with its own abstraction, and have SyncService only inject the abstraction.

This will become a problem again if/when we get a Gsuite testbed set up, but I think that solves it for now, and is a better design.

stripe has a cap on how large a transaction can be (999,999.99 USD) so there's a soft limit to how many users you can import and provision seats for at one time. Have we run into this before?

We have not run into this before, but spending $1 million USD in a single provisioning sync seems unlikely. I think we can leave it unhandled.

in terms of batching groups, do we want to batch based on the number of groups, or the number of users within the groups?

The goal here is to avoid timeouts at the network edge, so there isn't any hard and fast rule as to how we should split our requests. What do you think?

I can see an argument for splitting based on the number of users in each group - that's the property that scales as you add more users. But my assumption was that these timeouts are actually fairly generous (this request data is very small and this is the first time I've heard of requests timing out), so the simplest approach of batching by n users and n groups would be enough to resolve the issue.

eliykat · 2025-01-06T04:45:41Z

jslib/common/src/abstractions/batching.service.ts

I think both strategies should implement this interface. For example, a RequestBuilder interface which has BatchRequestBuilder and SingleRequestBuilder implementations for the new and existing behaviour respectively.

eliykat · 2025-01-06T04:48:05Z

src/services/sync.service.ts

+      return [
+        new OrganizationImportRequest({
+          groups: (groups ?? []).map((g) => {
+            return {
+              name: g.name,
+              externalId: g.externalId,
+              memberExternalIds: Array.from(g.userMemberExternalIds),
+            };
+          }),
+          users: (users ?? []).map((u) => {
+            return {
+              email: u.email,
+              externalId: u.externalId,
+              deleted: u.deleted || (removeDisabled && u.disabled),
+            };
+          }),
+          overwriteExisting: overwriteExisting,
+          largeImport: largeImport,
+        }),
+      ];


Following on from my comment above - this would be moved into the SingleRequestBuilder implementation.

eliykat · 2025-01-06T04:59:19Z

src/services/sync.service.ts

@@ -133,6 +116,36 @@ export class SyncService {
    }
  }

+  async generateNewHash(reqJson: string): Promise<string> {


This method is doing too much - it's called generateNewHash, but it's both generating a new hash and comparing it to the existing hash from state. And its return value is not very clear - a new hash if it's different, or an empty string if it's not.

I suggest that generateNewHash only generates a hash (generateHash really), and the comparison is done elsewhere (either a separate method or inline).

eliykat · 2025-01-06T05:44:06Z

openldap/ldifs/directory-11000.ldif

This directory is used for integration tests. Because your new service is more or less functional - i.e. it's just processing a set of inputs to a set of outputs - I'm not sure we need dedicated integration tests on it.

If you disagree and want to integration test it, I don't think we should maintain a set of 11k users and 11k groups. Ideally you'd specify a lower batchSize and use a more maintanable set of test data (e.g. 250 users with a batchSize of 45). I recall we discussed trying to replicate real world conditions as much as possible, but this is a lot of fixture data to maintain forever. And would be comparatively slow to run in the pipeline.

eliykat · 2025-01-06T05:50:34Z

jslib/common/src/services/batching.service.spec.ts

+    expect(requests.length).toEqual(
+      Math.ceil(mockGroups.length / batchingService.batchSize) +
+        Math.ceil(mockUsers.length / batchingService.batchSize),
+    );


I almost always prefer constants in assertions, otherwise you risk rewriting the logic of the SUT.

For example - you know that you've generated 11k users and you have batch sizes of 2k, so you know you're going to have 6 user requests. Assert that your request has a length of 6. (example only, obviously you have groups as well.)

I think it would help with this to specify the batchSize in the constructor of the service. That way, the test can specify a fixed batch size and base its assertions on that, without affecting the batch size used by the production code.

eliykat · 2025-01-06T05:53:19Z

jslib/common/src/services/batching.service.spec.ts

+    userSimulator = (userCount: number) => {
+      const simulatedArray: UserEntry[] = [];
+      for (let i = 0; i < userCount; i += batchingService.batchSize) {
+        for (let j = 0; j <= batchingService.batchSize; j++) {
+          simulatedArray.push(new UserEntry());
+        }
+      }
+      return simulatedArray;
+    };
+
+    groupSimulator = (groupCount: number) => {
+      const simulatedArray: GroupEntry[] = [];
+      for (let i = 0; i < groupCount; i += batchingService.batchSize) {
+        for (let j = 0; j <= batchingService.batchSize; j++) {
+          simulatedArray.push(new GroupEntry());
+        }
+      }
+      return simulatedArray;


I don't understand why this needs to know about batchSize - it should just generate the number of mock objects requested. Particularly because this is generating mock input data, which is usually supplied by a directory service which knows nothing about batching.

initial implementation

1249419

BTreston requested a review from eliykat December 27, 2024 15:19

eliykat requested changes Jan 6, 2025

View reviewed changes

BTreston added 6 commits January 6, 2025 14:52

next iteration

80b863d

clean up

c25d4be

Merge branch 'main' into ac/pm-14360-import-batching

e08ef3d

cleanup

c7cde29

fix test

d0f260d

cleanup

3b0d3c1

BTreston marked this pull request as ready for review January 7, 2025 15:30

BTreston requested a review from a team as a code owner January 7, 2025 15:30

BTreston requested review from JimmyVo16 and eliykat January 7, 2025 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial implementation #703

initial implementation #703

BTreston commented Dec 27, 2024 •

edited

Loading

github-actions bot commented Dec 27, 2024 •

edited

Loading

BTreston commented Dec 27, 2024 •

edited

Loading

eliykat left a comment

eliykat Jan 6, 2025

eliykat Jan 6, 2025

eliykat Jan 6, 2025

eliykat Jan 6, 2025

eliykat Jan 6, 2025

eliykat Jan 6, 2025

initial implementation #703

Are you sure you want to change the base?

initial implementation #703

Conversation

BTreston commented Dec 27, 2024 • edited Loading

🎟️ Tracking

📔 Objective

📸 Screenshots

⏰ Reminders before review

🦮 Reviewer guidelines

github-actions bot commented Dec 27, 2024 • edited Loading

New Issues

BTreston commented Dec 27, 2024 • edited Loading

eliykat left a comment

Choose a reason for hiding this comment

eliykat Jan 6, 2025

Choose a reason for hiding this comment

eliykat Jan 6, 2025

Choose a reason for hiding this comment

eliykat Jan 6, 2025

Choose a reason for hiding this comment

eliykat Jan 6, 2025

Choose a reason for hiding this comment

eliykat Jan 6, 2025

Choose a reason for hiding this comment

eliykat Jan 6, 2025

Choose a reason for hiding this comment

BTreston commented Dec 27, 2024 •

edited

Loading

github-actions bot commented Dec 27, 2024 •

edited

Loading

BTreston commented Dec 27, 2024 •

edited

Loading