-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(googledrive): add the tasks for google drive #725
Conversation
b72493a
to
3e38e55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Curious about why you extracted this client outside of the googledrive
package. Are other packages meant to use this?
Also, if we're going to have several common clients perhaps I'd remove some nesting in the directories and I'd have
pkg/component/internal
└──util
└──client
├──googledrive.go
├──grpc.go
└──http.go
or
/pkg/component/internal
└──util
└──client
├──googledrive
│ └──googledrive.go
├──grpc
│ └──grpc.go
└──http
└──http.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Curious about why you extracted this client outside of the googledrive package. Are other packages meant to use this?
Originally, I put Interface in googledrive
package.
However, when we import mock service, it will cause import cycle.
And, it also makes me think if my original design of the interface is good.
I realised that when I put this interface in googledrive
package, I'd think it is ok that client can know the knowledge in googledrive
package. It means the output can be googledrive.file
.
But, to make the test easier and have higher coverage, we should isolate the external part as much as possible.
So, in the current design, the external service doesn't have to know the knowledge in googledrive
package.
And, with the context above, I decide to move out the service interacting with google drive API to googledriveclient
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/pkg/component/internal
└──util
└──client
├──googledrive
│ └──googledrive.go
├──grpc
│ └──grpc.go
└──http
└──http.go
I will do so! Thanks for advice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, when we import mock service, it will cause import cycle.
But, to make the test easier and have higher coverage, we should isolate the external part as much as possible.
So, in the current design, the external service doesn't have to know the knowledge in googledrive package.
I do not agree with this. Both util/googledriveclient
and /data/googledrive/v0are tied to the Google Drive SDK. And it is highly unlikely that any other package will use the Google Drive client so, if the purpose is just breaking a dependency cycle, I think I'd go for
/pkg/component/data/googledrive[/v0]/client/client.go`.
But that wouldn't be my first option, though. I get your point about dependency cycles but it only happens in tests and the main cause is that you're defining an interface at the receiver package, which is a smell that tends to cause dependency cycle issues. In general, try to use interfaces when you want to communicate through packages and define them in the outer layer. Here you're using an interface to "increase the coverage" but in fact you're moving logic to another package that isn't covered, so that argument is a fallacy (better than no coverage but this strategy still leaves some core logic outside of the tests).
I'd try to avoid mocks here and inject the Google Drive API URL into the component (you may use a default URL) and use the WithEndpoint
(or WithHTTPClient
? I don't know if that would work) option to make requests against a local fake http server instead of the public GDrive API. Such strategy can be found in e.g. the openai
package. It takes more effort as you need to manually define the HTTP handler(s) but the coverage is total. It this is not possible due to time constraints, I'd fall back to the option proposed above, keeping the interface isolated but within the googledrive
component instead of under util
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/pkg/component/data/googledrive[/v0]/client/client.go`
Oh, I think it is better.
Here you're using an interface to "increase the coverage" but in fact you're moving logic to another package that isn't covered, so that argument is a fallacy (better than no coverage but this strategy still leaves some core logic outside of the tests).
I get your point. My elaboration is not good. I think the discussion points are not these points.
But, actually, I haven't had a clear belief/principle now. So, I just write down what I think and feel recently.
I'd try to avoid mocks here and inject the Google Drive API URL into the component (you may use a default URL) and use the WithEndpoint (or WithHTTPClient? I don't know if that would work) option to make requests against a local fake http server instead of the public GDrive API. Such strategy can be found in e.g. the openai package. It takes more effort as you need to manually define the HTTP handler(s) but the coverage is total.
I also thought about mock server solution recently. But, I actually don't like this way. My "feelings" are like -
- It can increase the coverage, but are those parts covered by this solution meaningful?
- In terms of integration test, it "could" be meaningful. But, for unit testing for business logic (product logic), is it meaningful?
- I took a look on some samples like
openai
, somehow it makes the test code look messy.- When we read the test code, we cannot easily check the external I/O, which helps the developer quickly pick up the product knowledge in test code. With the solution like mocking interface, we can easily know how internal component interact with external service.
- Ideally, I hope test code can make the developer know the structure of the code. So, it means it'd be better to make test cases MECE.
In terms of integration test, it "could" be meaningful.
So, by this feeling, it means I also agree with adding the test code to googledriveclient
, which I will move it under the googledrive
.
Ideally, I should have added this client test to increase the test coverage.
But, in schedule wise, I usually don't add the client test code when there are other tasks in the team. The reasons are
- The client code is usually covered by end-to-end test.
- Though end-to-end test is hard to maintain in general. However, usually the stability of external parts are not controllable. So, even we add the mock part of the client code, it also cannot "protect" the product.
So, I just feel it has a very little value to add the test code to the client part.
In general, I think my point is what coverage is the real important parts we want to focus on.
In my previous company, we gradually move our focus from integration test to business logic testing, which is unit test.
Here, I am still thinking which tests are more important in terms of the project like VDP/ open-source / ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we read the test code, we cannot easily check the external I/O, which helps the developer quickly pick up the product knowledge in test code. With the solution like mocking interface, we can easily know how internal component interact with external service.
I don't think that's true, at least for components like openai
, which interact with the external service directly through an HTTP client. That is what we fake, so the expected outcome of a component execution is a call to the OpenAI API with the provided characteristics (URL, headers, body, etc.). We could work on making the code more readable but if we introduce an interface in the middle, we're just creating an artificial boudary that only means "this is the furthest we'll test".
I get your argument for components like this or Slack, where use use an SDK. The outcome in the execution here can be seen as a call to the SDK, we don't need to worry about how that package works and we can even trust it's tested properly. Therefore, we may set a boundary there and consider our business logic "everything that happens before we call the SDK". I understand that point but I don't share that view, to me the outcome is still the API call. In the openapi
example, if tomorrow we adopt an OpenAI Go SDK, we don't need to change our tests. If we use an SDK through an interface and we test it with a mock, if we switch SDKs tomorrow we'll need to change our tests. Also, faking the API we're targeting might be useful in the future to implement end-to-end tests, making requests to the public Instill AI API and checking the calls we make to faked external services (OpenAI, GitHub, Slack...).
In general, I think my point is what coverage is the real important parts we want to focus on.
In my previous company, we gradually move our focus from integration test to business logic testing, which is unit test.
I'm a bit skeptical about mocks as I've seen many times how they made it easy to codify a wrong assumption into the expectations and then have a green CI but business logic that is flawed in production. That's why in the repository
package the tests run against a test database instead of a SQL mock. My experience is the opposite, investing more in service / integration tests to cover the business logic, while keeping unit tests for more fine-grained cases like error handling or how input values are transformed to calls to the dependencies injected to a package. I don't mean that in my experience I haven't cared about unit tests, they've always been the part with the most weight in the testing codebase, but when it comes to increasing confidence (not coverage) they were not enough.
In short, I don't love mocking the SDK packages but I won't consider it a blocker for component development, as long as the code structure and the dependencies are clean. In the future I'd like to unify our testing approaches (and, if the approach is faking HTTP servers, move towards a component testing framework that's easy to use and understand by devs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if we introduce an interface in the middle, we're just creating an artificial boudary that only means "this is the furthest we'll test".
In the openapi example, if tomorrow we adopt an OpenAI Go SDK, we don't need to change our tests. If we use an SDK through an interface and we test it with a mock, if we switch SDKs tomorrow we'll need to change our tests.
Yeah, I think it is a good discussion point. I also have been thinking about it for a long time.
Previously, I did think that when we change SDK, we should also change our tests.
But, I am also doubt about if it is meaningful.
It is just like we test because we want to pass the test not because we want to protect the product.
However, on the other hand, in my previous POV, it somehow helps developers (at least me) to understand the code more. (but, not critical part actually. Mainly, I still read the production code to understand the structure.)
I don't have an answer now.
Also, faking the API we're targeting might be useful in the future to implement end-to-end tests, making requests to the public Instill AI API and checking the calls we make to faked external services (OpenAI, GitHub, Slack...).
Yeah, it is a good point. When it really happens, we should care about the time length of running CI.
Making real request is just much more time-consuming. I don't think we want to wait for a long time to pass CI.
I'm a bit skeptical about mocks as I've seen many times how they made it easy to codify a wrong assumption into the expectations and then have a green CI but business logic that is flawed in production.
That's why in the repository package the tests run against a test database instead of a SQL mock.
Yeah, I understand it. I also care about it.
But, mainly, I think the way to solve it is to check the input of mock service.
In the future I'd like to unify our testing approaches (and, if the approach is faking HTTP servers, move towards a component testing framework that's easy to use and understand by devs).
I can follow faking HTTP servers in the future development.
And, I will still keep this topic in my mind. For me, I don't have a strong belief now.
Really appreciate you share your view with me! Learn a lot! ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the contribution guideline for testing part.
077ed8b
to
bd427e4
Compare
Will do end-to-end test again when other services are ready. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #725 +/- ##
==========================================
- Coverage 19.70% 19.47% -0.23%
==========================================
Files 321 334 +13
Lines 64847 66219 +1372
==========================================
+ Hits 12776 12897 +121
- Misses 50162 51389 +1227
- Partials 1909 1933 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
chore: update contribution guideline for unit test
7ad113b
to
2ebb997
Compare
🤖 I have created a release *beep* *boop* --- ## [0.46.0-beta](v0.45.2-beta...v0.46.0-beta) (2024-11-05) ### Features * add `condition` field support for iterator ([#803](#803)) ([04b1252](04b1252)) * add markdowns per pages ([#792](#792)) ([3ee428e](3ee428e)) * add resolution field ([#808](#808)) ([f15f6bf](f15f6bf)) * add task sync ([#793](#793)) ([41a1eeb](41a1eeb)) * **component,audio:** add TASK_DETECT_ACTIVITY and TASK_SEGMENT ([#762](#762)) ([9e92a31](9e92a31)) * **component,http:** refactor `restapi` component to `http` component ([#797](#797)) ([c2b1862](c2b1862)) * **component:** add error handling for missing conversation ([#806](#806)) ([54cc616](54cc616)) * **component:** inject global secrets as environment variables ([#786](#786)) ([8d842a6](8d842a6)) * convert time type to string ([#809](#809)) ([7de8465](7de8465)) * **googledrive:** add the tasks for google drive ([#725](#725)) ([b6fe968](b6fe968)) * **integration:** identify supported OAuth integrations through global secrets ([#791](#791)) ([5a96453](5a96453)) * **minio:** import updated minio package and add tag on file upload ([#779](#779)) ([ef86318](ef86318)) * revamp Instill Format ([#774](#774)) ([24153e2](24153e2)) * support `length` attribute for array data ([#810](#810)) ([fb4f4f7](fb4f4f7)) * **web:** refactor the web operator ([#772](#772)) ([ae4e3c2](ae4e3c2)) ### Bug Fixes * **component,image:** fix missing show score draw ([#801](#801)) ([a405bf7](a405bf7)) * fix bug not to return error if there is no app or conversation ([#816](#816)) ([a946cfd](a946cfd)) * fix iterator upstream check ([#794](#794)) ([671971f](671971f)) * **run:** add metadata retention handler ([#800](#800)) ([25ec0c2](25ec0c2)) * **run:** add namespace id in response ([#811](#811)) ([8d29ffb](8d29ffb)) * **run:** rename pipeline run columns and fix tests ([#776](#776)) ([98f1e00](98f1e00)) * **slack:** correct link to OAuth config in documentation ([#805](#805)) ([aa0752d](aa0752d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Because
This commit