-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-1869] Create instrumented orc writer #3732
[GOBBLIN-1869] Create instrumented orc writer #3732
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3732 +/- ##
============================================
+ Coverage 46.95% 47.07% +0.11%
- Complexity 10811 10865 +54
============================================
Files 2143 2147 +4
Lines 84557 84819 +262
Branches 9390 9410 +20
============================================
+ Hits 39704 39928 +224
- Misses 41235 41269 +34
- Partials 3618 3622 +4
... and 38 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
} | ||
|
||
GobblinTrackingEvent createOrcWriterMetadataEvent() throws IOException { | ||
GobblinEventBuilder builder = new GobblinEventBuilder(ORC_WRITER_METRICS_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to include the name space in this event? Also any specific reason that we don't use eventSubmitter but directly call metricContext to submit event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace helps with filtering the events since we have so many GobblinTrackingEvent types now.
Uh MetricContext not particularly a strong reason but I used it for uniformity, the InstrumentedDataWriter
classes use the metric Context (albeit it is recording metrics not submitting events). But it also helps that it adds metadata tags from the metric context at times compared to the event submitter, not that I think they're being used meaningfully right now. It might also come in handy if we want to record metrics/timers using this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to this line and think that you should set namespace. https://github.com/apache/gobblin/blob/master/gobblin-metrics-libs/gobblin-metrics-base/src/main/java/org/apache/gobblin/metrics/event/EventSubmitter.java#L97
And evenSubmitter should be even easier for you to set metadata for only the event (not other metrics). But I'm fine if you want to use metric context for consistency, just make sure we have namespace set there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
* Creates an instrumented ORC Writer that emits events on commit and close * Adds ORC writer metrics to track high record conversion metrics * Fix add final to config strings * Add namespace and use event submitter to send events
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Currently the metrics for streaming mainly relate to the Kafka consumption side. We want to also capture metrics through events from the ORCWriter to capture the memory usage and internal performance of the writer.
This PR captures the following information:
Logs of the fields looks like the below:
Tests
Commits