added support for deaggregation for aggregated records #62

psanzay · 2021-02-18T08:44:45Z

This PR enables support for deaggregation of the records if the stream has aggregated records.

garethlewin · 2021-02-18T22:10:18Z

shard_consumer.go

@@ -63,7 +64,10 @@ func getRecords(k kinesisiface.KinesisAPI, iterator string) (records []*kinesis.
 		return nil, "", 0, err
 	}

-	records = output.Records
+	records, err = deaggregator.DeaggregateRecords(output.Records)
+	if err != nil {


Not sure about swallowing this error. What are the possible errors from deaggregator?

@garethlewin deaggregator throws error in case if it fails to unmarshal the aggregated record.
err := proto.Unmarshal(messageData, aggRecord) if err != nil { return nil, err }
Should we not return the actual payload without deaggregation in case of such error, as it can be possible the records have been aggregated using some custom logic not via amazon's aggregation format, so for those scenarios we should return the records as pushed. It should be up to user to deaggregate them in that case.

garethlewin · 2021-02-26T17:32:28Z

Hi Sorry I haven't been ignoring this, I'm just at a bit of a analysis paralysis option here.

This change would make #49 very difficult (or more accurately #49 makes this more difficult). I am really also not sure how to handle erroneous situations.

As I see it there are 3 options, and I dislike all 3:

A) On error just send in the entire blob, this means clients now have to anticipate this happening and deal with the situation, which means that they have to be aware of deaggregation.

B) On error swallow the record. This means data will be dropped, this seems very bad.

C) On error return an error from kinsumer and error. The problem with this is that a checkpoint won't be created (or we are basically back to option B) ) and thus kinsumer will never be able to handle that shard again until the record expires off it.

I am wondering what the benefits of implicit deaggregation are here vs having the clients do it on their side (which is what we do at Twitch, but then we use our own aggregation method and not the one that KCL supplies.

Created pull request template to comply with SOC2.

added support for deaggregation for aggregated records

6f5a2ca

garethlewin reviewed Feb 18, 2021

View reviewed changes

psanzay requested a review from garethlewin February 26, 2021 12:29

garethlewin mentioned this pull request Feb 26, 2021

Add Manual Checkpointing #49

Open

Create pull_request_template.md

7fa2889

Created pull request template to comply with SOC2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added support for deaggregation for aggregated records #62

added support for deaggregation for aggregated records #62

psanzay commented Feb 18, 2021

garethlewin Feb 18, 2021

psanzay Feb 19, 2021 •

edited

Loading

garethlewin commented Feb 26, 2021

added support for deaggregation for aggregated records #62

Are you sure you want to change the base?

added support for deaggregation for aggregated records #62

Conversation

psanzay commented Feb 18, 2021

garethlewin Feb 18, 2021

Choose a reason for hiding this comment

psanzay Feb 19, 2021 • edited Loading

Choose a reason for hiding this comment

garethlewin commented Feb 26, 2021

psanzay Feb 19, 2021 •

edited

Loading