-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling column not present in data #22
Comments
Maybe we have to handle that null in that method. Do you want me to make that fix? or will you be able to send a PR? |
Hi @sanjuthomas Thanks for your quick reply. Unfortunately, I am not a JAVA guy (:python:) so i would request you if you can make the change. At the same time, i am testing the latest tag to see how it behaves. Would be very grateful if you can help out here (assuming this takes insignificant time on your end?) Kind Regards |
OK, I am working on it.. https://github.com/sanjuthomas/kafka-connect-gcp-bigtable/pulls I will release the next version with that fix. Thanks! |
Hi @sanjuthomas Thanks for this. While testing (not 1.0.10 but 1.0.7) , I have noticed some weird behaviour and wonder if you have come across this. So, using the same example above, where my column family is
randomly, the data that is getting written to the bigtable is missing attributes, like in some cases, So for example
after a while, another message comes, with the same key
in this case, randomly, age of the second is missing. Have you come across any such issue ? Also, I think adding It feels to me that something is going wrong in transform maybe 🤔 ? |
The behaviour is random in nature, so for the same messages, when i ran it into another table, different row_key had missing data. |
I have not seen any issues like that, nor has anyone reported it. It's a simple transform to transform a SinkRecord to Bigtable row. Bigtable maintains all versions of data as far as I can remember. Can you check and confirm that the data elements are produced correctly by the source? |
Yes, I can confirm that all elements of the data are produced correctly. This is because i can see it in the topic but by the time it reaches BigTable, i dont see some randomly. Just FYI - my message is a json, with 18 attributes, with each attribute either being a datatype string or something like another json. Apart from randomly missing attributes, i haven't seen another issue. I wonder, somehow transform is returning the row a bit quicker or .. 🤔 is their some multithreading kinda thing in place, where the transform is acted on the message more than once or something...sorry just throwing out my thoughts here..maybe this is the wrong directions. |
First, I will suggest you upgrade to the latest version. Its a nothing shared design, so I don't see how threads can overwrite data. |
Cool, i will give this a go ...just to confirm the latest tag is |
yes |
Please let me know. If there is a bug, I am happy to fix it. |
Thanks @sanjuthomas |
Did you find any pattern about the missing data element? Any specific data type? |
I have added some more debug logging and released a new version - https://github.com/sanjuthomas/kafka-connect-gcp-bigtable/releases/tag/kafka-connect-gcp-bigtable-1.0.12 can you deploy that version and see what is logged here - https://github.com/sanjuthomas/kafka-connect-gcp-bigtable/blob/master/src/main/java/com/sanjuthomas/gcp/bigtable/writer/BigtableWriter.java#L146 I will test it myself tomorrow if we can't find the reason. I would need your table definition, though. |
The only pattern i have found so far is that the issue is not happening when each record has a unique key. However, i am just getting on looking into ☝️ suggestion and trying the new version out. Once i have tested this out, I ll update you. But thanks for quick turnaround here. |
If you are building locally, you need to use Java 11. |
The only pattern i have found so far is that the issue is not happening when each record has a unique key. -- Does that mean when you send the first version of the record, every cell is saved, but when you send the second version of the same record, one or more cells are not saved? |
because i can't see the offset info in the metadata and the record is fairly big so can't confirm that but will check on this. But given each unique key is working fine, my guess is the problem is when a new record is written to the same row-key
should i use the jar or shaded.jar ? |
shaded.jar (kafka-connect-gcp-bigtable-1.0.12-shaded.jar) |
Let me take a look. looks like some library upgrade caused this. let me test the use case you mentioned also. |
I have fixed the Jar sign issue. The next version will show up in maven in a couple of hours. I have tested the use case that you mentioned. Same row key but with different values for the cell. I have not seen any issues. The only thing I can think of is the java version you are using for the connector. I have tested it for Java 11. Can you confirm the java version that you are using for the connector? btw I have a utility program that you can use to test the same use case that you mentioned here - You can run that again and again for the instance -> table that you have and see if you can reproduce that issue. I suggest you use Java 11. |
@ee07dazn If you are in a hurry, you can pick it up from https://tmpfiles.org/dl/313349/kafka-connect-gcp-bigtable-1.0.13-shaded.jar please note that it will be available only for an hour. |
Thanks for this @sanjuthomas
Will need to do a few checks but will test it out tomorrow morning and update you. |
I tested the same use case using the connector. I could not reproduce the issue. If you suspect that is the issue, you could use the configuration ulkMutateRowsMaxSize: 1 to write one record at a time and see. What is your bulkMutateRowsMaxSize now? |
Apologies for a little delay, Initially my bulkMutateRowsMaxSize was default of 1024. Changing this to 1, actually seems to have sorted the issue. And i think this logic makes sense, isnt it ? i do think i need to test a bit more though to be 100% sure, but initial results are suggesting the same. Did you check the same thing on your end? For example two records of same row-key and same value and part of the same batch ? In regards to the new 1.10.13, it also had some issues, basically crashing the pod. I will paste the output here soon. But i could test the above on 1.0.7 |
For example two records of same row-key and same value and part of the same batch ? - I have tested this last week. I can test once again. In regards to the new 1.10.13, it also had some issues, basically crashing the pod. - I tested the connectors in stand-alone mode using Java 11 before I published .13 version. Can you tell me your Java versions and other configs? Do you have a stack trace or heap dump for the crash? |
|
Looking at the error message, I believe you are using open jdk 11.0.15+10 (https://mail.openjdk.org/pipermail/jdk-updates-dev/2022-April/014104.html) It appears to me that some C libraries are crashing. |
|
any thoughts ? |
I haven't got enough time to test out the JVM that you are using. I will keep you posted. |
any side effects of using bulkMutateRowsMaxSize = 1, especially when rate of messages coming into the topic is not that high ? |
I don't see any side effects due to that. I will get some time this weekend to test out the JVM version and the issue you mentioned. |
Do you have a public docker image of the OS and JVM? |
https://hub.docker.com/r/ee07dazn/kafkaconnect-connectors has 1.0.10 jar installed in |
Hi @sanjuthomas
Does the transform handle a column that is not present in the map?
Say for example, my config looks like
And a test data looks like
Will the transform you have baked into, will automatically set the context to
null
or throw an error. Guess, I will find out pretty soon but if it throws an error, is there a good way to handle it ?The text was updated successfully, but these errors were encountered: