-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import data from 2.x kafka topic into 3.x apicurio. #5669
Comments
It's true that the upgrade path requires export/import, although notably it's just a one-time operation. It can also be automated via the REST API if that's helpful. It would certainly be more convenient for a V3 server to simply automatically upgrade data from a V2 data source. While that is certainly possible to do, it is not trivial. The structure of the messages on the Kafka topic for Registry v2 is quite different from what we have in v3. In v3 we have completely rewritten our KafkaSQL storage implementation to make it more robust and easier to maintain. I don't have a great sense for exactly how much work this would be. Maybe another @Apicurio/developers has thought about it already. |
@EricWittmann thanks for the quick reply! I'd be great if we could get some input for someone with more experience around that area. Apicurio v3 is already converting from v2 to v3 data for the /import operation, I was thinking there could be a way of extending some of that code. For what I can see, SqlDataUpgrader has a few |
That's true, the problem is that the structure of the messages on the kafka topic are not v2 "entities". So it's not a 1-1 mapping between the messages on the topic and what the v3 data importer/upgrader needs. So the work that would need to be done is something that could connect to the v2 topic and convert what it finds there into a stream of v2 entities. The messages on the topic are really a journal pattern (sort of). So what's in there more closely resembles a stream of temporally sequential changes to a database than the results of those changes. As a result, what would need to happen is that we would probably need to read in all of the journal messages from the topic, use those messages to materialize the v2 database in memory, then export the information from that database into a stream of entities (essentially the contents of the .zip file). We might be able to skip the actual writing of the content to a .zip file, but only if we are careful about how we stream the data from the in-memory representation to the stream of v2 entities. This is all possible, but non-trivial (which is why we went with the export/import migration path). To be clear, I would be very happy to have this functionality! I just want to clarify the scope. :) I know it's a bit disappointing since previous (non major) version changes in Registry have been automatic. We've considered writing a CLI to help with the migration. But all it would do is use the export/import REST API operations for you. After that we could build the functionality into the Operator, which is obviously only useful for those deploying on k8s/openshift. |
Note: in v3 we are still using a journalling approach, it's just a differently formatted journal. And unfortunately it's not really feasible to simply convert from the old format to the new. Now that we're thinking about this, maybe the most feasible solution for automation would be to have the v3 KafkaSQL storage implementation know how to consume v2 journal messages. I don't know offhand how possible that is, but it might be.... |
And I suppose if we did that, the next thing someone will ask is if we can have DB patch scripts to automatically upgrade a v2 SQL db to a v3 SQL db. :) Another thing that's probably possible, but maybe not trivial. |
I didn't suggest this approach because I thought it would be a bigger change than a simple one-off translation between the data. I also thought supporting v2 data on 3.x would negate the advantages of the new v3 structure? About the import/export without the .zip, I'm happy to try making a small proof of concept. I haven't dived enough in the codebase yet but maybe there's a chance of pulling the exporter from v2 into v3 and plugging it into the v3 import skipping the .zip step? Not sure how engrained v2 exporter is to the codebase but if pulling it out is feasible that could be an option too. |
I hate to discourage contributions, because if you were to come up with a good approach we'd be thrilled to have it. I suspect that it will be difficult, but I'm happy to provide support if you're game to try a POC. :) |
Note: if you do want to proceed with a POC, I recommend hitting me up on our Zulip channel when asking questions. It will cut down on GitHub noise and, more importantly, you are much more likely to get a timely response. (not guaranteed a timely response, mind you... :) :) ) |
Feature or Problem Description
The current upgrade path from Apicurio v2 to v3 requires exporting/importing using a .zip file which involves either manual handling or more moving parts.
Proposed Solution
Add a way to convert from v2 to v3 data directly from Apicurio v3, without the need to export/import. I'm interested at the KafkaSQL storage, would it be possible to get an operation that converts data from an input _v2 topic to a _v3 output?
Looking at the code it should be possible to extend the import API to read from a topic instead of a .zip file. This would then save the data in v3 format in another kafka topic or whatever storage has been configured for the instance.
Originally posted by @Pablosko98 in #5624
The text was updated successfully, but these errors were encountered: