Import data from 2.x kafka topic into 3.x apicurio. #5669

Pablosko98 · 2024-12-05T10:58:19Z

Feature or Problem Description

The current upgrade path from Apicurio v2 to v3 requires exporting/importing using a .zip file which involves either manual handling or more moving parts.

Proposed Solution

Add a way to convert from v2 to v3 data directly from Apicurio v3, without the need to export/import. I'm interested at the KafkaSQL storage, would it be possible to get an operation that converts data from an input _v2 topic to a _v3 output?
Looking at the code it should be possible to extend the import API to read from a topic instead of a .zip file. This would then save the data in v3 format in another kafka topic or whatever storage has been configured for the instance.

Originally posted by @Pablosko98 in #5624

EricWittmann · 2024-12-05T11:56:42Z

It's true that the upgrade path requires export/import, although notably it's just a one-time operation. It can also be automated via the REST API if that's helpful.

It would certainly be more convenient for a V3 server to simply automatically upgrade data from a V2 data source. While that is certainly possible to do, it is not trivial. The structure of the messages on the Kafka topic for Registry v2 is quite different from what we have in v3. In v3 we have completely rewritten our KafkaSQL storage implementation to make it more robust and easier to maintain.

I don't have a great sense for exactly how much work this would be. Maybe another @Apicurio/developers has thought about it already.

Pablosko98 · 2024-12-05T13:34:15Z

@EricWittmann thanks for the quick reply! I'd be great if we could get some input for someone with more experience around that area.

Apicurio v3 is already converting from v2 to v3 data for the /import operation, I was thinking there could be a way of extending some of that code. For what I can see, SqlDataUpgrader has a few import functions that create v3 entities from v2 inputs.

EricWittmann · 2024-12-05T16:10:54Z

That's true, the problem is that the structure of the messages on the kafka topic are not v2 "entities". So it's not a 1-1 mapping between the messages on the topic and what the v3 data importer/upgrader needs. So the work that would need to be done is something that could connect to the v2 topic and convert what it finds there into a stream of v2 entities. The messages on the topic are really a journal pattern (sort of). So what's in there more closely resembles a stream of temporally sequential changes to a database than the results of those changes.

As a result, what would need to happen is that we would probably need to read in all of the journal messages from the topic, use those messages to materialize the v2 database in memory, then export the information from that database into a stream of entities (essentially the contents of the .zip file). We might be able to skip the actual writing of the content to a .zip file, but only if we are careful about how we stream the data from the in-memory representation to the stream of v2 entities.

This is all possible, but non-trivial (which is why we went with the export/import migration path).

To be clear, I would be very happy to have this functionality! I just want to clarify the scope. :)

I know it's a bit disappointing since previous (non major) version changes in Registry have been automatic.

We've considered writing a CLI to help with the migration. But all it would do is use the export/import REST API operations for you. After that we could build the functionality into the Operator, which is obviously only useful for those deploying on k8s/openshift.

EricWittmann · 2024-12-05T16:14:39Z

Note: in v3 we are still using a journalling approach, it's just a differently formatted journal. And unfortunately it's not really feasible to simply convert from the old format to the new.

Now that we're thinking about this, maybe the most feasible solution for automation would be to have the v3 KafkaSQL storage implementation know how to consume v2 journal messages. I don't know offhand how possible that is, but it might be....

EricWittmann · 2024-12-05T16:16:02Z

And I suppose if we did that, the next thing someone will ask is if we can have DB patch scripts to automatically upgrade a v2 SQL db to a v3 SQL db. :) Another thing that's probably possible, but maybe not trivial.

Pablosko98 · 2024-12-09T11:35:16Z

Now that we're thinking about this, maybe the most feasible solution for automation would be to have the v3 KafkaSQL storage implementation know how to consume v2 journal messages.

I didn't suggest this approach because I thought it would be a bigger change than a simple one-off translation between the data. I also thought supporting v2 data on 3.x would negate the advantages of the new v3 structure?

About the import/export without the .zip, I'm happy to try making a small proof of concept. I haven't dived enough in the codebase yet but maybe there's a chance of pulling the exporter from v2 into v3 and plugging it into the v3 import skipping the .zip step? Not sure how engrained v2 exporter is to the codebase but if pulling it out is feasible that could be an option too.

EricWittmann · 2024-12-09T15:23:37Z

I hate to discourage contributions, because if you were to come up with a good approach we'd be thrilled to have it. I suspect that it will be difficult, but I'm happy to provide support if you're game to try a POC. :)

EricWittmann · 2024-12-09T15:27:39Z

Note: if you do want to proceed with a POC, I recommend hitting me up on our Zulip channel when asking questions. It will cut down on GitHub noise and, more importantly, you are much more likely to get a timely response.

(not guaranteed a timely response, mind you... :) :) )

apicurio-bot bot added the area/storage label Dec 5, 2024

Pablosko98 changed the title ~~Convert Apicurio v2 data stored in a kafka topic to v3 version.~~ Import data from 2.x kafka topic into 3.x apicurio. Dec 5, 2024

EricWittmann added type/discussion component/registry priority/low labels Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import data from 2.x kafka topic into 3.x apicurio. #5669

Import data from 2.x kafka topic into 3.x apicurio. #5669

Pablosko98 commented Dec 5, 2024 •

edited

Loading

EricWittmann commented Dec 5, 2024

Pablosko98 commented Dec 5, 2024

EricWittmann commented Dec 5, 2024 •

edited

Loading

EricWittmann commented Dec 5, 2024

EricWittmann commented Dec 5, 2024

Pablosko98 commented Dec 9, 2024 •

edited

Loading

EricWittmann commented Dec 9, 2024

EricWittmann commented Dec 9, 2024

Import data from 2.x kafka topic into 3.x apicurio. #5669

Import data from 2.x kafka topic into 3.x apicurio. #5669

Comments

Pablosko98 commented Dec 5, 2024 • edited Loading

Feature or Problem Description

Proposed Solution

EricWittmann commented Dec 5, 2024

Pablosko98 commented Dec 5, 2024

EricWittmann commented Dec 5, 2024 • edited Loading

EricWittmann commented Dec 5, 2024

EricWittmann commented Dec 5, 2024

Pablosko98 commented Dec 9, 2024 • edited Loading

EricWittmann commented Dec 9, 2024

EricWittmann commented Dec 9, 2024

Pablosko98 commented Dec 5, 2024 •

edited

Loading

EricWittmann commented Dec 5, 2024 •

edited

Loading

Pablosko98 commented Dec 9, 2024 •

edited

Loading