Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Mechanism to init keyspaces and pre-populate test data on startup #280

Open
tadhgpearson opened this issue Mar 29, 2024 · 4 comments

Comments

@tadhgpearson
Copy link

Thanks for building and supporting this image - it's really nice to have a dockerized Cassandra to run that just works 👍

We're using this in combination with Fabric8's Docker Maven plugin to run integration tests supported by docker containers as part of our build. To run our tests effectively, we need to instantiate keyspaces and pre-populate test data for our integration tests. This seems to be a common request, see #31 , #65 , #104 and plenty of others...

We'd rather do this as part of the image, because then developers can also start the containers using docker-compose if they want to run or debug tests in their IDEs. To date, we're been butchering the entrypoint script to achieve this, but it's hard to read, easy to screw up, and doesn't survive version upgrades well 😢

Could you consider a more elegant way to support this use case? For example, here's how it works in the Oracle Docker image: https://github.com/oracle/docker-images/blob/main/OracleDatabase/SingleInstance/README.md#running-scripts-after-setup-and-on-startup
This approach even allows users to start the container and run scripts simply by passing them as arguments to docker run. I think it would greatly improve the usability of this image.

@tianon
Copy link
Member

tianon commented Mar 29, 2024

#122 (comment) is really relevant here 😅

In short, we're really unhappy with how fragile and "not upstream" our existing script is, so the likelihood of us adding more behavior to it is very low. 🙇 ❤️

@tadhgpearson
Copy link
Author

That's a pity because it's pretty config-heavy.

If I understand it correctly, the solutions you propose either
(a) requires a lot of boilerplate from the end user (me and others) for a common use-case (integration tests). Writing a new Dockerfile to build a new image for each script set is considerably more steps, more execution time and generally becomes copy-pasta across multiple projects OR
(b) requires me to update the docs to send each user into docker config and set it up for every project they run.. which breaks the "just clone and build" paradigm that makes it easy for developers to start using new projects.

I know what you mean by that script being fragile - having added script execution to it for our project, every time I upgrade the Cassandra image version I need to rewrite it again, and every time I'm scratching my head! But it's definitely possible...

For end users, being able to add a volume of scripts to run after startup would be quick and easy out-of-the-box.

@LaurentGoderre
Copy link
Member

You could use an init container to achieve this step. You could have the default cassandra container running as the db and another cassandra with an custom entrypoint that runs the scripts on the remote host of the first container. This is a pattern I have used many times.

@tadhgpearson
Copy link
Author

You certainly can... and as a user I think it's a pretty clumsy solution.

  • it's not obvious. You'd have to write some significant documentation explaining why you have two Cassandra docker images to avoid being questioned every time a new developer opens the docker config
  • in our case, this would need to be duplicated in the Fabric8 Maven Plugin (which we use at build time) and in our docker-compose (which we use when testing in the IDE.)

Honestly, what our setup does at the moment is have a test singleton in Spring Data Cassandra that loads the schema and required data before running the first test using Spring's CQL Script Runner. This is kinda OK - it separates concerns correctly, etc. but it's still requires quite a bit of boilerplate and super-classing, and it's not obvious what's going on when there's an error in the startup CQL.

Compared to, for example, what Oracle does in their SQL image, all of these are pretty poor solutions. I think we can do better - hence this issue. I'm saying we because I use this docker image multiple times a day. It's the best one out there, and I'm invested, I want to help make it better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants