-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrating from SVN to GitHub #40
Comments
Yes, see my comments in gist. |
hiya! importing the ERG from SVN into GitHub is no small project, i imagine. ERG history goes back to around 1994, and there has been a long tradition for storing large binary files interspersed with the source files (owing in part to its centralized design, SVN works fairly well on binary files). i imagine some repository surgery and retroactive refactoring may be called for. if it helps, i could probably make available an SVN dump file (filtered to just include everything below the ERG directory). but for that to make sense, i think we would first have to declare the ERG in SVN read-only, i.e. establish agreement with dan that there are no pending commits and that all future development will be against GitHub. |
I have finished the first part of the migration.
I am attaching the script I used, the logs with the steps, commands and outputs, the old README file and the references I followed. In the README.org I enumerated the nexts steps.
|
Thanks, @arademaker. Rather than pruning out just the larger |
@arademaker, do we have the [incr tsdb()] profiles available anywhere? |
At the beginning of the year, @danflick and I discussed the issue with the profiles. I do not remember now what his final decision was, but one approach I suggested was to have a separate repo for them. |
Sorry, what I wrote above is precisely what I remoted in the previous comment. I don't know the current status; @danflick left Brazil with a complete step-by-step to finish the migration, but he needs time to revise the data before the final migration. |
I have the [incr tsdb()] profiles for each of the releases for the past 15 years, and would appreciate guidance on how best to organize those files on Github to enable convenient packaging of the releases, including the most recent 2023 version. |
@danflick, some suggestions:
For this |
Thanks, @goodmami, for the offer to help in using CI scripts to attach assets to releases. For the 2023 release that I just put together, I have stayed with the gold profiles in compressed form and still in erg/tsdb/gold, but those profiles are once again complete. There are so many changes in those profiles from release to release that using commit deltas on the uncompressed files doesn't seem useful, and it's much more convenient for my workflow to keep them where they are, and compressed. I hope they won't be too annoying. I have stored the large redwoods.mem file using LFS, and it is now included in the 2023 release. I don't yet see how to package in the .dat or .grm files, since they are larger than 100M, but once someone has obtained the source, it's just a one-line command to produce those two files. I also think I would rather present the .dat and .grm files (and eventually a combined LKB-FOS+ERG binary) as ones that can be downloaded separately so people who just want to use the compiled grammar can get it in one of those three forms without also getting the source, if that's possible. I'd be glad for advice on how to do that. |
@danflick That's ok for the compressed gold profiles. As long as you only update them in GitHub for releases and not regularly in between releases, it probably won't bloat the repository size too much.
Ok this is good. It might be good to provide some documentation (maybe with a revised README?) on how to retrieve this.
I've submitted #51 to add a GitHub Action CI script that will compile the grammar from a tagged release and upload the |
IMHO, the way to work with the profiles may need more thought. Of course, @danflick will be the ultimate person to decide how his workflow works best for him. But I would encourage a setup with frequent commits. This also helps preserve data if @danflick's laptop has any issues. Regarding the compiled files, definitely adding them to the releases makes much more sense, and @goodmami'srtainly solution was excellent. |
I don't think there was any misunderstanding, but to be clear I would also encourage frequent, atomic commits for changes to TDL files, configs, etc. I was only suggesting that changes to large binary files, such as gold profiles, be saved for release commits. |
I got it, and I agree. The question is how easy and safe it is for @danflick to use this workflow. He will have to carefully avoid including the changes of big files in every commit, but between the commits, if he changes these big files and something happens in his machine, he will lose data, right? I am unsure of the best solution; I am trying to make us think about possible problems. |
That's up to Dan, but I'm not sure it's very different from how he usually does it. And it's not the case that all profile changes have to be done in the same commit; each profile could be done in its own commit. It's just that any change to the binary file causes a copy of the whole thing to be stored in the history, so frequent, small changes to the same file could cause a problem. Anyway I don't want to over-complicate things. I'm just trying to avoid hitting a repository size limit. |
@danflick Sure enough, there was a bug in the script in #51 which wasn't caught until I merged it into main. I committed a fix and pushed it directly to main (aside: we can turn on branch protection rules if you want to force all commits to go through a PR before merging into the main branch). The script ran and uploaded the compressed The screenshot below also shows how to run the script with the order of clicks numbered in purple. |
The result looks good, and the workflow screenshot will be helpful for doing the next (2024) release. |
This issue concerns difficulties in importing the ERG from SVN to GitHub. See comments on this gist for some context. One question from that thread:
I can now answer that:
This might also be an issue with a manually converted repository. If so, we might need to consider storing big things like profiles and compiled
.grm
or.dat
files in a separate repo.The text was updated successfully, but these errors were encountered: