Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REMERGE the sector-coupled model keeping the commit history #1133

Open
hazemakhalek opened this issue Oct 9, 2024 · 25 comments
Open

REMERGE the sector-coupled model keeping the commit history #1133

hazemakhalek opened this issue Oct 9, 2024 · 25 comments
Assignees

Comments

@hazemakhalek
Copy link
Collaborator

Preserve PyPSA-Earth-Sec history

The MERGE PR has been mistakenly squashed and merged and now the entire history of the pypsa-earth-sec repo is not shown in the main pypsa-earth repo.

Suggested solution:

  1. Revert the merge

  2. Remerge with preserving history

  3. Reapply the following commits.

Important: Do not delete the merge branch

@hazemakhalek hazemakhalek changed the title REMERGE the sector-coupled model keeping the history of commits REMERGE the sector-coupled model keeping the commit history Oct 9, 2024
@davide-f
Copy link
Member

davide-f commented Oct 9, 2024

Revert and then applying back is an option to test. It is advisable to check the size of the whole repo in doing so, to make sure the repo does not explode in size.
Alternatively, we can do some force push and emerge the PRs we added.
This second option shall be avoided if possible as it is higher risk and is likely to be more difficult as PRs after the merge do have a different history.
The path stated above is the first good attempt.

I'd recommend to test it in a clean fork and test the change in size of the whole repo.
As the merge contains binaries, it is good to check its impact.

@ekatef also mentioned her availability.
I'm a bit stuck till the weekend unfortunately, happy to support after in case

@FabianHofmann
Copy link
Collaborator

Hey! You can also keep @siddharth-krishna in the loop. He is our git pro ;)
Reverting the merge commit can still be quite challenging in case other commits have been added. Hard pushing to the main branch should definitely be avoided imho.

@davide-f
Copy link
Member

davide-f commented Oct 9, 2024

Hey! You can also keep @siddharth-krishna in the loop. He is our git pro ;)
Reverting the merge commit can still be quite challenging in case other commits have been added. Hard pushing to the main branch should definitely be avoided imho.

Great @siddharth-krishna , any feedback would be welcome. Comments above. As a summary, we would like to replace a past squash and merge into a simple merge.

For now options are:

  • revert commit of squash-merge, and merge the merge again
  • (not recommended) some magic with force-pushing, resets etc.

As mentioned above, I wouldn't recommend force push either, but worth sharing all the options probably.

@siddharth-krishna , what do you think?

@siddharth-krishna
Copy link
Contributor

siddharth-krishna commented Oct 9, 2024

Thanks for tagging me, I'm happy to take a look and see if I can help. :)

Thanks for the summary, Davide, I agree with your (and others') recommendation against force push, as this makes life difficult for the many currently open PRs and anyone's unpushed local branches.

Reverting the squashed commit a8987468 is not so straightforward right now, since there have been commits on the main branch since a8987468. But it's not impossible. Davide is right about the size though, the revert commit will be as big as the original commit, and if we then do a merge commit of the merge-pyspa-earth-sec branch we'll effectively be adding all that code a 3rd time, so the size of the repo does go up by 2X the size of the pypsa-earth-sec code! (I'm not sure how big all this is, but we can try to calculate this.)

However, perhaps there's a 3rd option:

We can add a merge commit to the main branch that merges merge-pypsa-earth-sec branch into main. This would mostly be a redundant commit, as the code changes have already been included in main via a8987468. However, this would have the benefit of not re-adding all that code, since merge commits are very small (they only contain info on how to resolve any conflicts). It also brings the entire commit history of pypsa-earth-sec back into the main branch:
image

Caveat: unfortunately, since we don't remove the squashed merge, if we ask for the commit history of any of the files from pypsa-earth-sec, we only see the squashed merge commit:

% git log --oneline Makefile
a8987468 Merge pull request #1086 from merge-pyspa-earth-sec

But we can add a line to the README that if your history only shows that commit, you can use the following command to instead show the history of the file from the merge-pyspa-earth-sec branch:

% git log --oneline origin/merge-pyspa-earth-sec -- Makefile
1480d7a4 Merge remote-tracking branch 'pypsa-earth-sec/snakemake-module' into merge-pyspa-earth-sec
94ba3703 test: use more core
f229301e ci: use one core to better track log
...
dc4f4a2d Test routine: add Makefile CI: consolidate ci.yaml files

To summarize, Option 3: add a second (redundant) merge commit to main.
Advantages:

  1. No force pushing, so existing PRs and local branches are OK
  2. Do not need to re-apply the commits on main since a8987468
  3. Does not add extra redundant size to the repo

Disadvantages:

  1. File's commit history will only show the squashed merge, but a modified history command can be used to show the history from pypsa-earth-sec

@hazemakhalek
Copy link
Collaborator Author

Thank you @siddharth-krishna for your suggestions. I'll review them tomorrow or Friday.

In the meantime, it's important to highlight that the implemented solution must meet two critical requirements:

  1. Preserve the full commit history of the PyPSA-Earth-Sec repository: This reflects nearly three years of development and is crucial for transparency and future reference.

  2. Ensure proper credit to all contributors of the PyPSA-Earth-Sec model: Every team member's work over the past three years must be accurately represented.

This is especially important because we will soon abandon the PyPSA-Earth-Sec repository (hopefully for good). So let's take the necessary time to solve this issue carefully ;)

@siddharth-krishna
Copy link
Contributor

Sounds good, thanks.

Here is a slight variant, Option 3a, that removes the disadvantage of Option 3 and shows file histories as expected:

Idea: first merge main into the branch merge-pypsa-earth-sec, in order to make that branch have the same contents as main does currently. Then make a new PR from merge-pypsa-earth-sec and rebase merge it in (fast-forward). This can be simulated in your local checkout by the following commands:

alias go='git checkout '
go merge-pyspa-earth-sec
git merge main -s ort -Xtheirs  # Resolve conflicts using main's version, as main has new commits since a8987468 
git diff main HEAD
# Empty output here shows that the contents are the same as in main
go main
git merge merge-pypsa-earth-sec
git diff origin/main HEAD
# Empty output here shows that the contents of main with Option 3a implemented are same as origin/main

The commit graph of main now looks like:
image

And inspecting the history of files that came from pypsa-earth-sec shows their entire history as expected:

alias gl='git log -n 5 --pretty=format:"%Cred%h %Cgreen%an, %ar%Creset %s"'
gl Makefile
1480d7a4 Fabian, 8 weeks ago Merge remote-tracking branch 'pypsa-earth-sec/snakemake-module' into merge-pyspa-earth-sec
94ba3703 Fabian, 8 weeks ago test: use more core
f229301e Fabian, 8 weeks ago ci: use one core to better track log
23bc59fe Fabian, 9 weeks ago revert changes to config.default and config.tutorial priority
69817205 Fabian, 9 weeks ago workflow: use global root directory to avoid recursive upwards chdir

Advantages:

  1. No force pushing, so existing PRs and local branches are OK
  2. Do not need to re-apply the commits on main since a8987468
  3. Does not add extra size to the repo

Disadvantages:

  1. I can't think of any at the moment, but let's think about it carefully and discuss if there are any downsides!

@hazemakhalek
Copy link
Collaborator Author

Hi Sid,

This last suggestion actually sounds good. Very simple yet very smart.
I believe this should cover the 2 main requirements: the first one is clear but I think the second one too.

@davide-f, @ekatef, @FabianHofmann, any downsides you can foresee?
Also, @energyLS, please scan this and let us know if you have any suggestions.

Do you have suggestions for the way forward? @siddharth-krishna

@davide-f
Copy link
Member

davide-f commented Oct 10, 2024

I've been experimenting on the git reset usage, mainly out of curiosity.
Potentially it is not that bad, see https://github.com/pypsa-meets-earth/pypsa-earth/compare/main..davide-f:pypsa-earth:test_reset_merge2
Note that despite it looks "nice" it shall be improved.

What I did was:
git reset 3d6365a (commit before the merge) (*** comment below)
git merge merge-pypsa-earth-sec {Merged the branch of the merge}
git cherry-pick {various commits after the merge event; note that I didn't merge the merge commits as they require some additional git stuff; to make things nice, we shall account for that and also preserve the order}

The result (luckily/surprisingly) does not look that bad actually. What do you think?

*** if we decide to go for this, here we should add the --hard option. I didn't do that here, otherwise main and this were not easily comparable with the link above

@siddharth-krishna
Copy link
Contributor

@davide-f I presume the option you are investigating requires us to force-push origin/main to the reset-and-cherry-picked branch? That would work for sure, and results in a cleaner main branch, but the downside is that it makes any open PR branches, and any local branches people may have, "out of sync".

This is not a hard problem. As long as we force-push main to a new state that has the exact same file contents, it should be possible for people to rebase their branches back onto the force-pushed main. But if many people in the community are not so familiar with git, or are not aware that we have done this, it could lead to some confusion, and would require some time from maintainers to guide everyone through the steps. It's an option worth considering, though.

@energyLS
Copy link
Collaborator

Hi Sid,

This last suggestion actually sounds good. Very simple yet very smart. I believe this should cover the 2 main requirements: the first one is clear but I think the second one too.

@davide-f, @ekatef, @FabianHofmann, any downsides you can foresee? Also, @energyLS, please scan this and let us know if you have any suggestions.

Do you have suggestions for the way forward? @siddharth-krishna

@hazemakhalek thanks for keeping me in the loop. I have no further suggestions and would like to leave this to the git pros. I totally support the motivation behind it though.

@ekatef
Copy link
Member

ekatef commented Oct 14, 2024

To get a complete picture, have tested the initially suggested solution with reverting the merge commit and re-applying the merge properly. This exercise doesn't inflate size of the repo too much: disk usage by .git increases by only ~0.1 MB, with the overall .git size of about 30MB. Most likely, this small increase results from the fact that the merge commits themselfes are quite tiny, as @siddharth-krishna has explained. However, the solution Option 3b by @siddharth-krishna does basically the same with cleaner history.

Have tested Option 3b, and it looks like an excellent solution:

  1. The history seems to restore metadata of all the commits in pypsa-sec-merge branch [1668 commits (= 1669 - 1 auto-update of README) vs 1667 commits of the merge PR Merge the Merge #1086 and 1589 commits of PyPSA-Earth-Sec]
  2. History of PyPSA-Earth-Sec is available and can be seen without any issues by all git tools I have on hand (an example with Sublime):
image

Given that GitHub itself seems not to have any issues with displaying contributions if Option 3b is used, contributions PyPSA-Earth-Sec should be represented in an accurate way.

@hazemakhalek may you have any ideas on which other metrics would be helpful to check?

@siddharth-krishna thank you so much for working on this! A great solution and really amazing insights into git power 🙂

@ekatef
Copy link
Member

ekatef commented Oct 14, 2024

I've been experimenting on the git reset usage, mainly out of curiosity. Potentially it is not that bad, see https://github.com/pypsa-meets-earth/pypsa-earth/compare/main..davide-f:pypsa-earth:test_reset_merge2 Note that despite it looks "nice" it shall be improved.

What I did was: git reset 3d6365a (commit before the merge) (*** comment below) git merge merge-pypsa-earth-sec {Merged the branch of the merge} git cherry-pick {various commits after the merge event; note that I didn't merge the merge commits as they require some additional git stuff; to make things nice, we shall account for that and also preserve the order}

The result (luckily/surprisingly) does not look that bad actually. What do you think?

*** if we decide to go for this, here we should add the --hard option. I didn't do that here, otherwise main and this were not easily comparable with the link above

@davide-f thank you so much for finding time to experiment with the fix. Agree that the result looks very nice, though cherry-picking definitely requires a lot of concentration, given the need to the force-push. In any case, that is fantastic to see that the problem definitely can be resolved! ❤️

It looks like all the solutions have git merge merge-pypsa-earth-sec as the core incantation, so we seems to be converging 😄

@davide-f
Copy link
Member

davide-f commented Oct 15, 2024

Just looking there is only one possible downside of options 1 and 3*: line contributions do not match the past repo.
As we squash meshed the commit anyone who contributed to the -sec now has obtained the same line contributions as the rest and this may not be well fixed by the additional merge PR. Needs to be checked.

It is easy to check the current status in https://github.com/pypsa-meets-earth/pypsa-earth/graphs/contributors
When placing the contributions by additions, there is a huge spike due to the merge for many.
Re-merging the whole commits may further lead to weird counting potentially.
Fairness here is a point for both communities. Do we count commits only or addition/deletion?

How does it appear with the other options?

Do we want that? It seems conflicting to point 2 by hazem and may be conflicting also to -earth developer.

The force option may be tested in a parallel branch first.
The risk of breaking existing PRs may not be big problem as except for the most recent ones, they have been forked before the merge, so no conflict is expected even in the event of force pushing.
I believe the force option is unsafer, but we can keep copies to avoid damages

Regardless of the method, I'd propose to solve this by the end of the week to normalize the approach.

Current options are:

  1. revert commit of squash-merge, and merge the merge again
  2. some magic with force-pushing, cherry-picks etc.
  3. add a redundant normal merge of the merge
    3b. add a redundant merge of the merge with the procedure by @siddharth-krishna

Can you express preferences/recommendations?
@hazemakhalek @ekatef @FabianHofmann @siddharth-krishna

@siddharth-krishna
Copy link
Contributor

As we squash meshed the commit anyone who contributed to the -sec now has obtained the same line contributions as the rest

Yes, it looks like this is true. The squash merged commit a898746 has

119 files changed
+16423 -1817 lines changed

and git show on it shows

commit a8987468ceda152ed1152f6c7bfa2ffb79da0837
Author: Fabian Hofmann <[email protected]>
Date:   Thu Sep 19 23:17:23 2024 +0200

    Merge pull request #1086 from merge-pyspa-earth-sec

So I think GitHub attributed that entire commit to the author of the PR, Fabian. This can be confirmed in https://github.com/pypsa-meets-earth/pypsa-earth/graphs/contributors
image

Testing the effect of Option 3b on GitHub's contributors page

I created a test repo https://github.com/siddharth-krishna/test-contributions to test how GitHub's contributions are counted when PRs are squash merged vs merge merged. I tried to simulate what happened here: I made some dummy commits on main (I hope you'll excuse me, Fabian, for using your name for this test):

28c9bc9 Fabian Hofmann, 5 minutes ago A
0cfaa2e Siddharth Krishna, 8 minutes ago Initial commit

And this resulted in the following contributors graph:
image

I then made a branch sec-branch that had some commits from both of us, but Fabian had 3 commits and 3 lines added, while I had 2 and 2:

6277065 Fabian Hofmann, 21 minutes ago F
9021486 Fabian Hofmann, 22 minutes ago E
166a50f Fabian Hofmann, 22 minutes ago D
a8a6f05 Siddharth Krishna, 22 minutes ago C
bbe1f9d Siddharth Krishna, 22 minutes ago B
28c9bc9 Fabian Hofmann, 27 minutes ago A
0cfaa2e Siddharth Krishna, 31 minutes ago Initial commit

I then opened a PR siddharth-krishna/test-contributions#1, and did a squash merge. After the squashing, the contributors graph looked like:
image

I then implemented Option 3b by making a redundant merge commit on the sec-branch and opening a new PR from the same branch siddharth-krishna/test-contributions#2. I couldn't rebase merge this PR, not sure why, but I merge merged it in, resulting in the following git commit graph on the main branch:
image

And the contributors graph looked like this:
image

So it looks like GitHub attributes the squash merge as +1 commit to all authors involved, and also +L additions to all authors involved (where L is the number of lines added in the PR). If we then do Option 3b and make a redundant merge that brings in all the commits from the sec-branch, GitHub adds the correct number of commits and additions to each author.

On the plus side, this option brings the number of commits of all contributors back to almost what it should be (pypsa-earth-sec authors will have +1 extra commit, but perhaps that's an acceptable error). It also preserves the relative ranking between pypsa-earth-sec authors in terms of additions. But on the minus side, pyspa-earth-sec authors now have double counted their additions from the sec-branch, especially compared to pypsa-earth contributors who have not contributed to pypsa-earth-sec.

I suppose it all depends on which metric the contributors to both repos care about? And whether people would rather maintain an accurate "additions" metric, or force-push the main branch and potentially inconvenience people with open PRs and local branches. I am happy to support, no matter which option everyone ends up choosing. :)

(If I have time tomorrow I could also investigate the effect of Option 1 on the contributors graph, but my guess would be that a revert would not result in a decrease in "additions".)

@hazemakhalek
Copy link
Collaborator Author

I only now noticed the spike in line additions now. Thanks for pointing this out @davide-f

@siddharth-krishna, great and insightful work!

I personally only look at the number of commits. but I understand that this can be seen as unfair to the contributors of PyPSA-Earth.

As a side note: The additions counter for the main pypsa-earth-sec contributors (Leon and myself) is already inflated because we had a few csvs in the repo back in the very early stages of the repo in Q1 2022.
If the additions count matter, maybe we can use something like git-filter-repo to remove the history of those csv files?

Lastly, I agree with Sid, I don't think this will automatically reduce the additions..

@davide-f
Copy link
Member

Great exploration @siddharth-krishna !!! <3 I believe we have all we need.

The summary is: options that involve re-merging the merge basically bring the number of commits to the number that should be but they don't fix the line changes (additions/deletions), possibly leading to (partial) double counting of some line of code.

I'd love to be able to see if with cherry-picking we could provide a better and cleaner approach without disrupting everything.
Personally, I hope to find some time in the weekend for the test. Support is welcome.

I'd like to have an exchange with @hazem about it

By next week, let's fix it :D

@siddharth-krishna
Copy link
Contributor

I agree with everything above, but I had a few minutes this morning so for completeness (and my curiosity) I tested out what happens to contributions if we revert the squash merge:

Testing the effect of Option 1 on the contributors graph

Before the squash merge:
image

After squash merging siddharth-krishna/test-contributions#3:
image

After Option 1 (revert squash merge and merge merge the branch again):
image
image

It looks like the revert commit counted as 1 commit and 3 deletions for me (Sid), and then the merge merge did the expected thing and got counted as 3 commits (2 on the branch and 1 merge commit) and 2 additions for me, and 1 commit and 1 addition for Fabian.

Conclusion: I think Option 1 does not help in terms of fixing commit or additions count, and might muddy the metrics further since it also adds a bunch of deletions to the person who makes the revert commit. So, as mentioned by Hazem and Davide our options remain:

  • Option 2: rewrite history by resetting to before the squash merge, merge merging, cherry picking newer commits, and force pushing main
  • Option 3b: adding a redundant merge merge

@davide-f
Copy link
Member

davide-f commented Oct 20, 2024

I tested Option 2 in this PR: https://github.com/davide-f/pypsa-earth/tree/main
that in terms of changes it is basically perfect:
https://github.com/pypsa-meets-earth/pypsa-earth/compare/main..davide-f:pypsa-earth:main

Using the procedure as follows:

git reset 3d6365ac6af69b86fcadb1de9809ed09986e654f
git push --force
git clean -f
git fetch upstream
git status
git merge upstream/merge-pyspa-earth-sec
git cherry-pick 1777cb665ddb9d56d23aeb3f867a41ff1244742e
git cherry-pick 2350d55209bbeae09c6d019cfbf72d3e7989e80b
git cherry-pick d0fa6d6f3bccb18208c002a5d1d2353673e6a298
git cherry-pick 7b1249a20e6b56815329cc05ebf781e90b99d01f
git cherry-pick 7ef77254244e22982f4c7ba57b0fd63c7220101f
git push
git cherry-pick -m 1 ea99d748c1e7b7e6af9fd4ebad0554df70687e55
git commit --allow-empty
git cherry-pick 3142b1f71a1f73bfe753fd0107aa03c47e6740af
git cherry-pick ae4479fe3cb2405908d0962ff44de80e872873af
git cherry-pick 2cda6d768a28e79c8573be316a88e9a93c990242
git cherry-pick 3b8b952f978cf8f6c327284fc9f8e0f09c4e7d48
git cherry-pick 12edb982ef070470c08115a6ed752c0602dfa951
git cherry-pick 87d7f08f46faf9254ca8a7e31a01802a175a8a80
git cherry-pick 8fab463b47700523deefb679cca7eca75540e461
git cherry-pick a21ff8d77f2158503242c23a4cc803f8f30c046e
git push
git cherry-pick -m 1 328a32e7269853ed48c911a76e3282d5a76ed24d
git commit --allow-empty
git cherry-pick 356fb71eab053d62e9771a634c542391f5143524
git cherry-pick b64623d8d3d8127e186dbe3b9a899e326e1bcb72
git cherry-pick 75cb757f2a44464d94830812010dbd4e2c821e8c
git cherry-pick 2824af4fceff9a745e9d3ff1d02233422cd6b896
git cherry-pick 0a6f98c0104c859ba898f68a6230cf46ee401690
git cherry-pick 489ff8f9fb8b35cd4730910a23411da57ebd28e6

However, as a downside, the cherry-picks add the picker [me in this case] as co-author of each of the latest commits, though minimal.

@hazemakhalek
Copy link
Collaborator Author

I'll schedule a call to finalize this issue. If you're interested to join please react to this.

@davide-f
Copy link
Member

davide-f commented Oct 21, 2024

Today we met with @ekatef and @hazemakhalek and we are of common agreement that option 3b shall be the way to go.
The reason being that it is extremely easier for the users and developers.
Having better contribution list (additions/deletions) has not been considered a sufficient reason to compromise usability of the users.

Therefore, we shall go for option 3b. Katia has given availability to do so, as she also tested it locally.
As a curiosity, have you pushed the result in a branch? It would be nice to see the result and then do it on main. If you need support, feel free to ask [no pressures]
I'm wondering that it may be easy to create a new branch of the PR "merge" and remerge it here in github?

@ekatef
Copy link
Member

ekatef commented Oct 21, 2024

Today we met with @ekatef and @hazemakhalek and we are of common agreement that option 3b shall be the way to go. The reason being that it is extremely easier for the users and developers. Having better contribution list (additions/deletions) has not been considered a sufficient reason to compromise usability of the users.

Therefore, we shall go for option 3b. Katia has given availability to do so, as she also tested it locally. As a curiosity, have you pushed the result in a branch? It would be nice to see the result and then do it on main. If you need support, feel free to ask [no pressures] I'm wondering that it may be easy to create a new branch of the PR "merge" and remerge it here in github?

The result of application Option 3a are here in this comparison.

The new branch fix_squash_merge3 is a clone of main to which I have applied the list of git commands proposed by @siddharth-krishna. A little additional adjustment will be needed to use main version of add_extra_components.py instead of merge-pypsa-earth-sec one.

@siddharth-krishna
Copy link
Contributor

@ekatef thanks for trying it out! I took a look at your branch, and there might be 2 minor issues:

  1. As you point out, for some reason despite using git merge main -s ort -Xtheirs, there's a difference between main and your branch on add_extra_components.py... not sure why this is happening, but it's not a big deal, we can always do git merge main and resolve the conflicts manually while ensuring that we always pick main's version of each conflict.
  2. If we apply Option 3a to a branch such as ekatef:fix_squash_merge3 and then create a PR to main that is then merged with a merge commit, this doesn't fix the issue with file histories. Unfortunately, the merge commit created by GitHub when merging in a PR has the "main parent" as the main branch (which is normally what you want), so a git log query after merging will only report the commits on the main branch (i.e. the squashed merge commit, and not the full history from the merge-pyspa-earth-sec branch).

This can be seen in my test repository https://github.com/siddharth-krishna/test-contributions, where I implemented Option 3b on a branch called sec-branch and then merged it in with a merge commit:

*   1d4ab0c Siddharth Krishna, 5 days ago Merge pull request #2 from siddharth-krishna/sec-branch
|\  
| *   872ff0a (origin/sec-branch, sec-branch) Siddharth Krishna, 5 days ago Merge branch 'main' into sec-branch
| |\  
| |/  
|/|   
* | 7e1d4d9 Somebody Else, 5 days ago X
* | c006a44 Siddharth Krishna, 5 days ago Merge the sec branch (#1)
| * 6277065 Fabian Hofmann, 5 days ago F
| * 9021486 Fabian Hofmann, 5 days ago E
| * 166a50f Fabian Hofmann, 5 days ago D
| * a8a6f05 Siddharth Krishna, 5 days ago C
| * bbe1f9d Siddharth Krishna, 5 days ago B
|/  
* 28c9bc9 Fabian Hofmann, 5 days ago A
* 0cfaa2e Siddharth Krishna, 5 days ago Initial commit

Unfortunately, if you are at commit 1d4ab0c and you ask for the history of the README file (that all commits have modified), you see:

gl README.md 
7e1d4d9 Somebody Else, 5 days ago X
c006a44 Siddharth Krishna, 5 days ago Merge the sec branch (#1)
28c9bc9 Fabian Hofmann, 5 days ago A
0cfaa2e Siddharth Krishna, 5 days ago Initial commit

Instead of the desired full history:

7e1d4d9 Somebody Else, 5 days ago X
c006a44 Siddharth Krishna, 5 days ago Merge the sec branch (#1)
6277065 Fabian Hofmann, 5 days ago F
9021486 Fabian Hofmann, 5 days ago E
166a50f Fabian Hofmann, 5 days ago D
a8a6f05 Siddharth Krishna, 5 days ago C
bbe1f9d Siddharth Krishna, 5 days ago B
28c9bc9 Fabian Hofmann, 5 days ago A
0cfaa2e Siddharth Krishna, 5 days ago Initial commit

On the other hand, if we checkout ekatef:fix_squash_merge3 and ask for the history of a file we do see the detailed changes from both main and merge-pyspa-earth-sec:

git checkout ekatef/fix_squash_merge3
git log --oneline .gitignore
7aded83c Fabian, 8 weeks ago solve_networks: fix network reference for sector coupled case
1480d7a4 Fabian, 10 weeks ago Merge remote-tracking branch 'pypsa-earth-sec/snakemake-module' into merge-pyspa-earth-sec
031ad55d Fabian, 3 months ago replace snakemake subworkflow by submodule; fix path relations
d6299207 Hazem-IEG, 11 months ago industrial database v1
9e3d7ca8 energyls, 1 year ago feat: gitignore cost data
9566c0da carlosfv, 1 year, 1 month ago Changes made: *git-ignore file has been updated to avoid all .geojson files *details of new parameters have been added to the configtables (clean_osm_data_options.csv) *the config.default and config.tutorial files have been renamed and comments have been formated to fit PEP 8 recommendations *clean_osm_data script has been renamed and now the description of the added function follows PEP8 format and matches other functions *the proposed function in clean_osm_data has now been tested for substations also and is now being used in substations and cables
97c2f2f0 Hazem-IEG, 1 year, 1 month ago add default values for EG and ignore slurm folder
174ae272 Hazem-IEG, 1 year, 2 months ago resolving PR comments
6d54321a carlosfv, 1 year, 2 months ago Changes made: *The gitignore file added the custom_line path in the data folder to be included recognized (custom _lines.geojson has some predefined values as an example) *the config.default file has been modified (lines 104) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True) *the clean_osm_script has been adapted (line 855) to include a conditional (if), linked to the flag in the config file in order to use either the default osm data or the custom data file to create the base dataframe of lines (df_lines) *the config.tutorial file has been modified (lines 118) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True) and run the tests for the PR *To run the model with custom data, a file has to be created and added in the data folder named "custom_lines.geojson"
86898d07 Hazem-IEG, 1 year, 2 months ago adding the excel file with all the paths
56c97afe Hazem-IEG, 1 year, 2 months ago remove unnecessay commands and files
7b44631e Davide Fioriti, 1 year, 3 months ago bugfix: Fix admin needs for windows
d0d709d5 Davide Fioriti, 1 year, 4 months ago Fix pre-commit and revise style (#779)
3ac1efdd Davide Fioriti, 1 year, 4 months ago Merge branch 'main' into fix_empty_data_countries
b015ab28 Davide Fioriti, 1 year, 4 months ago Fix summary (#764)
09fbece4 Max Parzen, 1 year, 5 months ago fix typo
71dd6006 Max Parzen, 1 year, 5 months ago add dask scheduler

Perhaps the way to implement Option 3b is to apply the commands on a branch like ekatef:fix_squash_merge3, use the Compare page/git diff to ensure that there are no differences to main and then force push that branch to main.

@ekatef
Copy link
Member

ekatef commented Oct 22, 2024

@ekatef thanks for trying it out! I took a look at your branch, and there might be 2 minor issues:

  1. As you point out, for some reason despite using git merge main -s ort -Xtheirs, there's a difference between main and your branch on add_extra_components.py... not sure why this is happening, but it's not a big deal, we can always do git merge main and resolve the conflicts manually while ensuring that we always pick main's version of each conflict.
  2. If we apply Option 3a to a branch such as ekatef:fix_squash_merge3 and then create a PR to main that is then merged with a merge commit, this doesn't fix the issue with file histories. Unfortunately, the merge commit created by GitHub when merging in a PR has the "main parent" as the main branch (which is normally what you want), so a git log query after merging will only report the commits on the main branch (i.e. the squashed merge commit, and not the full history from the merge-pyspa-earth-sec branch).

This can be seen in my test repository https://github.com/siddharth-krishna/test-contributions, where I implemented Option 3b on a branch called sec-branch and then merged it in with a merge commit:

*   1d4ab0c Siddharth Krishna, 5 days ago Merge pull request #2 from siddharth-krishna/sec-branch
|\  
| *   872ff0a (origin/sec-branch, sec-branch) Siddharth Krishna, 5 days ago Merge branch 'main' into sec-branch
| |\  
| |/  
|/|   
* | 7e1d4d9 Somebody Else, 5 days ago X
* | c006a44 Siddharth Krishna, 5 days ago Merge the sec branch (#1)
| * 6277065 Fabian Hofmann, 5 days ago F
| * 9021486 Fabian Hofmann, 5 days ago E
| * 166a50f Fabian Hofmann, 5 days ago D
| * a8a6f05 Siddharth Krishna, 5 days ago C
| * bbe1f9d Siddharth Krishna, 5 days ago B
|/  
* 28c9bc9 Fabian Hofmann, 5 days ago A
* 0cfaa2e Siddharth Krishna, 5 days ago Initial commit

Unfortunately, if you are at commit 1d4ab0c and you ask for the history of the README file (that all commits have modified), you see:

gl README.md 
7e1d4d9 Somebody Else, 5 days ago X
c006a44 Siddharth Krishna, 5 days ago Merge the sec branch (#1)
28c9bc9 Fabian Hofmann, 5 days ago A
0cfaa2e Siddharth Krishna, 5 days ago Initial commit

Instead of the desired full history:

7e1d4d9 Somebody Else, 5 days ago X
c006a44 Siddharth Krishna, 5 days ago Merge the sec branch (#1)
6277065 Fabian Hofmann, 5 days ago F
9021486 Fabian Hofmann, 5 days ago E
166a50f Fabian Hofmann, 5 days ago D
a8a6f05 Siddharth Krishna, 5 days ago C
bbe1f9d Siddharth Krishna, 5 days ago B
28c9bc9 Fabian Hofmann, 5 days ago A
0cfaa2e Siddharth Krishna, 5 days ago Initial commit

On the other hand, if we checkout ekatef:fix_squash_merge3 and ask for the history of a file we do see the detailed changes from both main and merge-pyspa-earth-sec:

git checkout ekatef/fix_squash_merge3
git log --oneline .gitignore
7aded83c Fabian, 8 weeks ago solve_networks: fix network reference for sector coupled case
1480d7a4 Fabian, 10 weeks ago Merge remote-tracking branch 'pypsa-earth-sec/snakemake-module' into merge-pyspa-earth-sec
031ad55d Fabian, 3 months ago replace snakemake subworkflow by submodule; fix path relations
d6299207 Hazem-IEG, 11 months ago industrial database v1
9e3d7ca8 energyls, 1 year ago feat: gitignore cost data
9566c0da carlosfv, 1 year, 1 month ago Changes made: *git-ignore file has been updated to avoid all .geojson files *details of new parameters have been added to the configtables (clean_osm_data_options.csv) *the config.default and config.tutorial files have been renamed and comments have been formated to fit PEP 8 recommendations *clean_osm_data script has been renamed and now the description of the added function follows PEP8 format and matches other functions *the proposed function in clean_osm_data has now been tested for substations also and is now being used in substations and cables
97c2f2f0 Hazem-IEG, 1 year, 1 month ago add default values for EG and ignore slurm folder
174ae272 Hazem-IEG, 1 year, 2 months ago resolving PR comments
6d54321a carlosfv, 1 year, 2 months ago Changes made: *The gitignore file added the custom_line path in the data folder to be included recognized (custom _lines.geojson has some predefined values as an example) *the config.default file has been modified (lines 104) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True) *the clean_osm_script has been adapted (line 855) to include a conditional (if), linked to the flag in the config file in order to use either the default osm data or the custom data file to create the base dataframe of lines (df_lines) *the config.tutorial file has been modified (lines 118) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True) and run the tests for the PR *To run the model with custom data, a file has to be created and added in the data folder named "custom_lines.geojson"
86898d07 Hazem-IEG, 1 year, 2 months ago adding the excel file with all the paths
56c97afe Hazem-IEG, 1 year, 2 months ago remove unnecessay commands and files
7b44631e Davide Fioriti, 1 year, 3 months ago bugfix: Fix admin needs for windows
d0d709d5 Davide Fioriti, 1 year, 4 months ago Fix pre-commit and revise style (#779)
3ac1efdd Davide Fioriti, 1 year, 4 months ago Merge branch 'main' into fix_empty_data_countries
b015ab28 Davide Fioriti, 1 year, 4 months ago Fix summary (#764)
09fbece4 Max Parzen, 1 year, 5 months ago fix typo
71dd6006 Max Parzen, 1 year, 5 months ago add dask scheduler

Perhaps the way to implement Option 3b is to apply the commands on a branch like ekatef:fix_squash_merge3, use the Compare page/git diff to ensure that there are no differences to main and then force push that branch to main.

Thanks a lot @siddharth-krishna!

Regarding the difference for add_extra_components.py, I suspect that git doesn't qualify the change as a conflict and doesn't see a need to apply -Xtheirs. Agree though that it's rather a minor issue.

I have experimented with Option3a and can confirm that you are completely right on using GitHub pull requests. A pull request to upstream/main looks promising (here, but testing in a fork reveals the pitfalls you have identified.

Using GitHub PRs

If I'm merging the identical pull request in my fork (branch merge-backup-isolated), it seems to bring back the commits to the project history but not the history of the files:
image

Using the local merge

When I'm applying Option 3b locally and push the result, the file history is brought back:
image

Not sure if we could find a way to change parameters of Github merge...

Agree that pushing local changes sound more realistically. I wonder if it could be a safer option, as compared with force-pushing the fix_merge branch, to make changes directly in the local version of upstream/main, and push only the merge commit which has been previously done locally.

@ekatef
Copy link
Member

ekatef commented Oct 24, 2024

Have restored the history in my origin/main. The preparation steps mainly followed the algorithm of Option 3a:

  1. content of merge-pypsa-earth-sec copied into a merge-pypsa-earth-sec-backup;
  2. main merged into merge-pypsa-earth-sec-backup;
  3. a missed commit 75cb757 cherry-picked;
  4. modified main pushed to my origin.

The result seems to be alright:

  1. no differences with main in terms of files content (an empty output of git diff upstream/main HEAD)
  2. the history of commits is preserved, e.g. this commit from pypsa-earth-sec is searchable by git log
  3. history of files is alright (e.g. build_base_energy_totals.py which has been added by the merge).

Due to the issue with GitHub merge commit we discussed previously, the next step must be push of the fixed main to upstream. @davide-f is would be great to have your double-checking for that.

@siddharth-krishna
Copy link
Contributor

@ekatef thanks for the dry run, and checking that the histories are preserved. I took a look at your fork and the branch ekatef/main looks good to me. I agree that the way to go would be to force push the commits on ekatef/main to origin/main. Also sounds like a good idea for someone else to double check everything. :)

🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants