-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_new_commits crashes #2
Comments
The problem is caused by the incorrectly free a null pointer in line 102. I'll fix the bug and commit a new version |
txs. |
Btw, current fetch does the full clone of the repo. In other words it does not save on network traffic at all. What would help would be to fetch only objects associated with new commits. |
Hi, Professor! I run a test to test the fetch operation. I added some code to observe how many objects were fetched. First, I create an empty directory, then use the git_remote_fetch to fetch from remote, and I received 21 objects. Then I modify a remote file, the use git_remote_fetch again, I received 3 objects. It looks like that the fetch operation fetches objects associated with new commits |
Great, thats how fetch works with a populated git database. But we have an |
I'm sorry, Professor, I couldn't get the point. To get new commit objects, we have to fetch from the remote repository. Then what is "looks for objects that do not need to be fetched in an external database"? Could you please describe in more detail? I' sorry...... |
Lets consider the example you have above:
|
Hi, Professor!
|
The use case is that I have a database of over 11B of git objects, and would like to update it. In the considered example, all of the 21 objects obtained in the first step are |
Oh, I know the key point. You have collected commits, but have no local repository, so when I fetch, there is no local repo to compare with, I need to compare with the commit in the database. I thought you have local repo……Then I'll need to modify the fetch logic. I'll try! |
Thank you. You can assume that any of the existing object hashes and content can be obtained from the database. |
I traced the execution of fetch, and found the git_smart__negotiate_fetch function in the smart_protocol.c file tells the remote what we want and what we have. I modified it to tell the remote that we have the commits that we got last time, but later when executing to the git_smart__download_pack function in the smart_protocol.c file, the fetch operation raises an error! I checked the output, found the remote indeed knew that we have local commits. Then I checked the function, and found that when downloading from the remote, it first writes the odb in the repo to a writepack. The odb is the object database in the repo, which means that to correctly download the new commits, we must have local repo... |
Yes, we need to have local repo. The question is if we just need to prepopulate it with the objects we have, or can we replace with an alternative database, e.g., https://github.com/libgit2/libgit2-backends |
OK. By the way, what's the type of the database used? mysql, redis or sqlite? |
Its a custom database based on tokyocabinet since none of the above work at that size. You can assume an arbitrary interface to it. The difference from the git native object store is that you need to pass repo url (to get heads associated with a specific repo) |
OK. Thank you, Professor! |
also, the type of object needs to be passed |
Hi, Professor! I wrote 4 source files these days to populate blob, tree, commit and tag objects into the odb individually. Then I fetch from the remote. The result is still that it fetches all objects, including those we have populated into the odb......It seems that populating the odb doesn't work |
Have you checked them in? So what is the difference between fetch done on naturally created repo and prepopulated one? have you set heads properly? For example: When prepopulating object store, heads would not necessarily be set. Best would be to compare the exact differences between the two repos and that will give hints why fetch behaves differently. |
I figured out the cause: when create the first commit, Libgit2 won't create a branch, so I need create the master branch manually. |
Great! It checks the .git/refs/heads/* for the latest commits on each branch and compares them to what is on the server. That way only diffs can be sent back. Can you pre-populate repo only with commits, or, preferably, only the last commit? |
Indeed, populating all objects is time-consuming. I'll have a try. |
Hi, Professor! I tried to populate only with commit objects these days. I found the function git_commit_create_with_signature() could help. So I modified its logic to populate only commit object. It works. It can populate commit objects, and it can also only populate the last commit. However, an error similar to the one when I tried to modify the fetch logic earlier to make it look for objects that do not need to be fetched in an external database happened. |
Great. Can you track the attempts to access local objects during fetch, e.g, using trace or adding print to relevant functions? |
Oh, yes. I'll try to find the corresponding logic. Thank you for your hint! |
Hi, Professor! |
The external database contains shas of all objects. Can you stop fetch once it downloads the pack file? This talks a bit about pack file format In other word, can you write a function that I think that the packfile may contain all what is needed. |
I get the point. I need to capture what the fetch operation received from the remote. I think maybe I can print the data to a file. |
Try to save it first (it might actually save it on its own). I can help you with |
I checked the library code, but I didn't find code relevant to save it. When fetching, Libgit2 will receive data from the remote. The data was stored in a struct git_pkt_data. Libgit2 will receive more than once until all data were downloaded. Then Libgit2 will decompress the data. Missing delta bases occurred when decompressing it. |
I find the bug. It is because of the input, the input should obey the following format: |
I created a realistic test case of updating one linux kernel repo: Btw, not all commits in tst/linux-stable.heads.1555007357 are in the cloned repo.
|
According to the output, I think it is because of the ls.bin file. I download it, but cannot open it locally. I changed it to the UTF-8 format, but it shows messy code.(Using cat command shows the same error). So I am wondering if it was correctly produced? The former .bin file (linux-stable.bin) can be correctly opened. |
|
Here are results on ucompressed version:
|
OK, I'll try to find the error. |
Here is an update (nothing should be retrievd as of now) as the
|
|
No seg fault, but still does not work:
however ff7e9697c5c9c4d8b6521e3b6a18669fbecdba7f is in tst/1556204601.bin (btw, tst/1556204601.bin is uncompressed, should it be compressed now?) |
|
I committed fixed 1556204601.idx and tst.idx |
Glad to hear that it works. Next, I will change batch_fetch to read compressed data, so can you upload a test case: that is a file contains compressed data and a file contains indexes? |
Another issue: does nothing, even though the upstream repo has new commits, e.g., |
@KayGau Here is a test case that crashes Can you take a look at what is going on here? |
OK. I am sorry that I was busy with my graduation project the in the last few weeks. I will fix it as soon as possible! |
Hi, Audris.
The error was because that I uses the wrong method to write sha1 value into .git/refs/heads directory. I have fixed the bug |
I also checked this issue. Using the latest betch_fetch, It can correctly populate all objects provided by ls.1556989710.bin into an empty repository. But an error occured:
I found that it was not a head. I remember that all these 'heads' are obtained using the git_get_last function. I test this function, and find it will get all remote's refs, not only heads. In general, it contains the following things:
I reread libgit2's fetch implementation again. It will first receive all remote refs and check them whether they are in local repo. So I think, I should modify batch_fetch:
|
Wonderful, investigation! I am currently selecting heads via git ls-remote, and no longer using git_get_last.
To test the first approach I can produce ls.heads.1556989710 that contains not just commits but alls all the tags. |
For approach 1, if excluded them, it will try to get them again. |
So you don't think you can change git fetch to send back the tags/remotes/notes it receives from the remote together with the local heads? Otherwise, what needs to be stored for the tags/remotes/notes: just the sha1 or the content itself? |
Libgit2 will send back the tags/remotes/notes it receives from the remote together with the local heads after checking. |
would git ls-remote give at least the sha1's of all the needed pieces? |
Ok, I found a description of git notes: I don't think there is need to store them as they can be obtained from scratch during a fetch. In other words, prepopulate repo only with commits and, perhaps, tags. |
According to the Libgit2, it will first check all remote send back refs and mark refs that local don't have. Then it will send all checked remote refs to remote with local refs. I tried to prepopulate only commit objects, it works most time. But there are some cases that it behave wrongly (I construct these cases manually):
|
Does that mean that if both commits and tags are pre-populated it would work fine? I am not sure how to prepopulate notes and remotes. |
I will try to pre-populate commits and tags to see if it works. Prepopulating notes and remotes is the same as prepopulating commits ans tags. |
Audris, Libgit2's fetch is more complicated than I thought. Below is what I found:
|
Thank you for clarifying. I am storing the entire packed-refs file since only that part of the git repo is retrieved via
In that case, the following command is used to update:
I am not sure is a similar command exists in libgit2 or if it works differently from fetch |
|
I have figured out how to populate refs/notes/ and refs/remotes/. And I will complete a new batch_fetch.c soon.
|
echo tst,https://github.com/ssc-oscar/tst,968cdcf2e6b22fd5f8f95f2c8666f1a976fac0c7,968cdcf2e6b22fd5f8f95f2c8666f1a976fac0c7 | /usr/bin/get_new_commits
path: tst
url: https://github.com/ssc-oscar/tst
new head: 968cdcf2e6b22fd5f8f95f2c8666f1a976fac0c7
old head: 968cdcf2e6b22fd5f8f95f2c8666f1a976fac0c7
no update!
Segmentation fault
The text was updated successfully, but these errors were encountered: