-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarball: Streaming image parser PoC #1429
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1429 +/- ##
=======================================
Coverage 73.40% 73.40%
=======================================
Files 115 115
Lines 8757 8804 +47
=======================================
+ Hits 6428 6463 +35
- Misses 1688 1696 +8
- Partials 641 645 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
pkg/v1/tarball/image.go
Outdated
@@ -26,8 +26,10 @@ import ( | |||
"path" | |||
"path/filepath" | |||
"sync" | |||
"unsafe" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a lib to to track the size of all data that is referenced by this struct? unsafe.Sizeof
returns the value 40 regardless of how many megabytes are already stored in content
and headers
map.
type TarBuffered struct {
tf *tar.Reader
pos int
headers map[int]*tar.Header
content map[int][]byte
EOF bool
}
The code structure with I don't see there could be an efficient way to avoid high memory usage without restructuring the image format itself opencontainers/image-spec#936 |
This doesn't pass export from stdin stream on my machine. Need to write that test. |
Added #1436 that tests |
TarBuffered scans stream (`io.Reader`) once for filename and saves unused sections in memory for later access. This should speedup parsing a bit, because right now tarball is scanned several times, and should save resources and speed for parsing well-formed images from network. See google#1339.
Go subclassing with partials is rather hardcore here
@@ -220,6 +221,7 @@ | |||
// If a caller doesn't read the full contents, they should Close it to free up | |||
// resources used during extraction. | |||
func Extract(img v1.Image) io.ReadCloser { | |||
logs.Debug.Printf("mutate: extract %T", img) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how dumping type of v1.Image
gives access to AuthConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how that could happen either, but I can tell you this debug logging will probably be very noisy and likely not very useful. If it's helpful while working on this change that's fine, but we should remove it before reviewing and merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any tools to replace Debug prints while figuring out how Go code works? I am doing "pen and paper" reconstruction of the calls, which is not very effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically just rely on logging and pen and paper debugging, but I've heard good things about delve for interactive debugging, which is also integrated into things like VSCode.
Yeah this may just be a limitation of the format. That format is unfortunately unlikely to change any time soon. Attempts have been made, the most successful of which is estargz which abuses the tar format to allow for random access reads. This wouldn't be helpful in your case though since you can't guarantee it will be in the estargz format. Would it be reasonable to enforce some soft memory limit and fail if asked to buffer more than that during extraction? This could be configured with an env var maybe. |
|
This Pull Request is stale because it has been open for 90 days with |
The thought that I badly need money that come with job keeps me away from completing this PR. Somewhat hard to process the code in a stressed state of mind, but I will try to complete this to put into resume or something. In the meanwhile I found another contender for static network sync file format https://github.com/zchunk/zchunk which unlike |
This Pull Request is stale because it has been open for 90 days with |
TarBuffered scans stream (
io.Reader
) once for filename and savesunused sections in memory for later access. This should speedup
parsing a bit, because right now tarball is scanned several times,
and should save resources and speed for parsing well-formed images
from network.
Solves #1339.