Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-disk cache would be nice, to avoid excessive memory use #9

Open
GoogleCodeExporter opened this issue Nov 1, 2015 · 3 comments
Open

Comments

@GoogleCodeExporter
Copy link

- What steps will reproduce the problem?
1. Run against file tree containing millions of files
2. Watch memory use eventually grow to several hundred MB

- What is the expected output? What do you see instead?
I would like hardlink.py's memory use to remain moderate.  Instead it will
eventually use all the RAM of the small virtual machines I use for backing
things up.

- What version of the product are you using? On what operating system?
hardlink.py: 0.05 - 2010-01-07 (07-Jan-2010), Debian Lenny

- Please provide any additional information below.
hardlink.py tries to keep its cache of file data in memory, but on
directory trees containing 10s of millions of files (such as those I have
from backing up other machines) this is difficult to fit in moderate RAM.

It would be nice to optionally be able to use an on-disk cache file of some
sort so that it doesn't need to keep it all in RAM.  This should be
optional and probably non-default because it will surely be slower.

Cheers,
Andy

Original issue reported on code.google.com by [email protected] on 31 Jan 2010 at 4:06

@GoogleCodeExporter
Copy link
Author

Currently having a look at this. I initially did a small patch to convert the 
file_hashes dictionary into a shelve "dictionary" like object, but it choked on 
long 
integers as keys [0]. I'm unsure on whether to make my patch [1] bigger, or 
file a bug 
against shelve to make it deal with long keys properly.

[0] http://pastebin.com/m5f13fa62
[1] http://pastebin.com/f5b529dfb

Original comment by jshholland on 4 Feb 2010 at 10:15

@GoogleCodeExporter
Copy link
Author

I have now written a new patch that wraps the anydbm module (working at a lower 
level 
than shelve). Limited testing appears to show that it works, but it is yet to 
be used 
on a large tree. It should be self contained.

Original comment by jshholland on 16 May 2010 at 12:20

Attachments:

@GoogleCodeExporter
Copy link
Author

jshholland, applied your patch and it appears to be working well on first try.

had to modify it slightly to get it working.

changed

self.db = anydbm.open(name, 'n')

to

self.db = anydbm.open(self.name, 'n')

Original comment by [email protected] on 10 Nov 2010 at 12:05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant