You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- What steps will reproduce the problem?
1. Run against file tree containing millions of files
2. Watch memory use eventually grow to several hundred MB
- What is the expected output? What do you see instead?
I would like hardlink.py's memory use to remain moderate. Instead it will
eventually use all the RAM of the small virtual machines I use for backing
things up.
- What version of the product are you using? On what operating system?
hardlink.py: 0.05 - 2010-01-07 (07-Jan-2010), Debian Lenny
- Please provide any additional information below.
hardlink.py tries to keep its cache of file data in memory, but on
directory trees containing 10s of millions of files (such as those I have
from backing up other machines) this is difficult to fit in moderate RAM.
It would be nice to optionally be able to use an on-disk cache file of some
sort so that it doesn't need to keep it all in RAM. This should be
optional and probably non-default because it will surely be slower.
Cheers,
Andy
Original issue reported on code.google.com by [email protected] on 31 Jan 2010 at 4:06
The text was updated successfully, but these errors were encountered:
Currently having a look at this. I initially did a small patch to convert the
file_hashes dictionary into a shelve "dictionary" like object, but it choked on
long
integers as keys [0]. I'm unsure on whether to make my patch [1] bigger, or
file a bug
against shelve to make it deal with long keys properly.
[0] http://pastebin.com/m5f13fa62
[1] http://pastebin.com/f5b529dfb
Original comment by jshholland on 4 Feb 2010 at 10:15
I have now written a new patch that wraps the anydbm module (working at a lower
level
than shelve). Limited testing appears to show that it works, but it is yet to
be used
on a large tree. It should be self contained.
Original comment by jshholland on 16 May 2010 at 12:20
jshholland, applied your patch and it appears to be working well on first try.
had to modify it slightly to get it working.
changed
self.db = anydbm.open(name, 'n')
to
self.db = anydbm.open(self.name, 'n')
Original issue reported on code.google.com by
[email protected]
on 31 Jan 2010 at 4:06The text was updated successfully, but these errors were encountered: