On-disk cache would be nice, to avoid excessive memory use #9

GoogleCodeExporter · 2015-11-01T04:58:15Z

- What steps will reproduce the problem?
1. Run against file tree containing millions of files
2. Watch memory use eventually grow to several hundred MB

- What is the expected output? What do you see instead?
I would like hardlink.py's memory use to remain moderate.  Instead it will
eventually use all the RAM of the small virtual machines I use for backing
things up.

- What version of the product are you using? On what operating system?
hardlink.py: 0.05 - 2010-01-07 (07-Jan-2010), Debian Lenny

- Please provide any additional information below.
hardlink.py tries to keep its cache of file data in memory, but on
directory trees containing 10s of millions of files (such as those I have
from backing up other machines) this is difficult to fit in moderate RAM.

It would be nice to optionally be able to use an on-disk cache file of some
sort so that it doesn't need to keep it all in RAM.  This should be
optional and probably non-default because it will surely be slower.

Cheers,
Andy

Original issue reported on code.google.com by [email protected] on 31 Jan 2010 at 4:06

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-11-01T04:58:15Z

Currently having a look at this. I initially did a small patch to convert the 
file_hashes dictionary into a shelve "dictionary" like object, but it choked on 
long 
integers as keys [0]. I'm unsure on whether to make my patch [1] bigger, or 
file a bug 
against shelve to make it deal with long keys properly.

[0] http://pastebin.com/m5f13fa62
[1] http://pastebin.com/f5b529dfb

Original comment by jshholland on 4 Feb 2010 at 10:15

GoogleCodeExporter · 2015-11-01T04:58:15Z

I have now written a new patch that wraps the anydbm module (working at a lower 
level 
than shelve). Limited testing appears to show that it works, but it is yet to 
be used 
on a large tree. It should be self contained.

Original comment by jshholland on 16 May 2010 at 12:20

Attachments:

cache.patch

GoogleCodeExporter · 2015-11-01T04:58:15Z

jshholland, applied your patch and it appears to be working well on first try.

had to modify it slightly to get it working.

changed

self.db = anydbm.open(name, 'n')

to

self.db = anydbm.open(self.name, 'n')

Original comment by [email protected] on 10 Nov 2010 at 12:05

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Nov 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-disk cache would be nice, to avoid excessive memory use #9

On-disk cache would be nice, to avoid excessive memory use #9

GoogleCodeExporter commented Nov 1, 2015

GoogleCodeExporter commented Nov 1, 2015

GoogleCodeExporter commented Nov 1, 2015

GoogleCodeExporter commented Nov 1, 2015

On-disk cache would be nice, to avoid excessive memory use #9

On-disk cache would be nice, to avoid excessive memory use #9

Comments

GoogleCodeExporter commented Nov 1, 2015

GoogleCodeExporter commented Nov 1, 2015

GoogleCodeExporter commented Nov 1, 2015

GoogleCodeExporter commented Nov 1, 2015