Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add bitset.size() and version to index file header #69

Merged
merged 4 commits into from
Dec 15, 2020
Merged

Conversation

staltz
Copy link
Member

@staltz staltz commented Dec 14, 2020

For issues #8 and #65, this adds two new fields to the file header: version and bitsetSize. Note it's a breaking change for any system currently deploying jitdb indexes.

So far there is no bitset.size() optimization (that old idea of not storing indexes that are too small), but if we ever want to implement that idea, this new header would make it easier.

@staltz staltz requested a review from arj03 December 14, 2020 13:07
files.js Show resolved Hide resolved
index.js Show resolved Hide resolved
@arj03
Copy link
Member

arj03 commented Dec 14, 2020

Had a crash when running example with this PR

@staltz
Copy link
Member Author

staltz commented Dec 14, 2020

What was the crash? And did you delete all index files so they get recreated with the new headers?

@arj03
Copy link
Member

arj03 commented Dec 14, 2020

Somewhere in slowEqual. There is something there if you specify a path it can't find I think. This is unrelated to this PR, but something that would be nice to fix.

@github-actions
Copy link

  • Load core indexes: 6ms
  • Query one huge index (first run): 838ms
  • Query one huge index (second run): 302ms
  • Query three indexes (first run): 523ms
  • Query three indexes (second run): 250ms
  • Paginate one huge index: 255ms

@arj03
Copy link
Member

arj03 commented Dec 15, 2020

To reproduce the problem comment out the first example in example.js, the one with the ship. Then make sure indexes are deleted and run node example. Then second run will fail with:

internal/validators.js:92
        throw new ERR_OUT_OF_RANGE(name, 'an integer', value);
        ^

RangeError [ERR_OUT_OF_RANGE]: The value of "sourceStart" is out of range. It must be an integer. Received NaN
    at Buffer.compare (buffer.js:877:5)
    at Object.compareString (/home/arj/dev/jitdb/node_modules/bipf/index.js:326:17)
    at checkEqual (/home/arj/dev/jitdb/index.js:246:12)
    at updateIndexValue (/home/arj/dev/jitdb/index.js:302:32)
    at Object.write (/home/arj/dev/jitdb/index.js:362:16)
    at Stream._writeToSink (/home/arj/dev/jitdb/node_modules/async-flumelog/stream.js:57:32)
    at Stream._handleBlock (/home/arj/dev/jitdb/node_modules/async-flumelog/stream.js:91:12)
    at /home/arj/dev/jitdb/node_modules/async-flumelog/stream.js:133:14
    at Request._callback (/home/arj/dev/jitdb/node_modules/async-flumelog/index.js:101:9)
    at Request.callback (/home/arj/dev/jitdb/node_modules/random-access-storage/index.js:173:8) {
  code: 'ERR_OUT_OF_RANGE'
}

@staltz
Copy link
Member Author

staltz commented Dec 15, 2020

Updated this PR to not save bitsetSize, but note that this PR now fails tests.

There's one big change I made, which was the header fields version, seq, and count are now all UInt32LE. They used to be Int32LE. The tests are failing because some indexes are being saved with seq=-1. On the other hand, we want to not save empty indexes! See #71.

So that's what I'm gonna do in this PR, tackle also #71 in a new commit soon. The reason why I changed to unsigned 32: we don't need negative numbers for these fields, and we gain a lot more range with unsigned. It seems we will need that larger range especially for seq (which is "size of the log file this index corresponds to"), and Int32LE would support only log files up to 2 GB big.

@staltz
Copy link
Member Author

staltz commented Dec 15, 2020

Ready for review

@staltz staltz requested a review from arj03 December 15, 2020 11:58
@github-actions
Copy link

  • Load core indexes: 6ms
  • Query one huge index (first run): 823ms
  • Query one huge index (second run): 311ms
  • Query three indexes (first run): 531ms
  • Query three indexes (second run): 217ms
  • Paginate one huge index: 244ms

* ## File format for tarr files
*
* Each header field is 4 bytes in size.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@arj03
Copy link
Member

arj03 commented Dec 15, 2020

I tried this branch using the recipe here and still have the same problem. Its really wierd, the diff looks good. Have to double check it

@arj03
Copy link
Member

arj03 commented Dec 15, 2020

Ah, version and seq is switched for some reason

@staltz
Copy link
Member Author

staltz commented Dec 15, 2020

I'll take a look at that problem, I'll try to fix it in this PR

@arj03
Copy link
Member

arj03 commented Dec 15, 2020

In save it seems:

jitdb saving core index: offset +1ms
saving version 1
saving seq 983320378
  jitdb saving core index: timestamp +2ms
saving version 1
saving seq 983320378
  jitdb saving core index: sequence +2ms
saving version 1
saving seq 983320378
  jitdb saving index: value_content_type_vote +3ms
saving version 983320378
saving seq 1
  jitdb saving index: value_content_vote_expression_u26F5 +2ms
saving version 983320378
saving seq 1
  jitdb saving index: value_author_@+UMKhpbzXAII+2x2F7ZlsgkJwIsxdfeFi36Z5Rk1gCfY0=x2Eed25519 +2ms
saving version 983320378
saving seq 1

files.js Outdated Show resolved Hide resolved
@staltz
Copy link
Member Author

staltz commented Dec 15, 2020

Added test that fails because of swapped version and seq.

@github-actions
Copy link

  • Load core indexes: 8ms
  • Query one huge index (first run): 1042ms
  • Query one huge index (second run): 354ms
  • Query three indexes (first run): 635ms
  • Query three indexes (second run): 270ms
  • Paginate one huge index: 284ms

@staltz staltz requested a review from arj03 December 15, 2020 13:03
@arj03
Copy link
Member

arj03 commented Dec 15, 2020

Working now :)

@arj03 arj03 merged commit 9cec31e into master Dec 15, 2020
@staltz staltz deleted the new-header branch December 15, 2020 13:18
@github-actions
Copy link

  • Load core indexes: 7ms
  • Query one huge index (first run): 995ms
  • Query one huge index (second run): 371ms
  • Query three indexes (first run): 634ms
  • Query three indexes (second run): 264ms
  • Paginate one huge index: 296ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants