-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read performance #25
Comments
First issue I see is that you're creating a BlinkDB index on But I'll take a look at the query performance without indexes, BlinkDB shouldn't be 18x as slow as Loki when you search for a single key. |
Oopsie. Both indexed:
Both unindexed:
|
I'm also not seeing the performance in the sample from the front-page; unless I mucked it up again, I'm getting this (script below):
script:
|
And if I'm looking for names that are not in the dataset:
but not only is loki faster in both cases, a table scan is also faster. |
And for 100.000 items (as per the case on the front page):
|
Changed to tinybench like the blinkdb tests:
|
Thanks for taking a look at this :) If you check the official benchmarks for BlinkDB, do you get similar results there (with BlinkDB consistently being slower than alternatives) ? The I'll try implementing your testcase within the BlinkDB benchmarks from scratch, then we can see if there's any issues on either side. |
Not really, I either did that manually a few versions ago or deleted it from the benchmarks in the last refactor. In either case, I should add it back in and update the values on the front page. |
I added a benchmark myself, available on the My results look like this for 100.000 items in the DB:
Comparing our respective implementations, a few things I noticed:
Maybe you can gain other insights or discover issues looking through the code of my benchmark? |
Testing with your dataset gets me a benchmark more comparable to yours, but BlinkDB is somehow still faster on my side. My codeimport { createDB, createTable, insertMany, many } from "blinkdb";
import loki from "lokijs";
import { Bench } from "tinybench";
// Same content as https://gist.github.com/retorquere/c71d72f63a6a79ad3d0f6cd217e71b03
import userJson from "./users.json";
interface User {
id: number;
name: string;
// made age non-optional here
age: number;
}
const blinkdb = createDB({ clone: false });
let blinkUserTable = createTable<User>(blinkdb, "users")({
primary: "id",
indexes: ["name", "age"] // added age index
});
const lokidb = new loki("users.db");
let lokiUserTable = lokidb.addCollection<User>("users", {
unique: ["id"],
indices: ["name", "age"] // added age index
});
let users: User[] = userJson;
let userMap: Map<string, User[]> = new Map();
for(const user of users) {
userMap.set(user.name, [...(userMap.get(user.name) ?? []), user]);
}
lokiUserTable.insert(users);
insertMany(blinkUserTable, users);
const random = () => users[Math.round(Math.random() * users.length)];
const names = [random().name, random().name];
const age = random().age;
export const bench = new Bench()
.add("scan", () => {
users.filter(u => names.includes(u.name) && u.age && u.age > age);
})
.add("map", () => {
let u: User[] = [];
for(const name of names) {
u = u.concat((userMap.get(name) ?? []).filter(u => u.age && u.age > age));
}
})
.add("lokijs", () => {
lokiUserTable.find({ name: { $in: names }, age: { $gt: age } });
})
.add("blinkdb", async () => {
await many(blinkUserTable ,{ where: {
name: { in: names },
age: { gt: age }
}});
});
|
How do you run this? I'm getting
when I start it with |
I haven't been able to get them to work yet.
That gets me
That is true. But when I turn clone off for blinkDB, my tests continue to show everything faster than blinkdb.
It's supposed to be a benchmark for the worst case. Nothing should be slower than a full scan.
Javascript Map is just ridiculuously fast, and after selection of the partition it's all integer comparisons. It's doing half the work, and the fast half at that. I was working on a lokijs |
Well now... I've gotten this script to run and I'm now getting
|
And this is the same script with
How does cloning make such a big difference when a search returns so few records? There is no cloning during the search right, just for the hits that are returned from the search? |
I would imagine that partitions could implemented as hooks on insert/update/create/delete that would maintain |
I can just run "npm run test" on the toplevel -- that works. |
This is approximately what I meant by adding partition support to blinkdb:
edit: I had previously overlooked that I still needed to clone. With the clone overhead partitioning still comes out ahead, but the performance boost varies between 85% and 1300% depending on the size of the result set (smaller being faster regardless of the size of the DB being searched) rather than the consistent several orders of magnitude I was seeing. I'm not sure how I would make dynamic partition maintenance work without cloning. Loki seems to take a really hard hit from cloning. AAMOF I can substantially improve lokijs performance just by replacing its clone with rfdc. Still not as fast as blinkdb, but usually twice as fast as with lokijs's own clone. Is there a general hook for "each deleted(/inserted/updated) entry" regardless of whether it was called through removeMany/remove/removeWhere? |
I must admit your use of typescript is a lot more advanced than mine; what I had envisioned was something along the lines of
but I've run aground in adding this to the query evaluation. Is it correct that |
If you need an array of items from the DB that is regenerated whenever items are inserted/updated/deleted, you don't need to implement partitions, you can just use Your benchmark with `watch`const { createDB, createTable, insertMany, many, use, key, watch } = require( "blinkdb")
const { clone } = require('blinkdb/dist/core/clone')
const loki = require( "lokijs" )
const { Bench } = require( "tinybench")
// Same content as https://gist.github.com/retorquere/c71d72f63a6a79ad3d0f6cd217e71b03
const users = require( "./users.json" )
const blinkdb = createDB({ clone: true })
const blinkPartition = new Map
let blinkUserTable = createTable(
blinkdb,
"users"
)({
primary: "id",
indexes: ["name", "age"] // added age index
})
const lokidb = new loki("users.db")
let lokiUserTable = lokidb.addCollection("users", {
unique: ["id"],
indices: ["name", "age"], // added age index
clone: true
})
function mapRemove(entry, partition, pk) {
if (blinkPartition.has(entry[partition])) {
blinkPartition.get(entry[partition]).delete(entry[pk])
if (blinkPartition.get(entry[partition]).size === 0) blinkPartition.delete(entry[partition])
}
}
function mapAdd(entry, partition, pk) {
if (!blinkPartition.has(entry[partition])) blinkPartition.set(entry[partition], new Map)
blinkPartition.get(entry[partition]).set(entry[pk], entry)
}
use(blinkUserTable, async (ctx) => {
switch (ctx.action) {
case 'update':
case 'delete':
var pk = key(ctx.params[0])
mapRemove(one(ctx.params[0], ctx.params[1][pk]), 'name', pk)
break
case 'updateMany':
case 'deleteMany':
var pk = key(ctx.params[0])
for (const entry of ctx.params[1]) {
mapRemove(one(ctx.params[0], entry[pk]), 'name', pk)
}
}
const result = await ctx.next(...ctx.params);
switch (ctx.action) {
case 'update':
case 'insert':
var pk = key(ctx.params[0])
mapAdd(ctx.params[1], 'name', pk)
break
case 'updateMany':
case 'insertMany':
var pk = key(ctx.params[0])
for (const entry of ctx.params[1]) {
mapAdd(entry, 'name', pk)
}
}
return result
});
const random = () => users[Math.round(Math.random() * users.length)]
const names = [random().name, random().name]
const age = random().age
let watchUsers = [];
watch(blinkUserTable, {
where: {
name: { in: names },
age: { gt: age }
}
}, users => watchUsers = users)
lokiUserTable.insert(users)
insertMany(blinkUserTable, users)
async function main() {
const bench = new Bench()
.add("scan", () => {
const u = users.filter(u => names.includes(u.name) && u.age && u.age > age);
})
.add("partition", () => {
let found = []
for (const name of names) {
const partition = blinkPartition.get(name)
if (partition) found = found.concat([...partition.values()].filter(user => user.age > age)).map(clone)
}
})
.add("watch", () => {
watchUsers
})
.add("lokijs", () => {
const l = lokiUserTable.find({ name: { $in: names }, age: { $gt: age } })
})
.add("blinkdb", async () => {
const b = await many(blinkUserTable, {
where: {
name: { in: names },
age: { gt: age }
}
})
})
await bench.run();
console.table(bench.table());
}
main()
Can't speak for LokiJS, but BlinkDB only clones items returned from a call to
Yes. The entire query engine is synchronous. BlinkDB is just async in order to support async middleware. |
That's not entirely the same though. With the partition solution, you get the speedup without knowing the names in advance. The
I do need the changed/deleted/added items though, as I use them in my own event handling. How would I implement my |
Gotcha -- that is not a problem for me, I just need read/search to be sync, so I can start working on blink integration. |
I'm trying to understand how the query infrastructure works, is there any documentation on the design of it? |
I think it best to implement this in If you want to implement this using |
That would work.
Sort of -- I can fetch it using
I'd much prefer the former option :) |
Looks like in the interim I could just subscribe to the dispatcher using
correct? |
Me again -- I took your last script and modified it (to avoid screwups like before) for a new case I'm working on, script + data are here, but I'm getting this for single-field indexed lookup:
|
For
|
I've managed to get the benchmarks to run; I've taken the front-page.ts benchmark and modified it to this, so I'm pretty sure I got it right this time:
and there lokijs gets over twice the performance of blinkdb:
with
with blinkdb getting almost twice the performance of lokijs. |
I'm looking to switch to blinkdb for its typescript support, but when I benchmark indexed search, blinkdb comes out substantially slower than lokijs:
(where mock.json has this content)
returns
and even when I turn off indexes in loki, it still comes out ahead:
I haven't tested more complex searches but search by single indexed field is something I'd be doing a fair bit of. Is this just not a good use-case for blinkdb?
The text was updated successfully, but these errors were encountered: