-
Notifications
You must be signed in to change notification settings - Fork 19
Ideas from Pourover
So the release of the pourover library by NYTimes has got me thinking about the lazy version of underscore query that I've started working on.
I like the power of pourover - but I dislike it's API. There are quite a few similarities in the principles of implementation, but I hadn't tackled sorts yet, which they do in a really nice way.
Proposed API (will use Backbone Collections)
collection.addSort(name, sortFn, options)
# options can contain a list of attrs to speed up sorting on changes
collection.sort(name) # if name is left out will revert to generic sort
sortedCollection = collection.sortedCollection(name, sortFn, options)
# if sortFn is left out will look for previously registered sort
# sorted collection will stay sorted
Will probably need to think a bit more about the interaction between sortedCollections, pagedCollections & filteredCollections.
One idea for indexing filters, is to use a hash map of booleans with the cid as the key. That way its a simple lookup for each filter. We can also have change listeners for when affected attributes change.
budgetType =
c23: true
c45: true
The harder one is arbitrary queries. This is where my previous work on indexing can be useful. The problem with indexing previously was that the unions took a fair chunk of time. A potentially faster solution is to either:
- have indexes as sparse arrays - still have a union problem
- have a lookup table similar to above
- this would be added to the query chain as a new query primitive - cid lookup:
lookupHash[this[cidAttribute]]
- this way we can still keep our not / and / nor / or semantics
- there is no traversing of multiple arrays
So ideally the implementation has to be transparent to the user.
When parsing the query we can only do $equals, $not
queries - we will then translate to our new cid lookup. For $in
queries we would need to do a lookup for each option for each item - not fast.
Ideally we want to efficiently reduce the array of items before going into the other queries. For a single index this is fine, but for multiple indexes this is tricky. Perhaps we optimise for the most common usecase - when there are queries that are indexed, we look for the smallest index and perform the rest of the queries in the normal manner on that index. This is probably the most efficient we can get.
For more advanced usage we can allow compound indexes.
So now we can have some recursion as an index is a filteredCollection We should be able to make change handlers for filtered collections a lot more efficient - although its not really needed, as its only batch changes that cause a problem.
For a filtered collection to use an index - the problem comes when a new model is added. It needs to be added to the index filtered collection. Actually the problem is fairly easy if indexes are always intermediate filtered collections.
We still need the lookup table.
Ignore the above brain dump. We need to rather implement a simple groupBy
indexing method. However it can be bound into the change events for each model. This can work for exact match queries. For time based queries, the previous indexing that we had by month is probably the way forward. It would be good to implement this in a data type agnostic way if possible. For numeric properties this is not too difficult. For the ISO string dates that we use its a bit harder. We can have a special case for dates.
Compound exact match keys are ok. Range queries are harder. Lets look at how pourover implements range based lookups.
So pourover doesn't have the capability to support arbitrary lookups. Its range filter is similar however to the planned date filter - for individual months.
For ranges, we need a query translation method that takes a date range as follows:
dayLookup =
"2014-01-01": {index:1, start:true}
"2014-01-02": {index:1}
monthArrays = [
[{model},{model},{model}],
[{model},{model},{model}]
]
translate = (key, from, to) ->
start = dayLookup[from]
end = dayLookup[to]
# handle the case of either no start or no end
if start.start
startIndex = start.index
first = []
else
startIndex = start.index + 1
first = monthArrays[start.index].filter (a) -> a.date > from
if end.end
endIndex = end.index
last = []
else
endIndex = end.index - 1
last = monthArrays[end.index].filter (a) -> a.date < to
first.concat monthArrays[startIndex...endIndex], last
This code should make date based lookups really fast. The only iteration will be for partial month queries and even those are optimised to only iterate through transactions in that month.
Date based searches will be the most common. Exact match queries for categories will also be fast.