In winter 2015, we held a programming contest. Thanks to all who participated!
The problem statement and detailed rules are available on the contest page linked above. Now that the contest is over, we're publishing all the solutions we received, and announcing the winners.
We received a total of 408 submissions from 237 different participants. Only the last submission from each participant, received within the deadline, is published here in the submissions
directory.
An additional 9 solutions were submitted either after the deadline or by Hola employees, and we considered them hors concours (off-contest). These are found in the extra
directory.
64 solutions, or 16% of the total number, were submitted within the last 24 hours before the deadline; 15 of these were submitted within the last hour, and last one of these only 34 seconds before the deadline.
92 programs, or 39% of all participating solutions, passed our correctness tests. Also, 5 of the solutions considered hors concours passed.
The shortest correct submission is exactly 666 bytes long, while the longest one is 90274 bytes long.
The correctness and performance tests were automatic. Correctness is tested by tests/correctness.js
and performance by tests/performance.js
.
Our reference implementation, the same as was set up on our website, is included here as tests/reference.js
. The reference implementation includes strict input validity testing and an optional input size limitation (to restrict the load on the website); neither of these features was expected of contest submissions.
The correctness tests can be found in the body of tests/correctness.js
.
Unfortunately, many of the submissions failed on one or another corner case. Syntax errors, use of require
(forbidden by the rules), and failure to export a filter
function were considered correctness failures. If you are wondering why a particular solution failed, you can run the test program on the solution file and see the output.
Note that a few solutions passed the correctness tests but still produced wrong results in the performance tests; these were treated as having failed correctness.
Performance testing consisted of one large test with 100,000 messages and 100 rules. We also generated an even larger test set with 800,000 messages and 100 rules.
The inputs for the performance tests were generated by tests/generate_large_test.js
; to produce the expected outputs, the reference implementation was used. We didn't include the test data files because of their size, but you can run the generating script to recreate exactly the same content as we used. The generating script uses a random number generator with a fixed seed, so that it produces the same output on every run. The statistical characteristics that we chose for the performance test are somewhat plausible for a real e-mail database of a typical user. Please refer to the script's source code for further details.
The performance tests were run on a 3 GHz Intel Core i7 machine running 64-bit Debian GNU/Linux (testing). Each submission was run 10 times (in 10 separate runs of tests/performance.js
), and the best time was selected. The timing was taken by the real-time clock; it included the module parsing and initialization time.
After we published the contest rules, we realized that one of the requirements was ambiguous: “Your task is to write a Node.js module exporting one single function filter(messages, rules)
”. We originally intended it to mean a named export:
exports.filter = function(messages, rules){ ... };
However, many participants interpreted it differently:
module.exports = function(messages, rules){ ... };
Both seem to be coherent interpretations of the rules, and we decided to accept either way of exporting.
Yet some other submissions did not export the function at all, but merely defined it in their source files:
function filter(messages, rules){ ... }
Because the rules clearly say “exporting”, we considered such solutions as failing. Nevertheless, we tried to fix each of these solutions and see if they would then pass. Only one solution passed, and was considered hors concours.
When we published the original version of the final standings, several participants have pointed out several defects. Firstly, we received two contributions to the correctness test suite. Thanks to the contributors! The additional tests helped detect several solutions that produce wrong results in some corner cases.
More troubling was a defect in our performance testing methodology. The Node.js vm
module that we were originally using for performance testing, turned out to distort the measurements significantly. Namely, access to the globals within the virtual environment was disproportionately slow. In particular, those solutions whose authors chose to include all their helper functions in the body of the filter
function, performed faster than those where helper functions were defined outside filter
. We believe that this kind of stylistic choice should not have a dramatic effect on the results. Most participants seem to have been testing their solutions in simple test harnesses with require
, without the use of vm
, and weren't optimizing to the peculiarities of the Node.js virtualization technology.
At Hola, we take pride in admitting mistakes. In the light of this serious flaw, we made a decision to revise the final standings. Our new test program, tests/performance.js
, uses require
to load the participant's module and run the filter
function once. We ran it 10 times for every solution in a kernel-level VM and chose the best time.
We also received criticism on some other aspects of our performance testing, which I'll try to answer here:
- The performance test does not include
?
characters in any rules. The rationale here is that in real life, users rarely use?
in their rules. I can't imagine why anyone would want that in a real rule. Just like in a real filtering system, there are some features that go unused or almost unused. In contrst to correctness tests, performance tests should resemble actual patterns of usage. Moreover, most of our patterns did not even include a*
: most of the time, a user only wants to match a specific e-mail address. Where we did use*
, it was used for the whole part of the address before or after the@
character. - Performance should be tested with more rules. When a user's database grows, the number of rules doesn't normally grow proportionally. The number of rules that we chose (100) is more than enough even for advanced e-mail users. Therefore we didn't test for scalability along this dimension.
- Performance should be tested with more messages. We chose 100,000 messages because it is on the order of the yearly number of messages in a typical heavy e-mail user's database. Nevertheless, responding to the critique, we tried increasing this number. About 800,000 seemed to be about the highest number of messages that didn't cause any of the correct submissions to run out of memory with the default Node.js engine settings, so we decided to go with this number in our extra-large test (see the
xlarge
option intests/generate_large_test.js
). Using the extra-large test, we also ran every correct solution 10 times. From what we saw, the extra-large test would change the final standings significantly, and so would certain in-between dataset sizes. Both 100,000 and 800,000 messages are valid choices, and we could pick one or the other, or combine them somehow in a weighted average. However, we decided to keep the changes to the minimum and only fix that which is obviously a defect (the Node.jsvm
issue), while otherwise sticking to our original test material. - The performance tests should have been published in advance. The reason we didn't do that is that we didn't want participants to optimize for a specific test (or even try to manipulate it). We wanted programs that are correct on all valid inputs, and fast on typical use cases. Still, the lesson we learned from this is that at least the size of the test inputs should probably be published in advance.
The fixes to the performance testing system are going to lead to dramatic changes in the final standings. We apologize to those who's going to lose their previously announced prize-winning places!
The final results table will be published here two days later. Please tell us if you suspect any more flaws in the test system, or if you have critical comments other than those addressed above.