-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance benchmarks #33
Comments
JoinIn [1]: from cytoolz.curried import *
In [2]: from cytoolz.itertoolz import _consume, join
In [3]: data = [(i, i % 10) for i in range(1000000)]
In [4]: names = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five'), (6, 'six'), (7, 'seven'), (8, 'eight'), (9, 'nine'), (0, 'zero')]
In [5]: timeit _consume(join(0, names, 1, data))
10 loops, best of 3: 168 ms per loop
In [6]: import pandas
In [7]: df = pandas.DataFrame(data, columns=['a', 'b'])
In [8]: names_df = pandas.DataFrame(names, columns=['b', 'name'])
In [9]: timeit pandas.merge(names_df, df, left_on='b', right_on='b')
10 loops, best of 3: 86.8 ms per loop |
Getter function in
|
Looks like profile doesn't work well with these sorts of tasks In [11]: timeit groupby(lambda x: x[1], data)
10 loops, best of 3: 141 ms per loop
In [12]: timeit groupby(get(1), data)
10 loops, best of 3: 135 ms per loop
In [13]: timeit groupby(itemgetter(1), data)
10 loops, best of 3: 70.8 ms per loop |
I ran the In [12]: timeit consume(join(0, names, 1, data))
10 loops, best of 3: 163 ms per loop
In [13]: timeit pandas.merge(names_df, df, left_on='b', right_on='b')
1 loops, best of 3: 197 ms per loop |
Performance difference is even more exaggerated on my machine. I wasn't able to get pandas to go remarkably faster by playing with indices at all. Nice work. |
I really like functionality comparisons between |
I've been putting performance comparisons in #31 . Felt they should go somewhere more permanent.
The text was updated successfully, but these errors were encountered: