Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython runtime and compile target #311

Closed
arekbulski opened this issue Jan 18, 2018 · 18 comments
Closed

Cython runtime and compile target #311

arekbulski opened this issue Jan 18, 2018 · 18 comments

Comments

@arekbulski
Copy link
Member

I have been looking at Python runtime and examples for a while, and came to conclusion: probably best way to improve its performance by a significant factor is to cythonize it.

Drawbacks: pypi manifest would need to include cython as dependency. This is a simple and quite reasonable tradeoff, little code added and much performance gained.

Construct is currently preparing to get into 2.9 version and cythonize itself. You might want to wait until then, to see benchmark comparisons and so I can learn/refresh how to code it. Then it will be easier for me to implement it in Kaitai as well. Kaitai benchmarks also need complete overhaul before this.

@KOLANICH
Copy link

KOLANICH commented Jan 18, 2018

The drawbacks is that cython requires a c/c++ compiler in the system. I guess that the package should work both with a c++ compiler and without it. When reading files the speedup may be not big. This all needs to be benchmarked.

See also another possible ways to make the things faster #277 #65 #133

@arekbulski
Copy link
Member Author

Yes, that would mean Kaitai/Python would require cython, and indirectly gcc. But these are not outragous dependencies. To run Kaitai Python compiler, user already needs what, scala and whatnot? Cython is autoinstalled by pip, and gcc is on pretty much every machine.

I agree that benchmarks are important.

@KOLANICH
Copy link

KOLANICH commented Jan 18, 2018

To run ksc a one only needs jre, ksc itself and probably a library of formats.

@arekbulski
Copy link
Member Author

So end user already needs a java compiler, kaitai compiler, and some libraries, not to mention python and pip itself? But you complain about adding c++ compiler to the list?

@KOLANICH
Copy link

KOLANICH commented Jan 18, 2018

java compiler is not needed, only runtime, on the machines are going to run KSC
KSC and java may be not needed if you ship precompiled python source
pip is not strictly needed, you can use easy_install

but kaitai runtime library is needed and cython will be the dependency of it and/or the ksc-compiled python code

@arekbulski
Copy link
Member Author

So those are dependencies for developer's machine, not end user's. Fair point.

I just stumbled upon the ticket where XOR processing is benchmark, and the guy essentially points out that to make it faster we need Cython/CFFI or Numpy.
kaitai-io/kaitai_struct_python_runtime#8
This is another example of why Cython really is needed.

@koczkatamas
Copy link
Member

Depends on the way of distribution. If you want to compile Python from ksy all the time, then you need those indeed.

Otherwise if you want to use the generated Python code directly (which can be generated with a browser alone) and then you need basically zero dependency. You still need the Kaitai Python Runtime, but that is enough small to include into your source instead of using the pip version (if you cannot use pip somehow - eg. if in an Enterprise you have to get approval for every dependency).

I am not familiar with cython, but if using it would change this zero-dependency setup, I would recommend creating a new target separately from Python (called eg. Cython).

@GreyCat
Copy link
Member

GreyCat commented Jan 18, 2018

Adding Cython support is a great idea, but I agree that it shouldn't be a nuisance for existing users, extending existing dependencies a lot.

I wonder what mitmproxy devs would think. Pinging @s4chin, @Kriechi?

@arekbulski
Copy link
Member Author

If you look at mitmproxy setup.py, they just pip install kaitai. They would not even notice the difference, except in benchmarks that is.
https://github.com/mitmproxy/mitmproxy/blob/master/setup.py#L72

@Kriechi
Copy link

Kriechi commented Jan 18, 2018

Distribution is a major concern for @mitmproxy, we are shipping pre-compiled packages for Windows, Linux, and macOS. We also have wheels that should work out-of-the-box.
I remember that we had major issues with https://github.com/pyca/cryptography in the past, before they started to ship proper pre-compiled wheels for all platforms.

Maybe @mhils could offer some more details on how we currently build & manage our binaries and dependencies.

@mhils
Copy link

mhils commented Jan 18, 2018

Thanks for the ping, @GreyCat! I agree that Cython is definitely interesting from a performance perspective, with the disadvantage that users need a compiler and Cython. From our @mitmproxy perspective, things look as follows:

  1. We're happy with the current performance of Kaitai.
  2. Distribution of binary components is a major pain in the ass on our end. For example, Windows users tend to not have gcc installed. It's really not just Windows though:
    https://github.com/mitmproxy/mitmproxy/issues?q=gcc https://github.com/mitmproxy/mitmproxy/issues?q=failed+with+exit+status+1

Given 1 and 2, we'd definitely prefer if Kaitai stayed a pure Python dependency. That should not stop you guys from adopting Cython (other Kaitai users may have a need for performance), but frankly speaking I haven't seen anyone complain about Kaitai's Python performance yet. Not sure if you guys want to deal with the pain that is shipping wheels for a whole bunch of platforms (see e.g. https://pypi.python.org/pypi/Pillow/5.0.0) :-). For mitmproxy, we'd probably vendorize a pure-Python version if we get user complaints.

@arekbulski
Copy link
Member Author

Construct is going to fork versions, so to speak. 2.8 will remain pure Python, 2.9 will use Cython.
I tend to agree with @koczkatamas idea, to have a separate targets/compilers for Python and Cython.
We would not need to ship any wheels, see how pyximport works.
http://docs.cython.org/en/latest/src/reference/compilation.html#compiling-with-pyximport

@mhils
Copy link

mhils commented Jan 18, 2018

@arekbulski, from the pyximport page you linked:

Note that it is not recommended to let pyximport build code on end user side as it hooks into their import system. The best way to cater for end users is to provide pre-built binary packages in the wheel packaging format.

@arekbulski
Copy link
Member Author

pyximport works by hacking python, in that sense, they dont recommend it. Ehh.

@GreyCat
Copy link
Member

GreyCat commented Jan 18, 2018

From what I'm understanding, there's no universal agreement and there's demand for both versions — both pure Python and Cython. What would it take to support two of them? Would we just need 2 different runtimes, or we'll need 2 targets (i.e. different code to be generated)?

@arekbulski
Copy link
Member Author

I think its what you mean by runtime. It would require different KaitaiStream.py. It could use (not need) a different generated code, that would affect performance (not correctness).

@arekbulski arekbulski changed the title Python compiler: cython dependency Cython runtime and compile target Feb 3, 2018
@arekbulski
Copy link
Member Author

arekbulski commented Feb 27, 2018

Construct just finished its compiler implementation, using Cython for speedup. There are 2 separate conclusions to be made:

@arekbulski
Copy link
Member Author

New benchmarks prove that pypy does much more speedup than cython, therefore I withdraw the proposal.
https://construct.readthedocs.io/en/latest/compilation.html#comparison-with-kaitai-struct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants