Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring the Codebase #35

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

Bhargavasomu
Copy link
Contributor

What was wrong?

Fixes Issue: #31

Cute Animal Picture

put a cute animal picture link inside the parentheses

@Bhargavasomu
Copy link
Contributor Author

Bhargavasomu commented Dec 10, 2018

@pipermerriam I have not yet refactored the pairing part. But I have refactored the field_elements(so that all the curves can use the same code) and the curve implementation with the BaseCurve class implementation.

  • Please let me know if any changes are needed as far as design is concerned.
  • The other main thing that depresses me is that, I haven't changed the logic anywhere, but just because of the usage of classes and inheritance, the performance got worser for a test case. (Tested in ipython). Attaching the test case below.
In [2]: from py_ecc import bn128

In [3]: time bn128.multiply(bn128.G12, 100)
CPU times: user 68.3 ms, sys: 0 ns, total: 68.3 ms
Wall time: 67.5 ms
Out[3]: 
((0, 0, 6289622313332368275808678712703019680903267734760705664249053540194934405010, 0, 0, 0, 0, 0, 19765057837394621075835443081152314037672821821523780638388756299265215370750, 0, 0, 0),
 (0, 0, 0, 15824998746886404237735115559171157131044885936835913698791405128162671676610, 0, 0, 0, 0, 0, 10168740664245831497642710459131680020400674013216613216484698513127357137715, 0, 0))

In [4]: a1 = BN128_Curve()

In [5]: time a1.multiply(a1.G12, 100)
CPU times: user 5.27 s, sys: 7.1 ms, total: 5.27 s
Wall time: 5.27 s
Out[5]: 
((0, 0, 6289622313332368275808678712703019680903267734760705664249053540194934405010, 0, 0, 0, 0, 0, 19765057837394621075835443081152314037672821821523780638388756299265215370750, 0, 0, 0),
 (0, 0, 0, 15824998746886404237735115559171157131044885936835913698791405128162671676610, 0, 0, 0, 0, 0, 10168740664245831497642710459131680020400674013216613216484698513127357137715, 0, 0))

I could really use your help and thoughts on making this better. (Been stuck on this for the whole day)

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating all of the BaseCurve methods to be either

  • classmethod if it needs to access any other curve methods or properties of the curve
  • staticmethod if it is a pure function.

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved
py_ecc/BaseCurve.py Outdated Show resolved Hide resolved
py_ecc/BaseCurve.py Outdated Show resolved Hide resolved
py_ecc/BaseCurve.py Outdated Show resolved Hide resolved
py_ecc/BaseCurve.py Outdated Show resolved Hide resolved
Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the performance issues. You should just do some profiling to see where all that extra time is being spent.

https://docs.python.org/3/library/profile.html

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved
py_ecc/curve_properties.py Outdated Show resolved Hide resolved
py_ecc/field_elements.py Outdated Show resolved Hide resolved
@pipermerriam
Copy link
Member

@Bhargavasomu some preliminary review.

I don't think that the performance hit you're seeing is something we'll be able to merge so that will need to be addressed. It doesn't have to be equally fast, but it needs to be in the same order of magnitude.

@Bhargavasomu Bhargavasomu changed the title Refactoring in Naive Form [WIP] Refactoring in Naive Form Dec 11, 2018
@Bhargavasomu
Copy link
Contributor Author

@pipermerriam found out the devil causing more time and have fixed that. Just waiting for a reply from vitalik as part of #24 (comment). Will make the next PR after that.

But one more thing that I noticed is that, after we refactor this, we would need to change the code in
py-evm as well. For example, we have many places where bn128.FQ2 is used. But as part of this refactor FQ2 or any FQ* for that matter will not be binded to the curve anymore(like bn128 or bls12_381 etc.). So such changes have to be made in py-evm as well. My doubt was that if we merged this without changing the
py-evm code which uses this library, then the tests would crash( if at all we test the precompiles as of now). So how do we go about this?

@pipermerriam
Copy link
Member

@Bhargavasomu I think we'll need to cut a v2 release to account for the breaking API change and more formally, we need to establish some documentation and boundaries for public/private API since currently everything in the library is effectively public API.

@Bhargavasomu Bhargavasomu changed the title [WIP] Refactoring in Naive Form [WIP] Refactoring the Codebase Dec 13, 2018
@Bhargavasomu Bhargavasomu force-pushed the refactor branch 2 times, most recently from 6db742d to 9694535 Compare December 14, 2018 04:38
@Bhargavasomu
Copy link
Contributor Author

Bhargavasomu commented Dec 14, 2018

@pipermerriam the following issues still have not been addressed till this point and I have some questions.

  • Instead of using classmethod or staticmethod decorators, I was thinking that maybe we could instantiate an object in _init__.py like bn128 = BN128_Curve(), and the users could use this object itself directly. Need your suggestion on this.
  • I still haven't received a response to the comments raised by me here, and hence I am reverting back to the old phase itself (where instead of division we would be doing multiplication). I will open an issue for this bug and leave the code and tests unchanged in this issue.
  • I still haven't done the type hinting completely and hence didn't include it in tox.
  • Still have to replace asserts with Exceptions.
  • circleci was annoying with the installation of mypy-extensions, and hence I have flooded tox.ini with the installation of mypy-extensions. It's working for now, but might not be suboptimal.
  • Last but not least, we need a mechanism to validate the constants of the curves. For example, we need to be sure that the field_modulus is a prime number. I feel that these shouldn't be part of the tests and maybe we have to test the validity of these constants everytime we import the whole module or it's submodule. I was thinking maybe we can validate these constants in __init__.py. What do you say?

Apart from the above issues, I have fixed the following

  • Running of any part of the module takes the same time as the current live version.
  • All the original test cases have been used and modified so as to suit this refactored code and preserving the correctness.

Please review this. Thanks!

@Bhargavasomu
Copy link
Contributor Author

Type Hinting is done now.

@Bhargavasomu
Copy link
Contributor Author

Bhargavasomu commented Dec 16, 2018

@pipermerriam I have addressed all the above mentioned issues (except the bug of multiplication instead of division). Please take a look and we can maybe wrap this up.

@vbuterin
Copy link
Contributor

Responded to the issue in the other thread; basically, the code is correct and the comment is wrong, and it needs to use * w**2 in the BLS curve and / w**2 in the bn128 curve is because the relationship between b and b2 in the two curves is different.

@pipermerriam
Copy link
Member

pipermerriam commented Dec 17, 2018

Instead of using classmethod or staticmethod decorators, I was thinking that maybe we could instantiate an object in _init__.py like bn128 = BN128_Curve(), and the users could use this object itself directly. Need your suggestion on this.

These two things are not mutually exclusive. Even in the bn128 = BN128_Curve() model, I think the code benefits from marking the methods as staticmethod/classmethod appropriately if they don't use any self properties.

edit I realize that this fails for any that do rely on self based constants. In this model I think we fix that with having specific classes with those constants defined.

@pipermerriam
Copy link
Member

Last but not least, we need a mechanism to validate the constants of the curves. For example, we need to be sure that the field_modulus is a prime number. I feel that these shouldn't be part of the tests and maybe we have to test the validity of these constants everytime we import the whole module or it's submodule. I was thinking maybe we can validate these constants in __init__.py. What do you say?

Whatever the solution, it sounds like something that cannot be a runtime check. If it is fast an import time check seems fine. Having a test which checks this also seems fine.

@Bhargavasomu
Copy link
Contributor Author

@pipermerriam yes I have understood that, and I have added staticmethod and classmethod decorators as needed. All the issues that I have mentioned in the above comment have been resolved. Need you to review it. Thankyou!

@Bhargavasomu
Copy link
Contributor Author

Bhargavasomu commented Dec 17, 2018

@pipermerriam the validation checks that we are doing are fast enough for an import, and once the module is imported, it won't be called multiple times. But just to be sure I have cached that validation function. So I guess it's fine.

py_ecc/bn128_curve.py Outdated Show resolved Hide resolved
py_ecc/bn128_curve.py Outdated Show resolved Hide resolved
py_ecc/field_elements.py Outdated Show resolved Hide resolved
This is needed to obtain field_modulus, FQ2_MODULUS_COEFFS
and FQ12_MODULUS_COEFFS from the curve properties
"""
self.curve_name = curve_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut is telling me that this curve_name based approach could be improved. The primary downside that I want to get rid of is how much we seem to have to pass this parameter around. If this was instead a property of the class....

class FQ(ABC):
    field_modulus = None

    def __init__(self, val):
        if self.field_modulus is None:
           raise SomeError("...")

This sets a simple baseline for ensuring that FQ classes and their derivatives don't get instantiated without their field_modulus

class FQ_BN128(FQ):
    field_modulus = field_properties["bn128"]["field_modulus"]

Now we have an FQ_BN128 which represents the FQ for the bn128 curve.

class FQ:
    ...

    def __add__(self, other):
        on = other.n if isinstance(other, FQ) else other
        return type(self)(self.n + on) % self.field_modulus)

    @classmethod
    def one(cls):
        return cls(1)

This allows us to remove the passing around of the self.curve_name within the FQ object methods quite easily.

I did a quick search and this seems to cleanly remove the need to pass the curve name around. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pipermerriam I agree with this, but I was left with no other option. I have thought of this approach, but it would involve a lot of classes in the code. One class should be declared for each of the curves bn128, bls12_381, optimized_bn128, optimized_bls12_381 curve ; that too for only FQ. For FQP, FQ2 and FQ12, the same process needs to be repeated, and we would be ending up with 16 classes in total. Hence I just resented to this approach.

I know this design of using curve_name leads to a lot parameter passing, but I felt it was the best choice, given the above reasons. Please correct me if wrong.

for curve_name in field_properties:
# Check if field_modulus is prime
field_modulus = field_properties[curve_name]["field_modulus"]
if pow(2, field_modulus, field_modulus) != 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel dense. How does this check for primeness?

Copy link
Contributor Author

@Bhargavasomu Bhargavasomu Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pipermerriam I believe that is the result of the Fermat's Little Theorem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, the test correctly detects some, but not all cases where field_modulus is not prime. Maybe the comment could be updated since this is kind of a "smoke test" rather than a proof of primeness.

(For example, see 341 which is not prime, but would not raise this exception)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carver nice catch. Didn't realize that. Will fallback to the traditional O(sqrt(n)) way of checking a prime number. Thankyou!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the field_modulus and the other numbers that we are checking for primes, are quite large, the O(sqrt(n)) method is taking a lot of time. We might as well stay with the present smoke test.

setup.py Outdated Show resolved Hide resolved
@Bhargavasomu
Copy link
Contributor Author

Bhargavasomu commented Dec 18, 2018

@pipermerriam I have modified the code based on the above review. Please check again now and let me know if anything should be done before merging. Thankyou!

@Bhargavasomu Bhargavasomu changed the title [WIP] Refactoring the Codebase Refactoring the Codebase Dec 19, 2018
@Bhargavasomu
Copy link
Contributor Author

Assertions have now been replaced with suitable Exceptions, and additional Exceptions have been added wherever necessary. Please have a look.

@Bhargavasomu
Copy link
Contributor Author

@pipermerriam sorry for the multiple pings, but the WIP status has been removed, and is read for hopefully a final round of review. Thankyou.

@Bhargavasomu
Copy link
Contributor Author

Bhargavasomu commented Jan 8, 2019

@pipermerriam could you please update me on this?

@pipermerriam
Copy link
Member

Taking a look now

@pipermerriam
Copy link
Member

I'm unsure how to proceed right now. This codebase implements cryptographic primitives and thus is extremely sensitive and needs to be correct.

This pull request basically touches everything and does so in such a way that it's non-trivial to compare the before/after implementations to ensure that none of the functionality has changed. Similarly with the constants, there are lots of very large integer numbers and it's non-trivial to confirm that none of those numbers have changed.

Also, I recall the performance of this version being significantly worse than the current implementation. Has that been addressed?

Maybe there is a way to do this in a more iterative manner that touches a smaller chunks at a time... As it stands I can't justify merging this and I don't have an immediate suggestion for how to proceed.

@Bhargavasomu
Copy link
Contributor Author

@pipermerriam the performance issue has been resolved. Regarding the Iterative Procedure, since we have different curves, and we want to do it in a generic way, we would need to change all of them once I guess.

Similarly with the constants, there are lots of very large integer numbers and it's non-trivial to confirm that none of those numbers have changed.

We have a smoke test in place so as to ensure that the numbers are primes (with a general wave of hand though.) Further I have mimicked all the tests wrt to the new refactored way. But anyways I am ready to redo the work in a Iterative manner if you propose it.

@carver
Copy link
Collaborator

carver commented Jan 9, 2019

I am ready to redo the work in a Iterative manner if you propose it.

I agree that a series of smaller PRs would be the best way to get high quality code reviews.

I roughly aim to limit to low-100's of lines of code changed.

There are times when exceptions are reasonable, and we can reasonably review larger PRs. One example is when there is clearly no change in functionality, like:

  • moving files from one module to another
  • squashing a bunch of flake8 warnings after updating the config

Those large PRs should make exactly that one change, though.

@Bhargavasomu
Copy link
Contributor Author

@pipermerriam @carver I understand the reasoning for making the PR shorter. If I were in your places for reviewing, it would have been very confusing for me too. But I have no clue of how to make progress in this iteratively (sorry for being naive).

The main reason why I am saying that I don't know how to do this iteratively is, here we are aiming to remove redundant code by separating out the properties of the curve and passing it to the functions wherever redundant code is being used. With this process, I don't see how I could make it iterative. But if you guys can give me a plan to do this iteratively, please let me know so that I can start the work immediately.

@carver
Copy link
Collaborator

carver commented Jan 12, 2019

One option would be to leave the old curves implementations in, and add the new curve implementations one at a time. (yes, I know the first PR will still be big, but hopefully still significantly smaller) After the new ones are all added in each of their PRs, we can delete the old ones.

But this is just a guess. It will likely take some digging and playing on your part to figure out how to get it into smaller chunks. One thing I want to be able to easily do is verify that a single new API works the same way as the old API. Ideally, that would mean roughly one new API added (xor the refactor of one component) in each PR.


Also, it looks like a few basics can be split out, like the tox.ini or setup.py changes. Feel free to ignore whole files in flake8 if we know they will be eventually deleted (like some of the tests/).


If you really can't find any way to slice it up, then we will get to it eventually, but I can't make any promises about when. It's too sensitive to do a "merge and fix-forward" kind of approach.


Another approach to consider: split any API changes from major refactors. A major refactor shouldn't require any test changes, if the tests are written well.

Adding new APIs as an augmentation to old APIs can happen in their own PR, and you can write tests that compare the new and old APIs.

Then eventually the old APIs can be deleted at a major version bump. This is a common approach we take for other libraries.

@ghost ghost mentioned this pull request Feb 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants