Refactoring the Codebase #35

Bhargavasomu · 2018-12-10T19:17:54Z

What was wrong?

Fixes Issue: #31

Cute Animal Picture

Bhargavasomu · 2018-12-10T19:25:05Z

@pipermerriam I have not yet refactored the pairing part. But I have refactored the field_elements(so that all the curves can use the same code) and the curve implementation with the BaseCurve class implementation.

Please let me know if any changes are needed as far as design is concerned.
The other main thing that depresses me is that, I haven't changed the logic anywhere, but just because of the usage of classes and inheritance, the performance got worser for a test case. (Tested in ipython). Attaching the test case below.

In [2]: from py_ecc import bn128

In [3]: time bn128.multiply(bn128.G12, 100)
CPU times: user 68.3 ms, sys: 0 ns, total: 68.3 ms
Wall time: 67.5 ms
Out[3]: 
((0, 0, 6289622313332368275808678712703019680903267734760705664249053540194934405010, 0, 0, 0, 0, 0, 19765057837394621075835443081152314037672821821523780638388756299265215370750, 0, 0, 0),
 (0, 0, 0, 15824998746886404237735115559171157131044885936835913698791405128162671676610, 0, 0, 0, 0, 0, 10168740664245831497642710459131680020400674013216613216484698513127357137715, 0, 0))

In [4]: a1 = BN128_Curve()

In [5]: time a1.multiply(a1.G12, 100)
CPU times: user 5.27 s, sys: 7.1 ms, total: 5.27 s
Wall time: 5.27 s
Out[5]: 
((0, 0, 6289622313332368275808678712703019680903267734760705664249053540194934405010, 0, 0, 0, 0, 0, 19765057837394621075835443081152314037672821821523780638388756299265215370750, 0, 0, 0),
 (0, 0, 0, 15824998746886404237735115559171157131044885936835913698791405128162671676610, 0, 0, 0, 0, 0, 10168740664245831497642710459131680020400674013216613216484698513127357137715, 0, 0))

I could really use your help and thoughts on making this better. (Been stuck on this for the whole day)

pipermerriam

Updating all of the BaseCurve methods to be either

classmethod if it needs to access any other curve methods or properties of the curve
staticmethod if it is a pure function.

py_ecc/BaseCurve.py

pipermerriam

Regarding the performance issues. You should just do some profiling to see where all that extra time is being spent.

https://docs.python.org/3/library/profile.html

py_ecc/BaseCurve.py

py_ecc/curve_properties.py

py_ecc/field_elements.py

pipermerriam · 2018-12-10T19:50:11Z

@Bhargavasomu some preliminary review.

I don't think that the performance hit you're seeing is something we'll be able to merge so that will need to be addressed. It doesn't have to be equally fast, but it needs to be in the same order of magnitude.

Bhargavasomu · 2018-12-12T12:47:44Z

@pipermerriam found out the devil causing more time and have fixed that. Just waiting for a reply from vitalik as part of #24 (comment). Will make the next PR after that.

But one more thing that I noticed is that, after we refactor this, we would need to change the code in
py-evm as well. For example, we have many places where bn128.FQ2 is used. But as part of this refactor FQ2 or any FQ* for that matter will not be binded to the curve anymore(like bn128 or bls12_381 etc.). So such changes have to be made in py-evm as well. My doubt was that if we merged this without changing the
py-evm code which uses this library, then the tests would crash( if at all we test the precompiles as of now). So how do we go about this?

pipermerriam · 2018-12-12T19:41:47Z

@Bhargavasomu I think we'll need to cut a v2 release to account for the breaking API change and more formally, we need to establish some documentation and boundaries for public/private API since currently everything in the library is effectively public API.

Bhargavasomu · 2018-12-14T05:07:22Z

@pipermerriam the following issues still have not been addressed till this point and I have some questions.

Instead of using classmethod or staticmethod decorators, I was thinking that maybe we could instantiate an object in _init__.py like bn128 = BN128_Curve(), and the users could use this object itself directly. Need your suggestion on this.
I still haven't received a response to the comments raised by me here, and hence I am reverting back to the old phase itself (where instead of division we would be doing multiplication). I will open an issue for this bug and leave the code and tests unchanged in this issue.
~~I still haven't done the type hinting completely and hence didn't include it in tox.~~
Still have to replace asserts with Exceptions.
circleci was annoying with the installation of mypy-extensions, and hence I have flooded tox.ini with the installation of mypy-extensions. It's working for now, but might not be suboptimal.
Last but not least, we need a mechanism to validate the constants of the curves. For example, we need to be sure that the field_modulus is a prime number. I feel that these shouldn't be part of the tests and maybe we have to test the validity of these constants everytime we import the whole module or it's submodule. I was thinking maybe we can validate these constants in __init__.py. What do you say?

Apart from the above issues, I have fixed the following

Running of any part of the module takes the same time as the current live version.
All the original test cases have been used and modified so as to suit this refactored code and preserving the correctness.

Please review this. Thanks!

Bhargavasomu · 2018-12-14T08:05:16Z

Type Hinting is done now.

Bhargavasomu · 2018-12-16T09:33:20Z

@pipermerriam I have addressed all the above mentioned issues (~~except the bug of multiplication instead of division~~). Please take a look and we can maybe wrap this up.

vbuterin · 2018-12-16T10:11:01Z

Responded to the issue in the other thread; basically, the code is correct and the comment is wrong, and it needs to use * w**2 in the BLS curve and / w**2 in the bn128 curve is because the relationship between b and b2 in the two curves is different.

pipermerriam · 2018-12-17T16:12:21Z

Instead of using classmethod or staticmethod decorators, I was thinking that maybe we could instantiate an object in _init__.py like bn128 = BN128_Curve(), and the users could use this object itself directly. Need your suggestion on this.

These two things are not mutually exclusive. Even in the bn128 = BN128_Curve() model, I think the code benefits from marking the methods as staticmethod/classmethod appropriately if they don't use any self properties.

edit I realize that this fails for any that do rely on self based constants. In this model I think we fix that with having specific classes with those constants defined.

pipermerriam · 2018-12-17T16:15:46Z

Last but not least, we need a mechanism to validate the constants of the curves. For example, we need to be sure that the field_modulus is a prime number. I feel that these shouldn't be part of the tests and maybe we have to test the validity of these constants everytime we import the whole module or it's submodule. I was thinking maybe we can validate these constants in __init__.py. What do you say?

Whatever the solution, it sounds like something that cannot be a runtime check. If it is fast an import time check seems fine. Having a test which checks this also seems fine.

Bhargavasomu · 2018-12-17T16:15:47Z

@pipermerriam yes I have understood that, and I have added staticmethod and classmethod decorators as needed. All the issues that I have mentioned in the above comment have been resolved. Need you to review it. Thankyou!

Bhargavasomu · 2018-12-17T16:17:53Z

@pipermerriam the validation checks that we are doing are fast enough for an import, and once the module is imported, it won't be called multiple times. But just to be sure I have cached that validation function. So I guess it's fine.

py_ecc/bn128_curve.py

py_ecc/field_elements.py

pipermerriam · 2018-12-17T16:53:34Z

py_ecc/field_elements.py

+        This is needed to obtain field_modulus, FQ2_MODULUS_COEFFS
+        and FQ12_MODULUS_COEFFS from the curve properties
+        """
+        self.curve_name = curve_name


My gut is telling me that this curve_name based approach could be improved. The primary downside that I want to get rid of is how much we seem to have to pass this parameter around. If this was instead a property of the class....

class FQ(ABC): field_modulus = None def __init__(self, val): if self.field_modulus is None: raise SomeError("...")

This sets a simple baseline for ensuring that FQ classes and their derivatives don't get instantiated without their field_modulus

class FQ_BN128(FQ): field_modulus = field_properties["bn128"]["field_modulus"]

Now we have an FQ_BN128 which represents the FQ for the bn128 curve.

class FQ: ... def __add__(self, other): on = other.n if isinstance(other, FQ) else other return type(self)(self.n + on) % self.field_modulus) @classmethod def one(cls): return cls(1)

This allows us to remove the passing around of the self.curve_name within the FQ object methods quite easily.

I did a quick search and this seems to cleanly remove the need to pass the curve name around. What do you think?

@pipermerriam I agree with this, but I was left with no other option. I have thought of this approach, but it would involve a lot of classes in the code. One class should be declared for each of the curves bn128, bls12_381, optimized_bn128, optimized_bls12_381 curve ; that too for only FQ. For FQP, FQ2 and FQ12, the same process needs to be repeated, and we would be ending up with 16 classes in total. Hence I just resented to this approach.

I know this design of using curve_name leads to a lot parameter passing, but I felt it was the best choice, given the above reasons. Please correct me if wrong.

pipermerriam · 2018-12-17T16:55:02Z

py_ecc/validate_constants.py

+    for curve_name in field_properties:
+        # Check if field_modulus is prime
+        field_modulus = field_properties[curve_name]["field_modulus"]
+        if pow(2, field_modulus, field_modulus) != 2:


I feel dense. How does this check for primeness?

@pipermerriam I believe that is the result of the Fermat's Little Theorem.

If I'm reading this correctly, the test correctly detects some, but not all cases where field_modulus is not prime. Maybe the comment could be updated since this is kind of a "smoke test" rather than a proof of primeness.

(For example, see 341 which is not prime, but would not raise this exception)

@carver nice catch. Didn't realize that. Will fallback to the traditional O(sqrt(n)) way of checking a prime number. Thankyou!

Since the field_modulus and the other numbers that we are checking for primes, are quite large, the O(sqrt(n)) method is taking a lot of time. We might as well stay with the present smoke test.

setup.py

Bhargavasomu · 2018-12-18T11:58:55Z

@pipermerriam I have modified the code based on the above review. Please check again now and let me know if anything should be done before merging. Thankyou!

Bhargavasomu · 2018-12-19T13:56:36Z

Assertions have now been replaced with suitable Exceptions, and additional Exceptions have been added wherever necessary. Please have a look.

Bhargavasomu · 2018-12-20T04:12:30Z

@pipermerriam sorry for the multiple pings, but the WIP status has been removed, and is read for hopefully a final round of review. Thankyou.

Bhargavasomu · 2019-01-08T09:58:28Z

@pipermerriam could you please update me on this?

pipermerriam · 2019-01-08T20:55:12Z

Taking a look now

pipermerriam · 2019-01-08T21:14:41Z

I'm unsure how to proceed right now. This codebase implements cryptographic primitives and thus is extremely sensitive and needs to be correct.

This pull request basically touches everything and does so in such a way that it's non-trivial to compare the before/after implementations to ensure that none of the functionality has changed. Similarly with the constants, there are lots of very large integer numbers and it's non-trivial to confirm that none of those numbers have changed.

Also, I recall the performance of this version being significantly worse than the current implementation. Has that been addressed?

Maybe there is a way to do this in a more iterative manner that touches a smaller chunks at a time... As it stands I can't justify merging this and I don't have an immediate suggestion for how to proceed.

Bhargavasomu · 2019-01-09T10:09:17Z

@pipermerriam the performance issue has been resolved. Regarding the Iterative Procedure, since we have different curves, and we want to do it in a generic way, we would need to change all of them once I guess.

Similarly with the constants, there are lots of very large integer numbers and it's non-trivial to confirm that none of those numbers have changed.

We have a smoke test in place so as to ensure that the numbers are primes (with a general wave of hand though.) Further I have mimicked all the tests wrt to the new refactored way. But anyways I am ready to redo the work in a Iterative manner if you propose it.

carver · 2019-01-09T23:07:17Z

I am ready to redo the work in a Iterative manner if you propose it.

I agree that a series of smaller PRs would be the best way to get high quality code reviews.

I roughly aim to limit to low-100's of lines of code changed.

There are times when exceptions are reasonable, and we can reasonably review larger PRs. One example is when there is clearly no change in functionality, like:

moving files from one module to another
squashing a bunch of flake8 warnings after updating the config

Those large PRs should make exactly that one change, though.

Bhargavasomu · 2019-01-11T15:24:02Z

@pipermerriam @carver I understand the reasoning for making the PR shorter. If I were in your places for reviewing, it would have been very confusing for me too. But I have no clue of how to make progress in this iteratively (sorry for being naive).

The main reason why I am saying that I don't know how to do this iteratively is, here we are aiming to remove redundant code by separating out the properties of the curve and passing it to the functions wherever redundant code is being used. With this process, I don't see how I could make it iterative. But if you guys can give me a plan to do this iteratively, please let me know so that I can start the work immediately.

carver · 2019-01-12T00:58:56Z

One option would be to leave the old curves implementations in, and add the new curve implementations one at a time. (yes, I know the first PR will still be big, but hopefully still significantly smaller) After the new ones are all added in each of their PRs, we can delete the old ones.

But this is just a guess. It will likely take some digging and playing on your part to figure out how to get it into smaller chunks. One thing I want to be able to easily do is verify that a single new API works the same way as the old API. Ideally, that would mean roughly one new API added (xor the refactor of one component) in each PR.

Also, it looks like a few basics can be split out, like the tox.ini or setup.py changes. Feel free to ignore whole files in flake8 if we know they will be eventually deleted (like some of the tests/).

If you really can't find any way to slice it up, then we will get to it eventually, but I can't make any promises about when. It's too sensitive to do a "merge and fix-forward" kind of approach.

Another approach to consider: split any API changes from major refactors. A major refactor shouldn't require any test changes, if the tests are written well.

Adding new APIs as an augmentation to old APIs can happen in their own PR, and you can write tests that compare the new and old APIs.

Then eventually the old APIs can be deleted at a major version bump. This is a common approach we take for other libraries.

pipermerriam reviewed Dec 10, 2018

View reviewed changes

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved

pipermerriam reviewed Dec 10, 2018

View reviewed changes

py_ecc/BaseCurve.py Outdated Show resolved Hide resolved

py_ecc/curve_properties.py Outdated Show resolved Hide resolved

py_ecc/field_elements.py Outdated Show resolved Hide resolved

Bhargavasomu changed the title ~~Refactoring in Naive Form~~ [WIP] Refactoring in Naive Form Dec 11, 2018

pipermerriam force-pushed the master branch from 3e76595 to 3bf3965 Compare December 11, 2018 18:51

Bhargavasomu changed the title ~~[WIP] Refactoring in Naive Form~~ [WIP] Refactoring the Codebase Dec 13, 2018

Bhargavasomu force-pushed the refactor branch from a005da2 to 24e6ba9 Compare December 13, 2018 19:21

Bhargavasomu added 2 commits December 14, 2018 09:52

Refactor codebase with BaseCurve and Properties Dict

1a66554

Modify tests as per the refactored API

1e077ce

Bhargavasomu force-pushed the refactor branch 2 times, most recently from 6db742d to 9694535 Compare December 14, 2018 04:38

Update pytest version in setup

b3f0192

Bhargavasomu force-pushed the refactor branch from 9694535 to b3f0192 Compare December 14, 2018 04:51

Add type hinting for entire module

d89bee8

Bhargavasomu added 4 commits December 16, 2018 12:01

Update all the BaseCurve methods to either staticmethod or classmethod

c4707c9

Add instance objects of curves as part of API usage

b1330b7

Replace some asserts with exceptions

e502901

Add validation of constants at the time of import

00e65a8

Bhargavasomu force-pushed the refactor branch from 187d125 to 00e65a8 Compare December 16, 2018 09:26

pipermerriam reviewed Dec 17, 2018

View reviewed changes

Bhargavasomu added 2 commits December 17, 2018 23:27

Add TypedDict to install_requires

10e89d8

Further replace all the assertions with exceptions

12dfa2f

Bhargavasomu force-pushed the refactor branch from 7c08c2e to 12dfa2f Compare December 18, 2018 10:22

Make curve_properties of type NamedTuple instead of TypedDict

9a17072

Bhargavasomu changed the title ~~[WIP] Refactoring the Codebase~~ Refactoring the Codebase Dec 19, 2018

hwwhww mentioned this pull request Dec 19, 2018

Routine for processing deposits ethereum/py-evm#1640

Merged

ghost mentioned this pull request Feb 18, 2019

void gitcoinco/web#3765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring the Codebase #35

Refactoring the Codebase #35

Bhargavasomu commented Dec 10, 2018

Bhargavasomu commented Dec 10, 2018 •

edited

Loading

pipermerriam left a comment

pipermerriam left a comment

pipermerriam commented Dec 10, 2018

Bhargavasomu commented Dec 12, 2018

pipermerriam commented Dec 12, 2018

Bhargavasomu commented Dec 14, 2018 •

edited

Loading

Bhargavasomu commented Dec 14, 2018

Bhargavasomu commented Dec 16, 2018 •

edited

Loading

vbuterin commented Dec 16, 2018

pipermerriam commented Dec 17, 2018 •

edited

Loading

pipermerriam commented Dec 17, 2018

Bhargavasomu commented Dec 17, 2018

Bhargavasomu commented Dec 17, 2018 •

edited

Loading

pipermerriam Dec 17, 2018

Bhargavasomu Dec 17, 2018

pipermerriam Dec 17, 2018

Bhargavasomu Dec 17, 2018 •

edited

Loading

carver Dec 17, 2018

Bhargavasomu Dec 18, 2018

Bhargavasomu Dec 18, 2018

Bhargavasomu commented Dec 18, 2018 •

edited

Loading

Bhargavasomu commented Dec 19, 2018

Bhargavasomu commented Dec 20, 2018

Bhargavasomu commented Jan 8, 2019 •

edited

Loading

pipermerriam commented Jan 8, 2019

pipermerriam commented Jan 8, 2019

Bhargavasomu commented Jan 9, 2019

carver commented Jan 9, 2019 •

edited

Loading

Bhargavasomu commented Jan 11, 2019

carver commented Jan 12, 2019

Refactoring the Codebase #35

Are you sure you want to change the base?

Refactoring the Codebase #35

Conversation

Bhargavasomu commented Dec 10, 2018

What was wrong?

Cute Animal Picture

Bhargavasomu commented Dec 10, 2018 • edited Loading

pipermerriam left a comment

Choose a reason for hiding this comment

pipermerriam left a comment

Choose a reason for hiding this comment

pipermerriam commented Dec 10, 2018

Bhargavasomu commented Dec 12, 2018

pipermerriam commented Dec 12, 2018

Bhargavasomu commented Dec 14, 2018 • edited Loading

Bhargavasomu commented Dec 14, 2018

Bhargavasomu commented Dec 16, 2018 • edited Loading

vbuterin commented Dec 16, 2018

pipermerriam commented Dec 17, 2018 • edited Loading

pipermerriam commented Dec 17, 2018

Bhargavasomu commented Dec 17, 2018

Bhargavasomu commented Dec 17, 2018 • edited Loading

pipermerriam Dec 17, 2018

Choose a reason for hiding this comment

Bhargavasomu Dec 17, 2018

Choose a reason for hiding this comment

pipermerriam Dec 17, 2018

Choose a reason for hiding this comment

Bhargavasomu Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

carver Dec 17, 2018

Choose a reason for hiding this comment

Bhargavasomu Dec 18, 2018

Choose a reason for hiding this comment

Bhargavasomu Dec 18, 2018

Choose a reason for hiding this comment

Bhargavasomu commented Dec 18, 2018 • edited Loading

Bhargavasomu commented Dec 19, 2018

Bhargavasomu commented Dec 20, 2018

Bhargavasomu commented Jan 8, 2019 • edited Loading

pipermerriam commented Jan 8, 2019

pipermerriam commented Jan 8, 2019

Bhargavasomu commented Jan 9, 2019

carver commented Jan 9, 2019 • edited Loading

Bhargavasomu commented Jan 11, 2019

carver commented Jan 12, 2019

Bhargavasomu commented Dec 10, 2018 •

edited

Loading

Bhargavasomu commented Dec 14, 2018 •

edited

Loading

Bhargavasomu commented Dec 16, 2018 •

edited

Loading

pipermerriam commented Dec 17, 2018 •

edited

Loading

Bhargavasomu commented Dec 17, 2018 •

edited

Loading

Bhargavasomu Dec 17, 2018 •

edited

Loading

Bhargavasomu commented Dec 18, 2018 •

edited

Loading

Bhargavasomu commented Jan 8, 2019 •

edited

Loading

carver commented Jan 9, 2019 •

edited

Loading