-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring the Codebase #35
base: main
Are you sure you want to change the base?
Conversation
@pipermerriam I have not yet refactored the
I could really use your help and thoughts on making this better. (Been stuck on this for the whole day) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating all of the BaseCurve
methods to be either
classmethod
if it needs to access any other curve methods or properties of the curvestaticmethod
if it is a pure function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the performance issues. You should just do some profiling to see where all that extra time is being spent.
@Bhargavasomu some preliminary review. I don't think that the performance hit you're seeing is something we'll be able to merge so that will need to be addressed. It doesn't have to be equally fast, but it needs to be in the same order of magnitude. |
@pipermerriam found out the devil causing more time and have fixed that. Just waiting for a reply from vitalik as part of #24 (comment). Will make the next PR after that. But one more thing that I noticed is that, after we refactor this, we would need to change the code in |
@Bhargavasomu I think we'll need to cut a v2 release to account for the breaking API change and more formally, we need to establish some documentation and boundaries for public/private API since currently everything in the library is effectively public API. |
a005da2
to
24e6ba9
Compare
6db742d
to
9694535
Compare
9694535
to
b3f0192
Compare
@pipermerriam the following issues still have not been addressed till this point and I have some questions.
Apart from the above issues, I have fixed the following
Please review this. Thanks! |
|
187d125
to
00e65a8
Compare
@pipermerriam I have addressed all the above mentioned issues ( |
Responded to the issue in the other thread; basically, the code is correct and the comment is wrong, and it needs to use |
These two things are not mutually exclusive. Even in the edit I realize that this fails for any that do rely on |
Whatever the solution, it sounds like something that cannot be a runtime check. If it is fast an import time check seems fine. Having a test which checks this also seems fine. |
@pipermerriam yes I have understood that, and I have added |
@pipermerriam the validation checks that we are doing are fast enough for an |
This is needed to obtain field_modulus, FQ2_MODULUS_COEFFS | ||
and FQ12_MODULUS_COEFFS from the curve properties | ||
""" | ||
self.curve_name = curve_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut is telling me that this curve_name
based approach could be improved. The primary downside that I want to get rid of is how much we seem to have to pass this parameter around. If this was instead a property of the class....
class FQ(ABC):
field_modulus = None
def __init__(self, val):
if self.field_modulus is None:
raise SomeError("...")
This sets a simple baseline for ensuring that FQ
classes and their derivatives don't get instantiated without their field_modulus
class FQ_BN128(FQ):
field_modulus = field_properties["bn128"]["field_modulus"]
Now we have an FQ_BN128
which represents the FQ
for the bn128 curve.
class FQ:
...
def __add__(self, other):
on = other.n if isinstance(other, FQ) else other
return type(self)(self.n + on) % self.field_modulus)
@classmethod
def one(cls):
return cls(1)
This allows us to remove the passing around of the self.curve_name
within the FQ
object methods quite easily.
I did a quick search and this seems to cleanly remove the need to pass the curve name around. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pipermerriam I agree with this, but I was left with no other option. I have thought of this approach, but it would involve a lot of classes in the code. One class should be declared for each of the curves bn128
, bls12_381
, optimized_bn128
, optimized_bls12_381
curve ; that too for only FQ
. For FQP
, FQ2
and FQ12
, the same process needs to be repeated, and we would be ending up with 16 classes
in total. Hence I just resented to this approach.
I know this design of using curve_name
leads to a lot parameter passing, but I felt it was the best choice, given the above reasons. Please correct me if wrong.
for curve_name in field_properties: | ||
# Check if field_modulus is prime | ||
field_modulus = field_properties[curve_name]["field_modulus"] | ||
if pow(2, field_modulus, field_modulus) != 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel dense. How does this check for primeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pipermerriam I believe that is the result of the Fermat's Little Theorem
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly, the test correctly detects some, but not all cases where field_modulus
is not prime. Maybe the comment could be updated since this is kind of a "smoke test" rather than a proof of primeness.
(For example, see 341
which is not prime, but would not raise this exception)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carver nice catch. Didn't realize that. Will fallback to the traditional O(sqrt(n))
way of checking a prime number. Thankyou!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the field_modulus
and the other numbers that we are checking for primes, are quite large, the O(sqrt(n))
method is taking a lot of time. We might as well stay with the present smoke test
.
7c08c2e
to
12dfa2f
Compare
@pipermerriam I have modified the code based on the above review. Please check again now and let me know if anything should be done before merging. Thankyou! |
|
@pipermerriam sorry for the multiple pings, but the |
@pipermerriam could you please update me on this? |
Taking a look now |
I'm unsure how to proceed right now. This codebase implements cryptographic primitives and thus is extremely sensitive and needs to be correct. This pull request basically touches everything and does so in such a way that it's non-trivial to compare the before/after implementations to ensure that none of the functionality has changed. Similarly with the constants, there are lots of very large integer numbers and it's non-trivial to confirm that none of those numbers have changed. Also, I recall the performance of this version being significantly worse than the current implementation. Has that been addressed? Maybe there is a way to do this in a more iterative manner that touches a smaller chunks at a time... As it stands I can't justify merging this and I don't have an immediate suggestion for how to proceed. |
@pipermerriam the
We have a smoke test in place so as to ensure that the numbers are primes (with a general wave of hand though.) Further I have mimicked all the tests wrt to the new refactored way. But anyways I am ready to redo the work in a Iterative manner if you propose it. |
I agree that a series of smaller PRs would be the best way to get high quality code reviews. I roughly aim to limit to low-100's of lines of code changed. There are times when exceptions are reasonable, and we can reasonably review larger PRs. One example is when there is clearly no change in functionality, like:
Those large PRs should make exactly that one change, though. |
@pipermerriam @carver I understand the reasoning for making the PR shorter. If I were in your places for reviewing, it would have been very confusing for me too. But I have no clue of how to make progress in this iteratively (sorry for being naive). The main reason why I am saying that I don't know how to do this iteratively is, here we are aiming to remove redundant code by separating out the properties of the curve and passing it to the functions wherever redundant code is being used. With this process, I don't see how I could make it iterative. But if you guys can give me a plan to do this iteratively, please let me know so that I can start the work immediately. |
One option would be to leave the old curves implementations in, and add the new curve implementations one at a time. (yes, I know the first PR will still be big, but hopefully still significantly smaller) After the new ones are all added in each of their PRs, we can delete the old ones. But this is just a guess. It will likely take some digging and playing on your part to figure out how to get it into smaller chunks. One thing I want to be able to easily do is verify that a single new API works the same way as the old API. Ideally, that would mean roughly one new API added (xor the refactor of one component) in each PR. Also, it looks like a few basics can be split out, like the tox.ini or setup.py changes. Feel free to ignore whole files in flake8 if we know they will be eventually deleted (like some of the If you really can't find any way to slice it up, then we will get to it eventually, but I can't make any promises about when. It's too sensitive to do a "merge and fix-forward" kind of approach. Another approach to consider: split any API changes from major refactors. A major refactor shouldn't require any test changes, if the tests are written well. Adding new APIs as an augmentation to old APIs can happen in their own PR, and you can write tests that compare the new and old APIs. Then eventually the old APIs can be deleted at a major version bump. This is a common approach we take for other libraries. |
What was wrong?
Fixes Issue: #31
Cute Animal Picture