Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored and Optimized Logic:: Parser Logic & Shared Modules #283

Merged
merged 14 commits into from
Oct 27, 2024

Conversation

IIITM-Jay
Copy link
Member

@IIITM-Jay IIITM-Jay commented Sep 16, 2024

This PR intends to refactor the parse.py to achieve maintainability and scalability.

Scopes Covered

  1. Modularization: Breaking the codes in small and maintainable helper functions containing understandable lines of code
  2. Optimization: Reducing number of lines of codes using techniques such as list comprehensions etc.
  3. Removing Redundancy: Removing duplicacy and utilizing reusablity of logics
  4. Refactorization: Refactor a long script into dedicated scripts for scalability

Approach Followed

  1. The parse.py file now contains only the main function where it calls respective scripts for generating the output based on extensions selected.
  2. Common methods/ functions are moved to shared_utils.py so that other scripts can re-use them efficiently. These functions are shared between various extension scripts
  3. The output for Latex is refactored and optimized in larex_utils.py
  4. Scripts: parse.py and shared_utils.py are refactored, modularized and optimized as well
  5. Separate dedicated scripts being created for each extension

Needs to be done

  • The codes for other extension outputs are only moved to their respective scripts. They require improvements and enhancements. They are just simply taken from parse.py to their modules like c_utils.py for generating c based output
  • In the shared_utils.py, the method create_inst_dict() is yet to be refactored.

@IIITM-Jay
Copy link
Member Author

IIITM-Jay commented Sep 16, 2024

Hi @aswaterman and @rpsene, I have made an attempt to refactor the parse.py. For now, only latex_utils.py, shared_utils.py and parse.py and refactored, optimized and modularized.

The test cases are failing as we need to modify the tests.py as well based on the accepted refactored code.
P.S. Note: For the time being, I have imported the shared module methods for running the test cases, now all checks passed

Requesting feedback on the modified code and suggestions on what best we can do to achieve maintainability and scalability.

@IIITM-Jay
Copy link
Member Author

Hi @aswaterman, in addition to the written explanation of the parsing logic inside "Flow of parse.py" section of README file, I believe including flowcharts would greatly enhance the clarity and readability of the process. By visualizing the flow, readers can get a clearer understanding of the key steps involved in parsing instruction encodings.

Like, we have three main steps with each having sequence of procedures:
1. The first pass, we cover only the regular expression and follow the below steps:

flowchart TD
    A[Start: parse.py] --> B[Create list of all rv* files]
    B --> C{File contains regular instructions?}
    C -->|Yes| D[Parse file line by line]
    D --> E[Perform checks on regular instructions]
    
    E --> F[Check 1: msb > lsb in range assignment]
    E --> G[Check 2: Value representable in range]
    E --> H[Check 3: No multiple assignments to same bit]
    E --> I[Check 4: All bit positions must be accounted for]
    
    F --> J[Pass checks?]
    G --> J
    H --> J
    I --> J
    
    J -->|Yes| K[Create dictionary for regular instruction]
    K --> L[Add encoding, extension, mask, match, variable_fields]
    L --> M[Add to instr_dict]
    
    M --> N[Process next regular instruction]
    N -->|All regular instructions processed| O[End of Regular Instruction Parsing]
Loading

@IIITM-Jay
Copy link
Member Author

2. In the second pass, we do the checks for pseudo_instr carrying out similar procedure.

3. In the last step, the output generation,

flowchart TD
    A[Start Output Generation] --> B[Generate LaTeX tables]
    B --> C[Generate encoding.h file]
    C --> D[Generate other artifacts]

    D --> E[Output files generated]
    E --> F[End]

Loading

@IIITM-Jay
Copy link
Member Author

IIITM-Jay commented Sep 21, 2024

@aswaterman and @rpsene , These flowchart creation process are markdown friendly. I think it will also help to give a nutshell view of what we are doing. Let me know the feedback and suggestions, if these will add up to an enhancement for the existing repository.

@IIITM-Jay
Copy link
Member Author

IIITM-Jay commented Sep 25, 2024

@aswaterman and @rpsene , Refactored and Optimized the method used for creating instruction dictionary.

P.S. The second point from Needs to be Done header in PR description is ticked(marked as completed) as of now

@IIITM-Jay
Copy link
Member Author

IIITM-Jay commented Oct 2, 2024

The conflicts arises as after the commit w.r.t walrus operator made in parse.py.

@aswaterman, the walrus operator (:=) is compatible with Python 3.8 and later versions, as it was introduced in Python 3.8, and it allows assignment within an expression, making it useful for situations where we want to assign and evaluate a variable in the same line.
As far as I know, if I may not be wrong, python 3.6 is already having security vulnerabilities and reached its end in 2021.

So, wanted to know, whether we need to keep in mind with older versions compatibility of Python while doing refactorization and optimization.
Thanks!

@aswaterman
Copy link
Member

I'll try to provide feedback on this PR soon. To answer the immediate question: if we have good reason to start making extensive use of newer Python features, I have no objections to upgrading, but I did not think that a single use of the walrus operator was a sufficient justification for requiring a newer Python version.

@IIITM-Jay
Copy link
Member Author

but I did not think that a single use of the walrus operator was a sufficient justification for requiring a newer Python version.

Yes, I do agree with this. While going through the files in this repository, I found that the walrus operator is the only python 3.8 version feature that is being used. Although there are lot of other features in later versions such as Positional-Only Parameters, TypeDict etc., but those need not be used extensively here as they are not required and they do not add any additional benefits

@aswaterman
Copy link
Member

You've convinced me that other new features are a good enough justification to move to 3.8.

Copy link
Member

@aswaterman aswaterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very reasonable refactoring. I presume at this point there will be some merge conflicts with respect to other recent changes to the master branch, but feel free to merge once you've resolved them.

@riscv riscv deleted a comment from codecov-commenter Oct 20, 2024
@IIITM-Jay IIITM-Jay changed the title Refactored and Optimized Logic:: Parser Logic, Latex Based Output & Shared Modules Refactored and Optimized Logic:: Parser Logic & Shared Modules Oct 24, 2024
@IIITM-Jay
Copy link
Member Author

IIITM-Jay commented Oct 24, 2024

@aswaterman and @rpsene , Cleaned up the codes for parsing logic. I am only keeping the shared methods/ functions in the shared_utils and moved all the related scripts for generating respective artifacts to their own scripts. I have renamed the title so that to be clear that this PR intends only for refactoring the main parsing logic and all the shared codes that are exchanged between different scripts in order to enhance the maintainability and readability using best optimization techniques and best coding standards & practices.

Will raise separate PR(s) for supporting refactorization and modularization of each dedicated scripts in very near time, as this will not only help in review smaller PRs but will also help to debug later in the future.

These would be as VIZ:

  1. Latex Based Outputs - [instr-table.tex, priv-instr-table.tex]
  2. C Output - [encoding.out.h]
  3. Chisel
  4. Sverilog
  5. Rust
  6. GO

Just giving a final check before merging, matching the generated outputs with this approach and those that were generated earlier with existing codes.

I have also verified this logic and modular approach with optimized techniques, on applying generating asciidoc outputs that is in the formats html and pdf. And that worked well. I will raise a PR for that too as it can also be considered as widely used formats by users

@IIITM-Jay
Copy link
Member Author

IIITM-Jay commented Oct 27, 2024

@aswaterman revised the PR with updated code regarding to the newer latest PR #303.

Merging it now.

@IIITM-Jay IIITM-Jay merged commit 25c09e6 into master Oct 27, 2024
2 checks passed
@IIITM-Jay IIITM-Jay deleted the latex-based-output-refactor branch October 27, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants