Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groenenboomj/fixes causal #575

Open
wants to merge 12 commits into
base: main_perf
Choose a base branch
from
Open

Commits on Jul 16, 2024

  1. Add Perf Kernels

    Add Perf Kernels
    
    This is a combination of 2 commits.
    
    Add Perf Kernels
    
    Add Perf Kernels
    
    This is a combination of 6 commits.
    
    add perf-kernels
    
    fix formating issues
    
    fix unused variables and other bugs
    
    fix other issues
    
    remove scripts
    
    save
    
    check changes
    
    format
    
    save
    
    save
    
    try
    
    pre-commit check
    
    save
    micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    2d2dbe1 View commit details
    Browse the repository at this point in the history
  2. skip backward (#586)

    micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    17575ea View commit details
    Browse the repository at this point in the history
  3. Change all block pointers to tensor pointers (#585)

    Change all block pointers to tensor pointers
    
    Block pointers are for nvidia TMAs. They are useful for regular loads as well but not well supported.
    
    Also cleaned up some code I came across along the way and updated comment at the top.
    vgokhale authored and micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    a3d784a View commit details
    Browse the repository at this point in the history
  4. Add support for bshd layout (#587)

    Add support for layouts commonly used by users.
    
    Add option for varlen / thd layout to specify equal context lengths for all batches. Also often used by users.
    vgokhale authored and micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    aa6685a View commit details
    Browse the repository at this point in the history
  5. Post-Merge CI (#612)

    * remove on push for Integration Tests
    
    * rename
    
    * add post merge test
    
    * save
    
    * dtype params
    
    * skip bad config
    
    * fix more stuff
    micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    dbe1173 View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2024

  1. Increase CI timeout (#615)

    Increase CI timeout
    vgokhale authored Jul 18, 2024
    Configuration menu
    Copy the full SHA
    23ba546 View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2024

  1. Couple of FA optimizations (#608)

    Couple of FA optimizations
    
    Set SM scale multiplication to a constexpr. Minor asm improvement.
    
    Changed acc scaling to adjust for softmax division to
    multiplication with reciprocal. ~10% perf improvement.
    
    ---------
    
    Co-authored-by: Michael Melesse <[email protected]>
    vgokhale and micmelesse authored Jul 19, 2024
    Configuration menu
    Copy the full SHA
    df4c4d3 View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. streamk v0.1 (#619)

    * streamk v0.1
    
    * remove unused variable
    
    * fix format issues
    
    * add README
    
    * fix format issue
    
    * change num_sms to num_cus
    xiaohuguo2023 authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    52a908f View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2024

  1. Add explicit multiply-reduce GEMM kernel (#621)

    * Add explicit multiply-reduce GEMM kernel
    
    * Remove `SPLIT_K` argument from kernel
    
    * Remove `GROUP_SIZE_M` argument from kernel
    
    * Remove conditional call to `tl.dot` from kernel
    
    * Remove table with performance data from README
    brunomazzottiamd authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    1d2e066 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. Configuration menu
    Copy the full SHA
    51d0d92 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ae4633c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    550f395 View commit details
    Browse the repository at this point in the history