Skip to content

Commit

Permalink
add bibtex & replace text with logo
Browse files Browse the repository at this point in the history
  • Loading branch information
yzh119 committed Feb 25, 2024
1 parent b558807 commit 8e7a2f6
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 16 deletions.
15 changes: 2 additions & 13 deletions _includes/footer.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,11 @@

<div class="wrapper">

<h2 class="footer-heading">{{ site.title | escape }}</h2>
<h2 class="footer-heading"><img src="/assets/imgs/FlashInfer-white-background.png" alt="FlashInfer" width="120"></h2>

<div class="footer-col-wrapper">
<div class="footer-col footer-col-1">
<ul class="contact-list">
<li class="p-name">
{%- if site.author -%}
{{ site.author | escape }}
{%- else -%}
{{ site.title | escape }}
{%- endif -%}
</li>
{%- if site.email -%}
<li><a class="u-email" href="mailto:{{ site.email }}">{{ site.email }}</a></li>
{%- endif -%}
</ul>
<p> Copyright © 2023-2024, FlashInfer team</p>
</div>

<div class="footer-col footer-col-2">
Expand Down
4 changes: 2 additions & 2 deletions _includes/header.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<div class="wrapper">
{%- assign default_paths = site.pages | map: "path" -%}
{%- assign page_paths = site.header_pages | default: default_paths -%}
<a class="site-title" rel="author" href="{{ "/" | relative_url }}">{{ site.title | escape }}</a>
<a class="site-title" rel="author" href="{{ "/" | relative_url }}"><img src="/assets/imgs/FlashInfer-white-background.png" alt="FlashInfer" width="120"></a>

{%- if page_paths -%}
<nav class="site-nav">
Expand All @@ -30,4 +30,4 @@
</nav>
{%- endif -%}
</div>
</header>
</header>
14 changes: 13 additions & 1 deletion _posts/2024-01-03-introduce-flashinfer.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,5 +242,17 @@ This blog post is written by [Zihao Ye](https://homes.cs.washington.edu/~zhye/).

We also thank Masahiro Masuda (OctoAI), Yixin Dong (UW & SJTU), Roy Lu (UW), Chien-Yu Lin (UW), Ying Sheng (Stanford & LMSys) and Lianmin Zheng (Berkeley & LMSys) for their valuable feedbacks and discussions.

## Citation

```bibtex
@misc{flashinfer,
title = {Accelerating Self-Attentions for LLM Serving with FlashInfer},
url = {https://flashinfer.ai/2024/02/02/introduce-flashinfer.html},
author = {Ye, Zihao and Chen, Lequn and Lai, Ruihang and Zhao, Yilong and Zheng, Size and Shao, Junru and Hou, Bohan and Jin, Hongyi and Zuo, Yifei and Yin, Liangsheng and Chen, Tianqi and Ceze, Luis},
month = {February},
year = {2024}
}
```

## Footnotes
[^1]: [Dissecting Batching Effects in GPT Inference](https://le.qun.ch/en/blog/2023/05/13/transformer-batching/) by Lequn Chen
[^1]: [Dissecting Batching Effects in GPT Inference](https://le.qun.ch/en/blog/2023/05/13/transformer-batching/) by Lequn Chen
13 changes: 13 additions & 0 deletions _posts/2024-01-08-cascade-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,20 @@ The idea of Cascade Inference can be generalized to multiple levels (we only sho

Recently, [SGLang](https://arxiv.org/abs/2312.07104) (a domain-specific language for programming LLMs) proposes RadixAttention, where the KV-Cache is organized as a radix tree structure and the attention can be further accelerated with multiple-level Cascade Inference. We are collaborating with SGLang team to get this feature landed.

## Citation

```bibtex
@misc{cascade-inference,
title = {Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding},
url = {https://flashinfer.ai/2024/02/02/cascade-inference.html},
author = {Ye, Zihao and Lai, Ruihang and Lu, Bo-Ru and Lin, Chien-Yu and Zheng, Size and Chen, Lequn and Chen, Tianqi and Ceze, Luis},
month = {February},
year = {2024}
}
```

## Footnotes & References

[^1]: thread block: the programming abstraction that represents a group of cooperative threads, one SM can execute multiple thread blocks and one thread block cannot span multiple SMs.
[^2]: [Hopper architecture](https://resources.nvidia.com/en-us-tensor-core) introduces a new abstraction called Thread Block Clusters which enables a thread block to access shared memory of other thread blocks within the same SM. Hopper also supports direct SM-to-SM communication without accessing global memory (a.k.a. Distributed Shared Memory), which can greatly accelerate cross-SM communication. However, these features are not available in pre-Hopper architectures such as A100 GPUs.

0 comments on commit 8e7a2f6

Please sign in to comment.