add bibtex & replace text with logo

flashinfer-ai · Feb 25, 2024 · 8e7a2f6 · 8e7a2f6
1 parent b558807
commit 8e7a2f6
Show file tree

Hide file tree

Showing 4 changed files with 30 additions and 16 deletions.
diff --git a/_includes/footer.html b/_includes/footer.html
@@ -3,22 +3,11 @@
 
   <div class="wrapper">
 
-    <h2 class="footer-heading">{{ site.title | escape }}</h2>
+    <h2 class="footer-heading"><img src="/assets/imgs/FlashInfer-white-background.png" alt="FlashInfer" width="120"></h2>
 
     <div class="footer-col-wrapper">
       <div class="footer-col footer-col-1">
-        <ul class="contact-list">
-          <li class="p-name">
-            {%- if site.author -%}
-              {{ site.author | escape }}
-            {%- else -%}
-              {{ site.title | escape }}
-            {%- endif -%}
-            </li>
-            {%- if site.email -%}
-            <li><a class="u-email" href="mailto:{{ site.email }}">{{ site.email }}</a></li>
-            {%- endif -%}
-        </ul>
+        <p> Copyright © 2023-2024, FlashInfer team</p>
       </div>
 
       <div class="footer-col footer-col-2">

diff --git a/_includes/header.html b/_includes/header.html
@@ -3,7 +3,7 @@
   <div class="wrapper">
     {%- assign default_paths = site.pages | map: "path" -%}
     {%- assign page_paths = site.header_pages | default: default_paths -%}
-    <a class="site-title" rel="author" href="{{ "/" | relative_url }}">{{ site.title | escape }}</a>
+    <a class="site-title" rel="author" href="{{ "/" | relative_url }}"><img src="/assets/imgs/FlashInfer-white-background.png" alt="FlashInfer" width="120"></a>
 
     {%- if page_paths -%}
       <nav class="site-nav">
@@ -30,4 +30,4 @@
       </nav>
     {%- endif -%}
   </div>
-</header>
+</header>
diff --git a/_posts/2024-01-03-introduce-flashinfer.md b/_posts/2024-01-03-introduce-flashinfer.md
@@ -242,5 +242,17 @@ This blog post is written by [Zihao Ye](https://homes.cs.washington.edu/~zhye/).
 
 We also thank Masahiro Masuda (OctoAI), Yixin Dong (UW & SJTU), Roy Lu (UW), Chien-Yu Lin (UW), Ying Sheng (Stanford & LMSys) and Lianmin Zheng (Berkeley & LMSys) for their valuable feedbacks and discussions.
 
+## Citation
+
+```bibtex
+@misc{flashinfer,
+    title = {Accelerating Self-Attentions for LLM Serving with FlashInfer},
+    url = {https://flashinfer.ai/2024/02/02/introduce-flashinfer.html},
+    author = {Ye, Zihao and Chen, Lequn and Lai, Ruihang and Zhao, Yilong and Zheng, Size and Shao, Junru and Hou, Bohan and Jin, Hongyi and Zuo, Yifei and Yin, Liangsheng and Chen, Tianqi and Ceze, Luis},
+    month = {February},
+    year = {2024}
+}
+```
+
 ## Footnotes
-[^1]: [Dissecting Batching Effects in GPT Inference](https://le.qun.ch/en/blog/2023/05/13/transformer-batching/) by Lequn Chen
+[^1]: [Dissecting Batching Effects in GPT Inference](https://le.qun.ch/en/blog/2023/05/13/transformer-batching/) by Lequn Chen
diff --git a/_posts/2024-01-08-cascade-inference.md b/_posts/2024-01-08-cascade-inference.md
@@ -114,7 +114,20 @@ The idea of Cascade Inference can be generalized to multiple levels (we only sho
 
 Recently, [SGLang](https://arxiv.org/abs/2312.07104) (a domain-specific language for programming LLMs) proposes RadixAttention, where the KV-Cache is organized as a radix tree structure and the attention can be further accelerated with multiple-level Cascade Inference. We are collaborating with SGLang team to get this feature landed.
 
+## Citation
+
+```bibtex
+@misc{cascade-inference,
+    title = {Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding},
+    url = {https://flashinfer.ai/2024/02/02/cascade-inference.html},
+    author = {Ye, Zihao and Lai, Ruihang and Lu, Bo-Ru and Lin, Chien-Yu and Zheng, Size and Chen, Lequn and Chen, Tianqi and Ceze, Luis},
+    month = {February},
+    year = {2024}
+}
+```
+
 ## Footnotes & References
 
 [^1]: thread block: the programming abstraction that represents a group of cooperative threads, one SM can execute multiple thread blocks and one thread block cannot span multiple SMs.
 [^2]: [Hopper architecture](https://resources.nvidia.com/en-us-tensor-core) introduces a new abstraction called Thread Block Clusters which enables a thread block to access shared memory of other thread blocks within the same SM. Hopper also supports direct SM-to-SM communication without accessing global memory (a.k.a. Distributed Shared Memory), which can greatly accelerate cross-SM communication. However, these features are not available in pre-Hopper architectures such as A100 GPUs.
+