Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Hanyuezhuohua committed Dec 7, 2024
1 parent f585a3f commit cdc6db5
Showing 1 changed file with 28 additions and 28 deletions.
56 changes: 28 additions & 28 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -292,15 +292,15 @@ <h2 class="title is-3">Observation</h2>
</p>
</div>
<table style="border-collapse: collapse; width: 100%; text-align: center;">
<tr>
<tr style="font-weight: bold;">
<th style="padding: 8px; text-align: left; border-bottom: 2px solid black; border-top: 2px solid black;" rowspan="2">Task</th>
<th style="padding: 8px; border-bottom: 1px solid black; border-top: 2px solid black;">S²FT-R</th>
<th style="padding: 8px; border-bottom: 1px solid black; border-top: 2px solid black;" colspan="2">S²FT-W</th>
<th style="padding: 8px; border-bottom: 1px solid black; border-top: 2px solid black;" colspan="2">S²FT-A</th>
<th style="padding: 8px; border-bottom: 1px solid black; border-top: 2px solid black;" colspan="2">S²FT-S</th>
<th style="padding: 8px; border-bottom: 1px solid black; border-top: 2px solid black;" colspan="2">S²FT-G</th>
</tr>
<tr>
<tr style="font-weight: bold;">
<th style="padding: 8px; border-bottom: 2px solid black;"></th>
<th style="padding: 8px; border-bottom: 2px solid black;">Large</th>
<th style="padding: 8px; border-bottom: 2px solid black;">Small</th>
Expand All @@ -311,7 +311,7 @@ <h2 class="title is-3">Observation</h2>
<th style="padding: 8px; border-bottom: 2px solid black;">Large</th>
<th style="padding: 8px; border-bottom: 2px solid black;">Small</th>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee; width: 120px;">Knowledge</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; min-width: 80px;">86.6</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">85.9<sub style="color: red;">(-0.7)</sub></td>
Expand All @@ -323,7 +323,7 @@ <h2 class="title is-3">Observation</h2>
<td style="padding: 8px; border-bottom: 1px solid #eee;">85.4<sub style="color: red;">(-1.2)</sub></td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">86.2<sub style="color: red;">(-0.4)</sub></td>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 2px solid black; width: 120px;">Arithmetic</td>
<td style="padding: 8px; border-bottom: 2px solid black; min-width: 80px;">79.6</td>
<td style="padding: 8px; border-bottom: 2px solid black;">78.4<sub style="color: red;">(-1.2)</sub></td>
Expand Down Expand Up @@ -370,7 +370,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<tr>
<td colspan="11" style="padding: 12px; text-align: center; border-bottom: 1px solid #eee; font-weight: bold;">Fine-tuning LLaMA-3-8B on Commonsense Reasoning Tasks</td>
</tr>
<tr>
<tr style="font-weight: bold;">
<th style="padding: 8px; text-align: left; border-bottom: 2px solid black; width: 11%;">Method</th>
<th style="padding: 8px; border-bottom: 2px solid black; width: 9%;">#Param</th>
<th style="padding: 8px; border-bottom: 2px solid black; width: 8%;">BoolQ</th>
Expand All @@ -384,7 +384,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<th style="padding: 8px; border-bottom: 2px solid black; width: 8%;">Avg. ↑</th>
</tr>
<!-- First table rows... -->
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee;">Full FT</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">100</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">73.9</td>
Expand All @@ -397,7 +397,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<td style="padding: 8px; border-bottom: 1px solid #eee;">84.0</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">83.6</td>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee;">LoRA</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">0.70</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">70.8</td>
Expand All @@ -410,7 +410,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<td style="padding: 8px; border-bottom: 1px solid #eee;">84.4</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">82.5</td>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee;">DoRA</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">0.71</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">74.6</td>
Expand All @@ -423,7 +423,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<td style="padding: 8px; border-bottom: 1px solid #eee;">85.8</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">85.2</td>
</tr>
<tr style="background-color: #e6f3ff;">
<tr style="background-color: #e6f3ff; font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee; font-weight: bold;">S<sup>2</sup>FT</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">0.70</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">75.0</td>
Expand All @@ -442,7 +442,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<tr>
<td colspan="10" style="padding: 12px; text-align: center; border-bottom: 1px solid #eee; font-weight: bold;">Fine-tuning LLaMA-3-8B on Arithmetic Reasoning Tasks</td>
</tr>
<tr>
<tr style="font-weight: bold;">
<th style="padding: 8px; text-align: left; border-bottom: 2px solid black; width: 11%;">Method</th>
<th style="padding: 8px; border-bottom: 2px solid black; width: 9%;">#Param</th>
<th style="padding: 8px; border-bottom: 2px solid black; width: 16%;">MultiArith</th>
Expand All @@ -454,7 +454,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<th style="padding: 8px; border-bottom: 2px solid black; width: 8%;">MAWPS</th>
<th style="padding: 8px; border-bottom: 2px solid black; width: 8%;">Avg. ↑</th>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee;">Full FT</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">100</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">99.2</td>
Expand All @@ -466,7 +466,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<td style="padding: 8px; border-bottom: 1px solid #eee;">91.2</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">77.7</td>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee;">LoRA</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">0.70</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">99.5</td>
Expand All @@ -478,7 +478,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<td style="padding: 8px; border-bottom: 1px solid #eee;">90.8</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">77.2</td>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee;">DoRA</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">0.71</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">98.8</td>
Expand All @@ -490,17 +490,17 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<td style="padding: 8px; border-bottom: 1px solid #eee;">91.2</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">77.5</td>
</tr>
<tr style="background-color: #e6f3ff;">
<tr style="background-color: #e6f3ff; font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 1px solid #eee; font-weight: bold;">S<sup>2</sup>FT</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">0.70</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">99.7</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">65.8</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">93.7</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">31.5</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">97.8</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">76.0</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">92.4</td>
<td style="padding: 8px; border-bottom: 1px solid #eee;">79.6</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">0.70</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">99.7</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">65.8</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">93.7</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">31.5</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">97.8</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">76.0</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; font-weight: bold;">92.4</td>
<td style="padding: 8px; border-bottom: 1px solid #eee; ">79.6</td>
</tr>
</table>
</div>
Expand Down Expand Up @@ -528,12 +528,12 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<tr>
<td colspan="7" style="padding: 12px; text-align: center; border-bottom: 1px solid #eee; font-weight: bold;">Fusing Commonsense and Arithmetic Adapters for LLaMA-3-8B</td>
</tr>
<tr>
<tr style="font-weight: bold;">
<th style="padding: 8px; text-align: left; border-top: 2px solid black; border-bottom: 2px solid black;" rowspan="2">Task</th>
<th style="padding: 8px; border-top: 2px solid black; border-bottom: 1px solid black;" colspan="3">LoRA</th>
<th style="padding: 8px; border-top: 2px solid black; border-bottom: 1px solid black;" colspan="3">S²FT</th>
</tr>
<tr>
<tr style="font-weight: bold;">
<th style="padding: 8px; border-bottom: 2px solid black;">Adapter 1</th>
<th style="padding: 8px; border-bottom: 2px solid black;">Adapter 2</th>
<th style="padding: 8px; border-bottom: 2px solid black;">Fused</th>
Expand All @@ -543,7 +543,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
</tr>
</thead>
<tbody>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 2px solid black;">Commonsense</td>
<td style="padding: 8px; border-bottom: 2px solid black;">83.1</td>
<td style="padding: 8px; border-bottom: 2px solid black; color: #999;">32.1</td>
Expand All @@ -552,7 +552,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<td style="padding: 8px; border-bottom: 2px solid black; color: #999;">42.3</td>
<td style="padding: 8px; border-bottom: 2px solid black;">84.0<sub style="color: red;">(-2.6)</sub></td>
</tr>
<tr>
<tr style="font-weight: bold;">
<td style="padding: 8px; text-align: left; border-bottom: 2px solid black;">Arithmetic</td>
<td style="padding: 8px; border-bottom: 2px solid black; color: #999;">12.0</td>
<td style="padding: 8px; border-bottom: 2px solid black;">77.2</td>
Expand Down Expand Up @@ -584,7 +584,7 @@ <h2 class="title is-3">S<sup>2</sup>FT and Results</h2>
<h2 class="title is-3">Conclusion and Future Work</h2>
<div class="content has-text-justified">
<p>
This work introduces S<sup>2</sup>FT, a novel PEFT family that simultaneously achieves high quality, efficient training, and scalable serving for LLM fine-tuning. Leveraging S<sup>2</sup>FT, we demonstrate a 10% reduction in training costs while surpassing LoRA in performance. This improvement is particularly impactful for resource savings in large-scale, real-world deployments. Moreover, S<sup>2</sup>FT enhances serving scalability, offering promising potentials for serving thousands of adapters in future applications.
This work introduces S<sup>2</sup>FT, a novel PEFT family that is generalizable, efficient, and scalable. Compared to LoRA,S<sup>2</sup>FT improves the generalization ability on downstream tasks while reduce 10% training time and memory. Furthermore, S<sup>2</sup>FT's enables scalable serving of thousands of adapters simultaneously. These comprehensive improvements in quality, efficiency, and scalability make S<sup>2</sup>FT particularly valuable for the large-scale, real-world deployment of foundation models in various domains.Future research directions include exploring the controllability in S<sup>2</sup>FT, which enablee the separation of domain-specific knowledge into distinct parameters. This capability could significantly advance sparse training and inference techniques for LLMs, particularly in MOE architecture design.
</p>
</div>
</div>
Expand Down

0 comments on commit cdc6db5

Please sign in to comment.