Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
guokan-shang committed Jan 24, 2025
1 parent a310713 commit d815967
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 15 deletions.
8 changes: 4 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,10 @@ <h2>Program Highlights & Poster Registration</h2>
<h2>Organizers</h2>
<p>
<b>Organizing committee:</b><br>
<a href="https://mbzuai.ac.ae/study/faculty/professor-eric-moulines/">Eric Moulines</a>, Ecole Polytechnique & MBZUAI<br>
Guokan Shang, MBZUAI France Lab<br>
<a href="https://mbzuai.ac.ae/study/faculty/michalis-vazirgiannis/">Michalis Vazirgiannis</a>, Ecole Polytechnique & MBZUAI<br>
<a href="https://mbzuai.ac.ae/study/faculty/kun-zhang/">Kun Zhang</a>, MBZUAI
<a target="_blank" href="https://mbzuai.ac.ae/study/faculty/professor-eric-moulines/">Eric Moulines</a>, Ecole Polytechnique & MBZUAI<br>
<a target="_blank" href="https://www.linkedin.com/in/guokan-shang">Guokan Shang</a>, MBZUAI France Lab<br>
<a target="_blank" href="https://mbzuai.ac.ae/study/faculty/michalis-vazirgiannis/">Michalis Vazirgiannis</a>, Ecole Polytechnique & MBZUAI<br>
<a target="_blank" href="https://mbzuai.ac.ae/study/faculty/kun-zhang/">Kun Zhang</a>, MBZUAI
</p>
<p>
<b>Logistics support:</b><br>
Expand Down
16 changes: 8 additions & 8 deletions program-day-1/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ <h2>Program on Wednesday, February 12</h2>
<table>
<tr>
<td class="date" rowspan="2">
9:00am
09:00am
</td>
<td class="title-special">
Registration and Coffee &amp; Tea!
Expand Down Expand Up @@ -173,7 +173,7 @@ <h2>Program on Wednesday, February 12</h2>
<table id="PUTSPEAKERNAMEHERE">
<tr>
<td class="date" rowspan="3">
12:20am
12:20pm
</td>
<td class="title">
Exploiting Knowledge for Model-based Deep Music Generation
Expand All @@ -194,7 +194,7 @@ <h2>Program on Wednesday, February 12</h2>
<table>
<tr>
<td class="date" rowspan="2">
12:20pm
13:00pm
</td>
<td class="title-special">
Lunch
Expand All @@ -212,7 +212,7 @@ <h2>Program on Wednesday, February 12</h2>
<table id="PUTSPEAKERNAMEHERE">
<tr>
<td class="date" rowspan="3">
14:00am
14:00pm
</td>
<td class="title">
TBD
Expand All @@ -237,7 +237,7 @@ <h2>Program on Wednesday, February 12</h2>
14:40pm
</td>
<td class="title">
TBD
Intricacies of Game-theoretical LLM Alignment
</td>
</tr>
<tr>
Expand All @@ -247,7 +247,7 @@ <h2>Program on Wednesday, February 12</h2>
</tr>
<tr>
<td class="abstract">
TBD
Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD. @This equivalence may seem surprising at first sight, since IPO is an offline method whereas Nash-MD is an online method using a preference model. However, this equivalence can be proven when we consider the online version of IPO, that is when both generations are sampled by the online policy and annotated by a trained preference model. Optimising the IPO loss with such a stream of data becomes equivalent to finding the Nash equilibrium of the preference model through self-play. Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm. We compare online-IPO and IPO-MD to different online versions of existing losses on preference data such as DPO and SLiC on a summarisation task.
</td>
</tr>
</table>
Expand All @@ -257,7 +257,7 @@ <h2>Program on Wednesday, February 12</h2>
<table>
<tr>
<td class="date" rowspan="2">
14:50pm
15:20pm
</td>
<td class="title-special">
Coffee &amp; Tea Break
Expand Down Expand Up @@ -339,7 +339,7 @@ <h2>Program on Wednesday, February 12</h2>
18:00pm
</td>
<td class="title">
Poster Session at MBZUAI France Lab
Poster Session with Buffet at MBZUAI France Lab
</td>
</tr>
<tr>
Expand Down
6 changes: 3 additions & 3 deletions program-day-2/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ <h2>Program on Thursday, February 13</h2>
<table>
<tr>
<td class="date" rowspan="2">
9:00am
09:00am
</td>
<td class="title-special">
Registration and Coffee &amp; Tea!
Expand Down Expand Up @@ -173,7 +173,7 @@ <h2>Program on Thursday, February 13</h2>
<table id="PUTSPEAKERNAMEHERE">
<tr>
<td class="date" rowspan="3">
12:20am
12:20pm
</td>
<td class="title">
GFlowNets: A Novel Framework for Diverse Generation in Combinatorial and Continuous Spaces
Expand All @@ -194,7 +194,7 @@ <h2>Program on Thursday, February 13</h2>
<table>
<tr>
<td class="date" rowspan="2">
13:10pm
13:00pm
</td>
<td class="title-special">
Lunch
Expand Down

0 comments on commit d815967

Please sign in to comment.