From 3b439967f46293510f323d19e05c33d5a8016360 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Mon, 4 Nov 2024 16:00:27 +0100 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=B0=20Update=20blog=20posts=20in=20doc?= =?UTF-8?q?umentation=20(#2319)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Bump dev version to `0.13.0.dev0` * Update version number to 0.12 in CITATION.cff * Add publication date to blog post * 🧽 Fix judge documentation (#2318) * Update judge examples and documentation * without ':' * Clean doc * Fix typo in example code * Add space after Attributes * Update attribute name in judges.py * Add installation instructions for llm-blender library * Update PairRMJudge attributes documentation * Fix return type in PairRMJudge * Revert "🧽 Fix judge documentation (#2318)" This reverts commit 337005d95169371935fb87f1c559c7412f8472a4. * Update blog post publication dates * revert to p5 * Update image URLs in index.mdx * Sort and uniform thumbnail * Update image alignment in index.mdx --- docs/source/index.mdx | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/docs/source/index.mdx b/docs/source/index.mdx index b1de84afb1..bdddc9b6f2 100644 --- a/docs/source/index.mdx +++ b/docs/source/index.mdx @@ -38,28 +38,39 @@ Check the appropriate sections of the documentation depending on your needs:
Published on July 10, 2024
Preference Optimization for Vision Language Models with TRL
- - -Illustrating Reinforcement Learning from Human Feedback
+ + +Published on June 12, 2024
+Putting RL back in RLHF
- - -Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
+ + +Published on September 29, 2023
+Finetune Stable Diffusion Models with DDPO via TRL
+ + + +Published on August 8, 2023
+Fine-tune Llama 2 with DPO
- + +Published on April 5, 2023
StackLLaMA: A hands-on guide to train LLaMA with RLHF
- - -Fine-tune Llama 2 with DPO
+ + +Published on March 9, 2023
+Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
- - -Finetune Stable Diffusion Models with DDPO via TRL
+ + +Published on December 9, 2022
+Illustrating Reinforcement Learning from Human Feedback