From 3b439967f46293510f323d19e05c33d5a8016360 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20Gallou=C3=A9dec?= <45557362+qgallouedec@users.noreply.github.com> Date: Mon, 4 Nov 2024 16:00:27 +0100 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=B0=20Update=20blog=20posts=20in=20doc?= =?UTF-8?q?umentation=20(#2319)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Bump dev version to `0.13.0.dev0` * Update version number to 0.12 in CITATION.cff * Add publication date to blog post * 🧽 Fix judge documentation (#2318) * Update judge examples and documentation * without ':' * Clean doc * Fix typo in example code * Add space after Attributes * Update attribute name in judges.py * Add installation instructions for llm-blender library * Update PairRMJudge attributes documentation * Fix return type in PairRMJudge * Revert "🧽 Fix judge documentation (#2318)" This reverts commit 337005d95169371935fb87f1c559c7412f8472a4. * Update blog post publication dates * revert to p5 * Update image URLs in index.mdx * Sort and uniform thumbnail * Update image alignment in index.mdx --- docs/source/index.mdx | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/docs/source/index.mdx b/docs/source/index.mdx index b1de84afb1..bdddc9b6f2 100644 --- a/docs/source/index.mdx +++ b/docs/source/index.mdx @@ -38,28 +38,39 @@ Check the appropriate sections of the documentation depending on your needs:
- thumbnail + thumbnail +

Published on July 10, 2024

Preference Optimization for Vision Language Models with TRL

- - thumbnail -

Illustrating Reinforcement Learning from Human Feedback

+
+ thumbnail +

Published on June 12, 2024

+

Putting RL back in RLHF

- - thumbnail -

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

+
+ thumbnail +

Published on September 29, 2023

+

Finetune Stable Diffusion Models with DDPO via TRL

+
+ + thumbnail +

Published on August 8, 2023

+

Fine-tune Llama 2 with DPO

- thumbnail + thumbnail +

Published on April 5, 2023

StackLLaMA: A hands-on guide to train LLaMA with RLHF

- - thumbnail -

Fine-tune Llama 2 with DPO

+
+ thumbnail +

Published on March 9, 2023

+

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

- - thumbnail -

Finetune Stable Diffusion Models with DDPO via TRL

+
+ thumbnail +

Published on December 9, 2022

+

Illustrating Reinforcement Learning from Human Feedback