Merge pull request #63 from maciejchrabaszcz/master

Added presentation
MI2DataLab · Dec 6, 2023 · 2fe11d8 · 2fe11d8
2 parents 8760ad4 + 5130c5e
commit 2fe11d8
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 1 deletion.
diff --git a/2023/2023_11_06_Representation_Engineering/README.md b/2023/2023_11_06_Representation_Engineering/README.md
@@ -0,0 +1,5 @@
+# Representation Engineering
+
+AI systems pose many challenges and risks, such as transparency, safety, ethics, and fairness. How can we understand and control the inner workings of AI systems, especially large language models (LLMs) that have remarkable capabilities across various domains? In this seminar, we will explore the emerging area of representation engineering (RepE), a top-down approach to AI transparency that places representations at the center of analysis. We will learn about the methods and applications of RepE, such as reading and controlling representations of concepts and functions in LLMs, and how RepE can address various safety-relevant problems, such as honesty, hallucination, utility, power-seeking, emotion, harmlessness, bias, and more. We will also discuss the limitations and future directions of RepE and how it can contribute to developing more transparent and trustworthy AI systems.
+
+Paper: https://arxiv.org/abs/2310.01405
diff --git a/2023/2023_11_06_Representation_Engineering/Representation Engineering.pdf b/2023/2023_11_06_Representation_Engineering/Representation Engineering.pdf
diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@ Join us at https://meet.drwhy.ai.
 * 16.10.2023 - [On Minimizing the Impact of Dataset Shifts on Actionable Explanations](https://github.com/MI2DataLab/MI2DataLab_Seminarium/blob/master/2023/2023_10_16_impact_of_dataset_shifts_on_actionable_eplanations.txt) - Hubert Baniecki
 * 23.10.2023 - [On the Robustness of Removal-Based Feature Attributions](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_10_23_removal_based_attributions_robustness) - Mateusz Krzyziński
 * 30.10.2023 - Discussion - AdvXAI: Robustness of explanations
-* 06.11.2023 - Introduction to RedTeaming - Maciej Chrabąszcz
+* 06.11.2023 - [Representation Engineering](https://github.com/maciejchrabaszcz/MI2DataLab_Seminarium/tree/master/2023/2023_11_06_Representation_Engineering) - Maciej Chrabąszcz
 * 13.11.2023 - [Adaptive Testing of Computer Vision Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_11_13_Adaptive_Testing_of_Computer_Vision_Models)	 - Mikołaj Spytek
 * 20.11.2023 - [Red Teaming Language Models with Language Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_11_20_Red_Teaming_Language_Models_with_Language_Models) - Piotr Wilczyński
 * 27.11.2023 - [Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_11_27_Red_Teaming_Language_Models_to_Reduce_Harms) - Vladimir Zaigrajew