Skip to content

Commit

Permalink
Merge pull request #63 from maciejchrabaszcz/master
Browse files Browse the repository at this point in the history
Added presentation
  • Loading branch information
sobieskibj authored Dec 6, 2023
2 parents 8760ad4 + 5130c5e commit 2fe11d8
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 1 deletion.
5 changes: 5 additions & 0 deletions 2023/2023_11_06_Representation_Engineering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Representation Engineering

AI systems pose many challenges and risks, such as transparency, safety, ethics, and fairness. How can we understand and control the inner workings of AI systems, especially large language models (LLMs) that have remarkable capabilities across various domains? In this seminar, we will explore the emerging area of representation engineering (RepE), a top-down approach to AI transparency that places representations at the center of analysis. We will learn about the methods and applications of RepE, such as reading and controlling representations of concepts and functions in LLMs, and how RepE can address various safety-relevant problems, such as honesty, hallucination, utility, power-seeking, emotion, harmlessness, bias, and more. We will also discuss the limitations and future directions of RepE and how it can contribute to developing more transparent and trustworthy AI systems.

Paper: https://arxiv.org/abs/2310.01405
Binary file not shown.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Join us at https://meet.drwhy.ai.
* 16.10.2023 - [On Minimizing the Impact of Dataset Shifts on Actionable Explanations](https://github.com/MI2DataLab/MI2DataLab_Seminarium/blob/master/2023/2023_10_16_impact_of_dataset_shifts_on_actionable_eplanations.txt) - Hubert Baniecki
* 23.10.2023 - [On the Robustness of Removal-Based Feature Attributions](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_10_23_removal_based_attributions_robustness) - Mateusz Krzyziński
* 30.10.2023 - Discussion - AdvXAI: Robustness of explanations
* 06.11.2023 - Introduction to RedTeaming - Maciej Chrabąszcz
* 06.11.2023 - [Representation Engineering](https://github.com/maciejchrabaszcz/MI2DataLab_Seminarium/tree/master/2023/2023_11_06_Representation_Engineering) - Maciej Chrabąszcz
* 13.11.2023 - [Adaptive Testing of Computer Vision Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_11_13_Adaptive_Testing_of_Computer_Vision_Models) - Mikołaj Spytek
* 20.11.2023 - [Red Teaming Language Models with Language Models](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_11_20_Red_Teaming_Language_Models_with_Language_Models) - Piotr Wilczyński
* 27.11.2023 - [Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2023/2023_11_27_Red_Teaming_Language_Models_to_Reduce_Harms) - Vladimir Zaigrajew
Expand Down

0 comments on commit 2fe11d8

Please sign in to comment.