-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #63 from maciejchrabaszcz/master
Added presentation
- Loading branch information
Showing
3 changed files
with
6 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Representation Engineering | ||
|
||
AI systems pose many challenges and risks, such as transparency, safety, ethics, and fairness. How can we understand and control the inner workings of AI systems, especially large language models (LLMs) that have remarkable capabilities across various domains? In this seminar, we will explore the emerging area of representation engineering (RepE), a top-down approach to AI transparency that places representations at the center of analysis. We will learn about the methods and applications of RepE, such as reading and controlling representations of concepts and functions in LLMs, and how RepE can address various safety-relevant problems, such as honesty, hallucination, utility, power-seeking, emotion, harmlessness, bias, and more. We will also discuss the limitations and future directions of RepE and how it can contribute to developing more transparent and trustworthy AI systems. | ||
|
||
Paper: https://arxiv.org/abs/2310.01405 |
Binary file added
BIN
+2.39 MB
2023/2023_11_06_Representation_Engineering/Representation Engineering.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters