telegram_digest

Some Telegram chats are way too active for me to keep up with. Let's pull the last N days from these chats, summarize and send a digest, instead

How to

Set your keys for anything that does not have a default value in AppConfig (config.py):

TELEGRAM_BOT_TOKEN: str
TELEGRAM_API_HASH: str
TELEGRAM_API_ID: str
TELEGRAM_SESSION_STRING: str
POE_PB_TOKEN: str
POE_CHAT_CODE: str

These can be placed in any of the following places:

as environment variables (eg export), including as secrets that then get exposed as environment variables
in a conf.env file

then do:

$ pip install -r requirements.txt
$ python telegram_digest/main.py

v1

V1 can take arbitrary-length input and uses a refine-summary strategy to summarize.

Telegram setup: use individual credentials (not a bot), so we can get the full history
llm: leverage Poe (so we can try different llms quickly)
summarization: implemented a refine strategy
1. splits input into batches, each having at most max_token tokens
2. iteratively generates a summary (refine-style)
config loading: use pydantic_settings.BaseSettings to import either from environment variables (eg github secrets) or from file

TODO

summary quality. 2 issues
1. no metric to measure quality of one summary vs another
2. no stability: even the same input against the same poe bot will give different summaries
experiment with different bots
experiment with different thread representations
1. add "reply to.." to identify replies
2. represent the reply-chains in a more structured form (eg all replies in the same chain are collected and represented together, instead of interleaved in the main thread)
interactive: host the bot on heroku / fly.io, so I can interact with it via Telegram

Code walkthough

main.py is the entry point.
telegram_bot.py handles creating of a Telegram client (TelegramBotBuilder), pulling history and sending messages (TelegramBot) and message-data munging (TelegramMessagesParsing)
llm.py handles interfacing with Poe (sending messages, defining prompts) and has helpers for splitting the text into batches that fit into the context (TextBatcher)

Lessons learned

Telegram interface
1. telethon is what you want to use
2. You can interface as your own user or as a bot.
  1. my account --> bot: I thought I wanted to do as myself, then I discovered the bots, which have a simpler api
  2. bot --> myself: then I discovered bots can only see the conversation once they are added to a thread, and even then they can see only the messages sent after they were added
  3. [?] myself --> bot: having a bot is nice because you can interact with it (eg passing different ocnfig arguments) and is more clear who is doing what, see
    1. https://medium.com/hyperskill/telegram-conversation-summarizer-bot-with-chatgpt-and-flask-quart-bb2e19884c
    2. https://github.com/yellalena/telegram-gpt-summarizer/blob/92ee101ba3b2633560e65049e8e14d4851a88bc1/main.py#L28
Summarization
1. strategies: langchain details 2 summarization strategies (stuff-it-all in the prompt, map-reduce or refine).
2. metrics: it's unclear how to measure quality: if you have a reference summary you can measure similarity to the reference, but if you don't have a reference metrics might not be very reliable: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00417/107833/A-Statistical-Analysis-of-Summarization-Evaluation
pydantic_settings.BaseSettings is very useful for loading config from environment variables and files.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
data_assets		data_assets
telegram_digest		telegram_digest
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sweep.yaml		sweep.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

telegram_digest

How to

v1

TODO

Code walkthough

Lessons learned

About

Releases

Packages

Contributors 2

Languages

License

lpietrobon/telegram_digest

Folders and files

Latest commit

History

Repository files navigation

telegram_digest

How to

v1

TODO

Code walkthough

Lessons learned

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages