-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Feat/contact details #50
Conversation
Great stuff, I'm curious to see how it will work. Surely the LLM will have some advantage over regexps, no? |
I don't think so, all social handles have pretty clear regexes. Emails also should be fine. Phone numbers can be tricky but I don't expect GPT to do a better job than regex, it will also halucinate more. GPT is for things that require an understanding of a broader context. Let's continue in the Issue |
Yeah, I realised a bit later that this offers no real advantage over regexes. I guess we could set up some metamorph chain between contact-details scraper and gpt so that gpt gets page's text, regex-extracted contacts and just matches them |
I would try the name + contacts grouping. That was the student project. The tricky part was actually finding enough testing websites because it is not as common. |
Yes, for extracting email, URLs, etc. it has no benefit, but the whole point of using LLMs was to:
|
@foxt451 Creating a draft PR for better review