-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with extracting contact details extraction and if good make a miniactor integration #43
Comments
Just a note, if you guys do a miniactor for these sub-usecases, please let's use metamorph and make it open source, so that we can use it for marketing and show others how easily to do this. |
I guess I'll try to look into it a bit |
Sounds good, let's not write any code yet, just play with it. We need to collect some pages for testing. The ideal case for GPT would be to match emails, phones, etc. with names. We had a student project that tried to do this based on HTML element proximity but that's super tricky. |
Ouch :D |
Wow, maybe we'll need to use another model :) |
So, I haven't yet started tinkering with the model settings, but I've set up basic boilerplate, which includes the miniactor code with metamorpth and tests. Tests basically list a set of urls with expected contacts to be found on them and check if they get returned when using model settings, schema and prompt exported from |
I created a PR here https://github.com/apify-projects/store-gpt-scraper/pull/50/files. I don't think this implementation offers any advantage over the current Contact Details Scraper since all of these things can be regexed. The next step that we are missing and GPT could manage (although I don't believe it is powerful enough) is to link the contacts together also with names. This is something we tried as student project based on HTML proximity but we didn't have good generic name regex. I would just play with it a bit and we can release it even if it is not great. |
|
So, I've closed that old PR as it seemed to go in the way wrong direction. And opened another one #51. What do you think @jancurn @metalwarrior665 ? |
Compare if we could improve https://apify.com/vdrmota/contact-info-scraper
Basically, we need to figure out the best prompts and if it will do more than the contact, we can release it as a miniactor
The text was updated successfully, but these errors were encountered: