Skip to content

Crawling id, user info, content, date, comments and replies of posts of a Facebook page

Notifications You must be signed in to change notification settings

gonzalo-gongora/facebook-crawling

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Facebook crawling with Python

Demo: https://www.youtube.com/watch?v=Fx0UWOzYsig

Features:

  • Get information of posts
  • Filter comments
  • Not required sign in

Data Fields:

[
    {
        "url": "",
        "id": "",
        "utime": "",
        "text": "",
        "total_shares": "",
        "total_cmts": "",
        "reactions": ["reactions displayed below post content"],
        "crawled_cmts": [
            {
                "id": "",
                "utime": "",
                "user_url": "",
                "user_id": "",
                "user_name": "",
                "text": "",
                "replies": [
                    { "id": "", "utime": "", "user_id": "", "user_name": "",  "text": "" },
                    { "id": "", "utime": "", "user_id": "", "user_name": "",  "text": "" },
                    { "id": "", "utime": "", "user_id": "", "user_name": "",  "text": "" },
                ]
            },
        ]
    },
]

Usage:

  1. Install Helium: pip install helium
  2. Customize the crawler.py file:
    • PAGE_URL: url of Facebook page
    • SCROLL_DOWN: number of scroll times for loading more posts
    • FILTER_CMTS_BY: show comments by MOST_RELEVANT / NEWEST / ALL_COMMENTS
    • VIEW_MORE_CMTS: number of times for loading more comments
    • VIEW_MORE_REPLIES: number of times for loading more replies
  3. Start crawling:
    • Sign out Facebook (cause some CSS Selectors will be different as sign in)
    • Run python crawler.py

Reference: https://github.com/mherrmann/selenium-python-helium

About

Crawling id, user info, content, date, comments and replies of posts of a Facebook page

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%