How would I make this web scraper for a Facebook page?

Discussion in 'Programming General' started by kmjt, Apr 1, 2015.

How would I make this web scraper for a Facebook page?
  1. Unread #1 - Apr 1, 2015 at 8:44 AM
  2. kmjt
    Joined:
    Aug 21, 2009
    Posts:
    14,450
    Referrals:
    8
    Sythe Gold:
    449

    kmjt -.- The nocturnal life chose me -.-
    Banned

    How would I make this web scraper for a Facebook page?

    I think this would be pretty straight forward. I want to write a scraper for Facebook pages (a page can be accessed even when you are not logged in). Basically it will iterate through every post in the page's lifespan and gather user's who have liked a post. The first problem I see is pagination. When a page is scrolled to reveal more posts the URL does not change. How can I get around this? Not really sure where to start.
     
  3. Unread #2 - Apr 2, 2015 at 12:26 AM
  4. Covey
    Joined:
    Sep 9, 2005
    Posts:
    4,510
    Referrals:
    9
    Sythe Gold:
    9
    Discord Unique ID:
    807246764155338833
    Discord Username:
    Covey#1816

    Covey Creator of EliteSwitch
    Retired Sectional Moderator Visual Basic Programmers

    How would I make this web scraper for a Facebook page?

    I haven't programmed for a long time, but follow these rough steps and it should get the info you're after....but you'll have to do a lot of parsing.

    1) Load the facebook page you're scraping
    2) Retrieve the raw html from the browser and use some parsing functions to scrap what you need.
    3) Force the scroll bar to the bottom so it loads the next section
    4) Repeat step 2 & 3 (you'll probably have to make a function to detect when the page has finished loading so you know when to pull the raw html)

    Depending what language you're using the browser should have some form of function within itself that will tell you when it's finished loading.
     
  5. Unread #3 - Apr 2, 2015 at 2:12 AM
  6. kmjt
    Joined:
    Aug 21, 2009
    Posts:
    14,450
    Referrals:
    8
    Sythe Gold:
    449

    kmjt -.- The nocturnal life chose me -.-
    Banned

    How would I make this web scraper for a Facebook page?



    Thanks man! I would like to do this in java. So for 3 I can probably just use the Robot class. I am not really sure how to detect that new section of page reloaded though. I could probably detect screen colour but i'm sure there is a much more efficient way... Anyone have an idea what the java "browser function" is?
     
  7. Unread #4 - Apr 2, 2015 at 4:11 PM
  8. 70i
    Joined:
    Jan 11, 2014
    Posts:
    462
    Referrals:
    0
    Sythe Gold:
    174

    70i Forum Addict
    Banned

    How would I make this web scraper for a Facebook page?

    Never done something like this before, but couldn't you just keep selecting the "See More" link.
     
  9. Unread #5 - Apr 2, 2015 at 5:14 PM
  10. CompileTime
    Joined:
    Apr 16, 2014
    Posts:
    451
    Referrals:
    0
    Sythe Gold:
    3

    CompileTime Professional desktop/web application developer.
    Banned

    How would I make this web scraper for a Facebook page?

    That would be really inefficent. You can load webpages without actually loading up a browser like chrome or Mozilla. Just make some HTTP requests, parse the results and look through it, of course I'm oversimplifying here, but using the robot class just seems silly.
     
  11. Unread #6 - Apr 2, 2015 at 6:31 PM
  12. Virtual
    Joined:
    Jan 25, 2013
    Posts:
    1,250
    Referrals:
    1
    Sythe Gold:
    226
    Two Factor Authentication User Halloween 2015 Easter 2015 Sythe's 10th Anniversary

    Virtual Guru
    $25 USD Donor New

    How would I make this web scraper for a Facebook page?

    Could you post/pm me the actual link, I might be able to help
     
  13. Unread #7 - Apr 2, 2015 at 8:13 PM
  14. kmjt
    Joined:
    Aug 21, 2009
    Posts:
    14,450
    Referrals:
    8
    Sythe Gold:
    449

    kmjt -.- The nocturnal life chose me -.-
    Banned

    How would I make this web scraper for a Facebook page?

    Where on the page is that? I don't see it :p




    That is my main problem. I don't understand how the pagination works. I don't see how the parameters in url change when a new section is loaded (if you just scroll down, there are actually no parameters and the url is the same). I wish it was just as simple as https://www.facebook.com/PageName?post=1 lol




    PMing you a link now. Although I want my scraper to be generic.
     
  15. Unread #8 - Apr 2, 2015 at 8:52 PM
  16. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    How would I make this web scraper for a Facebook page?

    This is going to be a lot harder than you think. Facebook's HTML is going to be essentially entirely generated on the client side with Javascript. You will need to execute the Javascript in order to get the DOM that you need to parse. That means the only realistic way of doing this is within the browser. The paging works by detecting where the browsers scroll bar is with Javascript and making a request back to the server for that information so you will need to use Javascript to move the scrollbar yourself.
     
  17. Unread #9 - Apr 2, 2015 at 9:02 PM
  18. kmjt
    Joined:
    Aug 21, 2009
    Posts:
    14,450
    Referrals:
    8
    Sythe Gold:
    449

    kmjt -.- The nocturnal life chose me -.-
    Banned

    How would I make this web scraper for a Facebook page?

    Hmm. Can't I just do the scrolling with something like Robot class from java? It has a scroll method, although i'm not sure how reliable it is. Haven't really played around with it much but I can also just add colour detection on the Facebook page to make sure it scrolled to correct position? Thanks for input.
     
  19. Unread #10 - Apr 2, 2015 at 11:09 PM
  20. SuF
    Joined:
    Jan 21, 2007
    Posts:
    14,212
    Referrals:
    28
    Sythe Gold:
    1,234
    Discord Unique ID:
    203283096668340224
    <3 n4n0 Two Factor Authentication User Community Participant Spam Forum Participant Sythe's 10th Anniversary

    SuF Legend
    Pirate Retired Global Moderator

    How would I make this web scraper for a Facebook page?

    So you are going to have Facebook open and use the mouse to move stuff? That seems like a lot of effort.
     
  21. Unread #11 - Apr 3, 2015 at 1:54 AM
  22. kmjt
    Joined:
    Aug 21, 2009
    Posts:
    14,450
    Referrals:
    8
    Sythe Gold:
    449

    kmjt -.- The nocturnal life chose me -.-
    Banned

    How would I make this web scraper for a Facebook page?

    I think it would be pretty straight forward. The Robot class has a pretty decent scrolling method so I can just scroll until I reach the bottom of the page (where it needs to reload posts), and then in turn the loading process happens, then I parse? I will put a video together tonight to try to demonstrate what I mean.

    To be honest it would probably just be better if I learned javascript eh lol.
     
  23. Unread #12 - Apr 3, 2015 at 6:16 AM
  24. Virtual
    Joined:
    Jan 25, 2013
    Posts:
    1,250
    Referrals:
    1
    Sythe Gold:
    226
    Two Factor Authentication User Halloween 2015 Easter 2015 Sythe's 10th Anniversary

    Virtual Guru
    $25 USD Donor New

    How would I make this web scraper for a Facebook page?

    After analyzing the page you sent me I found out few things.


    https://www.facebook.com/ajax/pagelet/generic.php/PagePostsSectionPagelet?data={"segment_index":**,"page_index":0,"page":***,"column":"main","post_section":{"profile_id":****,"start":1420099200,"end":1451635199,"query_type":8,"filter":1,"filter_after_timestamp":1427470882},"section_index":2,"hidden":false,"posts_loaded":0,"show_all_posts":false}&__user=0&__a=1&__dyn=*****&__req=5&__rev=1673637

    ** = the segment index (0 - When you load it, the first time you scroll down 1, 2 is the 2nd time and so on)
    *** = The page id (you should be able to find that)
    **** = The user id (if you're viewing it as a guest, the user id = the page id)
    ***** = The page hashed id or so (I found out that it doesn't change per page, so that's my assumptions)

    P.s I'll PM you an example for your page
     
  25. Unread #13 - Apr 3, 2015 at 12:13 PM
  26. 70i
    Joined:
    Jan 11, 2014
    Posts:
    462
    Referrals:
    0
    Sythe Gold:
    174

    70i Forum Addict
    Banned

    How would I make this web scraper for a Facebook page?

< Getting Runescape java applet through html file? | How Do You Make Visual Basic Open Another Program >

Users viewing this thread
1 guest


 
 
Adblock breaks this site