Using python with selenium to scrape dynamic web pages

0
0

On the site, there are a couple of links at the top labeled 1, 2, 3, and next. If a link labeled by a number is pressed, it dynamically loads in some data into a content div. If next is pressed, it goes to a page with labels 4, 5, 6, next and the data for page 4 is shown.

I want to scrape the data from the content div for all links pressed (I don’t know how many there are, it just shows 3 at a time and next)

Please give an example of how to do it. For instance, consider the site http://www.cnet.com.

Please guide me to download the series of pages using selenium and parse them to handle with beautiful soup on my own.

  • You must to post comments
0
0

General layout (not tested):

#!/usr/bin/env python
from contextlib import closing
from selenium.webdriver import Firefox # pip install selenium
url
= "http://example.com"
# use firefox to get page with javascript generated content
with closing(Firefox()) as browser:
n
= 1
while n < 10:
browser
.get(url) # load page
link
= browser.find_element_by_link_text(str(n))
while link:
browser
.get(link.get_attribute("href")) # get individual 1,2,3,4 pages
#### save(browser.page_source)
browser
.back() # return to page that has 1,2,3,next -like links
n
+= 1
link
= browser.find_element_by_link_text(str(n))
link
= browser.find_element_by_link_text("next")
if not link: break
url
= link.get_attribute("href")
  • You must to post comments
Showing 1 result
Your Answer
Post as a guest by filling out the fields below or if you already have an account.
Name*
E-mail*
Website