This question already has answers here:
Switch to an iframe through Selenium and python
(3 answers)
Ways to deal with #document under iframe
(2 answers)
Closed 2 years ago.
I'm navigating a webpage that is basically fully loaded by an ajax call, and I've included the corresponding wait:
self._wait = WebDriverWait(driver, 15)
The page consists only of some tables and buttons, but I can't find any of them with Selenium. I've tried every find_element() combination but nothing works.
I tried getting the html source from the page:
html = self.driver.page_source
but the only thing I get is:
<html><head></head><frameset cols="*" frameborder="0" framespacing="0" border="0">
<frame name="MAIN" src="main.jsp">
</frameset></html>
though when I inspect it, there's a lot contained inside <ajax:page>
Any ideas?
You need to know something about why and how to switch into iframe.
such as this answer and this url
Related
I feel really stupid: I just read this question How to get html elements with multiple css classes and the answer was very clear and straightforward but when I try to apply it on this HTML
<div class="header group">
I am completely unable to make it work.
Here are some of the variations I have tried
//*[contains (#class, ’header’) and contains (#class, ‘group’)]
//div[contains (#class, ’header’) and contains (#class, ‘group’)]
//div[contains (#class, ’header’)]
What am I missing here? Should be straightforward, shouldn't it?
Testing in Chrome Canary.
Updates
The invalid typographical apostrophes above happened when I copy-pasted into this form. I was using ' in the actual console.
I was playing around on an archived version of the page at the Wayback Machine (WM) and nothing worked. However, when trying on the live version, everything worked as expected (the problem with the live version was that the final element I was "aiming" at currently is missing but will return later on, therefore I used WM). Any ideas why // seems broken on WM? Even if WM adds a few levels of divs, // should be transparent about that, shouldn't it?
Just use this xpath.
//div[#class='header group']
you should use something like this
<xpath expr="//div[hasclass('header') and hasclass('group')]" position="replace">
<t>
xxxxxx
</t>
</xpath>
I want to scrape all the names of the users who commented below a youtube video.
I'm using ruby and nokogiri.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "https://www.youtube.com/watch?v=tntOCGkgt98"
doc = Nokogiri::HTML(open(url))
doc.css(".comment-thread-renderer > .comment-renderer").each do |comment|
name = comment.css("#comment-section-renderer-items .g-hovercard").text
puts name
end
But it's not working, I'm not getting any output, no error either.
I won't be able to give you a solution, but at least I can give you a couple of hints that may help you to move forward.
The code you have is not working because the comments section is loaded via an ajax call after the page is loaded. If you do a hard reload in your browser, you will see that there is a spinner icon and a Loading... text in the sections comment, waiting for the content to be loaded. When Nokogiri gets the page via the http request, it gets the html content that you see before the comments are loaded. As a matter of fact the place where the contents will be later added looks like:
<div id="watch-discussion" class="branded-page-box yt-card">
<div id="comment-section-renderer"
class="comment-section-renderer vve-check"
data-visibility-tracking="CCsQuy8iEwjr3P3u1uzNAhXIepAKHRV9D8Ao-B0=">
<div class="action-panel-loading">
<p class="yt-spinner ">
<span class="yt-spinner-img yt-sprite" title="Loading icon">
</span>
<span class="yt-spinner-message">Loading...</span>
</p>
</div>
</div>
</div>
That is the reason why you won't find the divs you are looking for, because they aren't part of the html you have.
Looking at the network console in the browser, it seems that the ajax request to get the comments data is being sent to https://www.youtube.com/watch_fragments_ajax?v=tntOCGkgt98&tr=time&distiller=1&ctoken=EhYSC3RudE9DR2tndDk4wAEAyAEA4AEBGAY%253D&frags=comments&spf=load. As you can see the v parameter is the video id, however there are a couple of caveats:
There is a ctoken param, which you can get by scraping the original page contents. It is inside a <script> tag, in the form of
'COMMENTS_TOKEN': "<token>".
However, you still need to send a session_token as a form data in the body of the AJAX request (which is a POST). That I don't know where is coming from :(.
I think that you will be pushing the limits of Nokogiri here, as AFAIK it is not intended to follow ajax requests or handling Javascript. Maybe the ruby Selenium driver is better suited for this.
HTH
I think you need name.css("#comment-section..."
The each statement will iterate over the elements, using the variable name.
You may want to use node instead of name:
doc.css(".comment-thread-renderer > .comment-renderer").each do |node|
name = node.css("#comment-section-renderer-items .g-hovercard").text
puts name
end
I wrote this rails app using nokogiri to see all the tags that a page has before any javascript is run in the browser. The source code is here, so you can adjust it if you need to add more info about the node in the view.
That can easily tell you if the particular tag element that you are looking for is something you can retrieve without having to do some JS eval.
Most web crawlers don't support client-side rendering, which gives you an idea that it's not a trivial task to execute JS when scraping content.
YouTube is a dynamically rendered JavaScript website, though it could be parsed with Nokogiri without using Selenium or another package. Try open the Network tab in dev tools, scroll to the comment section, and see what request being send.
You need to make a post request in order to fetch comments data. You can preview the output in the "Preview" tab.
Preview output:
Which is equivalent to this comment:
Note: Since this comment brings very little value, this answer will be updated with the attached code once there will be an available solution.
I want to create a html page with a lot of questions and answers
The display for this html page must be like this:
+ question one
+ question two
+ question three
clicking to + of the question one appears the answer one and so on
Can I also do this in wordpress content
You can use bullet pointe in html.
<body>
<ul>
<li>Your text</li>
</ul>
</body>
If you want to make a quiz you will need to use javascript though. Html is only for web design and stuff. You can’t do much with html alone.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
This is my script written to scrape data inside the <iframe> tag using Nokogiri:
require 'nokogiri'
require 'restclient'
doc = Nokogiri::HTML(RestClient.get("http://www.sample_site.com/"))
doc.xpath('//iframe[#width="1001" and #height="973"]').children
I am getting like this:
=> [#<Nokogiri::XML::Text:0x1913970 "\r\nYour browser does not support inline frames\r\n">]
Can anyone tell me why?
An iframe is used to embed another document within the current HTML document. It means the iframe loads his content from an external source that is specified in the src attribute.
So, if you want to do scraping to an iframe content you should send a request to the external source from where it loads his content.
# The iframe (notice the 'src' attribute)
<iframe src="iframe_source_url" height="973" width="1001">
# iframe content
</iframe>
# Code to do the scraping
doc = RestClient.get('iframe_source_url')
parsed_doc = Nokogiri::HTML(doc)
parsed_doc.css('#yourSelectorHere') # or parsed_doc.xpath('...')
Note (about the error)
When you do scraping, the HTTP client you use acts as your browser (yours is restclient). The error says your browser does not support inline frames, in other words, restclient does not support inline-frames and is why it cannot load the content of the frame.
The issue is to be addressed to RestClient, not to Nokogiri.
RestClient does not retrieve the content of iframes. You might want to try to examine the content of RestClient.get("http://www.sample_site.com/"), there will be the string like:
<iframe src="page-1.htm" name="test" height="120" width="600">
You need a Frames Capable browser to view this content.
</iframe>
Nokogiri is fine dealing with this, it returns the content of iframe node which is apparently the only TextNode having the string you yielded as a result.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
style.display=‘none’ doesnt work on option tags in chrome, but it does in firefox, anyone know why? or a workaround?
#For Each myItem In Data
#<option style="display: none; " value="#myItem.dataCode" child="#myItem.dataCodeChild" >
#myItem.dataCode
</option>
Next myItem
All Option tags still show but style's in IE. element still show style="display: none; ".
But it work on Chrome. How can fix it to work in IE.?
<option> doesn't officially support the style="display: none" attribute
See this other question style.display='none' doesn't work on option tags in chrome, but it does in firefox
Resolve your issue on this site: https://stackoverflow.com/a/878331/1643075
and demo