I tried to use a command like this in Vimperator:
echo document.getElementsByTagName("p");
To view nodes whose tag name is <p> in vimperator. However, the result is like this:
I also tried the same command in Firebug. Following is the result:
While Vimperator's result is empty, Firebug's is not empty. Does anyone know why Vimperator echoes the Collection whose length is zero?
See the related question: How do I getElementByID in Vimperator?. The short answer is to use echo window.content.window.document.getElementsByTagName("p") instead.
Related
I’m trying to match a value where I don’t necessarily know the whole value every time i.e. it's randomly generated. Is there a way to search for a value where a part of the value dynamically changes?
Please see my example of value I'm trying to find and my attempted xPath:
<div class="target" testid="target”>
<h2>Hi, random user</h2>
<p>To get the xpath <b>target</b> of <b>[text I don’t know]</b> in <b>[text I don’t know]</b>, you need to do the following</p>
</div>
I’ve tried the following xpath I picked up from another question but it don’t get a match:
//p[matches(.,'^To get the xpath <b>target</b> of <b>.*</b> in <b>.*</b>, you need to do the following$')]
I’ve tried different combinations with and without the bold tag but can’t seem to get it to match. truthfully I'm not sure I've got the right syntax...
Try the plain text in the second argument of matches e.g.
//p[matches(., '^To get the xpath target of .*? in .*?, you need to do the following$')]
Online sample here.
Why not to use contains() method using the fixed attribute value?
Something like:
//p[contains(.,'you need to do the following')]
I want to extract all the functions listed inside the table in the below link : python functions list
I have tried using the chrome developers console to get the exact xpath to be used in the file spider.py as below:
$x('//*[#id="built-in-functions"]/table[1]/tbody//a/#href')
but this returns a list of all href's ( which I think what the xpath expression refers to).
I need to extract the text from here I believe but appending /text() to the above xpath return nothing. Can someone please help me to extract the function names from the table.
I think this should do the trick
response.css('.docutils .reference .pre::text').extract()
a non-exact xpath equivalent of it (but that also works in this case) would be:
response.xpath('//table[contains(#class, "docutils")]//*[contains(#class, "reference")]//*[contains(#class, "pre")]/text()').extract()
Try this:
for td in response.css("#built-in-functions > table:nth-child(4) td"):
td.css("span.pre::text").extract_first()
I have a code like this:
doc = Nokogiri::HTML("<a href='foo.html'>foo</a><a href='bar.html'>bar</a>")
doc.xpath('//a/#href').map(&:value) # => ["foo.html", "bar.html"]
It works as I expected.
But just out of curiosity I want to know, can I also get the value of href attributes only by using XPath?
Locate attributes first
example:
site name:
https://www.easymobilerecharge.com/
We want to locate "MTS" link
In your case, to locate this element, we can use x-path like:
//a[contains(text(),'MTS')]
Now to get href attribute, use:
//a[contains(text(),'MTS')]/#href
Judging from the first answer to this question the answer seems to be yes and no. It offers
xml.xpath("//Placement").attr("messageId")
which is quite close to "only XPath", but not entirely. Up to you to judge if that is enough for you.
I am trying to get these two attributes separately. When I try to get the version class the duration also gets lumped in with it as the tag is not closed. Also if there happens to be no version then I'd just get the duration returned. How do I ensure I grab this data separately and correctly?
Here's the html:
<span class="version">Original Version <span class="duration">(6:20)</span></span>
This is my current code and also the results I get now:
.//span[#class='duration'] Result: "(6:20)" CORRECT
.//span[#class='version'] Result: "Original Version (6:20)" INCORRECT!
I tried playing around with the 'not contains' operator but still cannot figure it out. Thanks for any help in advance.
This might be one of the few valid use cases for text():
.//span[#class='version']/text()
would give you just text nodes that are direct children of the version span, and not the text contained in any child elements.
In your example you'd get one text node whose value is "Original Version " (including a trailing space).
I am coding with Groovy, however, I don't believe its a language specific set of questions.
I actually have two questions
First Question
I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.
The page I'm testing it on is:
http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4
My code:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
This simply prints out: []
Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.
Second Question
I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[#id="gmi-ResViewSizer_img"]
How do I handle that?
First Answer:
/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a
Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.
Maybe something like this:
/html/body//div/h1/a
Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".
There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:
/html/body/div/div/div/div/img[1]
I had the same problem, I solved when I realize iframe tags on page, try call
((HtmlPage)current_page.getFrames()[n].getEnclosedPage()).getElementByXPath(...
where n is the position in frame in iframe collection. It's work for me !!!
Thanks a lot.