Extract first element with XPath and scrapy

Extract first element with XPath and scrapy - xpath

I use .extract() to get the data from a xpath, like:
response.xpath('//*#id="bakery"]/span[2]/text()').extract()
the issue with this is that I always get a list as response.
for example:
['23']
I only want the number, so I try with:
response.xpath('//*#id="bakery"]/span[2]/text()').extract()[0]
but this is a problem is the list empty, although I can use an exception to handle that scenario I guess there is a better way to do it

.extract_first() to the rescue:
response.xpath('//*#id="bakery"]/span[2]/text()').extract_first()
Instead of an exception, it would return None if no elements were matched.

There is a new Scrapy built in method get() can be used instead of extract_first() which always returns a string and None if no element exists.
response.xpath('//*#id="bakery"]/span[2]/text()').get()

Related

Response Assertion JMeter on Array

I'm trying to get assert a response body from a list of objects that were deleted with AWS SDK, the scenario is this:
I use a Delete Http request to a endpoint passing an array, with a list of object names, then aws should return the list with the objects that were deleted, but it's not in the same order of the list that i've passed, so i'm using a Contains to see if the objects are in the response body.
Someone can help me? I think that is a problem with Regex from JMeter but I'm freeze with this.

You're using Contains pattern matching rule which expects the pattern to be a Perl-5 compatible regular expression
In this case you will need to properly escape all the meta characters either manually or using a __groovy() function calling Pattern.quote() method instead:
${__groovy(java.util.regex.Pattern.quote(vars.get('listOfObjects')),)}
If you want to check whether the response just contains your ${listOfObjects} variable switch to Substring pattern matching rule

How Use Xpath resolve-uri

I want get a url form html page with xpath .
i used the //*[#id="main"]/table/tr[2]/td[3]/a/#href
its return url like this /nevesta/yulia
i want add Base URI to url like this http//mydomain.ru/nevesta/yulia
after searching i found out , resolve-uri do that , but Unfortunately i can't find any example for this.

concat(base-uri(.), data(//*[#id="main"]/table/tr[2]/td[3]/a/#href))

It returns the Base URI of the document/node as defined in XML Base. There are some examples for using it on XQueryFunctions.com. Quoting from the linked page above:
If $arg is an element, the function returns the value of its xml:base attribute, if any, or the xml:base attribute of its nearest ancestor. If no xml:base attributes appear among its ancestors, it defaults to the base URI of the document node.
In other words: this function returns a sequence of base URIs of the nearest ancestor; if there isn't defined any, the one of the document (which you seem to be after).
But please be aware that this is an XPath 2.0 only function (and thus also XQuery, if course) and not available in XPath 1.0!

How can I use a list function in CouchDB to generate a valid (/normal) ViewResults object?

I have a simple problem I need to solve, and list functions are my current attempt to do so. I have a view that generates almost what I need, but in certain cases there are duplicate entries that make it through when I send in edge-case parameters.
Therefore, I am looking to filter these extra results out. I have found examples of filtering, which I am using (see this SO post). However, rather than generate HTML or XML or what-have-you, I just want a regular ol' view result. That is, the same kind of object that I would get if I queried CouchDB without a list function. It should have JSON data as normal and be the same in every way, except that it is missing duplicate results.
Any help on this would be appreciated! I have tried to send() data in quite a few different ways, but I usually get that "No JSON object could be decoded", or that indices need to be integers and not strings. I even tried to use the list to store every row until the end and send the entire list object back at once.
Example code (this is using an example from this page to send data:
function(head, req) {
var row; var dupes = [];
while(row=getRow()) {
if (dupes.indexOf(row.key) == -1) {
dupes.push(row.key);
send(row.value);
}
};
}
Lastly, I'm using Flask with Flask-CouchDB, and I'm seeing the aforementioned errors in the flask development server that I'm running.
Thanks! I can try to supply more details if need be.

Don't you need to prepend a [, send a , after each row value except the last, and end with ]? To actually mimic a view result, you'd actually need to wrap that in a JSON structure:
{"total_rows":0,"offset":0,"rows":[<your stuff here>]}

Parse JSON from Jenkins, once hash, then nil

Jenkins gives me JSON from http://jenkins.net/jobs/MyJob/lastBuild/api/json
Then I use HTTParty to get it like so:
response = self.get( url, options )
change = response['changeSet']['items'][0]
This gives me the content of the last changes. change.class returns "Hash".
If I try this:
change = response['changeSet']['items'][0]['revision']
as looking at the JSON suggests, I get "Undefined method '[]' on NilObject".
What am I doing wrong?
EDIT3:
Of course, the problem lies between User and keyboard. The method was first called on another JSON, because it's polling the changes for more than one project, and one of the returned JSON objects didn't contain those keys. D'oh!
Sorry.

If you get that kind of error you're hitting an empty key and then trying to use it as if it's populated. Without seeing what your JSON is, it's hard to say, but one of those is failing. You'll want to inspect these:
response['changeSet']
response['changeSet']['items']
response['changeSet']['items'][0]
If any of those end up being nil then you can pin-point the problem. JSON comes back as an arbitrary structure so chaining a bunch of calls together without any sort of testing can lead to trouble.

php DomDocument xpath: how can i print the all html content of an element resulted from an xpath query?

I'm using DomDocument to query for html elements.
when i use $obj->textContent or $obj->nodeValue it returns only the texts that include in the element, it does not return the html representation of the object.
which means..
if the object contains
<div>test</div>
the return value for both tries will be test.
how do i fetch the html elements as well?
I know that there are other solutions for this besides domDocument like DomHTMlDocument and others but i'd prefer to work a solution with DomDocument.
thanks!

Have you tried a solution such as http://refactormycode.com/codes/708-innerhtml-of-a-domelement ?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extract first element with XPath and scrapy - xpath

.extract_first() to the rescue: response.xpath('//*#id="bakery"]/span[2]/text()').extract_first() Instead of an exception, it would return None if no elements were matched.

There is a new Scrapy built in method get() can be used instead of extract_first() which always returns a string and None if no element exists. response.xpath('//*#id="bakery"]/span[2]/text()').get()

Related

Response Assertion JMeter on Array

How Use Xpath resolve-uri

How can I use a list function in CouchDB to generate a valid (/normal) ViewResults object?

Parse JSON from Jenkins, once hash, then nil

php DomDocument xpath: how can i print the all html content of an element resulted from an xpath query?

Categories

Resources