XPath for pagination non link site - xpath

how to extract pagination this link (non link site ) by xpath
code:
<a class="pager_item" href="javascript:" data-page="2">2</a>
and i want get pagination link is:
https://www.sitename.com/search/phone/?pageno=2

The expression
//a/#data-page
should resolve to 2.
Then, depending on language/library you are using, you declare the base url (https://www.sitename.com/search/phone/?pageno=) as one variable, the output of that xpath expression as another variable, and then concatenate the two into the final url.

Related

Extracting a link with jmeter

So I need to delete an "onclick" dynamic link using jmeter.
Here is the sample of one of the links:
"Delete"
What I need is to extract number and post it in order to do the delete action. Every link is the same except the number.
I have tried to implement some of the solutions I've found on this site but it didn't work.
Thanks in advance
Peace
If you need to do it with XPath you could try going for substring-after function like:
substring-after(//a[text()='Delete']/#href,'param=')
The above expression returns everything which is after param= text in href attribute of a HTML tag having Delete text.
You can test your XPath expressions against actual server response using XPath Tester tab of the View Results Tree listener.
References:
substring-after Function Reference
XPath 1.0 Language Reference
Using the XPath Extractor in JMeter
XPath Tutorial

Assign a variable to xpath scrapy

Im using scrapy to crawl a webpage, the web page has 10+ links to crawl using |LinkExtractor, everything works fine but on the crawling of extracted links i need to get the page url. I have no other way to get the url but to use
response.request.url
How do i assign that value to
il.add_xpath('url', response.request.url)
If i do it like this i get error:
File "C:\Python27\lib\site-packages\scrapy\selector\unified.py", line
100, in xpath
raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))
exceptions.ValueError: Invalid XPath: http://www.someurl.com/news/45539/
title-of-the-news
And for description it is like this (just for refference):
il.add_xpath('descrip', './/div[#class="main_text"]/p/text()')
Thanks
The loader comes with two ways of adding attributes to the item, and is with add_xpath and add_value, so you should use something like:
...
il.add_value('url', response.url) # yes, response also has the url attribute

Scrapy xpath returns an empty list although tag and syntax are correct

In my parse function, here is the code I have written:
hs = Selector(response)
links = hs.xpath(".//*[#id='requisitionListInterface.listRequisition']")
items = []
for x in links:
item = CrawlsiteItem()
item["title"] = x.xpath('.//*[contains(#title, "View this job description")]/text()').extract()
items.append(item)
return items
and title returns an empty list.
I am capturing an xpath with an id tag in the links and then with in the links tag, I want to get list of all the values withthe title that has view this job description.
Please help me fix the error in the code.
If you cURL the request of the URL you provided with curl "https://cognizant.taleo.net/careersection/indapac_itbpo_ext_career/moresearch.ftl?lang=en" you get back a site way different from the one you see in your browser. Your search results in the following <a> element which does not have any text() attribute to select:
<a id="requisitionListInterface.reqTitleLinkAction"
title="View this job description"
href="#"
onclick="javascript:setEvent(event);requisition_openRequisitionDescription('requisitionListInterface','actOpenRequisitionDescription',_ftl_api.lstVal('requisitionListInterface', 'requisitionListInterface.listRequisition', 'requisitionListInterface.ID5645', this),_ftl_api.intVal('requisitionListInterface', 'requisitionListInterface.ID5649', this));return ftlUtil_followLink(this);">
</a>
This is because the site loads the site loads the information displayed with an XHR request (you can look up this in Chrome for example) and then the site is updated dynamically with the returned information.
For the information you want to extract you should find this XHR request (it is not hard because this is the only one) and call it from your scraper. Then from the resulting dataset you can extract the required data -- you just have to create a parsing algorithm which goes through this pipe separated format and splits it up into job postings and then extracts the information you need like position, id, date and location.

GSA get latest results in a collection without q param

I'm trying to get the latest results inserted into a collection (ordered by data) on the homepage. I haven't a 'q' parameter because the user doesn't make a search yet in the homepage. So, there's a way to do this? Maybe a special character, I didn't find anything in the documentation.
You could utilize the site: query to get all content from your site like
q=site%3Ahttp%3A%2F%2Fwww.yoururl.com&sort=date%3AD%3AS%3Ad1
(site:http://www.yoururl.com URL encoded)
Finally I found this way: I used the parameter requiredfields and link to it all the results that I want to show. For example:
www.gsa.it/search?q=&sort=date:D:S:d1&requiredfields=client
This will return any results that have a meta tag of this name
<meta name="client" content="lorem ipsum">
Reference: Restricts the search results to documents that contain the exact meta tag names or name-value pairs.

How Use Xpath resolve-uri

I want get a url form html page with xpath .
i used the //*[#id="main"]/table/tr[2]/td[3]/a/#href
its return url like this /nevesta/yulia
i want add Base URI to url like this http//mydomain.ru/nevesta/yulia
after searching i found out , resolve-uri do that , but Unfortunately i can't find any example for this.
concat(base-uri(.), data(//*[#id="main"]/table/tr[2]/td[3]/a/#href))
It returns the Base URI of the document/node as defined in XML Base. There are some examples for using it on XQueryFunctions.com. Quoting from the linked page above:
If $arg is an element, the function returns the value of its xml:base attribute, if any, or the xml:base attribute of its nearest ancestor. If no xml:base attributes appear among its ancestors, it defaults to the base URI of the document node.
In other words: this function returns a sequence of base URIs of the nearest ancestor; if there isn't defined any, the one of the document (which you seem to be after).
But please be aware that this is an XPath 2.0 only function (and thus also XQuery, if course) and not available in XPath 1.0!

Resources