How Use Xpath resolve-uri - xpath

I want get a url form html page with xpath .
i used the //*[#id="main"]/table/tr[2]/td[3]/a/#href
its return url like this /nevesta/yulia
i want add Base URI to url like this http//mydomain.ru/nevesta/yulia
after searching i found out , resolve-uri do that , but Unfortunately i can't find any example for this.

concat(base-uri(.), data(//*[#id="main"]/table/tr[2]/td[3]/a/#href))

It returns the Base URI of the document/node as defined in XML Base. There are some examples for using it on XQueryFunctions.com. Quoting from the linked page above:
If $arg is an element, the function returns the value of its xml:base attribute, if any, or the xml:base attribute of its nearest ancestor. If no xml:base attributes appear among its ancestors, it defaults to the base URI of the document node.
In other words: this function returns a sequence of base URIs of the nearest ancestor; if there isn't defined any, the one of the document (which you seem to be after).
But please be aware that this is an XPath 2.0 only function (and thus also XQuery, if course) and not available in XPath 1.0!

Related

Extract first element with XPath and scrapy

I use .extract() to get the data from a xpath, like:
response.xpath('//*#id="bakery"]/span[2]/text()').extract()
the issue with this is that I always get a list as response.
for example:
['23']
I only want the number, so I try with:
response.xpath('//*#id="bakery"]/span[2]/text()').extract()[0]
but this is a problem is the list empty, although I can use an exception to handle that scenario I guess there is a better way to do it
.extract_first() to the rescue:
response.xpath('//*#id="bakery"]/span[2]/text()').extract_first()
Instead of an exception, it would return None if no elements were matched.
There is a new Scrapy built in method get() can be used instead of extract_first() which always returns a string and None if no element exists.
response.xpath('//*#id="bakery"]/span[2]/text()').get()

GSA get latest results in a collection without q param

I'm trying to get the latest results inserted into a collection (ordered by data) on the homepage. I haven't a 'q' parameter because the user doesn't make a search yet in the homepage. So, there's a way to do this? Maybe a special character, I didn't find anything in the documentation.
You could utilize the site: query to get all content from your site like
q=site%3Ahttp%3A%2F%2Fwww.yoururl.com&sort=date%3AD%3AS%3Ad1
(site:http://www.yoururl.com URL encoded)
Finally I found this way: I used the parameter requiredfields and link to it all the results that I want to show. For example:
www.gsa.it/search?q=&sort=date:D:S:d1&requiredfields=client
This will return any results that have a meta tag of this name
<meta name="client" content="lorem ipsum">
Reference: Restricts the search results to documents that contain the exact meta tag names or name-value pairs.

IBM SBT SDK: How can I limit search results of CommunityService.getPublicCommunities(params)?

When I call communityService.getMyCommunities(params) or communityService.getPublicCommunities(params) or communityService.getSubCommunities(parentCommunity, params) I would expect that filling params with e. g. tags=[mytag,yourtag] the call would only lookup communities having at least one of these tags (or both, however).
But to me it looks like this param ("tags") is simply ignored, and I always receive all communities of the given category (my / public / sub).
In case of having lots of communities of the requested category this massively slows down performance when I only want to retrieve communities with e. g. one certain tag: I receive all data over the net and must filter / lookup the received object list locally.
What am I doing wrong?
Is there something missing in the SDK implementation?
As part of the communities/my api, you cannot do any filtering... you need to use Search APIs.
In order to get a filtered list of communities based on the tags, you need to make a request to the following URL.
https://apps.na.collabserv.com/search/atom/mysearch?scope=personalOnly&scope=communities&query=&constraint={%22type%22%3A%22category%22%2C%22values%22%3A[%22Tag%2Fprb%22]}&page=1&pageSize=10
Yes, it is URL encoded, you can then change prb to match your tag, and you can repeat the constraints for each tag
You can also reference Link to Search API Constraints

does sitecore query support current() function?

I get an error if I try to use the xpath 'current()' function in a sitecore query inside a nested expression, something like this:
/sitecore/content/Home/Topics/*[contains(current()/#MainTopics, ##id)]
What I am trying to do is, use this query as a source for my DropLink field to only list 'Topic' items that are already selected in another 'MainTopics' field in the same item.
But this gives me an error, something like ")" expected at position 50
So looks like current() function cannot be used inside the nested expression, or entirely. If not, is there any way to reference the current node and not the context node from within a nested expression?
Any ideas?
does sitecore query support current() function?
current() is a function defined only in XSLT.
By definition current() produces the node that is matched by the current xsl:template or the node that is selected in an xsl:for-each
As SiteCore doesn't seem to have an XSLT implementation, the answer must be negative.
I don't think there's any simple way to do this in Sitecore query.
But to place a data source on a template field that references items chosen in another field (if I've understood right) then one possible solution is to build a custom field. In the code for the custom field you would do some string handling to extend or override Sitecore query, allowing you to add a query type that specified a particular field ID. Essentially you'd have to write your own very basic query language that you could use within your C# code.
You can derive your feild from ValueLookupEx and override the GetItems(Item current) method with your custom query handling. As you can see, it quite handliy comes with the current item provided!
We had a similar requirement which we solved this way.

How can I get Mechanize objects from Mechanize::Page's search method?

I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search returns Nokogiri::XML::Elements which I can't use to fill and submit forms etc.
I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with methods too. However, matching things like :not(.class) is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.
Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search method?
Like stated in this answer you can simply construct a new Mechanize::Form object using your Nokogiri::XML::Element retrieved via Mechanize::Page#search or Mechanize::Page#at:
a = Mechanize.new
page = a.get 'https://stackoverflow.com/'
# Get the search form via ID as a Nokogiri::XML::Element
form = page.at '#search'
# Convert it back to a Mechanize::Form object
form = Mechanize::Form.new form, a, page
# Use it!
form.q = 'Foobar'
result = form.submit
Note: You have to provide the Mechanize object and the Mechanize::Page object to the constructor to be able to submit the form. Otherwise it would just be a Mechanize::Form object without context.
There seems to be no central utility function to convert Nokogiri::XML::Elements to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.

Resources