Good day, colleagues!
Tell me please, how to make a dynamic xpath-parsing:
for example, instead of writing
$domXPath->query('//*[(#id = "article-id-18")]');
-> write something like that
$domXPath->query('//*[(#id = "article-id-*")]');
, because in my case, the site's script generate (every time) a new id for block, that contains article's text?
So question, is above.
Try
$domXPath->query('//*[starts-with(#id, "article-id-")]');
Related
My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.
This may be very simple question but I am very new to ruby or any programming language. I want to trim some text and use it as parameter for next step. Can any one please write me code for doing this. I am testing a web application which is used in financial domain. I need to use the cvv2 and expiry date of card number which is generated in next step as parameter. The text which gets displayed on html is
CVV2 - 657 Expiry - 05/12 (mm/yy)
Now from the above text I should some how get only '657' and '0512' as value to use is in next step.
Request for urgent assistance.
If all your strings will be formatted like this, I suggest using regexp, using String#gsub. There's lots of places to learn about regexp if you don't already, and rubular allows you to test it in your browser, with a short cheat sheet.
To do it quickly you could use
card_details = "CVV2 - 657 Expiry - 05/12 (mm/yy)"
card_details = card_details.scan(/\d{2,}/)
cvv2 = card_details[0]
expiry = card_details[1] + card_details[2]
Probably better ways of doing it as I'm no expert, but you said urgent, so.
For getting the text out of the cell you could try (I don't use the original watir anymore, so I might not be able to remember this):
card_details = browser.td(:text => /CVV2/).text
If that doesn't work give this a try (actually on second thought TRY THIS ONE FIRST)
card_details = browser.cell(:text => /CVV2/).text
For these examples I'm assuming your browser object is called "browser".
We can use regular expression to achieve the same,
> "CVV2 - 657 Expiry - 05/12 (mm/yy)".match(/\d{3}/)
=> "657"
>"CVV2 - 657 Expiry - 05/12 (mm/yy)".match(/\d+\/\d+/)
=> "05/12"
Let me start with I'm not a programmer by trade, but I'm learning the best I can. I'm trying to build a template to take the result of one FreeMarker interpolation result and use that as a variable for another. I hope I'm using the terms correctly.
For example, I want the result of (entity.customer.organization.name) to be used in:
${blurb["organizationXXXAttire"]!}
Where XXX is the result of (entity.customer.organization.name)
If it was just a blurb with out a variable company name it would look like:
${blurb["organizationCompanyAttire"]!}
I thought the following would work but it did not:
<#assign organization = (entity.customer.organization.name)>
${blurb["organization<#organization?interpret>Attire"]!}
Thanks in advance for any suggestions.
It's simply ${blurb["organization${entity.customer.organization.name}Attire"]!}.
?interpret is only needed if you have a string that contains a piece of template. Besides you can't call directives (<#...>, <#...>) inside an expression.
This may be a silly question, but is it possible to make a query using XPath without specifying the element name?
Normally I would write something like
//ElementName[#id = "some_id"]
But the thing is I have many (about 40) different element types with an id attribute and I want to be able to return any of them if the id fits. But I don't want to make this call for each type individually. Is it possible to search all of them at once, regardless of the name?
I am using this in an XQuery script, if that offers any help.
use * instead of name //*[#id = "some_id"]
It might be more efficient to look directly at the #id elements - //* will work, but will initially return every node in the document and then filter!
That may not matter in a small document, of course. but here's an alternative:
//#id[.="some_id"]/..
I am coding with Groovy, however, I don't believe its a language specific set of questions.
I actually have two questions
First Question
I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.
The page I'm testing it on is:
http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4
My code:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
This simply prints out: []
Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.
Second Question
I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[#id="gmi-ResViewSizer_img"]
How do I handle that?
First Answer:
/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a
Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.
Maybe something like this:
/html/body//div/h1/a
Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".
There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:
/html/body/div/div/div/div/img[1]
I had the same problem, I solved when I realize iframe tags on page, try call
((HtmlPage)current_page.getFrames()[n].getEnclosedPage()).getElementByXPath(...
where n is the position in frame in iframe collection. It's work for me !!!
Thanks a lot.