How can I extract certain strings with Xpath - xpath

<span>1 Bedroom, 1 Bath</span>
Hi
I am new in Xpath. And my English is not very well, but I'll try. I need to extract with Xpath only '1 Bedroom' and '1 Bath'. How can I do this?
Thank you a lot

You can use substring-after() and substring-before():
substring-before(span, ',')
substring-after(span, ', ')

Related

How to build XPath with [0-9] regex expression

Hi is there anyone can help me how to build the xpath for below 2 different XPath values?
//*[#id="EditorPanel"]/div[3]/div/table[1]/tbody/tr/td/table[1]
//*[#id="EditorPanel"]/div[4]/div/table[1]/tbody/tr/td/table[1]
Im thinking something like
//*[#id="EditorPanel"]/div[3-4]/div/table[1]/tbody/tr/td/table[1]
div[3] is just an abbreviation of div[position() = 3], so you can use
//*[#id="EditorPanel"]/div[position() = 3 or position() = 4]/div/table[1]/tbody/tr/td/table[1]

How to join the results of two XPath expressions using the concat function?

I have following XML:
<root>
<chp id='1'>
<sent id='1'>hello</sent>
<sent id='2'>world</sent>
</chp>
<chp id='2'>
<sent id='1'>the</sent>
<sent id='2'>best</sent>
<sent id='3'>world</sent>
</chp>
</root>
Using the XPath expression
root/chp/sent[contains(.,'world')]/#id
I have the result 2 3 (which is exactly what I want), but when I run
concat('sentence ', /root/chp/sent[contains(.,'world')]/#id, ' - chap ' , /root/chp/sent[contains(.,'world')]/../#id )
the result breaks at the first result:
sentence 2 - chap 1
The last argument does not contain a single value, but a sequence. You cannot use XPath 1.0 to join this sequence to a single string. If you're able to use XPath 2.0, use string-join($sequence, $divisor):
concat(
'sentence ',
string-join(/root/chp/sent[contains(.,'world')]/#id, ' '),
' - chap ',
string-join(/root/chp/sent[contains(.,'world')]/../#id, ' ')
)
which will return
sentence 2 3 - chap 1 2
Most probably you want to loop over the result set yourself (also requires XPath 2.0):
for $sentence in /root/chp/sent[contains(.,'world')]
return concat('sentence ', $sentence/#id, ' - chap ', $sentence/../#id)
which will return
sentence 2 - chap 1
sentence 3 - chap 2
If you cannot use XPath 2.0, but are able to further process results in some outside programming language (like Java, PHP, ...) using DOM: Use /root/chp/sent[contains(.,'world')] to find the sentence nodes, loop over them, then query the #id and parent (chapter) #id using DOM and construct the result.

How to convert the int number into money format in X-path expression?

I want to convert the number (ie.1000) directly into into the money format like(ie.1,000).how should i do that?
In XSLT there is format-number(), but in pure XPath you'll have to do it the hard way. Perhaps you should do the formatting in the host language that you call XPath from?
You can use
<xsl:value-of select="format-number($wartosc, "###,###,###,###,##0.00")" />
You can use regex.
If $number is a string containing your integer number, then you can use:
replace(replace(
if (string-length($number) mod 3 eq 2) then concat("0", $number)
else if (string-length($number) mod 3 eq 1) then concat("00", $number)
else $number,
"([0-9]{3})", ",$1"),
"^[,0]+", "")

In xpath how you compare text() with \r\n (line break)?

I want to get the node :
//script[starts-with(text(). '\r\nvar name')]
but it seems xpath does not recognize \r\n escape characters. Any ideas how to match them?
Note: I am using html agility pack
Use:
//script[starts-with(., '
var name')]
Most often XML is normalized by the XML parser and there is only a single NL character left -- therefore, if the above expression doesn't select the wanted script elements, try with:
//script[starts-with(., '
var name')]
Or, this would work in both cases:
//script
[(starts-with(., '
') or starts-with(., '
'))
and
starts-with(substring-after(., '
'), 'var name')
]

Regular expression help

I am currently doing a bunch of processing on a string using regular expressions with gsub() but I'm chaining them quite heavily which is starting to get messy. Can you help me construct a single regex for the following:
string.gsub(/\.com/,'').gsub(/\./,'').gsub(/&/,'and').gsub(' ','-').gsub("'",'').gsub(",",'').gsub(":",'').gsub("#39;",'').gsub("*",'').gsub("amp;",'')
Basically the above removes the following:
.com
.
,
:
*
switches '&' for 'and'
switches ' ' for '-'
switches ' for ''
Is there an easier way to do this?
You can combine the ones that remove characters:
string.gsub(/\.com|[.,:*]/,'')
The pipe | means "or". The right side of the or is a character class; it means "one of these characters".
A translation table is more scalable as you add more options:
translations = Hash.new
translations['.com'] = ''
translations['&'] = 'and'
...
translations.each{ |from, to| string.gsub from, to }
Building on Tim's answer:
You can pass a block to String.gsub, so you could combine them all, if you wanted:
string.gsub(/\.com|[.,:*& ']/) do |sub|
case(sub)
when '&'
'and'
when ' '
'-'
else
''
end
end
Or, building off echoback's answer, you could use a translation hash in the block (you may need to call translations.default = '' to get this working):
string.gsub(/\.com|[.,:*& ']/) {|sub| translations[sub]}
The biggest perk of using a block is only having one call to gsub (not the fastest function ever).
Hope this helps!

Resources