I am trying to match the value of the following HTML snippet:
<input name="example" type="hidden" value="matchTextHere" />
with the following:
x = response.match(/<input name="example" type="hidden" value="^.+$" \/>/)[0]
why is this not working? it doesn't match 'matchTextHere'
edit:
when i use:
x = response.match(/<input name="example" type="hidden" value="(.+)" \/>/)[0]
it matches the whole html element, and not just the value 'matchTextHere'
^ matches start of a line and $ matches end of the line. Change ^.+$ to \w+ and it will work for values that doesn't contain any symbols. Make it a parenthetical group to capture the value - (\w+)
Update: to match anything between the quotes (assuming that there aren't any quotes in the value), use [^"]+. If there are escaped quotes in the value, it is a different ballgame. .+ will work in this case, but it will be slower due to backtracking. .+ first matches upto the end of the string (because . matches even a "), then looks for a " and fails. Then it comes back one position and looks for a " and fails again - and so on until it finds the " - if there was one more attribute after value, then you will get matchTextHere" nextAttr="something as the match.
x = response.match(/<input name="example" type="hidden" value="([^"]+)" \/>/)[1]
That being said, the regex will fail if there is an extra space between any of the attribute values. Parsing html with regex is not a good idea - and if you must use regex, you can allow extra spaces using \s+
/<input\s+name="example"\s+type="hidden"\s+value="([^"]+)"\s*\/>/
Because you have a start-of-line token (^) and an end-of-line token ($) in your regular expression. I think you meant to capture the value, this might solve your problem: value="(.+?)".
Beware, though, that processing html with regular expressions is not a good idea, it can even drive you crazy. Better use an html parser instead.
You don't need the ^ and $:
x = response.match(/<input name="example" type="hidden" value=".+" \/>/)[0]
you just need to change [0] to [1]
response='<input name="example" type="hidden" value="matchTextHere" />'
puts response.match(/<input name="example" type="hidden" value="(.*?)" \/>/)[1]
matchTextHere
Related
I have a hidden code from which I am trying extract the hidden field- 320365
<fieldset class="inputs"><ol></ol></fieldset><input id="activity_id" name="activity[approval_processor][approvals_attributes][0][id]" type="hidden" value="320365" />
and I tried -
[approvals_attributes][0][id]" type="hidden" value="(.+?)"
but even the Regex Tester is not showing the number 320365. What am I doing wrong?
Almost correct, you just need to escape [ and ] as they have a special meaning in RegEx:
\[approvals_attributes\]\[0\]\[id\]" type="hidden" value="(.+?)"
Also if you know that the value is supposed to be a number, it might be better to limit it to numbers only:
\[approvals_attributes\]\[0\]\[id\]" type="hidden" value="([0-9]+)"
\[approvals_attributes\]\[0\]\[id\]" type="hidden" value="(.+?)"
or you can also use the simple eq
type="hidden" value="(.+?)"
you can also use the website - https://regex101.com/
to write any regular expressions.
I can match sometext and othertext in
<br>
sometext
<br>
othertext
using xpath selector '//br/following-sibling::text()'
but if there is only whitespace after the <br> element
<br>
<br>
othertext
only the second match occurs. Is it possible to match whitespace as well?
I tried
//br/following-sibling::matches(., "\s+")
to attempt to match whitespace without success.
'matches' is to match regular-expressions, not to match nodes. And it can't be used with an axis specifier. You could use it as condition like:
//br/following-sibling::text()[matches(., "\s+")]
Or without regexs (might be faster depending on the implementation), checking if it is all whitespace and not the empty string:
//br/following-sibling::text()[(normalize-space(.) = "") and (. != "")]
I have this HTML:
<tr class="even expanded first>
<td class="score-time status">
<a href="/matches/2012/08/02/europe/uefa-cup/">
16 : 00
</a>
</td>
</tr>
I want to extract the (16 : 00) string without the extra whitespace. Is this possible?
I. Use this single XPath expression:
translate(normalize-space(/tr/td/a), ' ', '')
Explanation:
normalize-space() produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character.
translate() takes the result produced by normalize-space() and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.
II. Alternatively:
translate(/tr/td/a, '

', '')
Please try the below xpath expression :
//td[#class='score-time status']/a[normalize-space() = '16 : 00']
You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]
I came across this thread when I was having my own issue similar to above.
HTML
<div class="d-flex">
<h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
<a href="/nsomar/OAStackView/releases/tag/1.0.1">
1.0.1
</a>
XPath start command
tree.xpath('//div[#class="d-flex"]/h4/a/text()')
However this grabbed random whitespace and gave me the output of:
['\n ', '\n 1.0.1\n ']
Using normalize-space, it removed the first blank space node and left me with just what I wanted
tree.xpath('//div[#class="d-flex"]/h4/a/text()[normalize-space()]')
['\n 1.0.1\n ']
I could then grab the first element of the list, and use strip() to remove any further whitespace
XPath final command
tree.xpath('//div[#class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()
Which left me with exactly what I required:
1.0.1
you can check if text() nodes are empty.
/path/text()[not(.='')]
it may be useful with axes like following-sibling:: if these are no containers, or with child::.
you can use string() or the regex() function of xpath 2.
NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().
if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.
you can separate node and string manipulation
So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).
The scenario is I would like to write a hidden field with a guid value generated by the server.
Why does
<input type="hidden" id="sampleGuid" value="#{Guid.NewGuid().ToString()};" />
yield 'value=""' while
#{
string token = Guid.NewGuid().ToString();
<input type="hidden" id="sampleGuid" value="#token" />
}
properly fill in 'value' with a guid?
You need parentheses instead of braces.
#{ ... } will execute ordinary statements, but won't print anything.
#(...) will print the value of an expression. (and will HTML-encode it)
You've wrapped Guid.NewGuid().ToString() in curly braces.
That just means you want to execute the code, not ouput it.
If you're trying to output a value, wrap the code in parenthesis.
consider both types:
<select name="garden">
<option>Flowers</option>
<option selected="selected">Shrubs</option>
<option>Trees</option>
<option selected="selected">Bushes</option>
<option>Grass</option>
<option>Dirt</option>
</select>
Is #val for actually indicating the value="" attribute ?
Is #value for indicating the innerText value ?
for example what happens if <option> doesn't contain any value="" property. how would you select it then ?
select/option[#value = "Grass"]
Does Xpath automatically ignore white spaces for the case above? Should it be trimmed?
EDIT:
for selecting multiple options would this suffice ?
select/option[normalize-space(text())="Grass" or normalize-space(text())="Trees"]
To select by text value you can use text() function. And normalize spaces is required, because they are not removed by default. Here is an example:
select/option[normalize-space(text())="Grass"]
#value - value of "value" attribute
#val - value of "val" attribute
normalize-space() - function returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space
Well, if whitespace isn't an issue:
/select/option[.='Grass']
I'd need to check re whitespace, though. You could always normalize:
/select/option[normalize-space(.)='Grass']