meaning of xpath Notations - xpath

I want to know, what is the meaning of below symbols while calculating xpath.
., #, |, [], /, //, //*
I mean if for eg. xpath is .//*[#id='Table2']/tbody/tr[1]/td[1]
then what is the explanation of the same.

Related

Double escape characters in elisp regex patterns

(regexp-opt '("this" "that"))
returns,
"\\(?:th\\(?:at\\|is\\)\\)
Why there are double backward slashes everywhere in this elisp regex. Doesn't elisp regex use single backward slash?
And, ? symbol is a postfix operator in regex patterns which means it acts upon the characters that precedes it..(http://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Special.html#Regexp-Special). but here, there are no expressions before the ? operator. so, what does
(?:th\\
part mean in this regex.
The backslash is part of the regexp syntax. But to preserve it as part of a regexp string, you need to protect it with another backslash, as documented in the syntax for strings documentation:
'Likewise, you can include a backslash by preceding it with another backslash, like this: "this \\ is a single embedded backslash".'
As for the ?: construct, it's how you specify a non-capturing or "shy" group:
"A shy group serves the first two purposes of an ordinary group (controlling the nesting of other operators), but it does not get a number, so you cannot refer back to its value with ‘\digit’. Shy groups are particularly useful for mechanically-constructed regular expressions, because they can be added automatically without altering the numbering of ordinary, non-shy groups."
It's documented as part of the regexp backslash documentation. As the passage quoted above explains, it's useful in functions like regexp-opt for grouping patterns without creating capture groups.

Precedence of Ruby regular expressions?

I am reviewing regular expressions and cannot understand why a regular expression won't match a given string, specifically:
regex = /(ab*)+(bc)?/
mystring = "abbc"
The match matches "abb" but leaves the c off. I tested this using Rubular and in IRB and don't understand why the regex doesn't match the entire string. I thought that (ab*)+ would match "ab" and then (bc)? would match "bc".
Am I missing something in terms of precedence for regular expression operations?
Regular expressions try to match the first part of the regular expression as much as possible by default, and they do not backtrack to try to make larger sections match if they don't have to. Since you make (bc) optional, the (ab*) can match as much as it wants (the non-zero repetition after it doesn't have much to do) and doesn't try backtracking to try other matching alternatives.
If you want the whole string to be matched (which will force some backtracking in this case) make sure you anchor both ends of the string:
regex = /^(ab*)+(bc)?$/
The regex with parenthesis assumes you have two matches in your string.
The first one is abb because (ab*) means a and zero or more b. You have two b, so the match is abb. Then you have only c in your string, so it doesn't match the second condition which is bc.

meaning of a `+` following a `*`, when the latter is used as a quantifier in a regular expression

Today I came across the following regular expression and wanted to know what Ruby would do with it:
> "#a" =~ /^[\W].*+$/
=> 0
> "1a" =~ /^[\W].*+$/
=> nil
In this instance, Ruby seems to be ignoring the + character. If that is incorrect, I'm not sure what it is doing with it. I'm guessing it's not being interpreted as a quantifier, since the * is not escaped and is being used as a quantifier. In Perl/Ruby regexes, sometimes when a character (e.g., -) is used in a context in which it cannot be interpreted as a special character, it is treated as a literal. But if that was happening in this case, I would expect the first match to fail, since there is no + in the lvalue string.
Is this a subtly correct use of the + character? Is the above behavior a bug? Am I missing something obvious?
Well, you can certainly use a + after a *. You can read a bit about it on this site. The + after the * is called a possessive quantifier.
What it does? It prevents * from backtracking.
Ordinarily, when you have something like .*c and using this to match abcde, the .* will first match the whole string (abcde) and since the regex cannot match c after the .*, the engine will go back one character at a time to check if there is a match (this is backtracking).
Once it has backtracked to c, you will get the match abc from abcde.
Now, imagine that the engine has to backtrack a few hundred characters, and if you have nested groups and multiple * (or + or the {m,n} form), you can quickly end up with thousands, millions of characters to backtrack, called catastrophic backtracking.
This is where possessive quantifiers come in handy. They actually prevent any form of backtracking. In the above regex I mentioned, abcde will not be matched by .*+c. Once .*+ has consumed the whole string, it cannot backtrack and since there's no c at the end of the string, the match fails.
So, another possible use of possessive quantifiers is that they can improve the performance of some regexes, provided the engine can support it.
For your regex /^[\W].*+$/, I don't think that there's any improvement (maybe a tiny little improvement) that the possessive quantifier provides though. And last, it might easily be rewritten as /^\W.*+$/.

"R".match(%r[^R]) finds a match?

"R".match(%r[^R]) finds a match.
I don't know much about regex but I thought ^ after a [ negates the character class, the characters between the brackets.
What am I missing?
in your case brackets are not part of regex, your case is similar to %r|^R| or %r'^R' or place any character before % and after R
what you want is %r|[^R]| or /[^R]/

Ruby regex, is there a way to only match literal matches?

I'm trying to parse using a case/when statement with regex in it. I'm having some trouble with the match as it will give me a match even if it's not a literal match.
Example:
if I input ($45, x), I get back: "address mode: indirect, x -> value: 45" from this regex:
/[(][$][1-9a-fA-F]{1,2}\s*,\s*[xX]\s*[)]/
Now, if I input ($45, p), I get a match for this regex:
/[$][1-9a-fA-F]{2,4}/
Which is understandable, but I would like the match to look only for literal matches. If there are extra characters that does not exactly match the regex I want the match function to return false.
Is there some other functions like match() or extra arguments that can be given to match() to get this behavior?
From your question, it is a little unclear what you are after. Your second regex is matching on the substring
$45
If you want to avoid this, use the anchors ^ and $ to ensure the entire string is matched. Something like:
^\(\$[1-9A-Za-z]+,\s*[xX]\s*\)$

Resources