latex formatting negative number with brackets/parenthesis - format

In expressions, I use Latex macro such as \a to have '23' or '-23'.
The problem I have is that \a-\a produces '23--23' instead of '23-(-23)' as it should, when \a is negative.
I tried the numprint package that allow to change the minus sign in red but couldn't use the trick to convert \np\a into '(-23)' when \a is negative.

Related

Regex match 1024x768 but not 1024x768x16

I'm trying to match 1024x768 but not 1024x768x16. Here is the pattern. This is the code I'm using:
if #config[:resolution] !~ /[0-9]{1,4}x[0-9]{1,4}/
raise "Invalid resolution format: #{#config[:resolution]}"
end
I know I'm missing something around greediness but can't find a solution
If this is the entire string, you can add ^ $ anchors to point to the start and end:
/^[0-9]{1,4}x[0-9]{1,4}$/
If that is not the case, you can use a negative lookarounds:
/(?<![0-9]x)[0-9]{1,4}x[0-9]{1,4}(?!x[0-9])/
Also, use \d instead of [0-9].
If you succeed in matching 1024x768 with a regex, then put \A in the front and \z in the end of that regex.
You might be able to use word boundaries, if this string is part of a larger string
/\b[0-9]{1,4}x[0-9]{1,4}\b/
Some commenters seem to think that the word boundary token will not prevent the match.
A word boundary is defined as being between a word and a non-word character. In Ruby, word characters include Unicode letters, Unicode digits and the underscore, so this should work.
See the example at rubular.com

Syntax Highlighting in Notepad++: how to highlight timestamps in log files

I am using Notepad++ to check logs. I want to define custom syntax highlighting for timestamps and log levels. Highlighting logs levels works fine (defined as keywords). However, I am still struggling with highlighting timestamps of the form
06 Mar 2014 08:40:30,193
Any idea how to do that?
If you just want simple highlighting, you can use Notepad++'s regex search mode. Open the Find dialog, switch to the Mark tab, and make sure Regular Expression is set as the search mode. Assuming the timestamp is at the start of the line, this Regex should work for you:
^\d{2}\s[A-Za-z]+\s\d{4}\s\d{2}:\d{2}:\d{2},[\d]+
Breaking it down bit by bit:
^ means the following Regex should be anchored to the start of the line. If your timestamp appears anywhere but the start of a line, delete this.
\d means match any digit (0-9). {n} is a qualifier that means to match the preceding bit of Regex exactly n times, so \d{2} means match exactly two digits.
\s means match any whitespace character.
[A-Za-z] means match any character in the set A-Z or the set a-z, and the + is a qualifier that means match the preceding bit of Regex 1 or more times. So we're looking for an alphabetic character sequence containing one or more alphabetic characters.
\s means match any whitespace character.
\d{4} is just like \d{2} earlier, only now we're matching exactly 4 digits.
\s means match any whitespace character.
\d{2} means match exactly two digits.
: matches a colon.
\d{2} matches exactly two digits.
: matches another colon.
\d{2} matches another two digits.
, matches a comma.
[\d]+ works similarly to the alphabetic search sequence we set up earlier, only this one's for digits. This finds one or more digits.
When you run this Regex on your document, the Mark feature will highlight anything that matches it. Unlike the temporary highlighting the "Find All in Document" search type can give you, Mark highlighting lasts even after you click somewhere else in the document.

Issue with a Look-behind Regular expression (Ruby)

I wrote this regex to match all href and src links in an HTML page; (I know I should be using a parser; this just experimenting):
/((href|src)\=\").*?\"/ # Without look-behind
It works fine, but when I try to modify the first portion of the expression as a look-behind pattern:
/(?<=(href|src)\=\").*?\"/ # With look-behind
It throws an error stating 'invalid look-behind pattern'. Any ideas, whats going wrong with the look-behind?
Lookbehind has restrictions:
(?<=subexp) look-behind
(?<!subexp) negative look-behind
Subexp of look-behind must be fixed character length.
But different character length is allowed in top level
alternatives only.
ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
In negative-look-behind, captured group isn't allowed,
but shy group(?:) is allowed.
You cannot put alternatives in a non-top level within a (negative) lookbehind.
Put them at the top level. You also don't need to escape some characters that you did.
/(?<=href="|src=").*?"/

How to check how many variables (masks) declared in Regexp (ruby)?

Let's say I have a regexp with some arbitrary amount of capturing groups:
pattern = /(some)| ..a lot of masks combined.... |(other)/
Is there any way to determine a number of that groups?
If you can always find a string that matches the regex you are given, then it suffices to match it against the regex, and look at the match data length. However, determining whether a regexp has a string that it matches is np-hard[1]. This is only feasible if you know in advance what kind of regexes you'll be getting.
The next best best method in the Regexp class is Regexp#source or Regexp#to_s. However, we need to parse the regex if we do this.
I can't speak for the future, but as of Ruby 2.0, there is no better method in the Regexp core class.
A left parenthesis denotes a literal left parenthesis, if preceded by an unescaped backslash. A backslash is unescaped unless an unescaped backslash precedes. So, a character is escaped iff preceded by an odd number of backslashes.
An unescaped left parenthesis denotes a capturing group iff not followed by a question mark. With a question mark, it can mean various things: (?'name') and (?<name>) denote a named capturing group. Named and unnamed capturing groups cannot coexist in the same regex, however[2]. (?:) denote non-capturing groups. This is a special case of (?flags-flags:). (?>) denote atomic groups. (?=), (?!), (?<=) and (?<!) denote lookaround. (?#) denote comments.
Ruby regexp engine supports comments in regexes. Considering them in the main regex would be very difficult. We can try to strip them if we really want to support these, but supporting them fully will get messy due to the possibility of inline flags turning extended mode (and thus line comments) on and off in ways that a regular expression cannot capture. I will go ahead and not support unescaped parentheses in regex comments[3].
We want to count:
the number of left parentheses \(
that are not escaped by a backslash (?<!(?<!\\)(?:\\\\)*\\) (read: not preceded by an odd number of backslashes that are not preceded by yet another backslash) and
that are not followed by a question mark (?!\?)
Ruby doesn't support unbounded lookbehind, but if we reverse the source first, we can rewrite the first assertion slightly: (?!(?:\\\\)*(?!\\)). The second assertion becomes a lookbehind: (?<!\?).
the whole solution
def count_groups(regexp)
# named capture support:
# named_count = regexp.named_captures.count
# return named_count if named_count > 0
# main:
test = /(?!<\?)\((?!(?:\\\\)*(?!\\))/
regexp.source.scan(test).count
end
[1]: we can show the NP-hardness by converting the satisfiability problem to it:
AND: xy (x must be an assertion)
OR: x|y
NOT: (?!x)
atoms: (?=1), (?=.1), (?=..1), ..., (?!1), (?!.1)...
example(XOR): /^(?:(?=1)(?!.1)|(?!1)(?=.1))..$/
this extends to NP-completeness for any class of regexes that can be tested in polynomial time. This includes any regex with no nested repetition (or repeated backreferences to repetition or recursion) and with bounded nesting depth of optional matches.
[2]: /((?<name>..)..)../.match('abcdef').to_a returns ['abcdef', 'ab'], indicating that unnamed capturing groups are ignored when named capturing groups are present. Tested in Ruby 1.9.3
[3]: Inline comments start with (?# and end with ). They cannot contain an unescaped right parenthesis, but they can contain an unescaped left parenthesis. These can be stripped easily (even though we have to sprinkle the "unescaped" regex everywhere), are the lesser evil, but they're also less likely to contain anunescaped left parenthesis.
Line comments start with # and end with a newline. These are only treated as comments in the extended mode. Outside the extended mode, they match the literal # and newline. This is still easy, even if we have to consider escaping again. Determining if the regex has the extended flag set is not too difficult, but the flag modifier groups are a different beast entirely.
Even with Ruby's awesome recursive regexes, merely determining if a previously-open group modifying the extended mode is already closed would yield a very nasty regex (even if you replace one by one and don't have to skip comments, you have to account for escaping). It wouldn't be pretty (even with interpolation) and it wouldn't be fast.

Why won't my regex lookback work on a URL using Ruby 1.9?

I would like to have this regex:
.match(/wtflungcancer.com\/\S*(?<!js)/i)
NOT match the following string based on the fact that 'js' is present. However, the following matches the entire URL:
"http://www.wtflungcancer.com/wp-content/plugins/contact-form-7/includes/js/jquery.form.min.js?ver=3.32.0-2013.04.03".match(/wtflungcancer.com\/\S*(?<!js)/i)
This happens because \S* eats all the characters, so the lookbehind is never activated.
Something like this should work:
/wtflungcancer.com(?!\S*\.js)/i
Basically
do not let the * consume all characters
instead of using a lookbehind, use a lookahead
search for strings containing wtflungcancer.com NOT followed by a string containing ".js"
-- EDIT: more explanation added --
What is the difference between
"wtflungcancer.com\S*(?<!\.js)"
and
"wtflungcancer.com(?!\S*\.js)"
They look really similar!
Lookarounds (lookahead and lookbehind) in regular expressions tell the regexp engine when a match is correct or not: they do not consume characters of the string.
Especially lookbehinds tell the regexp engine to look backwards, in your case the lookbehind wasn't anchored on the right side, so the "\S*" just consumed all the non whitespace characters in the string.
For example, this regexp can work for finding url NOT ending with ".js":
wtflungcancer.com\S+(?<!\.js)$
See? The right side of the lookbehind is anchored using the end of string metacharacter.
In our case, though we couldn't hook anything to the right side, so I switched from lookbehind to lookahead
So, the real regular expression just matches "wtflungcancer.com": at that point, the lookahead tells the regexp engine: "In order for this match to be correct, this string must not be followed by a sequence of non-whitespace characters followed by '.js'". This works because lookaheads do not consume actual characters, they just move on character by character to see if the match is good or not.
You can try with this pattern:
wtflungcancer.com\/(?>[^\s.]++|\.++(?!js))*(?!\.)
Explanations:
The goal is to allow all characters that are not a space or a dot followed by js:
(?> # open an atomic group
[^\s.]++ # all characters but white characters and .
| # OR
\.++(?!js) # . not followed by js
)* # close the atomic group, repeat zero or more times
To be sure that your pattern check all the url string, i add a lookahead that check if a dot don't follow.

Resources