I'm looking to convert Less mixin calls to their equivalents in Scss:
.mixin(); should become #mixin();
.mixin(0); should become #mixin(0);
.mixin(0; 1; 2); should become #mixin(0, 1, 2);
I'm having the most difficulty with the third example, as I essentially need to match n groups separated by semicolons, and replace those with the same groups separated by commas. I suppose this relies on some sort of repeating groups functionality in regexes that I'm not familiar with.
It's not simply enough to simply replace semicolons within paren - I need a regex that will only match the \.[\w\-]+\(.*\) format of mixins, but obviously with some magic in the second match group to handle the 3rd example above.
I'm doing this in Ruby, so if you're able to provide replacement syntax that's compatible with gsub, that would be awesome. I would like a single regex replacement, something that doesn't require multiple passes to clean up the semicolons.
I suggest adding two capturing groups round the subvalues you need and using an additional gsub in the first gsub block to replace the ; with , only in the 2nd group.
See
s = ".mixin(0; 1; 2);"
puts s.gsub(/\.([\w\-]+)(\(.*\))/) { "##{$1}#{$2.gsub(/;/, ',')}" }
# => #mixin(0, 1, 2);
The pattern details:
\. - a literal dot
([\w\-]+) - Group 1 capturing 1 or more word chars ([a-zA-Z0-9_]) or -
(\(.*\)) - Group 2 capturing a (, then any 0+ chars other than linebreak symbols as many as possible up to the last ) and the last ). NOTE: if there are multiple values, use lazy matching - (\(.*?\)) - here.
Here you go:
less_style = ".mixin(0; 1; 2);"
# convert the first period to #
less_style.gsub! /^\./, '#'
# convert the inner semicolons to commas
scss_style = less_style.gsub /(?<=[\(\d]);/, ','
scss_style
# => "#mixin(0, 1, 2);"
The second regex is using positive lookbehinds. You can read about those here: http://www.regular-expressions.info/lookaround.html
I also use this neat web app to play around with regexes: http://rubular.com/
This will get you a single pass through gsub:
".mixin(0; 1; 2);".gsub(/(?<!\));|\./, ";" => ",", "." => "#")
=> "#mixin(0, 1, 2);"
It's an OR regex with a hash for the replacement parameters.
Assuming from your example that you just want to replace semicolons not following close parens(negative lookbehind): (?<!\));
You can modify/build on this with other expressions. Even add more OR conditions to the regex.
Also, you can use the block version of gsub if you need more options.
Related
I have this code running inside a buffer (used to unescape a JS string in Ruby):
elsif hex_substring =~ /^\\u[0-9a-fA-F]{1,4}/
hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/) do |match|
hex_byte = match[0]
buffer << JSON.load(%Q("#{hex_byte}"))
hex_index += hex_byte.length
end
...
I have a concern that the scan() is matching a bit too much:
hex_substring.scan(/^((\\u[\da-fA-F]{4}){1,})/)
# => [["\\ud83c\\udfec", "\\udfec"]]
I am using only "\\ud83c\\udfec", not "\\udfec".
Is there a way in Ruby or in regex to grab only the first part?
You should use a single grouping construct here, the one to match 1 or more occurrences of four hex chars, and omit the inner capturing group that resulted in an extra item in the resulting array:
.scan(/^(?:\\u[\da-fA-F]{4})+/)
Note that + is a simpler and shorter way to write {1,} (one or more occurrences).
Details
^ - start of string
(?: - start of a non-capturing group (what it matches won't be added to the final scan result):
\\u - a \u substring
[\da-fA-F]{4} - four hex chars
)+ - 1 or more occurrences (of the group pattern sequence).
I've following regexes defined for capturing the gem names in a Gemfile.
GEM_NAME = /[a-zA-Z0-9\-_\.]+/
QUOTED_GEM_NAME = /(?:(?<gq>["'])(?<name>#{GEM_NAME})\k<gq>|%q<(?<name>#{GEM_NAME})>)/
I want to convert these into a regex that can be used in python and other languages.
I tried (?:(["'])([a-zA-Z0-9\-_\.]+)\k["']|%q<([a-zA-Z0-9\-_\.]+)>) based on substitution and several similar combinations but none of them worked. Here's the regexr link http://regexr.com/3g527
Can someone please explain what should be correct process for converting these ruby regular expression defintions into a form that can be used by python.
To define a named group, you need to use (?P<name>) and then (?p=name) named
If you can afford a 3rd party library, you may use PyPi regex module and use the approach you had in Ruby (as regex supports multiple identically named capturing groups):
s = """%q<Some-name1> "some-name2" 'some-name3'"""
GEM_NAME = r'[a-zA-Z0-9_.-]+'
QUOTED_GEM_NAME = r'(?:(?P<gq>["\'])(?<name>{0})(?P=gq)|%q<(?P<name>{0})>)'.format(GEM_NAME)
print(QUOTED_GEM_NAME)
# => # (?:(?P<gq>["\'])(?<name>[a-zA-Z0-9_.-]+)(?P=gq)|%q<(?P<name>[a-zA-Z0-9_.-]+)>)
import regex
res = [x.group("name") for x in regex.finditer(QUOTED_GEM_NAME, s)]
print(res)
# => ['Some-name1', 'some-name2', 'some-name3']
backreference in the replacement pattern.
See this Python demo.
If you decide to go with Python re, it can't handle identically named groups in one regex pattern.
You can discard the named groups altogether and use numbered ones, and use re.finditer to iterate over all the matches with comprehension to grab the right capture.
Example Python code:
import re
GEM_NAME = r'[a-zA-Z0-9_.-]+'
QUOTED_GEM_NAME = r"([\"'])({0})\1|%q<({0})>".format(GEM_NAME)
s = """%q<Some-name1> "some-name2" 'some-name3'"""
matches = [x.group(2) if x.group(1) else x.group(3) for x in re.finditer(QUOTED_GEM_NAME, s)]
print(matches)
# => ['Some-name1', 'some-name2', 'some-name3']
So, ([\"'])({0})\1|%q<({0})> has got 3 capturing groups: if Group 1 matches, the first alternative got matched, thus, Group 2 is taken, else, the second alternative matched, and Group 3 value is grabbed in the comprehension.
Pattern details
([\"']) - Group 1: a " or '
({0}) - Group 2: GEM_NAME pattern
\1 - inline backreference to the Group 1 captured value (note that r'...' raw string literal allows using a single backslash to define a backreference in the string literal)
| - or
%q< - a literal substring
({0}) - Group 3: GEM_NAME pattern
> - a literal >.
You can rewrite your pattern like this:
GEM_NAME = r'[a-zA-Z0-9_.-]+'
QUOTED_GEM_NAME = r'''["'%] # first possible character
(?:(?<=%)q<)? # if preceded by a % match "q<"
(?P<name> # the three possibilities excluding the delimiters
(?<=") {0} (?=") |
(?<=') {0} (?=') |
(?<=<) {0} (?=>)
)
["'>] #'"# closing delimiter
(?x) # switch the verbose mode on for all the pattern
'''.format(GEM_NAME)
demo
Advantages:
the pattern doesn't start with an alternation that makes the search slow. (the alternation here is only tested at interesting positions after a quote or a %, when your version tests each branch of the alternation for each position in the string). This optimisation technique is called "the first character discrimination" and consists to quickly discard useless positions in a string.
you need only one capture group occurrence (quotes and angle brackets are excluded from it and only tested with lookarounds). This way you can use re.findall to get a list of gems without further manipulation.
the gq group wasn't useful and was removed (shorten a pattern at the cost of creating a useless capture group isn't a good idea)
Note that you don't need to escape the dot inside a character class.
A simple way is to use a conditional and consolidate the name.
(?:(?:(["'])|%q<)(?P<name>[a-zA-Z0-9\-_\.]+)(?(1)\1|>))
Expanded
(?:
(?: # Delimiters
( ["'] ) # (1), ' or "
| # or,
%q< # %q
)
(?P<name> [a-zA-Z0-9\-_\.]+ ) # (2), Name
(?(1) \1 | > ) # Did group 1 match ? match it here, else >
)
Python
import re
s = ' "asdf" %q<asdfasdf> '
print ( re.findall( r'(?:(?:(["\'])|%q<)(?P<name>[a-zA-Z0-9\-_\.]+)(?(1)\1|>))', s ) )
Output
[('"', 'asdf'), ('', 'asdfasdf')]
I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"
I don't think I'll even try to explain this, I don't know the words to, but I'd like to achieve the following:
Given a string like this:
+++>><<<--
I'd like a match to give me: +++, but also match if any of the other characters were in the string consecutively like they are. So if the +++ wasn't there, I'd like to match >>.
I tried using the following regular expression:
([><\-\+]+)
However, given the string above, it would match the entire string, and not the first list of consecutive characters.
If it makes a difference, this is in Ruby (1.9.3).
Not sure about the ruby bit, but you can do this with backreferences in the pattern:
(.)\1+
What this does is to use a capturing group () to capture any character . followed by any number + of the same character \1. The \1 is a backreference to the the first captured group; in a pattern with more capturing groups \2 would be the second captured group and so on.
Java Example
Pattern p = Pattern.compile("(.)\\1+");
Matcher m = p.matcher("aaabbccaa");
m.find();
System.out.println(m.group(0)); // prints "aaa"
Ruby Example
# Return an array of matched patterns.
string = '+++>><<<--'
string.scan( /((.)\2+)/ ).collect { |match| match.first }
Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]