Ruby - regex for formatting string representing time left - ruby

I have various strings that represent time left coming in from a data feed. The formats look like this:
13:35
01:36
00:34
I want to use regex to change the formats to:
13:35 --> 13:35 (ok as-is)
01:36 --> 1:36 (removing leading 0)
00:34 --> 0:34 (remove first leading 0)
Currently, I'm doing this:
time_left.gsub(/\A0+/, '')
Accomplishes first two target formats, but not the third, which results in:
:34 (should be 0:34)

Your regex /\A0+/ removes all leading 0s, but it sounds like you just want to remove the first one. You just want /\A0/.

Regex are not the best choice for this. I'd go after this like, ... uh, this:
puts %w[
13:35
01:36
00:34
].map { |s|
"%0d:%0d" % s.split(':').map(&:to_i)
}
Which outputs:
13:35
1:36
0:34

Just take out the + that means 1 or more. So it will take two zeros if it has two zeros.
time_left.gsub(/\A0/, '')

If each of the times are in a separate string and you are applying the regex individually to them, then, you should not use gsub:
time_left.sub(/\A0/, "")
If is rather the case that all the times are in a single string, then you cannot use \A.
time_left.gsub(/(?!<\d)0/, "")
The second one will also remove zeros after the colon.

How about using a negative look-ahead:
/\A0+(?!:)/
( Tested in Perl )

Related

Regular Expression replacement to convert Less mixins to Scss

I'm looking to convert Less mixin calls to their equivalents in Scss:
.mixin(); should become #mixin();
.mixin(0); should become #mixin(0);
.mixin(0; 1; 2); should become #mixin(0, 1, 2);
I'm having the most difficulty with the third example, as I essentially need to match n groups separated by semicolons, and replace those with the same groups separated by commas. I suppose this relies on some sort of repeating groups functionality in regexes that I'm not familiar with.
It's not simply enough to simply replace semicolons within paren - I need a regex that will only match the \.[\w\-]+\(.*\) format of mixins, but obviously with some magic in the second match group to handle the 3rd example above.
I'm doing this in Ruby, so if you're able to provide replacement syntax that's compatible with gsub, that would be awesome. I would like a single regex replacement, something that doesn't require multiple passes to clean up the semicolons.
I suggest adding two capturing groups round the subvalues you need and using an additional gsub in the first gsub block to replace the ; with , only in the 2nd group.
See
s = ".mixin(0; 1; 2);"
puts s.gsub(/\.([\w\-]+)(\(.*\))/) { "##{$1}#{$2.gsub(/;/, ',')}" }
# => #mixin(0, 1, 2);
The pattern details:
\. - a literal dot
([\w\-]+) - Group 1 capturing 1 or more word chars ([a-zA-Z0-9_]) or -
(\(.*\)) - Group 2 capturing a (, then any 0+ chars other than linebreak symbols as many as possible up to the last ) and the last ). NOTE: if there are multiple values, use lazy matching - (\(.*?\)) - here.
Here you go:
less_style = ".mixin(0; 1; 2);"
# convert the first period to #
less_style.gsub! /^\./, '#'
# convert the inner semicolons to commas
scss_style = less_style.gsub /(?<=[\(\d]);/, ','
scss_style
# => "#mixin(0, 1, 2);"
The second regex is using positive lookbehinds. You can read about those here: http://www.regular-expressions.info/lookaround.html
I also use this neat web app to play around with regexes: http://rubular.com/
This will get you a single pass through gsub:
".mixin(0; 1; 2);".gsub(/(?<!\));|\./, ";" => ",", "." => "#")
=> "#mixin(0, 1, 2);"
It's an OR regex with a hash for the replacement parameters.
Assuming from your example that you just want to replace semicolons not following close parens(negative lookbehind): (?<!\));
You can modify/build on this with other expressions. Even add more OR conditions to the regex.
Also, you can use the block version of gsub if you need more options.

Removing all whitespace from a string in Ruby

How can I remove all newlines and spaces from a string in Ruby?
For example, if we have a string:
"123\n12312313\n\n123 1231 1231 1"
It should become this:
"12312312313123123112311"
That is, all whitespaces should be removed.
You can use something like:
var_name.gsub!(/\s+/, '')
Or, if you want to return the changed string, instead of modifying the variable,
var_name.gsub(/\s+/, '')
This will also let you chain it with other methods (i.e. something_else = var_name.gsub(...).to_i to strip the whitespace then convert it to an integer). gsub! will edit it in place, so you'd have to write var_name.gsub!(...); something_else = var_name.to_i. Strictly speaking, as long as there is at least one change made,gsub! will return the new version (i.e. the same thing gsub would return), but on the chance that you're getting a string with no whitespace, it'll return nil and things will break. Because of that, I'd prefer gsub if you're chaining methods.
gsub works by replacing any matches of the first argument with the contents second argument. In this case, it matches any sequence of consecutive whitespace characters (or just a single one) with the regex /\s+/, then replaces those with an empty string. There's also a block form if you want to do some processing on the matched part, rather than just replacing directly; see String#gsub for more information about that.
The Ruby docs for the class Regexp are a good starting point to learn more about regular expressions -- I've found that they're useful in a wide variety of situations where a couple of milliseconds here or there don't count and you don't need to match things that can be nested arbitrarily deeply.
As Gene suggested in his comment, you could also use tr:
var_name.tr(" \t\r\n", '')
It works in a similar way, but instead of replacing a regex, it replaces every instance of the nth character of the first argument in the string it's called on with the nth character of the second parameter, or if there isn't, with nothing. See String#tr for more information.
You could also use String#delete:
str = "123\n12312313\n\n123 1231 1231 1"
str.delete "\s\n"
#=> "12312312313123123112311"
You could use String#delete! to modify str in place, but note delete! returns nil if no change is made
Alternatively you could scan the string for digits /\d+/ and join the result:
string = "123\n\n12312313\n\n123 1231 1231 1\n"
string.scan(/\d+/).join
#=> "12312312313123123112311"
Please note that this would also remove alphabetical characters, dashes, symbols, basically everything that is not a digit.

Regex for series of four digits each up to 100

I'm trying to write a regex to validate a string and accepts only a series of four comma-separated digits, each up to 100. Something like this would be valid:
20,30,40,50
and these invalid:
120,0,20,0
20,30,40,ss
invalid_string
Any thoughts?
They're used for CMYK colours. We just need to store them here, not use them.
Number Range and Subroutine
In Ruby 2+, for a compact regex, use this:
^([0-9]|[1-9][0-9]|100)(?:,\g<1>){3}$
Explanation
The ^ anchor asserts that we are at the beginning of the string
The parentheses around ([0-9]|[1-9][0-9]|100) match a number from 0 to 100 and define subroutine #1
(?:,\g<1>) matches one comma and the expression defined by subroutine # 1
The {3} quantifier repeats that three times
The $ anchor asserts that we are at the end of the string
I'd save myself the headache of using regex for a number related problem. Also the validation message will look akward so it's better to make your own:
validate :that_string_has_only_4_numbers_upto_100
def that_string_has_only_4_numbers_upto_100
errors.add(:str, 'is not valid.') unless str.split(/,/).all? { |n| 1..100 === n.to_i }
end
Unless you a re regex jedi guru like #zx81 :p.
^(?:\d{1,2},){3}\d{1,2}$
Try this

Separate word Regex Ruby

I have a bunch of input files in a loop and I am extracting tag from them. However, I want to separate some of the words. The incoming strings are in the form cs### where ### => is any number from 0-9. I want the result to be cs ###. The closest answer I found was this, Regex to separate Numeric from Alpha . But I cannot get this to work, as the string is being predefined (Static) and mine changes.
Found answer:
Nevermind, I found the answer the following sperates alpha-numeric characters and removes any unwanted non-alphanumeric characters so anything like ab5#6$% =>ab 56
gsub(/(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i, ' ').gsub(/[^0-9a-z ]/i, ' ')
If your string is something like
str = "cs3232
cs23
cs423"
Then you can do something like
str.scan(/((cs)(\d{1,10}))/m).collect{|e| e.shift; e }
# [["cs", "3232"], ["cs", "23"], ["cs", "423"]]

What's wrong with this RegEx?

I'm trying to implement this in a small ruby script, and tested it on http://www.rubular.com/, where it worked perfectly. Not sure why its not performing in the actual script.
The RegEx: /(motion|links|sound|button|symbol)|(0.\d{8})|(\s\d{1}\s)|(\d{10}\s)/
The Text it's Against:
Trial ID: 1 | Trial Type: motion | Trick? 1
Click Time: 0.87913100 1302969732
Trial ID: 7 | Trial Type: button | Trick? 0
Click Time: 0.19817800 1302987043
etc. etc.
What I am trying to grab: Only the numbers, and the single word after "Trial Type". So for the first line of the example, I would only want " 1 motion 1 0.87913100 1302969732" to be returned. I also want to keep the space before the first number in each trial.
My short ruby script:
File.open('log.txt', 'r') do |file|
contents = file.readlines.to_s
regex = Regexp.new(/(motion|links|sound|button|symbol)|(0\.\d{8})|(\s\d{1}\s)|(\d{10}\s)/)
matchdata = regex.match(contents).to_a
matchdata.each do |match|
if match != nil
puts match
end
end
end
It only outputs two "1"s though. Hmm... I know its reading the file contents right, and when I tried an alternate simplet regex it worked fine.
Thanks for any help I get here!! : )
You want to use String#scan
matchdata = contents.scan(regex)
Also #Mike Penington is correct, you shouldn't have to do the if match != nil if you do it right. You have to clean up your regex as well. The pipe character in regex is a special character to denote match the left side OR the right side, and you have the litteral pipe character that you must escape.
You need to escape the literal pipes inside the regex, fill in other missing literals (like Trick, \?, Click\sTime:, remove some of the spaces, etc...), and insert regex spaces where appropriate... i.e.
regex = Regexp.new(/(motion|links|sound|button|symbol)\s\|\sTrick\?\s*\d\s*Click\s+Time:\s+(0\.\d{,8})\s(\d{10}))/)
EDIT: fixed parenthesis nesting in the original
If you know that the data follows a particular pattern, you can just follow that pattern in the regex, and pick up the portions you want with ( ).
/Trial ID: (\d+) \| Trial Type: (\w+) \| Trick\? (\d+) Click Time: ([\.\d]+) ([\.\d]+)/
The more you know previously about the data, the more specifically you can make the regex.
If you see some variations in the data, and the regex fails to match, then just relax the pattern:
If the Trail ID, Trail ID may include a decimal point, use [\.\d]+ instead of \d+.
If the space can be more than one, then replace it with []+
If the space can be a tab, or can be absent, use \s* or [ \t]*.
If the Trial ID: part may appear as a different phrase, replace it with .*?,
and so on.
If you are not sure how many spaces/tabs appear, use this:
/Trial\s*ID:\s*(\d+)\s*\|\s*Trial\s*Type:\s*(\w+)\s*\|\s*Trick\?\s*(\d+)\s*Click\s*Time:\s*([\.\d]+)\s+([\.\d]+)/
This is one of those times that trying to everything in a big regex makes you work too hard. Simplify things:
ary = [
'Trial ID: 1 | Trial Type: motion | Trick? 1 Click Time: 0.87913100 1302969732',
'Trial ID: 7 | Trial Type: button | Trick? 0 Click Time: 0.19817800 1302987043'
]
ary.each do |li|
numbers = li.scan(/[\d.]+/)
trial_type = li[/Trial Type: (\w+)/, 1]
puts "%d %s %d %f %d\n" % [numbers.first, trial_type, *numbers[1 .. -1]]
end
# >> 1 motion 1 0.879131 1302969732
# >> 7 button 0 0.198178 1302987043
Regex patterns are powerful, but people think it's macho to do everything in one big line. You have to weigh doing that with the increased work necessary to put together the regex in the first place, plus maintain it if something changes in the text being parsed later.

Resources