What's different about this ruby regex? - ruby

I was trying to substitute either a comma or a percent sign, and it continually failed, so I opened up IRB and tried some things out. Can anyone explain to me why the first regex (IRB line 13) doesn't work but the flipped version does (IRB line 15)? I've looked it up and down and I don't see any typos, so it must be something to do with the rule but I can't see what.
b.gsub(/[%]*|[,]*/,"")
# => "245,324"
b.gsub(/[,]*/,"")
# => "245324"
b.gsub(/[,]*|[%]*/,"")
# => "245324"
b
# => "245,324"

Because ruby happily finds [%]* zero times throughout your string and does the substitution. Check out this result:
b = '232,000'
puts b.gsub(/[%]*/,"-")
--output:--
-2-3-2-,-0-0-0-
If you put all the characters that you want to erase into the same character class, then you will get the result you want:
b = "%245,324,000%"
puts b.gsub(/[%,]*/, '')
--output:--
245324000
Even then, there are a lot of needless substitutions going on:
b = "%245,324,000%"
puts b.gsub(/[%,]*/, '-')
--output:--
--2-4-5--3-2-4--0-0-0--
It's the zero or more that gets you into trouble because ruby can find lots of places where there are 0 percent signs or 0 commas. You actually don't want to do substitutions where ruby finds zero of your characters, instead you want to do substitutions where at least one of your characters occurs:
b = '%232,000,000%'
puts b.gsub(/%+|,+/,"")
--output:--
232000000
Or, equivalently:
puts b.gsub(/[%,]+/, '')
Also, note that regexes are like double quoted strings, so you can interpolate into them--it's as if the delimiters // are double quotes:
one_or_more_percents = '%+'
one_or_more_commas = ',+'
b = '%232,000,000%'
puts b.gsub(/#{one_or_more_percents}|#{one_or_more_commas}/,"")
--output:--
232000000
But when your regexes consist of single characters, just use a character class: [%,]+

Related

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Use regular expression to fetch 3 groups from string

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

How to insert a newline character to an array of characters

I want to insert a newline character into an array of characters which initially is a string. Let's say I have a variable myvar = "Blizzard". A string is formed from an array of characters. How can I insert a newline character inside it? In hope of making an output like this:
"B
lizzard"
I tried this:
myvar[1] = "\n"
but it's not working, and the output is like this:
"B\nlizzard"
My goal is to make the output like this:
B
l
i
z
z
a
r
d
without using puts. I have to do it by inserting newline characters into the array. Can someone point out where my mistake is, and if possible help me with this?
To add \n you can use this:
myvar = "Blizzard"
myvar.chars.map { |c| c + "\n" }.join.strip
Or better #Uri solution:
myvar.chars.join "\n"
But you can puts letters one on the line with next code:
myvar.chars.each { |c| puts c }
or:
myvar.each_char { |c| puts c } # for ruby >= 2.0
by Darek Nędza
'Blizzard'.chars.join("\n")
# => "B\nl\ni\nz\nz\na\nr\nd"
If all you want is to print the characters each in a new row you can do the following:
puts 'Blizzard'.chars
Output:
B
l
i
z
z
a
r
d
You have done myvar[1] = "\n" correctly. Your problem is not how you did it, but what you are expecting.
You seem to be confusing the inspection of a string and the puts output of the string. Inspection is what is displayed as the return value as in irb, and it is a meta-representation of what you have. And as long as it is a string, it will be delimited by double quotes, and all the special characters will be escaped with a backslash \. If you have a new line character, that would be represented as "\n". On the other hand, when you pass the string to puts, you will get the output according to what the special characters represent.
What you displayed as what you want (the one in multiple lines) should be the result of puts. You will never get such thing as inspection of the string.

How to properly join LF character in an array?

Basic question.
Instead of adding '\n' between the elements:
>> puts "#{[1, 2, 3].join('\n')}"
1\n2\n3
I need to actually add the line feed character, so the output I expect when printing it would be:
1
2
3
What's the best way to do that in Ruby?
You need to use double quotes.
puts "#{[1, 2, 3].join("\n")}"
Note that you don't have to escape the double quotes because they're within the {} of a substitution, and thus will not be treated as delimiters for the outer string.
However, you also don't even need the #{} wrapper if that's all your doing - the following will work fine:
puts [1,2,3].join("\n")
Escaped characters can only be used in double quoted strings:
puts "#{[1, 2, 3].join("\n")}"
But since all you output is this one statement, I wouldn't quote the join statement:
puts [1, 2, 3].join("\n")
Note that join only adds line feeds between the lines. It will not add a line-feed to the end of the last line. If you need every line to end with a line-feed, then:
#!/usr/bin/ruby1.8
lines = %w(one two three)
s = lines.collect do |line|
line + "\n"
end.join
p s # => "one\ntwo\nthree\n"
print s # => one
# => two
# => three
Ruby doesn't interpret escape sequences in single-quoted strings.
You want to use double-quotes:
puts "#{[1, 2, 3].join(\"\n\")}"
NB: My syntax might be bad - I'm not a Ruby programmer.

Resources