I want to do a sequence of gsubs against one string, so I utilized the fact that gsub can take a hash as the second argument. One thing I wanted to do with gsub is to convert a sequence of one or more space/tab into a single space, so I have something essentially as follows:
gsub(/[ \t]+/, {/[ \t]+/ => ' '})
In my actual code, the first argument is a union of the regexp I gave here, and the second argument includes more key-value pairs.
Now, when I apply this to a string, all of the space/tabs are deleted. I suppose this is because the match to the first argument is not regarded as matching to the key [ \t] in the second argument (hash). Does the match in the second argument hash only looks for exact string match, not regexp match? If so, is there any way to get around it?
This is a related question. If you need to use the hash because many things have to be substituted, this might work:
list = Hash.new{|h,k|if /\s+/ =~ k then ' ' else k end}
list['foo'] = 'bar'
list['apple'] = 'banana'
p "appleabc\t \tabc apple foo".gsub(/\w+|\W+/,list)
#=> "appleabc abc banana bar"
p list
#=>{"foo"=>"bar", "apple"=>"banana"} no garbage
According to the docs, gsub with a hash as the second parameter only matches against literal strings:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
If you want to supply multiple hashes you could work around it by creating a hash, where the key/value pairs are the search => replacement pairs, iterate over the hash, and pass those into the gsub. Because Ruby 1.9+ maintains the insertion order of the hash, you're guaranteed that the search will occur in the order you want.
search_hash = {
'1' => 'one',
'too' => 'two',
/[\t ]+/ => ' '
}
str = "1, too,\t3 , four"
search_hash.each { |n,v| str.gsub!(n, v) }
str #=> "one, two, 3 , four"
If you just want the spaces/tabs replaced with one space, why not just specify that as the replacement, and omit the whole hash?
gsub(/[ \t]+/, ' ')
UPDATE: based on your comment, you can use the block syntax of gsub
gsub(/[ \t]+/) {|match| *do stuff here* }
Related
I have a string like "a_b_c" or "a_b_c_d" or "a_b_c_d_e". I want to split the string at the last underscore.
**input**
'a_b_c'
**output**
a_b
c
**input**
'a_b_c_d'
**output**
a_b_c
d
I have done the following:
a='a_b_c'
a=a.split('_')
last=a.pop
a.delete(last)
p a.join("_")
p last
and achieved the result, but I don't think this should be done this way. I hope there is some regular expression to achieve this. Is there anyone who can help me with this?
You can use String#rpartition that searches for a given pattern form the right end of the string and splits when it finds it.
'a_b_c_d_e'.rpartition(/_/)
=> ["a_b_c_d", "_", "e"]
s = 'a_b_c_d_e'
parts = s.rpartition(/_/)
[parts.first, parts.last]
=> ["a_b_c_d", "e"]
EDIT: applying advices from the comments:
'a_b_c_d_e'.rpartition('_').values_at(0,2)
=> ["a_b_c_d", "e"]
Do you really need to split? How about just replacing the _ with a space? e.g. using rindex and []=
a[a.rindex('_')] = ' '
I didn't do a benchmark, but split creates a new array, which typically requires more resources, at least in other languages.
EDIT: as the question was edited, its now clear the OP is asking for a list instead of a string output
You can also get values as below,
> a = a.split('_')
> a[0..-2].join('_')
# => "a_b_c_d"
> a[-1]
# => "e"
'a_b_c_d_e'.split /_(?!.*_)/
#=> ["a_b_c_d", "e"]
The negative lookahead (?!.*_) requires that following the match of the underscore there is no other underscore in the string.
Split it with regex:
a.split(/_(?=[^_]+$)/)
Explanation:
matches the character _ with positive Lookahead (?=[^_]+$)
Match a single character not present in the list below [^_]+ and
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
Assuming you know this string follows this format:
str = 'a_b_c_d_e'
# Remainder
str[0...-2] # -> 'a_b_c_d'
# Last symbol
str[-1] # -> 'e'
I'm doing a 'morse code' exercise and running into some difficulty. I'll skip posting the hash I created that stores the code and the letter.
The morse code in the method call has 3 spaces between the 'words' Example -
decodeMorse('.... . -.-- .--- ..- -.. .')
My strategy was to split the words first using split(/\s\s\s/) which gives me separate arrays for each word, but then those arrays need a split(' ') to get to the letters.
This is my code -
sc = str.split(/\s\s\s/)
sc.each do |string|
string.split(' ').map {|key| morsecode[key]; }
It works okay, but I'm left with two arrays at the end:
=> ["h", "e", "y"]
=> ["j", "u", "d", "e"]
Normally, if I had two or more arrays that had assigned variable names I would know how to concat them but what I've tried and searched on hasn't changed the situation. Obviously all I get from join('') is the two words together with no space between them.
There is no need to convert the string to an array, then convert the elements of the array to arrays, join the latter arrays then join the former array. Instead one can simply use the form of String#gsub that employs a hash to make substitutions.
morsecode = { ".-"=>"a", "-..."=>"b", "-.-."=>"c", "-.."=>"d", "."=>"e", "..-."=>"f",
"--."=>"g", "...."=>"h", ".."=>"i", ".---"=>"j", "-.-"=>"k", ".-.."=>"l",
"--"=>"m", "-."=>"n", "---"=>"o", ".--."=>"p", "--.-"=>"q", ".-."=>"r",
"..."=>"s", "-"=>"t", "..-"=>"u", "...-"=>"v", ".--"=>"w", "-..-"=>"x",
"-.--"=>"y", "--.."=>"z", " "=>" ", " "=>""}
Notice the last two key-value pairs in morsecode.
'.... . -.-- .--- ..- -.. .'.gsub(/[.-]+| | /, morsecode)
#=> "hey jude"
The regular expression reads, "match one or more dits and dahs or three spaces or one space". Note that three spaces must precede the single space in the regex.
sc = str.split(/\s\s\s/)
deciphered = sc.map do |string|
string.split(' ').map {|key| morsecode[key]; }.join
end
deciphered.join(' ')
I meet some hard task for me. I has a string which need to parse into array and some other elements. I have a troubles with REGEXP so wanna ask help.
I need delete from string all non-digits, except commas (,) and dashes (-)
For example:
"!1,2e,3,6..-10" => "1,2,3,6-10"
"ffff5-10...." => "5-10"
"1.2,15" => "12,15"
and so.
[^0-9,-]+
This should do it for you.Replace by empty string.See demo.
https://regex101.com/r/vV1wW6/44
We must have at least one non-regex solution:
def keep_some(str, keepers)
str.delete(str.delete(keepers))
end
keep_some("!1,2e,3,6..-10", "0123456789,-")
#=> "1,2,3,6-10"
keep_some("ffff5-10....", "0123456789,-")
#=> "5-10"
keep_some("1.2,15", "0123456789,-")
#=> "12,15"
"!1,2e,3,6..-10".gsub(/[^\d,-]+/, '') # => "1,2,3,6-10"
Use String#gsub with a pattern that matches everything except what you want to keep, and replace it with the empty string. In a reguar expression, the negated character class [^whatever] matches everything except the characters in the "whatever", so this works:
a_string.gsub /[^0-9,-]/, ''
Note that the hyphen has to come last, as otherwise it will be interpreted as a range indicator.
To demonstrate, I put all your "before" strings into an Array and used Enumerable#map to run the above gsub call on all of them, producing an Array of the "after" strings:
["!1,2e,3,6..-10", "ffff5-10....", "1.2,15"].map { |s| s.gsub /[^0-9,-]/, '' }
# => ["1,2,3,6-10", "5-10", "12,15"]
I have a string composed by words divided by'#'. For instance 'this#is#an#example' and I need to extract the last word or the last two words according to the second to last word.
If the second to last is 'myword' I need the last two words otherwise just the last one.
'this#is#an#example' => 'example'
'this#is#an#example#using#myword#also' => 'myword#also'
Is there a better way than splitting and checking the second to last? perhaps using regular expression?
Thanks.
You can use the end-of-line anchor $ and make the myword# prefix optional:
str = 'this#is#an#example'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "example"
str = 'this#is#an#example#using#myword#also'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "myword#also"
However, I don't think using a regular expression is "better" in this case. I would use something like Santosh's (deleted) answer: split the line by # and use an if clause.
def foo(str)
*, a, b = str.split('#')
if a == 'myword'
"#{a}##{b}"
else
b
end
end
str = 'this#is#an#example#using#myword#also'
array = str.split('#')
array[-2] == 'myword' ? array[-2..-1].join('#') : array[-1]
With regex:
'this#is#an#example'[/(myword\#)*\w+$/]
# => "example"
'this#is#an#example#using#myword#also'[/(myword\#)*\w+$/]
# => "myword#also"
Let's say I have the following array:
arr = ["", "2121", "8", "myString"]
I want to return false in case the array contains any non-digit symbols.
arr.all? { |s| s =~ /^\d+$/ }
This will check for each element if it consists only of digits (\d) – If any of them does not, false will be returned.
Edit: You didn't completely specify if the empty string is valid or not. If it is, the line has to be rewritten as follows (as per DarkDust):
arr.all? {|s| s =~ /^\d*$/ }
If empty strings are allowed:
def contains_non_digit(array)
!array.select {|s| s =~ /^.*[^0-9].*$/}.empty?
end
Explanation: this filters the array for all strings that match a regular expression. This regex is true for a string that contains at least one non-digit character. If the resulting array is empty, the array contains no non-digit strings. Finally, we need to negate the result, because we want to know the array does contain non-digit strings.