regular expression to treat string as mutiple line in ruby - ruby

I am new to ruby. I am struck at a point where data needs to match a pattern. I was wondering if there is a regular expression which makes ruby to treat string as multiple lines.

I think you are looking for the m option. m will allow . to match a new line.
a = "this is my
string"
=> "this is my\nstring"
a
=> "this is my\nstring"
a.match /my.string/m
=> #<MatchData "my\nstring">
a.match /not my.string/m
=> nil

Related

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ā€˜dā€™, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

Check the string with hash key

I am using Ruby 1.9.
I have a hash:
Hash_List={"ruby"=>"fun to learn","the rails"=>"It is a framework"}
I have a string like this:
test_string="I am learning the ruby by myself and also the rails."
I need to check if test_string contains words that match the keys of Hash_List. And if it does, replace the words with the matching hash value.
I used this code to check, but it is returning them empty:
another_hash=Hash_List.select{|key,value| key.include? test_string}
OK, hold onto your hat:
HASH_LIST = {
"ruby" => "fun to learn",
"the rails" => "It is a framework"
}
test_string = "I am learning the ruby by myself and also the rails."
keys_regex = /\b (?:#{Regexp.union(HASH_LIST.keys).source}) \b/x # => /\b (?:ruby|the\ rails) \b/x
test_string.gsub(keys_regex, HASH_LIST) # => "I am learning the fun to learn by myself and also It is a framework."
Ruby's got some great tricks up its sleeve, one of which is how we can throw a regular expression and a hash at gsub, and it'll search for every match of the regular expression, look up the matching "hits" as keys in the hash, and substitute the values back into the string:
gsub(pattern, hash) ā†’ new_str
...If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string....
Regexp.union(HASH_LIST.keys) # => /ruby|the\ rails/
Regexp.union(HASH_LIST.keys).source # => "ruby|the\\ rails"
Note that the first returns a regular expression and the second returns a string. This is important when we embed them into another regular expression:
/#{Regexp.union(HASH_LIST.keys)}/ # => /(?-mix:ruby|the\ rails)/
/#{Regexp.union(HASH_LIST.keys).source}/ # => /ruby|the\ rails/
The first can quietly destroy what you think is a simple search, because of the ?-mix: flags, which ends up embedding different flags inside the pattern.
The Regexp documentation covers all this well.
This capability is the core to making an extremely high-speed templating routine in Ruby.
You could do that as follows:
Hash_List.each_with_object(test_string.dup) { |(k,v),s| s.sub!(/#{k}/, v) }
#=> "I am learning the fun to learn by myself and also It is a framework."
First, follow naming conventions. Variables are snake_case, and names of classes are CamelCase.
hash = {"ruby" => "fun to learn", "rails" => "It is a framework"}
words = test_string.split(' ') # => ["I", "am", "learning", ...]
another_hash = hash.select{|key,value| words.include?(key)}
Answering your question: split your test string in words with #split and then check whether words include a key.
For checking if the string is substring of another string use String#[String] method:
another_hash = hash.select{|key, value| test_string[key]}

ruby match and scan not matching a pattern the same way?

I'm trying to parse out some inforamtion from multiple records. One of the items I'm interested in can have multiple entries in a string. My thought was just to return an array of all the matching values, but I'm having trouble with the results. For example:
> s = '>ctg7180000000043_1204 selected_feature: CDS loc=299156..299605;/db_xref="GO:0007155";/db_xref="GO:0009289";'
=> ">ctg7180000000043_1204 selected_feature: CDS loc=299156..299605;/db_xref=\"GO:0007155\";/db_xref=\"GO:0009289\";"
> s.match('db_xref="[^"]+')
=> #<MatchData "db_xref=\"GO:0007155">
> s.scan('db_xref="[^"]+')
=> []
Anyway, why does match, er, match and scan does not?
String#match converts its argument to a Regexp, String#scan searches for a literal string if that's what you give it, giving #scan a Regexp gives it the same matches. Reference the ri docs for String#match and String#scan. Try the following in irb:
regex = /db_xref="[^"]+/
s.match(regex)
=> #<MatchData "db_xref=\"GO:0007155">
s.scan(regex)
=> ["db_xref=\"GO:0007155", "db_xref=\"GO:0009289"]
scan will also continue to match over the entire string, while match stops at the first pattern (you can then give it a start offset to continue if you need).

Regex replace pattern with first char of match & second char in caps

Let's say i have the following string:
"a test-eh'l"
I want to capitalize the start of each word. A word can be separated by a space, apostrophe, hyphen, a forward slash, a period, etc. So I want the string to turn out like this:
"A Test-Eh'L"
I'm not too worried about getting the first character capitalized from the gsub call, as that's easy to do after the fact. However, when I've been using IRB and match method, I only seem to be getting one result. When i use a scan, it collects the matches, but the problem is I cannot really do much with it, as i need to replace the contents of the original string.
Here's what i have so far:
"a test-eh'a".scan(/[\s|\-|\'][a-z]/)
=> [" t", "-e", "'a"]
"a test-eh'a".match(/[\s|\-|\'][a-z]/)
=> #<MatchData " t">
Then if i try the pattern using gsub:
"a test-eh'a".gsub(/[\s|\-|\'][a-z]/, $1)
TypeError: can't convert nil into String
In javascript, i would normally use parenthesis instead of square brackets on the front section. However, i wasn't getting correct results in the scan call when doing so.
"a test-eh'a".scan(/(\s|\-|\')[a-z]/)
=> [[" "], ["-"], ["'"]]
"a test-eh'a".gsub(/(\s|\-|\')[a-z]/, $1)
=> "a'est'h'"
Any help would be appreciated.
Try this:
"a test-eh'a".gsub(/(?:^|\s|-|')[a-z]/) { |r| r.upcase }
# => "A Test-Eh'A"

Split specific string by regular expression

i am trying to get an array that contain of aaaaa,bbbbb,ccccc as split output below.
a_string = "aaaaa[x]bbbbb,ccccc";
split_output a_string.split.split(%r{[,|........]+})
what supposed i put as replacement of ........ ?
No need for a regex when it's just a literal:
irb(main):001:0> a_string = "aaaaa[x]bbbbb"
irb(main):002:0> a_string.split "[x]"
=> ["aaaaa", "bbbbb"]
If you want to split by "open bracket...anything...close bracket" then:
irb(main):003:0> a_string.split /\[.+?\]/
=> ["aaaaa", "bbbbb"]
Edit: I'm still not sure what your criteria is, but let's guess that what you are really doing is looking for runs of 2-or-more of the same character:
irb(main):001:0> a_string = "aaaaa[x]bbbbb,ccccc"
=> "aaaaa[x]bbbbb,ccccc"
irb(main):002:0> a_string.scan(/((.)\2+)/).map(&:first)
=> ["aaaaa", "bbbbb", "ccccc"]
Edit 2: If you want to split by either the of the literal strings "," or "[x]" then:
irb(main):003:0> a_string.split /,|\[x\]/
=> ["aaaaa", "bbbbb", "ccccc"]
The | part of the regular expression allows expressions on either side to match, and the backslashes are needed since otherwise the characters [ and ] have special meaning. (If you tried to split by /,|[x]/ then it would split on either a comma or an x character.)
no regex needed, just use "[x]"

Resources