I have the following regular expression in Ruby:
\<name\>(.+)\<\/name\>
Within in if statement, like so:
if line =~ /\<name\>(.+)\<\/name\>/
Is there any way to get the value of the group (.+)?
Thanks in advance!
It is in the variable $1
Rather than use regex to parse XML or HTML, use a real parser. I like Nokogiri:
require 'nokogiri'
doc = Nokogiri::XML('<somecontainingtags><name>blah</name></somecontainingtags>')
# find all occurences
doc.search('//name').map {|n| n.inner_text } # => ["blah"]
# find the first occurance
doc.at('//name').inner_text # => "blah"
Related
How can I get the username without the # symbol?
That's everything between # and any non-word character.
message = <<-MESSAGE
From #victor with love,
To #andrea,
and CC goes to #ghost
MESSAGE
Using a Ruby regular expression, I tried
username_pattern = /#\w+/
I will like to get the following output
message.scan(username_pattern)
#=> ["victor", "andrea", "ghost"]
Use look behind
(?<=#)\w+
this will leave # symbol regex
I would go with:
message.scan(/(?<=#)\w+/)
#=> ["victor","andrea","ghost"]
You might want to read about look-behind regexp.
You could match the # and then capture one or more times a word character in a capturing group
#(\w+)
username_pattern = /#(\w+)/
Regex demo
Try this
irb(main):010:0> message.scan(/#(\w+)/m).flatten
=> ["victor", "andrea", "ghost"]
I am new to ruby and I want to do the following action to remove last "_val3" in ruby:
$ val="val1_val2_val3"
$ echo ${val%_*}
val1_val2
I used to use echo ${val%_*} to get "val1_val2", but i do not how do this in ruby.
Also, how to get "val1"?
Is there a good way to do them?
Not a ruby expert but I'll get the ball rolling with a regular expression:
a.sub /_[^_]*$/, ''
Match an underscore followed by any number of non-underscores at the end of the string. Replace with nothing.
You can use a single gsub to get your expected o/p,
a = "a-b_c_d"
# => "a-b_c_d"
a.gsub /_[a-z]*$/, ''
# => "a-b_c"
Or, you can use ruby split and join,
a.split("_")[0..-2].join("_")
# => "a-b_c"
String#rpartition would probably work:
'a-b_c_d'.rpartition('_') #=> ["a-b_c", "_", "d"]
rpartition looks for the last '_' and returns an array containing the part before it, the separator itself and the part after it.
I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ādā, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].
I'm parsing a file with ruby to change the data formatting. I created a regex which has three match groups that I want to temporally store in variables. I'm having trouble getting the matches to be stored as everything is nil.
Here is what I have so far from what I've read.
regex = '^"(\bhttps?://[-\w+&##/%?=~_|$!:,.;]*[\w+&##/%=~_|$])","(\w+|[\w._%+-]+#[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})'
begin
file = File.new("testfile.csv", "r")
while (line = file.gets)
puts line
match_array = line.scan(/regex/)
puts $&
end
file.close
end
Here is some sample data that I'm using for testing.
"https://mail.google.com","Master","password1","","https://mail.google.com","",""
"https://login.sf.org","monster#gmail.com","password2","https://login.sf.org","","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxUsername","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxPassword"
"http://www.facebook.com","Beast","12345678","https://login.facebook.com","","email","pass"
"http://www.own3d.tv","Earth","passWOrd3","http://www.own3d.tv","","user_name","user_password"
Thank you,
LF4
This won't work:
match_array = line.scan(/regex/)
That's just using a literal "regex" string as your regular expression, not what's in your regex variable. You can either put the big ugly regex right into your scan or create a Regexp instance:
regex = Regexp.new('^"(\bhttps?://[-\w+&##/%?=~_|$!:,.;]*[\w+&##/%=~_|$])","(\w+|[\w._%+-]+#[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})')
# ...
match_array = line.scan(regex)
And you should probably use a CSV library (one comes with Ruby: 1.8.7 or 1.9) for parsing CSV files, then apply a regular expression to each column from the CSV. You'll run into fewer quoting and escaping issues that way.
I have the following line
'passenger (2.2.5, 2.0.6)'.match(//)[0]
which obviously doesn't match anything yet
I want to return the just the content of (2.2.5, so everything after the open parentheses and before the comma.
How would I do this?
Beanish solution fails on more than 2 version numbers, you should use something like:
>> 'passenger (2.2.5, 2.0.6, 1.8.6)'.match(/\((.*?),/)[1] # => "2.2.5"
'passenger (2.2.5, 2.0.6)'.match(/\((.*),/)[1]
if you use the $1 element it is the group that is found within the ( )
#!/usr/bin/env ruby
s = 'passenger (2.2.5, 2.0.6)'
p s.scan(/(?:\(|, *)([^,)]*)/).flatten # => ["2.2.5", "2.0.6"]