Pass regular expression as script argument Ruby - ruby

I am trying to pass regular expression to process a file line by line. The regular expression works fine if I hard code it in the code, like this.
File.foreach(filename).with_index do |line, line_num|
md5 = line.scan(/[0-9a-f]{32}/i)
puts md5
end
This works wonderful and I can see every line that has a MD5 hash on it printed. Now, the problem comes when I try to pass the regular expression to match md5 hashes as a script argument like:
ruby md5.rb -h "/[0-9a-f]{32}/i"
options = {}
OptionParser.new do |opts|
opts.on('-h', '--hash "<hash regex>"', 'Hash Regex') { |v| options[:hash] = v }
end.parse!
hash = options[:hash]
File.foreach(filename).with_index do |line, line_num|
md5 = line.scan(hash)
puts md5
end

You can pass the inside bits of the regex as a string, then convert it to a regex later eg:
ruby md5.rb -h "[0-9a-f]{32}"
To convert a string into a regex, just use interpolation:
regex = /#{regex_string}/i

Related

Reading specific line into an array - ruby

Have a txt file with the following:
Anders Hansen;87442355;11;87
Jens Hansen;22338843;23;11
Nanna Kvist;25233255;24;84
I would like to search the file after a specific name taken from the user input. Then save that line into an array, splittet via ";". Can't get it to work though. This is my code:
user1 = []
puts "Start by entering the full name of user 1: "
input = gets.chomp
File.open("userregister.txt") do |f|
f.each_line { |line|
if line =~ input then do |line|
user1 << line.split(';').map
=~ in ruby tries to match a string with a regex (or vice versa). Here, you use it with two strings, which gives an error:
'foo' =~ 'bar' # => TypeError: type mismatch: String given
There are more appropriate String methods to use instead. In your case, #start_with? does the job. If you wanted to check if the latter is contained somewhere as a substring (but not necessary the beginning), you can use #include?.
In case you actually wanted to take a regex as a user input (generally a bad idea), you can convert it from string to regex:
line =~ /#{input}/
Looking at the file format, I would actually use Ruby CSV class. By specifying the column separator to ;, you will get an array for each row.
require 'csv'
input = gets.chomp
CSV.foreach('userregister.txt', col_sep: ';') do |row|
if row[0].downcase == input.downcase
# Do stuffs with row[1..-1]
end
end

Print Unicode escape codes from variable

I have a list of Unicode character codes that I would like to output with rumoji. Here's the code I'm using to iterate over my data.
require "rumoji"
# this works
puts Rumoji.decode("\u{1F600}")
# feed some data
data = [
"1F600",
"1F476",
"1F474"
]
data.each do |line|
# this doesn't work
puts Rumoji.decode("\u{#{line}}")
puts Rumoji.decode("\u{" + line + "}")
end
I'm not sure how I can use variable names inside the escaped string.
One can not use \u along with string interpolation, since \u takes precedence. What one might do, is to Array#pack an array of integers:
▶ data.map { |e| e.to_i(16) }.pack 'U*'
#⇒ "😀👶👴"

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

How to access the various occurences of the same match group in Ruby Regular expressions ?

I have a regular expression which has multiple matches. I figured out that $1 , $2 etc .. can be used to access the matched groups. But how to access the multiple occurences of the same matched group ?
Please take a look at the rubular page below.
http://rubular.com/r/nqHP1qAqRY
So now $1 gives 916 and $2 gives NIL. How can i access the 229885 ? Is there something similar to $1[1] or so ?
Firstly it is not a good idea to parse xml-based data only with regular expressions.
Instead use a library for parsing xml-files, like nokogiri.
But if you're sure, that you want to use this approach, you do need to know the following.
Regex engines stop as soon as they get a (pleasing) match. So you cannot
expect to get all possible matches in a string from one regex-call,
you need to iterate through the string applying a new regex-match after
each already occurred match. You could do it like that:
# ruby 1.9.x version
regex = /<DATA size="(\d+)"/
str = your_string # Your string to be parsed
position = 0
matches = []
while(match = regex.match(str,position)) do # Until there are no matches anymore
position = match.end 0 # set position to the end of the last match
matches << match[1] # add the matched number to the matches-array
end
After this all your parsed numbers should be in matches.
But since your comment suggests, that you are using ruby 1.8.x i will post another
version here, which works in 1.8.x (the method definition are different in these versions).
# ruby 1.8.x version
regex = /<DATA size="(\d+)"/
str = your_string # Your string to be parsed
matches = []
while(match = regex.match(str)) do # Until there are no matches anymore
str = match.post_match # set str to the part which is after the match.
matches << match[1] # add the matched number to the matches-array
end
To expand on my comment and respond to your question:
If you want to store the values in an array, modify the block and collect instead of iterate:
> arr = xml.grep(/<DATA size="(\d+)"/).collect { |d| d.match /\d+/ }
> arr.each { |a| puts "==> #{a}" }
==> 916
==> 229885
The |d| is normal Ruby block parameter syntax; each d is the matching string, from which the number is extracted. It's not the cleanest Ruby, although it's functional.
I still recommend using a parser; note that the rexml version would be this (more or less):
require 'rexml/document'
include REXML
doc = Document.new xml
arr = doc.elements.collect("//DATA") { |d| d.attributes["size"] }
arr.each { |a| puts "==> #{a}" }
Once your "XML" is converted to actual XML you can get even more useful data:
doc = Document.new xml
arr = doc.elements.collect("//file") do |f|
name = f.elements["FILENAME"].attributes["path"]
size = f.elements["DATA"].attributes["size"]
[name, size]
end
arr.each { |a| puts "#{a[0]}\t#{a[1]}" }
~/Users/1.txt 916
~/Users/2.txt 229885
This is not possible in most implementations of regex. (AFAIK only .NET can do this.)
You will have to use an alternate solution, e.g. using scan(): Equivalent to Python’s findall() method in Ruby?.

Store regex matches in ruby?

I'm parsing a file with ruby to change the data formatting. I created a regex which has three match groups that I want to temporally store in variables. I'm having trouble getting the matches to be stored as everything is nil.
Here is what I have so far from what I've read.
regex = '^"(\bhttps?://[-\w+&##/%?=~_|$!:,.;]*[\w+&##/%=~_|$])","(\w+|[\w._%+-]+#[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})'
begin
file = File.new("testfile.csv", "r")
while (line = file.gets)
puts line
match_array = line.scan(/regex/)
puts $&
end
file.close
end
Here is some sample data that I'm using for testing.
"https://mail.google.com","Master","password1","","https://mail.google.com","",""
"https://login.sf.org","monster#gmail.com","password2","https://login.sf.org","","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxUsername","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxPassword"
"http://www.facebook.com","Beast","12345678","https://login.facebook.com","","email","pass"
"http://www.own3d.tv","Earth","passWOrd3","http://www.own3d.tv","","user_name","user_password"
Thank you,
LF4
This won't work:
match_array = line.scan(/regex/)
That's just using a literal "regex" string as your regular expression, not what's in your regex variable. You can either put the big ugly regex right into your scan or create a Regexp instance:
regex = Regexp.new('^"(\bhttps?://[-\w+&##/%?=~_|$!:,.;]*[\w+&##/%=~_|$])","(\w+|[\w._%+-]+#[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})')
# ...
match_array = line.scan(regex)
And you should probably use a CSV library (one comes with Ruby: 1.8.7 or 1.9) for parsing CSV files, then apply a regular expression to each column from the CSV. You'll run into fewer quoting and escaping issues that way.

Resources