Ruby regexp to parse command line - ruby

How can I parse strings in ruby like many command line utilities do? I've got strings similar to "command [--opt1=...] [--enable-opt2] --opt3=... arg1" and methods similar to command(opt1,opt2,opt3,arg1...). I want to let arguments to come in random order, some of them can be optional.
At the moment I wrilte regexp every time I need to parse new command, as for example
to parse "lastpost --chan=your_CHANNEL /section/"
I have this regular expression:
text = "lastpost --chan=0chan.ru /s/"
command = (text.match /^\w+/)[0]
args = text.gsub(/^\w+/,'')
if args =~ /[[:blank:]]*(--chan\=([[:graph:]]+)[[:blank:]]+)*\/?(\w+)\/?/
chan = $2
section = $3
do_command(chan,section)
else
puts "wrong args"
end
I wish i had create_regexp(opts,args), which should produce regular expression.

Ok, I found optparse can do it for me

Related

Ruby command line script: Trying to pass variable in switch case

I'm creating a ruby command line tool which has a switch case statement, I'd like to pass through variables on this switch case statement for example:
input = gets.chomp
case input
when 'help'
display_help
when 'locate x, y' # this is the bit i'm stuck on
find_location(x, y)
when 'disappear s'
disappear_timer(s)
when 'exit'
exit
else
puts "incorrect input"
end
Essentially I want the user to be able to type in locate 54, 30 or sleep 5000 and then call a function which handles the number they passed. I was wondering how I can either pass arguments from the user in a switch statement like this for my command line tool like this?
Use Regexp matcher inside when:
when /locate \d+, \d+/
find_location *input.scan(/\d+/).map(&:to_i)
Here we basically match whatever is locate followed by digits, comma, space, digits. If matched, we extract the digits from the string with String#scan and then convert to Integers, finally passing them as an argument to find_location method.

How to get access to command-line arguments in Nim?

How can I access command line arguments in Nim?
The documentation shows only how to run the compiled Nim code with command line arguments
nim compile --run greetings.nim arg1 arg2
but I didn't find how to use their values in code.
Here's an example that prints the number of arguments and the first argument:
import os
echo paramCount(), " ", paramStr(1)
Personally I find paramCount and paramStr a bit confusing to work with, because the paramCount value differs from C conventions (see the document links).
Fortunately, there are additional convenient functions which do not require to be aware of the conventions:
commandLineParams returns a seq of only command line params.
getAppFilename returns the executable file name (what is argv[0] in the C world).
I have not checked when it was added, but the parseopt seems to me, the default and the best way for this.
commandLineParams isn't available on Posix.
os.commandLineParams() returns a sequence of command-line arguments provided to the program.
os.quoteShellCommand(<openArray[string]>) takes a sequence of command-line arguments and turns it into a single string with quotes around items containing spaces, so the string can be parsed properly.
parseopt.initOptParser(<string>) takes a full command-line string and parses it, returning an OptParser object.
parseopt.getopt(<OptParser>) is an iterator that yields parsed argument info.
You can combine them to parse a program's input arguments:
import std/[os, parseopt]
proc writeHelp() = discard
proc writeVersion() = discard
var positionalArgs = newSeq[string]()
var directories = newSeq[string]()
var optparser = initOptParser(quoteShellCommand(commandLineParams()))
for kind, key, val in optparser.getopt():
case kind
of cmdArgument:
positionalArgs.add(key)
of cmdLongOption, cmdShortOption:
case key
of "help", "h": writeHelp()
of "version", "v": writeVersion()
of "dir", "d":
directories.add(val)
of cmdEnd: assert(false) # cannot happen
echo "positionalArgs: ", positionalArgs
echo "directories: ", directories
Running this with nim c -r main.nim -d:foo --dir:bar dir1 dir2 dir3 prints:
positionalArgs: #["dir1", "dir2", "dir3"]
directories: #["foo", "bar"]

Read string as variable RUBY

I am pulling the following string from a CSV file, from cell A1, and storing it as a variable:
#{collector_id}
So, cell A1 reads #{collector_id}, and my code essentially does this:
test = #excel_cell_A1
However, if I then do this:
puts test
I get this:
#{collector_id}
I need #{collector_id} to read as the actual variable collector_id, not the code that I am using to call the variable. Is that possible?
Thanks for the help. I am using ruby 1.9.3.
You can use sub or gsub to replace expected input values:
collector_id = "foo"
test = '#{collector_id}'
test.sub("\#{collector_id}", "#{collector_id}") #=> "foo"
I would avoid the use of eval (or at least sanity check what you are running) to reduce the risk of running arbitrary code you receive from the CSV file.
Try this:
test_to_s = eval("\"#{ test }\"")
puts test_to_s
%q["#{ test }"] will build the string "#{collector_id}" (the double quotes are part of the string, "#{collector_id}".length == 17) which then will be evaluated as ruby code by eval

Why doesn't this Ruby replace regex work as expected?

Consider the following string which is a C fragment in a file:
strcat(errbuf,errbuftemp);
I want to replace errbuf (but not errbuftemp) with the prefix G-> plus errbuf. To do that successfully, I check the character after and the character before errbuf to see if it's in a list of approved characters and then I perform the replace.
I created the following Ruby file:
line = " strcat(errbuf,errbuftemp);"
item = "errbuf"
puts line.gsub(/([ \t\n\r(),\[\]]{1})#{item}([ \t\n\r(),\[\]]{1})/, "#{$1}G\->#{item}#{$2}")
Expected result:
strcat(G->errbuf,errbuftemp);
Actual result
strcatG->errbuferrbuftemp);
Basically, the matched characters before and after errbuf are not reinserted back with the replace expression.
Anyone can point out what I'm doing wrong?
Because you must use syntax gsub(/.../){"...#{$1}...#{$2}..."} or gsub(/.../,'...\1...\2...').
Here was the same problem: werid, same expression yield different value when excuting two times in irb
The problem is that the variable $1 is interpolated into the argument string before gsub is run, meaning that the previous value of $1 is what the symbol gets replaced with. You can replace the second argument with '\1 ?' to get the intended effect. (Chuck)
I think part of the problem is the use of gsub() instead of sub().
Here's two alternates:
str = 'strcat(errbuf,errbuftemp);'
str.sub(/\w+,/) { |s| 'G->' + s } # => "strcat(G->errbuf,errbuftemp);"
str.sub(/\((\w+)\b/, '(G->\1') # => "strcat(G->errbuf,errbuftemp);"

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

Resources