Parsing command-line arguments as wildcards - ruby

I wrote a simple script that writes all given arguments to a single text file, separated by newline. I'd like to pass a list of files to it using OptionParser. I would like to add a couple of files using wildcards like /dir/*.
I tried this:
opts = OptionParser.new
opts.on('-a', '--add FILE') do |s|
puts "DEBUG: before #{s}"
#options.add = s
puts "DEBUG: after #{#options.add}"
end
...
def process_arguments
#lines_to_add = Dir.glob #options.add
end
Put when I add files like this:
./script.rb -a /path/*
I always get only the first file in the directory. All the debug outputs show only the first file of directory, and it seems as if OptionParser does some magic interpretations
Does anyone know how to handle this?

You didn't mention which operating system you are using (it matters).
On Windows, whatever you type on the command line gets passed to the program without modification. So if you type
./script.rb -a /path/*
then the arguments to the program contain "-a" and "/path/*".
On Unix and other systems with similar shells, the shell does argument expansion that automatically expands wildcards in the command line. So when you type the same command above, the shell looks to find the files in the /path/* directory and expands the command line arguments before your program runs. So the arguments to your program might be "-a", "/path/file1", and "/path/file2".
An important point is that the script cannot find out whether argument expansion happened, or whether the user actually typed all those filenames out on the command line.

As mentioned above, the command-line is being parsed before the OS hands off the command to Ruby. The wildcard is being expanded into a list of space-delimited filenames.
You can see what will happen if you type something like echo * at the command-line, then, instead of hitting Return, instead hit Esc then *. You should see the * expanded into the list of matching files.
After hitting Return those names will be added to the ARGV array. OptionParser will walk through ARGV and find the flags you defined, grab the following elements if necessary, then remove them from ARGV. When OptionParser is finished any ARGV elements that didn't fit into the options will remain in the ARGV array where you can get at them.
In your code, you are looking for a single parameter for the '-a' or '--add FILE' option. OptionParser has an Array option which will grab comma-separated elements from the command line but will subsequent space-delimited ones.
require 'optparse'
options = []
opts = OptionParser.new
opts.on('-a', '--add FILE', Array) do |s|
options << s
end.parse!
print "options => ", options.join(', '), "\n"
print "ARGV => ", ARGV.join(', '), "\n"
Save that to a file and try your command line with -a one two three, then with -a one,two,three. You'll see how the Array option grabs the elements differently depending on whether there are commas or spaces between the parameters.
Because the * wildcard gets replaced with space delimited filenames you'll have to post-process ARGV after OptionParser has run against it, or programmatically glob the directory and build the list that way. ARGV has all the files except the one picked up in the -a option so, personally, I'd drop the -a option and let ARGV contain all the files.
You will have to glob the directory if * has to too many files and exceeds the buffer size. You'll know if that happens because the OS will complain.

The shell is expanding the argument before it gets passed to your program. Either keep consuming filenames until you reach another option, or have the user escape the wildcards (e.g. ./script.rb -a '/path/*') and glob them yourself.

What's happening is the shell is expanding the wildcard before Ruby gets to it. So really you are processing:
./script.rb -a /path/file1 /path/file2 ......
Put quotes around /path/* to avoid the shell expansion and pass the wildcard to Ruby:
./script.rb -a '/path/*'

Related

How to separate out command with its arguments coming in format of string using ruby

I have to pass a command with its arguments in a scheduled task, while separating the arguments from the command. I used:
split(/(?=\s-)/)
to do this, but it won't work when the argument is not passed as -arg format.
Example of commands can be passed in format:
"ping http://www.google.com" here url is argument
"abc-abc -V"
"abc-abc -L c:\\folder name\\test.log"
'"C:\\Program Files\\example\\program.exe" -arg1 -arg2'
"C:\\Program Files\\example\\program.exe"
To make this more clear these commands are not passed as command line argument which can get in ARGV
The command gets set in command property which accepts input in string format
command '"C:\\Program Files\\example\\program.exe" -arg1 -arg2'
Use Shellwords.split, from the standard library:
Shellwords.split("ping http:\\www.google.com here url is argument")
#=> ["ping", "http:www.google.com", "here", "url", "is", "argument"]
Shellwords.split("abc-abc -V")
#=> ["abc-abc", "-V"]
Shellwords.split("abc-abc -L c:\\folder name\\test.log")
#=> ["abc-abc", "-L", "c:folder", "nametest.log"]
Shellwords.split('"C:\\Program Files\\example\\program.exe" -arg1 -arg2')
#=> ["C:\\Program Files\\example\\program.exe", "-arg1", "-arg2"]
Shellwords.split('"C:\\Program Files\\example\\program.exe"')
#=> ["C:\\Program Files\\example\\program.exe"]
No need to reinvent the wheel with a custom regex/splitter, or an external system call.
It seems to me that if there's no consistent pattern to your command syntax, then any regex based approach will inevitably fail. It seems better instead to solve this problem the way a human would, i.e. with some knowledge of context.
In a *nix terminal, you can use the compgen command to list available commands. This Ruby script invokes that command to print the first 5 options from that list:
list = `cd ~ && compgen -c`
list_arr = list.split("\n")
list_arr[0,6].each{|x| puts x }
(The cd in the first line seems to be needed because of the context in which my Ruby is running with rvm.) For Windows, you may find this thread a useful starting point.
I'd match against the elements of this list to identify my commands, and take it from there.
Tom Lord's answer is far better than this one.
You probably want to look at OptionParser or GetOptLong if you need parsing of command line arguments provided to a ruby program.
If you are interested in parsing some strings that may or may not be commands with arguments, here's a quick-and-dirty:
I'd use scan instead of split with the following regex: /(".*"|[\w\:\:\.\-\\]+)/.
Best results come from: 'some string'.scan(/(".*"|[\w\:\:\.\-\\]+)/).flatten:
["ping", "http:\\www.google.com"]
["abc-abc", "-V"]
["abc-abc", "-L", "c:\\folder\\", "name\\test.log"]
# Technically, this is wrong, but so is the non-escaped whitespace.
["\"C:\\Program Files\\example\\program.exe\"", "-arg1", "-arg2"]
["\"C:\\Program Files\\example\\program.exe\""]

Bash command line parsing containing whitespace

I have a parse a command line argument in shell script as follows:
cmd --a=hello world good bye --b=this is bash script
I need the parse the arguments of "a" i.e "hello world ..." which are seperated by whitespace into an array.
i.e a_input() array should contain "hello", "world", "good" and "bye".
Similarly for "b" arguments as well.
I tried it as follows:
--a=*)
a_input={1:4}
a_input=$#
for var in $a_input
#keep parsing until next --b or other argument is seen
done
But the above method is crude. Any other work around. I cannot use getopts.
The simplest solution is to get your users to quote the arguments correctly in the first place.
Barring that you can manually loop until you get to the end of the arguments or hit the next --argument (but that means you can't include a word that starts with -- in your argument value... unless you also do valid-option testing on those in which you limit slightly fewer -- words).
Adding to Etan Reisners answer, which is absolutely correct:
I personally find bash a bit cumbersome, when array/string processing gets more complex, and if you really have the strange requirement, that the caller should not be required to use quotes, I would here write an intermediate script in, say, Ruby or Perl, which just collects the parameters in a proper way, wraps quoting around them, and passes them on to the script, which originally was supposed to be called - even if this costs an additional process.
For example, a Ruby One-Liner such as
system("your_bash_script here.sh '".(ARGV.join(' ').split(' --').select {|s| s.size>0 }.join("' '"))."'")
would do this sanitizing and then invoke your script.

jamplus: link command line too long for osx

I'm using jamplus to build a vendor's cross-platform project. On osx, the C tool's command line (fed via clang to ld) is too long.
Response files are the classic answer to command lines that are too long: jamplus states in the manual that one can generate them on the fly.
The example in the manual looks like this:
actions response C++
{
$(C++) ##(-filelist #($(2)))
}
Almost there! If I specifically blow out the C.Link command, like this:
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TC)) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
in my jamfile, I get the command line I need that passes through to the linker, but the response file isn't newline terminated, so link fails (osx ld requires newline-separated entries).
Is there a way to expand a jamplus list joined with newlines? I've tried using the join expansion $(LIST:TCJ=\n) without luck. $(LIST:TCJ=#(\n)) doesn't work either. If I can do this, the generated file would hopefully be correct.
If not, what jamplus code can I use to override the link command for clang, and generate the contents on the fly from a list? I'm looking for the least invasive way of handling this - ideally, modifying/overriding the tool directly, instead of adding new indirect targets wherever a link is required - since it's our vendor's codebase, as little edit as possible is desired.
The syntax you are looking for is:
newLine = "
" ;
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TCJ=$(newLine))) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
To be clear (I'm not sure how StackOverflow will format the above), the newLine variable should be defined by typing:
newLine = "" ;
And then placing the carat between the two quotes and hitting enter. You can use this same technique for certain other characters, i.e.
tab = " " ;
Again, start with newLine = "" and then place carat between the quotes and hit tab. In the above it is actually 4 spaces which is wrong, but hopefully you get the idea. Another useful one to have is:
dollar = "$" ;
The last one is useful as $ is used to specify variables typically, so having a dollar variable is useful when you actually want to specify a dollar literal. For what it is worth, the Jambase I am using (the one that ships with the JamPlus I am using), has this:
SPACE = " " ;
TAB = " " ;
NEWLINE = "
" ;
Around line 28...
I gave up on trying to use escaped newlines and other language-specific characters within string joins. Maybe there's an awesome way to do that, that was too thorny to discover.
Use a multi-step shell command with multiple temp files.
For jamplus (and maybe other jam variants), the section of the actions response {} between the curly braces becomes an inline shell script. And the response file syntax #(<value>) returns a filename that can be assigned within the shell script, with the contents set to <value>.
Thus, code like:
actions response C.Link
{
_RESP1=#($(2:TCJ=#)#$(NEEDLIBS:TCJ=#)#$(LINKLIBS:TCJ=#))
_RESP2=#()
perl -pe "s/[#]/\n/g" < $_RESP1 > $_RESP2
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,$_RESP2
}
creates a pair of temp files, assigned to shell variable names _RESP1 and _RESP2. File at path _RESP1 is assigned the contents of the expanded sequence joined with a # character. Search and replace is done with a perl one liner into _RESP2. And link proceeds as planned, and jamplus cleans up the intermediate files.
I wasn't able to do this with characters like :;\n, but # worked as long as it had no adjacent whitespace. Not completely satisfied, but moving on.

How to bring system grep results into ruby

I'm currently grep-ing the system and returning the results into ruby to manipulate.
def grep_system(search_str, dir, filename)
cmd_str ="grep -R '#{search_str}' #{dir} > #{filename}"
system(cmd_str)
lines_array = File.open(filename, "r").read.split("\n)
end
As you can see, I'm just writing the results from the grep into a temp file, and then re-opening that file with "File.open".
Is there a better way to do this?
Never ever do anything like this:
cmd_str ="grep -R '#{search_str}' #{dir}"
Don't even think about it. Sooner or later search_str or dir will contain something that the shell will interpret in unexpected ways. There's no need to invoke a shell at all, you can use Open3.capture3 thusly:
lines = Open3.capture3('grep', '-R', search_str, dir).first
lines, _ = Open3.capture3('grep', '-R', search_str, dir)
That will leave you with a newline delimited list in lines and from there it should be easy.
That will invoke grep directly without using a shell at all. capture3 also nicely lets you ignore (or capture) the command's stderr rather than leaving it be printed wherever your stderr goes by default.
If you use this form of capture3, you don't have to worry about shell metacharacters or quoting or unsanitary inputs.
Similarly for system, if you want to use system with arguments you'd use the multi-argument version:
system('ls', some_var)
instead of the potentially dangerous:
system("ls #{some_var}")
You shouldn't need to pass an argument for the temporal filename. After all, writing and reading to/from a temporal file is something you should avoid if possible.
require "open3"
def grep_system(search_str, dir)
Open3.capture2("grep -R '#{search_str}' #{dir}").first.each_line.to_a
end
Instead of using system(cmd_str), you could use:
results = `#{cmd_str}`
Yes, there are a few better ways. The easiest is just to assign the result of invoking the command with backticks to a variable:
def grep_system(search_str, dir, filename)
cmd_str ="grep -R '#{search_str}' #{dir}"
results = `#{cmd_str}`
lines_array =results.split("\n)
end

Convert Hex STDIN / ARGV / gets to ASCII in ruby

my Question is how I can convert the STDIN of cmd ARGV or gets from hex to ascii
I know that if I assigned hex string to variable it'll be converted once I print it
ex
hex_var = "\x41\41\x41\41"
puts hex_var
The result will be
AAAA
but I need to get the value from command line by (ARGV or gets)
say I've this lines
s = ARGV
puts s
# another idea
puts s[0].gsub('x' , '\x')
then I ran
ruby gett.rb \x41\x41\x41\x41
I got
\x41\x41\x41\x41
is there a way to get it work ?
There are a couple problems you're dealing with here. The first you've already tried to address, but I don't think your solution is really ideal. The backslashes you're passing in with the command line argument are being evaluated by the shell, and are never making it to the ruby script. If you're going to simply do a gsub in the script, there's no reason to even pass them in. And doing it your way means any 'x' in the arguments will get swapped out, even those that aren't being used to indicate a hex. It would be better to double escape the \ in the argument if possible. Without context of where the values are coming from, it's hard to say with way would actually be better.
ruby gett.rb \\x41\\x41
That way ARGV will actually get '\x41\x41', which is closer to what you want.
It's still not exactly what you want, though, because ARGV arguments are created without expression substitution (as though they are in single quotes). So Ruby is escaping that \ even though you don't want it to. Essentially you need to take that and re-evaluate it as though it were in double quotes.
eval('"%s"' % s)
where s is the string.
So to put it all together, you could end up with either of these:
# ruby gett.rb \x41\x41
ARGV.each do |s|
s = s.gsub('x' , '\x')
p eval('"%s"' % s)
end
# => "AA"
# ruby gett.rb \\x41\\x41
ARGV.each do |s|
p eval('"%s"' % s)
end
# => "AA"
Backlashes entered in the console will be interpreted by the shell and will
not make it into your Ruby script, unless you enter two backlashes in a row,
in which case you script will get a literal backlash and no automatic
conversion of hexadecimal character codes following those backlashes.
You can convert these escaped codes to characters manually if you replace the last line of your script with this:
puts s.gsub(/\\x([[:xdigit:]]{1,2})/) { $1.hex.chr }
Then run it with double backlashed input:
$ ruby gett.rb \\x41\\x42\\x43
ABC
When fetching user input through gets or similar, only a single backslash will be need to be entered by the user for each character escape, since that will indeed be passed to your script as literal backslashes and thus handled correctly by the above gsub call.
An alternative way when parsing command line arguments would be to let the shell interpret the character escapes for you. How to do this will depend on what shell you are using. If using bash, it can be done
like this:
$ echo $'\x41\x42\x43'
ABC
$ ruby -e 'puts ARGV' $'\x41\x42\x43'
ABC

Resources