In Ruby, how can I escape a comma in an argument parameter with OptionParser? - ruby

Given the following code:
options = {}
optparse = OptionParser.new do |opts|
opts.on('-t', '--thing [THING1,THING2]', Array, 'Set THING1, THING2') do |t|
options[:things] = t
end
end
If THING1 has a comma in it, how can I prevent OptionParser from splitting on it?
Sample case: ./scrit.rb -t 'foo,bar',baz. In this case I want options[:things] should be ['foo,bar', 'baz']
Is this even possible?

If your run:
./scrit.rb -t 'foo,bar',baz
shell pass ARGV:
["-t", "foo,bar,baz"]
Shell converts 'foo,bar',baz to foo,bar,baz:
$ strace -e trace=execve ./scrit.rb -t 'foo,bar',baz
execve("./scrit.rb", ["./scrit.rb", "-t", "foo,bar,baz"], [/* 52 vars */]) = 0
execve("/home/scuawn/bin/ruby", ["ruby", "./scrit.rb", "-t", "foo,bar,baz"], [/* 52 vars */]) = 0
You can use other delimiter:
opts.on('-t', '--thing [THING1,THING2]', Array, 'Set THING1, THING2') do |t|
options[:things] = t
options[:things][0] = options[:things][0].split(":")
end
$ ./scrit.rb -t foo:bar,baz
[["foo", "bar"], "baz"]
Or:
opts.on('-t', '--thing [THING1,THING2]', Array, 'Set THING1, THING2') do |t|
options[:things] = t
options[:things] = options[:things].length == 3 ? [[options[:things][0],options[:things][1]],options[:things][2]] : options[:things]
end
$ ./scrit.rb -t foo,bar,baz
[["foo", "bar"], "baz"]

First of all, the shell1 yields the same final value for all of the following quoting variations:
./scrit.rb -t 'foo,bar',baz
./scrit.rb -t foo,'bar,baz'
./scrit.rb -t 'foo,bar,baz'
./scrit.rb -t foo,bar,baz
./scrit.rb -t fo"o,b"ar,baz
./scrit.rb -t foo,b\ar,baz
# obviously many more variations are possible
You can verify this like so:
ruby -e 'f=ARGV[0];ARGV.each_with_index{|a,i|puts "%u: %s <%s>\n" % [i,a==f,a]}'\
'foo,bar',baz foo,'bar,baz' 'foo,bar,baz' foo,bar,baz fo"o,b"ar,baz foo,b\ar,baz
1 I am assuming a Bourne-like shell (some sh-variant like zsh, bash, ksh, dash, et cetera).
If you want to switch to some other separator, you might do it like this:
split_on_semicolons = Object.new
OptionParser.accept split_on_semicolons do |s,|
s.split ';'
end
⋮
opts.on('-t', '--thing [THING1;THING2]', split_on_semicolons, 'Set THING1, THING2 (semicolon must be quoted to protect it from the shell)') do |t|
options[:things] = t
end
The shell gives special meaning to the semicolon, so it must be escape or quoted (otherwise it serves as an unconditional command separator (e.g. echo foo; sleep 2; echo bar)):
./scrit.rb -t foo,bar\;baz
./scrit.rb -t foo,bar';'baz
./scrit.rb -t 'foo,bar;baz'
# et cetera
The “parsing” done when you specify Array is almost exactly a basic str.split(',') (it also drops empty string values), so there is no way to directly specify an escape character.
If you want to stick with commas but introduce an “escape character”, then you could post-process the values a bit in your OptionParser#on block to stitch certain values back together:
# use backslash as an after-the-fact escape character
# in a sequence of string values,
# if a value ends with a odd number of backslashes, then
# the last backslash should be replaced with
# a command concatenated with the next value
# a backslash before any other single character is removed
#
# basic unsplit: (note doubled backslashes due to writing these as Ruby values)
# %w[foo\\ bar baz] => %w[foo,bar baz]
#
# escaped, trailing backslash is not an unsplit:
# %w[foo\\\\ bar baz] => %w[foo\\ bar baz]
#
# escaping [other, backslash, split], also consecutive unsplits
# %w[f\\o\\\\o\\ \\\\\\bar\\\\\\ baz] => %w[fo\\o,\\bar\\,baz]
def unsplit_and_unescape(orig_values)
values = []
incompleteValue = nil
orig_values.each do |val|
incomplete = /\\*$/.match(val)[0].length.odd?
val.gsub! /\\(.)/, '\1'
val = incompleteValue + ',' + val if incompleteValue
if incomplete
incompleteValue = val[0..-2]
else
values << val
incompleteValue = nil
end
end
if incompleteValue
raise ArgumentError, 'Incomplete final value'
end
values
end
⋮
opts.on('-t', '--thing [THING1,THING2]', Array, 'Set THING1, THING2 (use \\, to include a comma)') do |t|
options[:things] = unsplit_and_unescape(t)
end
You could then run it from the shell like this (the backslash is also special to the shell, so it must be escaped or quoted2):
./scrit.rb -t foo\\,bar,baz
./scrit.rb -t 'foo\,bar,baz'
./scrit.rb -t foo'\,'bar,baz
./scrit.rb -t "foo\\,bar,baz"
./scrit.rb -t fo"o\\,ba"r,baz
# et cetera
2 Unlike in Ruby, the shell’s single quote is completely literal (e.g. no backslashes are interpreted), so it is often a good choice when you need to embed any other shell-special characters (like backslashes and double quotes).

Related

Introducing blank lines separating matched sections

I have this bash function that prints lines between Mode: org and # End of org. I would like for matched sections to be separated by a blank line.
capture ()
{
local efile="$1"
local charcl begorg endorg
charcl_ere='^[[:space:]]*([#;!]+|#c|\/\/)[[:space:]]*'
charcl_bre='^[[:space:]]*\([#;!]\+\|#c\|\/\/\)[[:space:]]*'
begorg="${charcl_bre}"'Mode: org$'
endorg="${charcl_bre}"'# End of org$'
mdr='^Mode: org$' ; edr='^# End of org$'
sed -n "/$begorg/,/$endorg/ s/$charcl_bre//p" "$efile" |
sed "/$mdr\|$edr/d"
}
This is the input
cat /home/flora/docs/recnotes.txt
## Mode: org
# Assigns shell positional parameters or changes the values of shell
# options. The -- option assigns the positional parameters to the
# arguments of {set}, even when some of them start with an option
# prefix `-'.
## # End of org
;; Mode: org
; Assigns shell positional parameters or changes the values of shell
; options. The -- option assigns the positional parameters to the
; arguments of {set}, even when some of them start with an option
; prefix `-'.
;; # End of org
#c Mode: org
#c Assigns shell positional parameters or changes the values of shell
#c options. The -- option assigns the positional parameters to the
#c arguments of {set}, even when some of them start with an option
#c prefix `-'.
#c # End of org
I assumed by lines between that you didn't mean including the lines with Mode: org and # End of org. You didn't specify that. Try:
perl -nlE'$p=/Mode: org/?1:/# End of org/?0*say$":$p&&$p+say' recnotes.txt
If it's important to avoid the empty separation line at the end, try:
perl -nlE'$p=/Mode: org/?1:/# End of org/?0*do{eof||say("")}:$p&&$p+say' r.txt
I have come up with a straightforward way using
sed -n "/$begorg/,/$endorg/ s/$charcl_bre//p" "$efile" |
sed "/$mdr/d" | sed "s/$edr//"

Insert multiple lines and keep their indentation with sed

I have some Tython functions that I want to insert in a file. Inserting multiple lines in itself works well using a variable and some \n, but the indentation isn't kept. Because it's Python code, that's a big issue, the code can't work as it is.
Here is what I tried:
cat sed-insertlines.sh
#!/bin/bash
read -r -d '' lines_to_insert << 'EOF'
def string_cleanup(x, notwanted):\n
for item in notwanted:\n
x = re.sub(item, '', x)\n
return x\n
EOF
lines_to_insert=$(echo ${lines_to_insert} )
sed -i "/import re # Regular Expression library/a $lines_to_insert" sed-insertlines.txt
But here is what I get in the end when I cat sed-insertlines.txt:
#!/bin/python
import re # Regular Expression library
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected
Lines are there but the indentation is gone.
First, let's get the data cleanly into a shell variable. Here's one way:
lines_to_insert=$(cat<<'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
)
Note that there are no \n added; you can just use the text you want to insert unmodified with the sole restriction that it can't contain a line consisting of exactly EOF (and if it does, you can change the here-doc delimiter.) Unfortunately, the later use of sed will modify the text by interpreting some backslash-sequences.
The correct syntax for the sed a command would be the following:
sed -i '/^import re/a \
def string_cleanup(x, notwanted):\
for item in notwanted:\
x = re.sub(item, '', x)\
return x
'
(The commonly-seen sed 'a line to insert' is not Posix standard, and does not allow you to put leading spaces on the line. The correct syntax is as shown above; an a followed by whitespace, followed by a continuation marker and a newline.)
Note that every line except the last ends with a continuation marker (a trailing backslash). We could have put those in the text above, but that would defeat the goal of allowing you to use precisely the text you want inserted.
Instead, when we interpolate the shell variable into the sed command, we'll insert the backslashes using the global search-and-replace syntax:
# The following works with bash 4.3 and up
sed -i.bak "/^import re/a \
${lines_to_insert//$'\n'/$'\\\n'}
" sed-insertlines.txt
# Prior to v4.3, quoting worked differently in replacement
# patterns, and there was a bug with `$'...'` quoting. The
# following will work with all bashes I tested (starting with v3.2):
nl=$'\n' bsnl=$'\\\n'
sed -i.bak "/^import re/a \
${lines_to_insert//$nl/$bsnl}
" sed-insertlines.txt
Another solution is to use the mapfile command to read the lines into an array:
mapfile -t lines_to_insert <<'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
Now we can add the backslashes using printf:
sed -i.bak "/^import re/a \
$(printf '%s\\\n' "${lines_to_insert[#]}")
" sed-insertlines.txt
(The search-and-replace syntax would work on the array as well, but I think the printf command is more readable.)
Unfortunately, that adds an extra newline after the text because all of the lines in the original text were continued. If that's undesired, it could easily be removed in the second solution by inserting the backslash and newline at the beginning of the printf instead of the end, making a slightly less-readable command:
sed -i.bak "/^import re/a $(printf '\\\n%s' "${lines_to_insert[#]}")
" sed-insertlines.txt
Finally, based on a nice answer by Benjamin W, here's a version which uses the sed r command and process substitution (to avoid a temporary file):
sed '/^import re/r '<(cat<<'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
) sed-insertlines.txt
I would use the sed r command, which inserts the contents of a file after the current cycle:
#!/bin/bash
# Write code to be inserted into 'insertfile' with proper indentation
cat <<'EOF' > insertfile
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
# Sed with r command
sed -i '/import re # Regular Expression library/r insertfile' sed-insertlines.txt
# Remove temp file
rm -f insertfile
resulting in
import re # Regular Expression library
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected
Awk solution for this in case you're interested :
python_file:
#!/bin/python
import re # Regular Expression library
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected
Our Script
#!/bin/bash
read -rd '' lines_to_insert << 'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
awk -v from_shell="$lines_to_insert" '
{
if ($0 ~ /import re # Regular Expression library/){
printf "%s\n%s\n",$0,from_shell
}
else{
print $0
}
}' python_file
Output:
#!/bin/python
import re # Regular Expression library
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected
Note :
I have removed the \ns from the $lines_to_insert.

Evaluate a string with indexed array as values

I would like to take a string that contains positional argument markers (not named), supply it with an array (not hash) of values, and have it evaluated.
The use case as an example would be somewhat like ARGV.
For example,
# given:
string = "echo $1 ; echo $#"
values = ["hello", "world"]
# expected result:
"echo hello ; echo hello world"
The below function is the best I could come up with:
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
# Create a variable that can replace $# with all arguments, and quote
# arguments that had "more than one word" originally
all_arguments = arguments.map{|a| a =~ /\s/ ? "\"#{a}\"" : a}.join ' '
# Replace all $1 - $9 with their respective argument ($1 ==> arguments[0])
string.gsub!(/\$(\d)/) { arguments[$1.to_i - 1] }
# Replace $# or $* with all arguments
string.gsub!(/\$[*|#]/, all_arguments)
return string
end
And it seems to me like it can and should be simpler.
I was hoping to find something that is closer to the Kernel.sprintf method of doing things - like "string with %{marker}" % {marker: 'value'}
So, although this issue is almost solved for me (I think), I would love to know if there is something I missed that can make it more elegant.
It seems like you're trying to reproduce Bash-style variable expansion, which is an extremely complex problem. At the very least, though, you can simplify your code in two ways:
Use Kernel.sprintf's built in positional argument feature. The below code does this by substituting e.g. $1 with the sprintf equivalent %1$s.
Use Shellwords from the standard library to escape arguments with spaces etc.
require 'shellwords'
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
tmpl = string.gsub(/\$(\d+)/, '%\1$s')
(tmpl % arguments).gsub(/\$[*#]/, arguments.shelljoin)
end
string = "echo $1 ; echo $#"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
If you didn't have the $* requirement I'd suggest just dropping the Bash-like format and just using sprintf, since it covers everything else you mentioned. Even so, you could further simplify things by using sprintf formatting for everything else:
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
string.gsub('%#', arguments.shelljoin) % arguments
end
string = "echo %1$s ; echo %#"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
Edit
If you want to use %{1} with sprintf you could turn the input array into a hash where the integer indexes are turned into symbol keys, e.g. ["hello", "world"] becomes { :"1" => "hello", :"2" => "world" }:
require "shellwords"
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
string % {
:* => arguments.shelljoin,
**arguments.map.with_index {|val,idx| [ :"#{idx + 1}", val ] }.to_h
}
end
string = "echo %{1} ; echo %{*}"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
string = "echo $1 ; echo $# ; echo $2 ; echo $cat"
values = ["hello", "World War II"]
vals = values.map { |s| s.include?(' ') ? "\"#{s}\"" : s }
#=> ["hello", "\"World War II\""]
all_vals = vals.join(' ')
#=> "hello \"World War II\""
string.gsub(/\$\d+|\$[#*]/) { |s| s[/\$\d/] ? vals[s[1..-1].to_i-1] : all_vals }
#=> "echo hello ; echo hello \"World War II\" ; echo \"World War II\" ; echo $cat" $cat"

Ruby is returning actual newlines instead of \n

I have a file "Foo.md", which contains three lines:
Foo
Bar
I want File.read("Foo.md") to return "Foo\n\nBar" It does this when I run it from irb in the Terminal, but when I run it from a script such as
content = File.read('Foo.md')
puts content
it returns with the lines converted to actual returns. I need that variable as a single line for what comes next in my script.
To be clear: I'm not interested in changing the number of lines, just debugging to make sure the variable content is being passed as a single line.
You are still reading "Foo\n\n\nBar". However, puts interprets the special characters.
You can use String#inspect:
puts content.inspect # => "Foo\n\n\nBar"
str =
"Foo
Bar"
#=> "Foo\n\n\nBar"
You could also do this:
str.gsub(/\n{3,}/,"\n\n")
#=> "Foo\\nnBar"
It might help you visualize what is happening by meditating on this:
str = <<EOT
Foo
Bar
EOT
str's contents look like:
str # => "Foo\n\n\nBar\n"
Inspecting it escapes the backslashes for a visual representation, but it's NOT what you'd want to use when creating a string with embedded line-ends. You'd want to define it as it's shown above.
str.inspect # => "\"Foo\\n\\n\\nBar\\n\""
inspect output varies when viewed in IRB vs. the console. Here's what you'd see in the console:
ruby -e 'puts "\n"; puts "\n".inspect'
"\n"
Printing str to the console:
puts str
# >> Foo
# >>
# >>
# >> Bar

Substituting the value of variable inside backtick operator

How can I substitute the value of a variable inside the backtick operator?
script_dir = File.expand_path File.dirname(__FILE__)
p `ruby -e p "$script_dir"` # this does not work
In Ruby, unlike Perl, the dollar sign indicates a global variable, not a plain regular variable to expand in a string. In a string, you need to use the #{} construct:
p `ruby -e "p #{script_dir}"`
An example:
irb(main):011:0> str = '\'howdy\''
=> "'howdy'"
irb(main):012:0> `ruby -e "p #{str}"`
=> "\"howdy\"\n"
Ruby string interpolation works with backtick operator:
p `ruby -e p "#{script_dir}"`

Resources