Why doesn't File.exist find this file? - ruby

I have a variable like book_file_name which stores a filename with path like this:
book_file_name
=> "./download/Access\\ Database\\ Design\\ \\&\\ Programming,\\ 3rd\\ Edition.PDF"
puts book_file_name
./download/Access\ Database\ Design\ \&\ Programming,\ 3rd\ Edition.PDF
=> nil
book_file_name.length
=> 71
When I use File.exists? to check the file, something is wrong.
This is how I use the string:
File.exists?("./download/Access\ Database\ Design\ \&\ Programming,\ 3rd\ Edition.PDF")
=> true
This is how I use the variable:
File.exists?(book_file_name)
=> false
What's wrong with the variable?

The string
"./download/Access\ Database\ Design\ \&\ Programming,\ 3rd\ Edition.PDF"
is in double-quotes, which causes the backslash+space to be replaced with space
This won't happen with a string variable like book_file_name, and won't happen in a string enclosed within single quotes.
I can see the actual book name with path is
'./download/Access Database Design & Programming, 3rd Edition.PDF'
so
File.exists?('./download/Access Database Design & Programming, 3rd Edition.PDF')
File.exists?("./download/Access Database Design & Programming, 3rd Edition.PDF")
book_file_name = './download/Access Database Design & Programming, 3rd Edition.PDF'
File.exists?(bookfilename)
book_file_name = "./download/Access Database Design & Programming, 3rd Edition.PDF"
File.exists?(bookfilename)
will all work just fine... so you're better off not using backslashes.

As you have shown in your code snippets, the string contained in your variable has backslashes in it. You don't need to escape the spaces, but if you do, you only need to escape them with one backslash. As it stands, you are using double backslashes; the first backslash escapes the second, and has no impact on the space.
puts "file name with spaces"
# => file name with spaces
puts "file\ name\ with\ spaces"
# => file name with spaces
puts "file\\ name\\ with\\ spaces"
# => file\ name\ with\ spaces
This explains why your string literal succeeds where your variable fails: the two strings are not equivalent. So just store the same string literal that succeeded (the one with single backslashes) or else the string literal without any backslashes and you should be good to go.

Related

Regexp.escape adds weird escapes to a plain space

I stumbled over this problem using the following simplified example:
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty. Indeed, this is the case for many strings, but not for this case:
searchstring = "D "
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
p line
It turns out, that line is printed as "D " afterwards, i.e. no replacement had been performed.
This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string, in the following way:
REPLACEMENTS.each do
|from, to|
line.chomp!
line.gsub!(Regexp.escape(from)) { to }
end
. I'm using Regexp.escape just as a safety measure in the case that the string being replaced contains some regex metacharacter.
I'm using the Cygwin port of MRI Ruby 2.6.4.
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty.
Your understanding is incorrect. The guarantee in the docs is
For any string, Regexp.new(Regexp.escape(str))=~str will be true.
This does hold for your example
Regexp.new(Regexp.escape("D "))=~"D " # => 0
therefore this is what your code should look like
line.gsub!(Regexp.new(Regexp.escape(searchstring))) { '' }
As for why this is the case, there used to be a bug where Regex.escape would incorrectly handle space characters:
# in Ruby 1.8.4
Regex.escape("D ") # => "D\\s"
My guess is they tried to keep the fix as simple as possible by replacing 's' with ' '. Technically this does add an unnecessary escape character but, again, that does not break the intended use of the method.
This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
This looks to be a bug. In my opinion, whitespace is not a Regexp meta character, there is no need to escape it.
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string […]
If you want to do literal string replacement, then don't use a Regexp. Just use a literal string:
line.gsub!(from, to)

Unable to substitute escaped characters in string

I have this string:
str = "no,\"contact_last_name\",\"token\""
=> "no,\"contact_last_name\",\"token\""
I want to remove the escaped double quoted string character \". I use gsub:
result = str.gsub('\\"','')
=> "no,\"contact_last_name\",\"token\""
It appears that the string has not substituted the double quote escape characters in the string.
Why am I trying to do this? I have this csv file:
no,"contact_last_name","token",company,urbanization,sec-"property_address","property_address",city-state-zip,ase,oel,presorttrayid,presortdate,imbno,encodedimbno,fca,"property_city","property_state","property_zip"
1,MARIE A JEANTY,1083123,,,,17 SW 6TH AVE,DANIA BEACH FL 33004-3260,Electronic Service Requested,,T00215,12/14/2016,00-314-901373799-105112-33004-3260-17,TATTTADTATTDDDTTFDDFATFTDDDTTFADTTDFAAADDATDAATTFDTDFTTAFFTTATFFF,017,DANIA BEACH,FL, 33004-3260
When I try to open it with CSV, I get the following error:
CSV.foreach(path, headers: true) do |row|
end
CSV::MalformedCSVError: Illegal quoting in line 1.
Once I removed those double quoted strings in the first row (the header), the error went away. So I am trying to remove those double quoted strings before I run it through CSV:
file = File.open "file.csv"
contents = file.read
"no,\"contact_last_name\",\"token\" ... "
contents.gsub!('\\"','')
So again my question is why is gsub not removing the specified characters? Note that this actuall does work:
contents.gsub /"/, ""
as if the string is ignoring the \ character.
There is no escaped double quote in this string:
"no,\"contact_last_name\",\"token\""
The interpreter recognizes the text above as a string because it is enclosed in double quotes. And because of the same reason, the double quotes embedded in the string must be escaped; otherwise they signal the end of the string.
The enclosing double quote characters are part of the language, not part of the string. The use of backslash (\) as an escape character is also the language's way to put inside a string characters that otherwise have special meaning (double quotes f.e.).
The actual string stored in the str variable is:
no,"contact_last_name","token"
You can check this for yourself if you tell the interpreter to put the string on screen (puts str).
To answer the issue from the question's title, all your efforts to substitute escaped characters string were in vain just because the string doesn't contain the character sequences you tried to find and replace.
And the actual problem is that the CSV file is malformed. The 6th value on the first row (sec-"property_address") doesn't follow the format of a correctly encoded CSV file.
It should read either sec-property_address or "sec-property_address"; i.e. the value should be either not enclosed in quotes at all or completely enclosed in quotes. Having it partially enclosed in quotes confuses the Ruby's CSV parser.
The string looks fine; You're not understanding what you're seeing. Meditate on this:
"no,\"contact_last_name\",\"token\"" # => "no,\"contact_last_name\",\"token\""
'no,"contact_last_name","token"' # => "no,\"contact_last_name\",\"token\""
%q[no,"contact_last_name","token"] # => "no,\"contact_last_name\",\"token\""
%Q#no,"contact_last_name","token"# # => "no,\"contact_last_name\",\"token\""
When looking at a string that is delimited by double-quotes, it's necessary to escape certain characters, such as embedded double-quotes. Ruby, along with many other languages, has multiple ways of defining a string to remove that need.

Make: how to replace character within a make variable?

I have a variable such :
export ITEM={countryname}
this can be :
"Albania",
"United States" // with space
"Fs. Artic Land" // dot
"Korea (Rep. Of)" // braket
"Cote d'Ivoir" // '
This variable $(ITEM) is passed to other commands, some needing is as it (fine, I will use $(ITEM)), some MUST HAVE characters replacements, by example, to go with mkdir -p ../folder/{countryname} :
"Albania" // => Albania
"United States" // => United_States
"Fs. Artic Land" // => Fs\._Artic_Land
"Korea (Rep. Of)" // => Korea_\(Rep\._Of\)
"Cote d'Ivoire" // => Cote_d\'Ivoire
So I need a new make variable such
export ITEM={countryname}
export escaped_ITEM=$(ITEM).processed_to_be_fine
How should I do this characters replacements within my makefile ? (to keep things simple and not have to do an external script). I was thinking to use some transclude tr or something.
Note: working on Ubuntu.
You can use the subst function in GNU Make to perform substitutions.
escaped_ITEM := $(subst $e ,_,$(ITEM))
(where $e is an undefined or empty variable; thanks to #EtanReisner for pointing it out).
You will need one subst for each separate substitution, though.
If at all possible, I would advise against this, however -- use single, machine-readable tokens for file names, and map them to human readable only as the very last step. That's also much easier in your makefile:
human_readable_us=United States
human_readable_kr=Korea (Rep. of)
human_readable_ci=Côte d'Ivoire
human_readable_tf=FS. Antarctic Lands
stuff:
echo "$(human_readable_$(ITEM))"
Given the input simply "quoting" the country "names" when using them in the shell will work fine (for the few shown here) but double quoting arbitrary strings is not safe as any number of things can still evaluate inside double quotes (and with the way make operates even double quotes themselves in the string will cause problems).
If you need to pass "random" strings to the shell their is only one safe way to do that: replace every instance of ' (a single quote) in the string with '\'' and then wrap the string in ' (single quotes). (Depending on the consumer of the string replacing each ' with \047 can also work.)

Ruby: unexplained behaviour of String#sub in the presence of "\\'"

I can't understand why this happens:
irb(main):015:0> s = "Hello\\'World"
=> "Hello\\'World"
irb(main):016:0> "#X#".sub("X",s)
=> "#Hello#World#"
I would have thought the output would be "#Hello\'World#", and I certainly can't understand where the extra # came from.
I guess I'm unfamiliar with something that has got to do with the internals of String#sub and to the "\'" symbols.
It's due to the use of backslash in a sub replacement string.
Your replacement string contains \' which is expanded to the global variable $' which is otherwise known as POSTMATCH. For a string replacement, it contains everything in the string which exists following the matched text. So because your X that you replaced is followed by a #, that's what gets inserted.
Compare:
"#X$".sub("X",s)
=> "#Hello$World$"
Note that the documentation for sub refers to use of backreferences \0 through \9. This seems to refer directly to the global variables $0 to $9 and also applies to other global variables.
For reference, the other global variables set by regular expression matching are:
$~ is equivalent to ::last_match;
$& contains the complete matched text;
$` contains string before match;
$' contains string after match;
$1, $2 and so on contain text matching first, second, etc capture group;
$+ contains last capture group.

Ruby string sub without regex back references

I'm trying to do a simple string sub in Ruby.
The second argument to sub() is a long piece of minified JavaScript which has regular expressions contained in it. Back references in the regex in this string seem to be effecting the result of sub, because the replaced string (i.e., the first argument) is appearing in the output string.
Example:
input = "string <!--tooreplace--> is here"
output = input.sub("<!--tooreplace-->", "\&")
I want the output to be:
"string \& is here"
Not:
"string & is here"
or if escaping the regex
"string <!--tooreplace--> is here"
Basically, I want some way of doing a string sub that has no regex consequences at all - just a simple string replace.
To avoid having to figure out how to escape the replacement string, use Regex.escape. It's handy when replacements are complicated, or dealing with it is an unnecessary pain. A little helper on String is nice too.
input.sub("<!--toreplace-->", Regexp.escape('\&'))
You can also use block notation to make it simpler (as opposed to Regexp.escape):
=> puts input.sub("<!--tooreplace-->") {'\&'}
string \& is here
Use single quotes and escape the backslash:
output = input.sub("<!--tooreplace-->", '\\\&') #=> "string \\& is here"
Well, since '\\&' (that is, \ followed by &) is being interpreted as a special regex statement, it stands to reason that you need to escape the backslash. In fact, this works:
>> puts 'abc'.sub 'b', '\\\\&'
a\&c
Note that \\\\& represents the literal string \\&.

Resources