Why, in Ruby, does Array("foo\nbar") == ["foo\n", "bar"]? - ruby

In Ruby 1.8.7, Array("hello\nhello") gives you ["hello\n", "hello"]. This does two things that I don't expect:
It splits the string on newlines. I'd expect it simply to give me an array with the string I pass in as its single element without modifying the data I pass in.
Even if you accept that it's reasonable to split a string when passing it to Array, why does it retain the newline character when "foo\nbar".split does not?
Additionally:
>> Array.[] "foo\nbar"
=> ["foo\nbar"]
>> Array.[] *"foo\nbar"
=> ["foo\n", "bar"]

It splits the string on newlines. I'd expect it simply to give me an array with the string I pass in as its single element without modifying the data I pass in.
That's a convention as good as any other. For example, the list constructor in Python does something entirely different:
>>> list("foo")
['f', 'o', 'o']
So long as it's consistent I don't see the problem.
Even if you accept that it's reasonable to split a string when passing it to Array, why does it retain the newline character when "foo\nbar".split does not?
My wild guess here (supported by quick googling and TryRuby) is that the .split method for strings does so to make it the "inverse" operation of the .join method for arrays.
>> "foospambar".split("spam").join("spam")
=> "foospambar"
By the way, I cannot replicate your behaviour on TryRuby:
>> x = Array("foo\nbar")
=> ["foo\nbar"]
>> Array.[] *"foo\nbar"
=> ["foo\nbar"]

If you replace the double-quotes with single-quotes it works as expected:
>> Array.[] "foo\nbar"
=> ["foo\nbar"]
>> Array.[] 'foo\nbar'
=> ["foo\\nbar"]

You may try:
"foo\nbar".split(/w/)
"foo\nbar".split(/^/)
"foo\nbar".split(/$/)
and other regular expressions.

Related

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ā€˜dā€™, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

Replacing regex capture with the same capture and an extra string

I am trying to escape certain characters in a string. In particular, I want to turn
abc/def.ghi into abc\/def\.ghi
I tried to use the following syntax:
1.9.3p125 :076 > "abc/def.ghi".gsub(/([\/.])/, '\\\1')
=> "abc\\1def\\1ghi"
Hmm. This behaves as if capture replacements didn't work. Yet, when I tried this:
1.9.3p125 :075 > "abc/def.ghi".gsub(/([\/.])/, '\1')
=> "abc/def.ghi"
... I got the replacement to work, but, of course, my prefixes weren't part of it.
What is the correct syntax to do something like this?
This should be easier
gsub(/(?=[.\/])/, "\\")
If you are trying to prepare a string to be used as a regex pattern, use the right tool:
Regexp.escape('abc/def.ghi')
=> "abc/def\\.ghi"
You can then use the resulting string to create a regex:
/#{ Regexp.escape('abc/def.ghi') }/
=> /abc\/def\.ghi/
or:
Regexp.new(Regexp.escape('abc/def.ghi'))
=> /abc\/def\.ghi/
From the docs:
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.
Regexp.escape('\*?{}.') #=> \\\*\?\{\}\.
You can pass a block to gsub:
>> "abc/def.ghi".gsub(/([\/.])/) {|m| "\\#{m}"}
=> "abc\\/def\\.ghi"
Not nearly as elegant as #sawa's answer, but it was the only way I could find to get it to work if you need the replacing string to contain the captured group/backreference (rather than inserting the replacement before the look-ahead).

how do I use String.delete to remove '<em>' from a string in Ruby?

I'm sure I can do this with a regex, but I can't find any explanation for this behavior using just normal delete!:
#1.9.2
>> "helllom<em>".delete!"<em>"
=> "hlllo"
The docs don't have anything to say about this. Seems to me that it's treating '<em>' as a set. Where is this documented?
Edit: in my defense I was looking for special treatment of < and > in the docs under delete. Didn't see anything about it and tried google, which also didn't have anything to say about that -- because it doesn't exist.
String#delete is one of those unfortunate methods that is difficult to explain (I have no idea what the use case is). In practice, I've always used gsub with an empty string as the second argument.
'helllom<em>'.gsub '<em>', '' # => "helllom"
Note that String#gsub! also has weirdness such that you should not depend on its return value, it will return nil if it does not alter the string, so it is best to use gsub if you depend on the return value, or if you want to mutate the string, then use gsub! but and don't use anything else on that line.
You cannot use String#delete to remove substrings.
Check the API. It removes all the characters from given parameters from the given string.
I your case it removes all occurrences of e, m, < and >.
Straight from the docs:
delete([other_str]+) ā†’ new_str
Returns a copy of str with all characters in the intersection of its
arguments deleted. Uses the same rules for building the set of
characters as String#count.
ex:
"hello".delete "l","lo" #=> "heo"
"hello".delete "lo" #=> "he"
"hello".delete "aeiou", "^e" #=> "hell"
"hello".delete "ej-m" #=> "ho"
So every character in the intersection of the two strings is removed.

How to parse a string representation of a hash

I have this string and I'm wondering how to convert it to a Hash.
"{:account_id=>4444, :deposit_id=>3333}"
The way suggested in miku's answer is indeed easiest and unsafest.
# DO NOT RUN IT
eval '{:surprise => "#{system \"rm -rf / \"}"}'
# SERIOUSLY, DON'T
Consider using a different string representation of your hashes, e.g. JSON or YAML. It's way more secure and at least equally robust.
With a little replacement, you may use YAML:
require 'yaml'
p YAML.load(
"{:account_id=>4444, :deposit_id=>3333}".gsub(/=>/, ': ')
)
But this works only for this specific, simple string. Depending on your real data you may get problems.
The easiest and unsafest would be to just evaluate the string:
>> s = "{:account_id=>4444, :deposit_id=>3333}"
>> h = eval(s)
=> {:account_id=>4444, :deposit_id=>3333}
>> h.class
=> Hash
if your string hash is some sort of like this (it can be nested or plain hash)
stringify_hash = "{'account_id'=>4444, 'deposit_id'=>3333, 'nested_key'=>{'key1' => val1, 'key2' => val2, 'key3' => nil}}"
you can convert it into hash like this without using eval which is dangerous
desired_hash = JSON.parse(stringify_hash.gsub("'",'"').gsub('=>',':').gsub('nil','null'))
and for the one you posted where the key is a symbol you can use like this
JSON.parse(string_hash.gsub(':','"').gsub('=>','":'))
Guess I never posted my workaround for this... Here it goes,
# strip the hash down
stringy_hash = "account_id=>4444, deposit_id=>3333"
# turn string into hash
Hash[stringy_hash.split(",").collect{|x| x.strip.split("=>")}]

How does Ruby's replace work?

I'm looking at ruby's replace: http://www.ruby-doc.org/core/classes/String.html#M001144
It doesn't seem to make sense to me, you call replace and it replaces the entire string.
I was expecting:
replace(old_value, new_value)
Is what I am looking for gsub then?
replace seems to be different than in most other languages.
I agree that replace is generally used as some sort of pattern replace in other languages, but Ruby is different :)
Yes, you are thinking of gsub:
ruby-1.9.2-p136 :001 > "Hello World!".gsub("World", "Earth")
=> "Hello Earth!"
One thing to note is that String#replace may seem pointeless, however it does remove 'taintediness". You can read more up on tained objects here.
I suppose the reason you feel that replace does not make sense is because there is assigment operator = (not much relevant to gsub).
The important point is that String instances are mutable objects. By using replace, you can change the content of the string while retaining its identity as an object. Compare:
a = 'Hello' # => 'Hello'
a.object_id # => 84793190
a.replace('World') # => 'World'
a.object_id # => 84793190
a = 'World' # => 'World'
a.object_id # => 84768100
See that replace has not changed the string object's id, whereas simple assignment did change it. This difference has some consequences. For example, suppose you assigned some instance variables to the string instance. By replace, that information will be retained, but if you assign the same variable simply to a different string, all that information is gone.
Yes, it is gsub and it is taken from awk syntax. I guess replace stands for the internal representation of the string, since, according to documentation, tainted-ness is removed too.

Resources