How to split a string by colons NOT in quotes - ruby

I have a CSV-file delimited by colons, but it contains text-fields wrapped in quotes, which themselves contain several colons.
I would like a simple solution for getting the data fields, but eg. in ruby the split method splits on every colon.
Is there a regex which matches all colons, except those wrapped in quotes?

Given:
str = 'foo:bar:"jim:jam":jar'
You can do this:
a = str.scan( /([^":]+)|"([^"]+)"/ ).flatten.compact
p a
#=> ["foo", "bar", "jim:jam", "jar"]
Or you can do this:
a = []
str.scan( /([^":]+)|"([^"]+)"/ ){ a << ($1 || $2) }
p a
#=> ["foo", "bar", "jim:jam", "jar"]
Those regex say to find either
One or more characters that are not a-quote-or-a-colon, or
A quote, followed by one or more characters that are not a quote, followed by a quote.

Just use http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html

you can split on double quotes instead of colons
>> str = 'foo:bar:"jim:jam":jar'
=> "foo:bar:\"jim:jam\":jar"
>> str.split("\"").each_with_index do |x,y|
?> puts y%2==0 ? x.split(":") : x
>> end
foo
bar
jim:jam
jar

First attempt was so bad, revised the entire thing. This is my regex solution:
GETS LAST delimeter field ':' = :last
Trims: /(?:^\s*:|:|^)\s*(".*?"|.*?)(?=\s*(?:\:|$))/
No-trim: /(?:(?<!^):|^)(\s*".*?"\s*|.*?)(?=\:|$)/
------------------
GETS FIRST AND LAST delimeter fields ':' = first:last
Trims: /(?:^|:)\s*(".*?"|(?<!^).*?|)(?=\s*(?:\:|$))/
No trim: /(?:^|:)(\s*".*?"\s*|\s*(?<!^).*?|)(?=\:|$)/
And yes, its not as easy as one thinks ..

Related

how to split a string between 2 parametres in ruby

Hi I try to separate input like this : <Text1><Text2><Text2>..<TextN>
in a array who only have each text in each index, how I can use split with double parameters?
I try make a double split but doesn't work:
request = client.gets.chomp
dev = request.split("<")
split_doble(dev)
dev.each do |devo|
puts devo
end
def split_doble (str1)
str1.each do |str2|
str2.split(">")
end
end
When you have a string like this
string = "<text1><text2><textN>"
then you can extract the text between the < and > chars like that:
string.scan(/\w+/)
#=> ["text1", "text2", "textN"]
/\w+/ is a regular expression that matches a sequence of word characters (letter, number, underscore) and therefore ignores the < and > characters.
Also see docs about String#scan.
In the string "<text1><text2><textN>" the leading < and ending > are in the way, so get rid of them by slicing them off. Then just split on "><".
str = "<text1><text2><textN>"
p str[1..-2].split("><") # => ["text1", "text2", "textN"]

How to split a string with "\" using ruby?

Lets say I have a string:
str = "12345\56789"
How to split above string into 2 words?
["12345","56789"]
str = "12345/56789"
print str.split('/') # => ["12345", "56789"]
Edit: With the change to a backslash, it should be:
str = '12345\56789'
print str.split('\\') # => ["12345", "56789"]
You need the double backslash to avoid escaping the closing quote mark.
Regexp.quote returns a string with special characters escaped. This returned string can be split with '\\'.
So the solution is: Regexp.quote('00050\00050').split('\\')[0]

How do I remove a substring after a certain character in a string using Ruby?

How do I remove a substring after a certain character in a string using Ruby?
new_str = str.slice(0..(str.index('blah')))
I find that "Part1?Part2".split('?')[0] is easier to read.
I'm surprised nobody suggested to use 'gsub'
irb> "truncate".gsub(/a.*/, 'a')
=> "trunca"
The bang version of gsub can be used to modify the string.
str = "Hello World"
stopchar = 'W'
str.sub /#{stopchar}.+/, stopchar
#=> "Hello W"
A special case is if you have multiple occurrences of the same character and you want to delete from the last occurrence to the end (not the first one).
Following what Jacob suggested, you just have to use rindex instead of index as rindex gets the index of the character in the string but starting from the end.
Something like this:
str = '/path/to/some_file'
puts str.slice(0, str.index('/')) # => ""
puts str.slice(0, str.rindex('/')) # => "/path/to"
We can also use partition and rpartitiondepending on whether we want to use the first or last instance of the specified character:
string = "abc-123-xyz"
last_char = "-"
string.partition(last_char)[0..1].join #=> "abc-"
string.rpartition(last_char)[0..1].join #=> "abc-123-"

How to strip leading and trailing quote from string, in Ruby

I want to strip leading and trailing quotes, in Ruby, from a string. The quote character will occur 0 or 1 time. For example, all of the following should be converted to foo,bar:
"foo,bar"
"foo,bar
foo,bar"
foo,bar
You could also use the chomp function, but it unfortunately only works in the end of the string, assuming there was a reverse chomp, you could:
'"foo,bar"'.rchomp('"').chomp('"')
Implementing rchomp is straightforward:
class String
def rchomp(sep = $/)
self.start_with?(sep) ? self[sep.size..-1] : self
end
end
Note that you could also do it inline, with the slightly less efficient version:
'"foo,bar"'.chomp('"').reverse.chomp('"').reverse
EDIT: Since Ruby 2.5, rchomp(x) is available under the name delete_prefix, and chomp(x) is available as delete_suffix, meaning that you can use
'"foo,bar"'.delete_prefix('"').delete_suffix('"')
I can use gsub to search for the leading or trailing quote and replace it with an empty string:
s = "\"foo,bar\""
s.gsub!(/^\"|\"?$/, '')
As suggested by comments below, a better solution is:
s.gsub!(/\A"|"\Z/, '')
As usual everyone grabs regex from the toolbox first. :-)
As an alternate I'll recommend looking into .tr('"', '') (AKA "translate") which, in this use, is really stripping the quotes.
Another approach would be
remove_quotations('"foo,bar"')
def remove_quotations(str)
if str.start_with?('"')
str = str.slice(1..-1)
end
if str.end_with?('"')
str = str.slice(0..-2)
end
end
It is without RegExps and start_with?/end_with? are nicely readable.
It frustrates me that strip only works on whitespace. I need to strip all kinds of characters! Here's a String extension that will fix that:
class String
def trim sep=/\s/
sep_source = sep.is_a?(Regexp) ? sep.source : Regexp.escape(sep)
pattern = Regexp.new("\\A(#{sep_source})*(.*?)(#{sep_source})*\\z")
self[pattern, 2]
end
end
Output
'"foo,bar"'.trim '"' # => "foo,bar"
'"foo,bar'.trim '"' # => "foo,bar"
'foo,bar"'.trim '"' # => "foo,bar"
'foo,bar'.trim '"' # => "foo,bar"
' foo,bar'.trim # => "foo,bar"
'afoo,bare'.trim /[aeiou]/ # => "foo,bar"
Assuming that quotes can only appear at the beginning or end, you could just remove all quotes, without any custom method:
'"foo,bar"'.delete('"')
I wanted the same but for slashes in url path, which can be /test/test/test/ (so that it has the stripping characters in the middle) and eventually came up with something like this to avoid regexps:
'/test/test/test/'.split('/').reject(|i| i.empty?).join('/')
Which in this case translates obviously to:
'"foo,bar"'.split('"').select{|i| i != ""}.join('"')
or
'"foo,bar"'.split('"').reject{|i| i.empty?}.join('"')
Regexs can be pretty heavy and lead to some funky errors. If you are not dealing with massive strings and the data is pretty uniform you can use a simpler approach.
If you know the strings have starting and leading quotes you can splice the entire string:
string = "'This has quotes!'"
trimmed = string[1..-2]
puts trimmed # "This has quotes!"
This can also be turned into a simple function:
# In this case, 34 is \" and 39 is ', you can add other codes etc.
def trim_chars(string, char_codes=[34, 39])
if char_codes.include?(string[0]) && char_codes.include?(string[-1])
string[1..-2]
else
string
end
end
You can strip non-optional quotes with scan:
'"foo"bar"'.scan(/"(.*)"/)[0][0]
# => "foo\"bar"

How to properly join LF character in an array?

Basic question.
Instead of adding '\n' between the elements:
>> puts "#{[1, 2, 3].join('\n')}"
1\n2\n3
I need to actually add the line feed character, so the output I expect when printing it would be:
1
2
3
What's the best way to do that in Ruby?
You need to use double quotes.
puts "#{[1, 2, 3].join("\n")}"
Note that you don't have to escape the double quotes because they're within the {} of a substitution, and thus will not be treated as delimiters for the outer string.
However, you also don't even need the #{} wrapper if that's all your doing - the following will work fine:
puts [1,2,3].join("\n")
Escaped characters can only be used in double quoted strings:
puts "#{[1, 2, 3].join("\n")}"
But since all you output is this one statement, I wouldn't quote the join statement:
puts [1, 2, 3].join("\n")
Note that join only adds line feeds between the lines. It will not add a line-feed to the end of the last line. If you need every line to end with a line-feed, then:
#!/usr/bin/ruby1.8
lines = %w(one two three)
s = lines.collect do |line|
line + "\n"
end.join
p s # => "one\ntwo\nthree\n"
print s # => one
# => two
# => three
Ruby doesn't interpret escape sequences in single-quoted strings.
You want to use double-quotes:
puts "#{[1, 2, 3].join(\"\n\")}"
NB: My syntax might be bad - I'm not a Ruby programmer.

Resources