Ruby - How to get rid of escape character and "\n" while converting string to hash value? - ruby

In Ruby, i'm trying to convert a string to a hash value. It shows up with escapse character and "\n" in the string.
Eg:
hashex = { keyex: 'example "test" line 1
line 2 "test2"'}
puts hashex
It is printing the result as
{:keyex=>"example \"test\" line 1\n line 2 \"test2\""}
I need to get the result as
{ keyex: 'example "test" line 1
line 2 "test2"'}
preserving the newline (not '\n') and the "". Kindly help.

Note
{:keyex=>"example \"test\" line 1\n line 2 \"test2\""}
is just the way Ruby represents the hash. It is 100% the same object as :
{ keyex: 'example "test" line 1
line 2 "test2"'}
even though it might look different.
Code
You could replace "\\n" from inspect with newlines, \" with " and " with ' :
hashex = { keyex: 'example "test" line 1
line 2 "test2"'}
puts hashex.inspect.gsub("\\n", "\n").gsub('"', "'").gsub("\\'",'"')
# {:keyex=>'example "test" line 1
# line 2 "test2"'}

Related

How do I write a regex that eliminates the space between a number and a colon?

I want to replace a space between one or two numbers and a colon followed by a space, a number, or the end of the line. If I have a string like,
line = " 0 : 28 : 37.02"
the result should be:
" 0: 28: 37.02"
I tried as below:
line.gsub!(/(\A|[ \u00A0|\r|\n|\v|\f])(\d?\d)[ \u00A0|\r|\n|\v|\f]:(\d|[ \u00A0|\r|\n|\v|\f]|\z)/, '\2:\3')
# => " 0: 28 : 37.02"
It seems to match the first ":", but the second ":" is not matched. I can't figure out why.
The problem
I'll define your regex with comments (in free-spacing mode) to show what it is doing.
r =
/
( # begin capture group 1
\A # match beginning of string (or does it?)
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
) # end capture group 1
(\d?\d) # match one or two digits in capture group 2
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
: # match ":"
( # begin capture group 3
\d # match a digit
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
| # or
\z # match the end of the string
) # end capture group 3
/x # free-spacing regex definition mode
Note that '|' is not a special character ("or") within a character class. It's treated as an ordinary character. (Even if '|' were treated as "or" within a character class, that would serve no purpose because character classes are used to force any one character within it to be matched.)
Suppose
line = " 0 : 28 : 37.02"
Then
line.gsub(r, '\2:\3')
#=> " 0: 28 : 37.02"
$1 #=> " "
$2 #=> "0"
$3 #=> " "
In capture group 1 the beginning of the line (\A) is not matched because it is not a character and only characters are not matched (though I don't know why that does not raise an exception). The special character for "or" ('|') causes the regex engine to attempt to match one character of the string " \u00A0|\r\n\v\f". It therefore would match one of the three spaces at the beginning of the string line.
Next capture group 2 captures "0". For it to do that, capture group 1 must have captured the space at index 2 of line. Then one more space and a colon are matched, and lastly, capture group 3 takes the space after the colon.
The substring ' 0 : ' is therefore replaced with '\2:\3' #=> '0: ', so gsub returns " 0: 28 : 37.02". Notice that one space before '0' was removed (but should have been retained).
A solution
Here's how you can remove the last of one or more Unicode whitespace characters that are preceded by one or two digits (and not more) and are followed by a colon at the end of the string or a colon followed by a whitespace or digit. (Whew!)
def trim(str)
str.gsub(/\d+[[:space:]]+:(?![^[:space:]\d])/) do |s|
s[/\d+/].size > 2 ? s : s[0,s.size-2] << ':'
end
end
The regular expression reads, "match one or more digits followed by one or more whitespace characters, followed by a colon (all these characters are matched), not followed (negative lookahead) by a character other than a unicode whitespace or digit". If there is a match, we check to see how many digits there are at the beginning. If there are more than two the match is returned (no change), else the whitespace character before the colon is removed from the match and the modified match is returned.
trim " 0 : 28 : 37.02"
#=> " 0: 28: 37.02" xxx
trim " 0\v: 28 :37.02"
#=> " 0: 28:37.02"
trim " 0\u00A0: 28\n:37.02"
#=> " 0: 28:37.02"
trim " 123 : 28 : 37.02"
#=> " 123 : 28: 37.02"
trim " A12 : 28 :37.02"
#=> " A12: 28:37.02"
trim " 0 : 28 :"
#=> " 0: 28:"
trim " 0 : 28 :A"
#=> " 0: 28 :A"
If, as in the example, the only characters in the string are digits, whitespaces and colons, the lookbehind is not needed.
You can use Ruby's \p{} construct, \p{Space}, in place of the POSIX expression [[:space:]]. Both match a class of Unicode whitespace characters, including those shown in the examples.
Excluding the third digit can be done with a negative lookback, but since the other one or two digits are of variable length, you cannot use positive lookback for that part.
line.gsub(/(?<!\d)(\d{1,2}) (?=:[ \d\$])/, '\1')
# => " 0: 28: 37.02"
" 0 : 28 : 37.02".gsub!(/(\d)(\s)(:)/,'\1\3')
=> " 0: 28: 37.02"

Ruby one liner to replace only lines that match, discard others

Looking for the ruby one liner substitute to print out a substitution only if the line matches the regular expression:
echo -e "Line 1\nLine 2\nLine 3" | perl -ne "print if s/Line 2/Line 2 replaced, others discarded/g"
Input:
Line 1
Line 2
Line 3
Output:
Line 2 replaced, others discarded
As I know, there is no equivalent to -ne shorthand in ruby. So it will be little longer:
echo -e "Line 1\nLine 2\nLine 3" | ruby -e 'puts $<.read.lines.map {|l| l =~ /Line 2/ ? l.gsub(/Line 2/, "Line 2 replaced, others discarded") : nil }.compact'
Where:
$< also ARGF (docs) is Stream for file argument or STDIO
$<.read will read it all to string
$<.read.lines split by new line character, returns array
map {|l| ... } will collect result of expression in a block to new array
l =~ /Line 2/ check if string match Regex
l.gsub(/Line 2/, "Line 2 replaced") will replace all "Line 2" to "Line 2 replaced"
.compact will remove nil values from array (return new array without nil's)
puts [] will print each element of array on new line
Probably ruby is not a best chose for this task, I would choose sed or do it in text editor. Most of text editors can find and replace by regex nowdays

Read files line by line with \r, \n or \r\n as line separator

I want to process files line by line. However, these files have different line separators: "\r", "\n" or "\r\n". I don't know which one they use or which kind of OS they come from.
I have two solutions:
using bash command to translate these separators to "\n".
cat file |
tr '\r\n' '\n' |
tr '\r' '\n' |
ruby process.rb
read the whole file and gsub these separators
text=File.open('xxx.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
do some thing
end
but the second solution is not good when the file is huge. See reference. Is there any other ruby idiomatic and efficient solution?
I suggest you first determine the line separator. I've assumed that you can do that by reading characters until you encounter "\n" or "\r" (or reach the end of the file, in which case we can regard "\n" as the line separator). If the character "\n" is found, I assume that to be the separator; if "\r" is found I attempt to read the next character. If I can do so and it is "\n", I return "\r\n" as the separator. If "\r" is the last character in the file or is followed by a character other than "\n", I return "\r" as the separator.
def separator(fname)
f = File.open(fname)
enum = f.each_char
c = enum.next
loop do
case c[/\r|\n/]
when "\n" then break
when "\r"
c << "\n" if enum.peek=="\n"
break
end
c = enum.next
end
c[0][/\r|\n/] ? c : "\n"
end
Then process the file line-by-line
def process(fname)
sep = separator(fname)
IO.foreach(fname, sep) { |line| puts line }
end
I haven't converted "\r" or "\r\n" to "\n", but of course you could do that easily. Just open a file for writing and in process read each line and write it to the output file with the default line separator.
Let's try it (for clarity I show the value returned by separator):
fname = "temp"
IO.write(fname, "slash n line 1\nslash n line 2\n")
#=> 30
separator(fname)
#=> "\n"
process(fname)
# slash n line 1
# slash n line 2
IO.write(fname, "slash r line 1\rslash r line 2\r", )
#=> 30
separator(fname)
#=> "\r"
process(fname)
# slash r line 1
# slash r line 2
IO.write(fname, "slash r slash n line 1\r\nslash r slash n line 2\r\n")
#=> 48
separator(fname)
#=> "\r\n"
process(fname)
# slash r slash n line 1
# slash r slash n line 2

Escape hash sign in Yaml multiline text

Is it possible to escape a hash sign (#) from a multiline text?
...
-
my_story: |
Line 1
Line 2
# Hash line
What I was hoping to get is:
array {
'my_story' => 'Line 1
Line 2
# Hash line'
}
If I wrap the hash line with quotes I get them in the text:
'Line 1
Line 2
"# Hash line"'
Any ideas..?
What you wrote is perfectly fine and '#' should be correctly processed. The following code works just fine in Python 3 (pyyaml)
data="""
-
my_story: |
Line 1
Line 2
# Hash line
"""
import yaml
deserializedData = yaml.load ( data )
print ( deserializedData[0]['my_story'] )
The above line prints
Line 1
Line 2
# Hash line

Reading from stdin and printing to stdout in Ruby

This question is kinda simple (don't be so harsh with me), but I can't get a code-beautiful solution. I have the following code:
ARGF.each_line do |line|
arguments = line.split(',')
arguments.each do |task|
puts "#{task} result"
end
end
It simply read from the standard input numbers. I use it this way:
echo "1,2,3" | ruby prog.rb
The output desired is
1 result
2 result
3 result
But the actual output is
1 result
2 result
3
result
It seems like there's a newline character introduced. I'm skipping something?
Each line ends in a newline character, so splitting on commas in your example means that the last token is 3\n. Printing this prints 3 and then a newline.
Try using
arguments = line.chomp.split(',')
To remove the trailing newlines before splitting.
Your stdin input includes a trailing newline character. Try calling line.chomp! as the first instruction in your each_line block.

Resources