Replace "OS agnostic" newlines - ruby

I have several different document formats coming in. I'd like to strip out all the newlines and replace them with a " ". How can I account for newlines other than "\n"?
Something like s.gsub("\n", " ")

Most operating systems use \n or \r (or a combination) for newlines.
s.gsub(/[\n\r]+/, " ") should do the trick.
/[\n\r]+/ is known as a regular expression. It matches \n, \r and any combination of the two.

To make it your code more readable you could however use my gem.
You can install it this way:
gem install linebreak
You can use it this way:
require 'aef/linebreak/string_extension'
"Something\n".linebreak_encode(" ")
# => "Something "
Other examples:
"Something\n".linebreak_encode(:windows)
# => "Something\r\n"
"Something\r\n".linebreak_encode(:unix)
# => "Something\n"
It additionally comes with a commandline tool. Documentation can be found here.

Related

bash, getting ' characters inside printf statements

I find it hard to nest printf statements inside aliases. I have a number of help topics that I want to have available (just little collections of helpful tips for when I forget syntax). I have found out that printf requires \ to be escaped as \\ and % to be escaped as %%. However, my problem is more to do with ' and "
alias helpx='printf "A note about 'vim'.\n"'
=> A note about vim. # The ' are ignored.
alias helpx="printf 'A note about 'vim'.\n'"
=> A note about vim. # The ' are ignored.
alias helpx='printf "A note about \'vim\'.\n"' # Invalid syntax
alias helpx='printf "A note about \"vim\".\n"'
=> A note about "vim". # Some progress, I can now get " here
How can I get ' characters inside my notes in the above?
Would you please try:
alias helpx='printf "A note about '\''vim'\''.\n"'
or:
alias helpx="printf \"A note about 'vim'.\\n\""
You can do following to escape character while using printf-command on an unix terminal:
printf "A note about \'vim\'.\n"
Since you are interested in assigning this command to a variale (your alias "helpx"), you could do it at least in two different ways:
If you have affinity to ASCI punctuation & symbols, then without dealing to much with escaping of characters, then use the above solution
alias helpx='printf "A note about \u0027vim\u0027.\u000A"'
If you don't have affinity to ASCI punctuation & symbols
use the answer proposed by #tshiono
Hopefully, this will help also in the future when dealing with such problems.

Ruby /& in a String

I need to include /& in a ZPL string to break up a long line on a label.
There is a stackoverflow post that suggests a lot of methods but does not seem to answer my question:
Here is my problem:
>>"asdf & asdf".gsub("&", "\\\\&")
=>"asdf \\& asdf"
Yes, if I puts the string it will return what I want:
>>puts "asdf & asdf".gsub("&", "\\\\&")
=>asdf \& asdf
But I need the actual string to equal asdf \& asdf
I've tried inspect:
>>"asdf & asdf".gsub("&", "\\\\&").inspect
=>"\"asdf \\\\& asdf\""
>>"asdf & asdf".gsub("&", "\&").inspect
=>"\"asdf & asdf\""
But that also does not return what I need. Maybe there is some combination that I'm missing that will return a \& in the string?
Thanks
When you see Ruby showing something like:
"asdf \\& asdf"
That's broken up into the tokens \\ (backslash) and & (ampersand) where the first backslash is special, the second is an actual character. You can read this as "literal backslash".
When printed you're seeing the actual string. It's correct.
Internally the double backslashes aren't actually there, it's just a consequence of how double-quoted strings must be handled.
This is to ensure that \n and \r and other control characters work, but you're also able to put in actual backslashes. There's a big difference between \n (newline) and \\n (literal backslash letter n).
You'll see this often when dealing with data formats that have special characters. For example, in printf style formatters % is significant, so to print an actual % you need to double it up:
'%.1f%%' % 10.3
#=> "10.3%"
Each format will have its own quirks and concerns. HTML doesn't treat backslash as special, but it does < and > and &, so you'll see & instead of ampersand.

Why does File.dirname returns a period when I expect a path?

I am trying to get the directory of a file on a Windows box using File.dirname. I get the file ("file1" below) from the Windows box and return it to my the Mac OS X box that the script is run on.
file1 = "C:\Administrator\proj1\testFile.txt" below is to simplify my example, however, to make it more clear, I am getting this value from a remote box and returning it to my development box:
file1 = "C:\Administrator\proj1\testFile.txt"
path = "#{File.dirname(file1)}"
puts "#{path}"
>> .
I am confused on why it would return '.'. I saw on ruby-doc.org that File.dirname says the following:
"Returns all components of the filename given in file_name except the last one. The filename can be formed using both File::SEPARATOR and File::ALT_SEPARETOR as the separator when File::ALT_SEPARATOR is not nil."
I did a puts on File::SEPARATOR and File::ALT_SEPARATOR and got the following:
File::SEPARATOR >> /
File::ALT_SEPARATOR >>
I assumed it was because "\" wasn't a valid file separator. So I set File::ALT_SEPARATOR to "\". However, even after that, I still got the same value when I puts path.
I tried using File.realdirpath and this was the result:
file1 = "C:\Administrator\proj1\testFile.txt"
path = "#{File.realdirpath(file1)}"
puts "{path}"
>> /Users/me/myProject/C:\Administrator\proj1\testFile.txt
It seemed to add the path from where I called the Ruby script and appended the full path (including the file name). Seems to be odd behavior.
Any ideas, comments or suggestions would be great.
The problem is that when you declare file1, those backslashes define escape characters. Notice the return:
file1 = "C:\Administrator\proj1\testFile.txt"
=> "C:Administratorproj1\testFile.txt"
If you want to store a filepath in a string, you either need to use forward slashes or double backslashes (to escape the escape character):
file1 = "C:\\Administrator\\proj1\\testFile.txt"
file1 = "C:/Administrator/proj1/testFile.txt"
Okay, I was able to duplicate this problem as well.
As #fbonetti pointed out, you have to enclose your directory with single quotes to keep ruby from interpreting the backslashes as escapes, so start with that...
>> file1='C:\Administrator\proj1\testFile.txt'
=> "C:\\Administrator\\proj1\\testFile.txt"
Then, passing file1 through gsub to 'normalize' the slashes, gives you the results you're expecting.
>> File.dirname(file1.gsub('\\', '/'))
=> "C:/Administrator/proj1"
Of course, you could always reverse the gsub if you needed them to be backslashes again.
>> File.dirname(file1.gsub('\\', '/')).gsub('/', '\\')
=> "C:\\Administrator\\proj1"
I figured it out. It was an issue with the version of Ruby I was using. I was using ruby 1.9.3 and then I switched to jruby 1.7.3 and it works correctly now.
Ruby's IO documentation is of great help when dealing with different OS path separators. From the documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb". When specifying a Windows-style filename in a Ruby string, remember to escape the backslashes:
"c:\\gumby\\ruby\\test.rb"
Our examples here will use the Unix-style forward slashes; File::ALT_SEPARATOR can be used to get the platform-specific separator character.
So, in other words, you don't need to hassle with backslashes, and whether you need to use single or double-quotes. Keep it simple, and use forward-slashes and let Ruby worry about it. That way your code is portable across *nix/Mac OS and Windows.
Beyond that, it looks like there's a real need to learn how character escaping works in double-quoted strings vs. single-quoted strings. This is from "Programming Ruby":
Ruby provides a number of mechanisms for creating literal strings. Each generates objects of type String. The different mechanisms vary in terms of how a string is delimited and how much substitution is done on the literal's content.
Single-quoted string literals (' stuff ' and %q/stuff/) undergo the least substitution. Both convert the sequence into a single backslash, and the form with single quotes converts \' into a single quote.
'hello' » hello
'a backslash \'\\\'' » a backslash '\'
%q/simple string/ » simple string
%q(nesting (really) works) » nesting (really) works
%q no_blanks_here ; » no_blanks_here
Double-quoted strings ("stuff", %Q/stuff/, and %/stuff/) undergo additional substitutions, shown in Table 18.2 on page 203.
Substitutions in double-quoted strings
\\a Bell/alert (0x07) \\nnn Octal nnn
\\b Backspace (0x08) \\xnn Hex nn
\\e Escape (0x1b) \\cx Control-x
\\f Formfeed (0x0c) \\C-x Control-x
\\n Newline (0x0a) \\M-x Meta-x
\\r Return (0x0d) \\M-\\C-x Meta-control-x
\\s Space (0x20) \\x x
\\t Tab (0x09) #{expr} Value of expr
\\v Vertical tab (0x0b)
a = 123
"\123mile" » Smile
"Say \"Hello\"" » Say "Hello"
%Q!"I said 'nuts'," I said! » "I said 'nuts'," I said
%Q{Try #{a + 1}, not #{a - 1}} » Try 124, not 122
%<Try #{a + 1}, not #{a - 1}> » Try 124, not 122
"Try #{a + 1}, not #{a - 1}" » Try 124, not 122

quote_char causing fits in ruby CSV import

I have a simple CSV file that uses the | (pipe) as a quote character. After upgrading my rails app from Ruby 1.9.2 to 1.9.3 I'm getting an "CSV::MalformedCSVError: Missing or stray quote in line 1" error.
If I pop open vim and replace the | with regular quotes, single quotes or even "=", the file works fine, but | and * result in the error. Anyone have any thoughts on what might be causing this? Here's a simple one-liner that can reproduce the error:
#csv = CSV.read("public/sample_file.csv", {quote_char: '|', headers: false})
Also reproduced this in Ruby 2.0 and also in irb w/out loading rails.
Edit: here are some sample lines from the CSV
|076N102 |,|CARD |,| 1|,|NEW|,|PCS |
|07-1801 |,|BASE |,| 18|,|NEW|,|PCS |
I think you've just discovered a bug in CSV ruby module.
From csv.rb :
1587: #re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(#encoding)}/
This Regexp is used to escape characters conflicting with special regular expression symbols, including your "pipe" char | .
I don't see any reason for the prepending [-], so if you do remove it, your example starts to work:
edit: the hyphen has to be escaped inside character set expression (surrounded with brackets []) only when not as the leading character. So had to update the fixed Regexp:
1587: #re_chars = /#{%"(?<!\\[)-(?=.*\\])|[\\.^$?*+{}()|# \r\n\t\f\v]".encode(#encoding)}/
CSV.read('sample.csv', {quote_char: '|'})
# [["076N102 ",
# "CARD ",
# " 1", "NEW", "PCS "],
# ["07-1801 ",
# "BASE ",
# " 18", "NEW", "PCS "]]
As most languages does not support lookbehind expressions with quantifiers, Ruby included, I had to write it as a negative version for the left bracket. It would also match hyphens with missing left one of a bracket pair. If you'd find a better solution, leave a comment pls.
Glad to hear any comments before fill in a bug report to ruby-lang.org .

Rubular/Ruby discrepancy in captured text

I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
only gets me the first line, i.e.
<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
I don't think it's my test data, but that's possible. What am I missing?
(ruby 1.9 on Ubuntu 10.10(
Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)
The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.
/^<DD>(.*)\n?(.*)\n/
Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.
^<DD>([^\n]*\n?[^\n]*)\n
I believe you need the multiline modifier in your code:
/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.
The following:
#!/usr/bin/env ruby
desc= '<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
prints
#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
on my system (Linux, Ruby 1.8.7).
Perhaps your line breaks are really \r\n (Windows style)? What if you try:
desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/

Resources