Other encodings for line breaks? - ruby

I have some database records in a Rails app that I'm trying to export to CSV, avoiding line breaks in the values. I run something to the effect of this:
File.open(new_file_path, 'w+') do |f|
bio = c.biography.gsub("\n", "->")
f.print "\"#{bio}\","
end
And I see results like this:
"Katcho Achadjian was first elected to the Assembly in 2010.
->
->Prior to being elected to the Legislature, Achadjian served as a member of the San Luis Obispo County Board of Supervisors from 1998 to 2010.
->
->Achadjian graduated from Cal Poly San Luis Obispo with a bachelor’s degree in business administration. Achadjian and his wife have two adult children and reside in Arroyo Grande.",
You'll notice that the substituted character sequence appears with line breaks as well. Is there another encoding for line break that I'm somehow missing?

It depends on the platform.
On Windows you'll have \r\n (carriage return, new line).
On Linux, OS X, and other Unix-like systems, it is just \n (new line).
On Classic Mac OS (up to version 9) it is just \r (carriage return).
Probably you are using windows. The best way to deal with cross-platform texts is this:
substitute \r\n with \r
substitute \r with \n
This way you'll end up with just \n, whatever platform the text originated from and you'll have the benefit of uniforming line endings, because some editors do not touch unmodified lines and on saving you end up with mixed line endings, which are even worse than this.

Try modifying your gsub to this:
gsub(/\r?\n/, "->")

Related

Win 10: Desktop.ini infotip/tooltip text formatting - line break

I'm trying to customize some folders in Windows 10 os using Desktop.ini text files. One thing I can't solve is how to make a line break in the infotip.
Current text file looks like this:
[.ShellClassInfo]
ConfirmFileOp=0
NoSharing=1
IconFile=$path_to_icon
IconIndex=0
InfoTip=Line1 \n Line2
So the last line of the text document is not working as desired. It just doesn't recognize the \n symbol. I also tried replacing the standard \n new line symbol with unicode characters and some other similar methods and symbols, but it didn't work. It just recognizes it as a string no matter what is written there.
The only way I could achieve a line break was to add so many characters, that Win 10 would automatically start a new line.
Help is much appreciated. Thank you!
You could define a string in a resource-only DLL:
InfoTip=#Your.dll,-12345
The negative number defines the resource ID of the string to use.
String resources in a DLL are not limited in the range of character codes, so this should in principle enable you to use line breaks (ASCII code 10).
To create such a resource-only DLL there are many free tools available, google for "windows resource editor".

How is '\x1A' special in Windows?

Reading some references1, 2, I learned that the modifier b in the second argument in fopen(3) has no effect in POSIX systems, while it prevents special handling for \n and \x1A in Windows (See below).
I well know how \n (LF) is special in Windows as text files use CRLF for line break (i.e. printf("\n") actually prints \r\n), but how is \x1A (SUB) special?
fopen("D:\\foo.txt", "rb");
^
\x1A is Ctrl+Z, which used to be used as the end-of-file marker in MS-DOS (maybe even as far back as CP/M).
The Microsoft documentation makes no mention of Ctrl+Z under the "b" mode (only under the "t" mode), so this could be cargo cult programming. I don't have a Windows box handy right now, so I can't easily check.

Unintended double line spacing in CRichEditCtrl

I'm echoing the serial port input to a CRichEditCtrl, one char at a time as it arrives. The problem I've come across is that when I receive '\r' followed by '\n' I end up two lines further down page, not one. Debugging it a little I realise that sending "\r\n" results in (what I'd consider to be) the correct, single new line insertion, but sending '\r' and '\n' separately yields two new lines.
Simple example, where m_Output is obviously a rich edit control variable:
m_Output.SetSel(-1, -1);
m_Output.ReplaceSel(_T("X\r\n"));
m_Output.SetSel(-1, -1);
m_Output.ReplaceSel(_T("Y"));
m_Output.SetSel(-1, -1);
m_Output.ReplaceSel(_T("\r"));
m_Output.SetSel(-1, -1);
m_Output.ReplaceSel(_T("\n"));
m_Output.SetSel(-1, -1);
m_Output.ReplaceSel(_T("Z"));
The output from the above is:
X
Y
Z
Why the extra line?!?!
I figure maybe something about the behaviour of Set/ReplaceSel(), but it doesn't insert lines between regular characters in this way, e.g. if I send 'a' followed by 'b' the output is simply "ab" ...
The various versions of the RichEdit control are documented as using different characters for paragraph breaks; RichEdit 1.0 used \r\n, RichEdit 2.0 is documented as using \r and RichEdit 3.0 (and presumably higher) can use both.
What this looks like though is that the control is actually seeing a solitary \n as a break as well (i.e. it sounds like it accepts \r, \n and \r\n as all representing a single break). This doesn't match the documentation but then again it wouldn't be the first time Microsoft documentation was somewhat inaccurate.
Internally the control probably doesn't store the actual break character verbatim, so when you feed it a \r and then separately a \n it isn't able to join them together into a single break.
It sounds like the easiest solution for you would be to filter out \n characters rather than sending them to the control. That way all the control will see are the \r characters and you'll only end up with a single break in the text.

Universal newline support in Ruby that includes \r (CR) line endings

In a Rails app, I'm accepting and parsing CSV files that may come formatted with any of three possible line termination characters: \n (LF), \r\n (CR+LF), or \r (CR). Ruby's File and CSV libraries seem to handle the first two cases just fine, but the last case ("Mac classic" \r line endings) isn't handled as a newline. It's important to be able to accept this format as well as the others, since Microsoft Excel for Mac (running on OS X) seems to use it when exporting to "Comma Separated Values" (although exporting to "Windows Comma Separated" produces the easier-to-handle \r\n).
Python has "universal newline support" and will handle any of these three formats without a problem. Is there something similar in Ruby that will accept all three without knowing the format in advance?
You could use :row_sep => :auto:
:row_sep
The String appended to the end of each row. This can be set to the special :auto setting, which requests that CSV automatically discover this from the data. Auto-discovery reads ahead in the data looking for the next "\r\n", "\n", or "\r" sequence.
There are some caveats of course, see the manual linked to above for details.
You could also manually clean up the EOLs with a bit of gsubing before handing the data to CSV for parsing. I'd probably take this route and manually convert all \r\ns and \rs to single \ns before attempting to parse the CSV. OTOH, this won't work that well if there is embedded binary data in your CSV where \rs mean something. On the gripping hand, this is CSV we're dealing with so who knows what sort of crazy broken nonsense you'll end up dealing with.

What does \n\r mean?

When reading from a pseudo-terminal via java, I'm seeing "\n\r" in the text. What is that representative of? Note its not "\r\n" which I'm familiar with.
\n is a line feed (ASCII code 10), \r is a carriage return (ASCII code 13).
Different operating systems use different combinations of these characters to represent the end of a line of text. Unix-like operating systems (Linux, Mac OS X) usually use only \n. MS-DOS and Windows use \r\n (carriage return, followed by a line feed).
The code you're using uses \n\r (line feed, carriage return). There are operating systems that use that sequence, but probably it's a mistake and it should have been \r\n.
See Newline on Wikipedia.
If you're programming in Java and you want to know what the newline sequence is for the operating system that your program is running on, you can get the system property line.separator:
String newline = System.getProperty("line.separator");

Resources