I just can't find it.
Found you can remove them with chomp, but not how to create them.
There is a global variable $/ which represent input record separator (default to newline (\n)).
>> $/
=> "\n"
Methods like Kernel#gets use this to determine input boundary.
As long as you work with files in text mode (the default), Ruby itself does the translation of the operating system's end-of-line character sequences to "\n" in Ruby:
When reading from a file in text mode, all line endings will appear as "\n".
When writing to a file in text mode, all newline characters "\n" will be written as the operating system's end-of-line character sequence.
So for all practical purposes when dealing with files in text mode, you can use "\n" as a constant to mean the OS-specific line ending, like std::endl.
Source: How to make your Ruby code work on Windows PCs, section "Get your file modes right".
Related
print FILEHANDLE; - when run from a Windows box - always converts a trailing \n into \r\n - resulting in a DOS formatted file. The difference between a DOS and a UNIX file is that in UNIX, the last character of each line is \n, whereas in Windows it is \r\n. I have tried changing the line termination character $\ = "\n"; but the print command still does the conversion to DOS format. This only occurs on Windows boxes.
If you don't like how Perl decides to output your data, you can change it. In the three-argument open, it looks like this:
open my $fh, '>:raw', $filename;
Or, if you already have the filehandle, you can use binmode:
binmode $fh, ':raw';
binmode $fh; # :raw is the default
The output depends on various IO "layers", each of which gets to stick their dirty fingers into your data before it is output. The perlio docs have the list. There's a :crlf layer that turns unix line endings, and you are probably getting it by default. Note that changing the output record separator is something that happens at the print level, but there are deeper layers that can still do their work.
I'm working on a small text file with a list of words in it that I want to add a new word to, and then sort. The file doesn't have a newline at the end when I start, but does after the sort. Why? Can I avoid this behavior or is there a way to strip the newline back out?
Example:
words.txt looks like
apple
cookie
salmon
I then run printf "\norange" >> words.txt; sort words.txt -o words.txt
I use printf rather than echo figuring that'll avoid the newline, but the file then reads
apple
cookie
orange
salmon
#newline here
If I just run printf "\norange" >> words.txt orange appears at the bottom of the file, with no newline, ie;
apple
cookie
salmon
orange
This behavior is explicitly defined in the POSIX specification for sort:
The input files shall be text files, except that the sort utility shall add a newline to the end of a file ending with an incomplete last line.
As a UNIX "text file" is only valid if all lines end in newlines, as also defined in the POSIX standard:
Text file - A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the newline character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Think about what you are asking sort to do.
You are asking it "take all the lines, and sort them in order."
You've given it a file containing four lines, which it splits to the following strings:
"salmon\n"
"cookie\n"
"orange"
It sorts these for you dutifully:
"cookie\n"
"orange"
"salmon\n"
And it then outputs them as a single string:
"cookie
orangesalmon
"
That is almost certainly exactly what you do not want.
So instead, if your file is missing the terminating newline that it should have had, the sort program understands that, most likely, you still intended that last line to be a line, rather than just a fragment of a line. It appends a \n to the string "orange", making it "orange\n". Then it can be sorted properly, without "orange" getting concatenated with whatever line happens to come immediately after it:
"cookie\n"
"orange\n"
"salmon\n"
So when it then outputs them as a single string, it looks a lot better:
"cookie
orange
salmon
"
You could strip the last character off the file, the one from the end of "salmon\n", using a range of handy tools such as awk, sed, perl, php, or even raw bash. This is covered elsewhere, in places like:
How can I remove the last character of a file in unix?
But please don't do that. You'll just cause problems for all other utilities that have to handle your files, like sort. And if you assume that there is no terminating newline in your files, then you will make your code brittle: any part of the toolchain which "fixes" your error (as sort kinda does here) will "break" your code.
Instead, treat text files the way they are meant to be treated in unix: a sequence of "lines" (strings of zero or more non-newline bytes), each followed by a newline.
So newlines are line-terminators, not line-separators.
There is a coding style where prints and echos are done with the newline leading. This is wrong for many reasons, including creating malformed text files, and causing the output of the program to be concatenated with the command prompt. printf "orange\n" is correct style, and also more readable: at a glance someone maintaining your code can tell you're printing the word "orange" and a newline, whereas printf "\norange" looks at first glance like it's printing a backslash and the phrase "no range" with a missing space.
The string is originating as a return value from:
> msg = imap.uid_fetch(uid, ["RFC822"])[0].attr["RFC822"]
In the console if I type msg, a long string is displayed with double quotes and \r\n separating each line:
> msg
"Delivered-To: email#test.com\r\nReceived: by xx.xx.xx.xx with SMTP id;\r\n"
If I match part of it with a regex, the return value has \r\n:
> msg[/Delivered-To:.*?\s+Received:/i]
=> "Delivered-To: email#test.com\r\nReceived:"
If I save the string to a file, read it back in and match it with the same regex, I get \n instead of \r\n:
> File.write('test.txt', msg)
> str = File.read('test.txt')
> str[/Delivered-To:.*?\s+Received:/i]
=> "Delivered-To: email#test.com\nReceived:"
Is \r\n being converted to \n when the string is saved to a file?
Is there a way to save the string to a file, read it back in without the line endings being modified?
This is covered in the IO.new documentation:
The following modes must be used separately, and along with one or more of the modes seen above.
"b" Binary file mode
Suppresses EOL <-> CRLF conversion on Windows. And
sets external encoding to ASCII-8BIT unless explicitly
specified.
"t" Text file mode
In other words, Ruby, like many other languages, senses the OS it's on and will automatically translate line-ends between "\r\n" <-> "\n" when reading/writing a file in text mode. Use binary mode to avoid translation.
str = File.read('test.txt')
A better practice would be to read the file using foreach, which negates the need to even care about line-endings; You'll get each line separately. An alternate is to use readlines, however it uses slurping which can be very costly on large files.
Also, if you're processing mail files, I'd strongly recommend using something written to do so rather than write your own. The Mail gem is one such package that's pre-built and well tested.
I'm rewriting a GoldParser Grammar for VBScript. In VBScript Statements are terminated using either a newline or ':'. Therefore i use the following terminal:
NewLine = {All Newline}
| ':'
Because every statement has to end with the Newline terminal, only programs ending with an empty line are accepted. How can i extend the newline terminal to also accept programs not ending with an empty line? I tried the following:
NewLine = {All Newline}
| ':'
| {EOF}
This does not work because the {EOF} (End of File) group does not exist.
EOF is a special token and I'm not aware of any syntax allowing you to use it in a production rule. It is emitted when the tokenizer receives no more data, and as such it is not a control character you could use in a terminal definition either.
That being said, you have different possibilities to parse the (strictly speaking invalid) input. The simplest may be to just append a newline at the end of the string or text being tokenized. While this will not make it parse correctly in the GOLD Builder test window, it will make your code process the data as expected and it will not add complexity to the grammar.
I am dealing with some multilingual data(English and Arabic) in a json file with a weird character i am not able to parse. I am not sure what the character is. I tried getting the ASCII value via vim and this is what i got
"38 0x26"
This is the status line in vim i used to get the value (http://vim.wikia.com/wiki/Showing_the_ASCII_value_of_the_current_character).
:set statusline=%<%f%h%m%r%=%b\ 0x%B\ \ %l,%c%V\ %P
This is how the character looks in vim -
I tried 'sed' and '.gsub' to replace this character unsuccessfully.
Is there a way where i can replace this character(preferably with .gsub ruby) with '&' or something else?
Thanks
try with something like
sed 's/[[:alpnum:][:space:]\[\]{}()\.\*\\\/_(AllAsciiVariationYouWant)/&/g;t
s/./?/g' YourFile
where (AllAsciiVariationYouWant) is all character that you want to keep as is (without the surrounding "()" )
JSON is encoded in UTF-8 (Unicode). If you're seeing funky-looking characters in your file, it's probably because your editor is not treating Unicode characters properly. That could be caused by the use of a terminal emulator that doesn't support Unicode; an incorrect $LANG setting; vim not being able to correctly determine the encoding of the file; and likely other reasons.
What terminal program are you using? What's your $LANG environment variable set to (echo $LANG)? If you're certain your terminal supports Unicode, try:
LANG=en_US.utf-8 vim your_file_here.json
(The above example assumes that U.S. English is appropriate for the file, which it may not be.)
As for replacing characters in the file, vim's substitution command can be used:
:%s/old text/new text/g
The above command will run the substitute command on all lines in the file (%), replacing every instance of "old text" with "new text". (The g at the end tells vim to replace every instance on a line, not just the first it finds.)