Visual Studio will detect inconsistent line endings when opening a file and there is an option to fix it for that specific file. However, if I want to fix line endings for all files in a solution, how do I do that?
Just for a more complete answer, this worked best for me:
Replace
(?<!\r)\n
with
\r\n
in entire solution with "regEx" option.
This will set the correct line ending in all files which didn't have the correct line ending so far. It uses the negative lookahead to check for the non-existance of a \r in front of the \n.
Be careful with the other solutions: They will either modify all lines in all files (ignoring the original line ending) or delete the last character of each line.
You can use the Replace in Files command and enable regular expressions. For example, to replace end-of-lines that have a single linefeed "\n" (like, from GitHub, for example) with the standard Windows carriage-return linefeed "\r\n", search for:
([^\r]|^)\n
This says to create a group (that's why the parentheses are required), where the first character is either not a carriage-return or is the beginning of a line. The beginning of the line test is really only for the very beginning of the file, if it happens to start with a "\n". Following the group is a newline. So, you will match ";\n" which has the wrong end-of-line, but not "\r\n" which is the correct end-of-line.
And replace it with:
$1\r\n
This says to keep the group ($1) and then replace the "\n" with "\r\n".
Try doing
Edit > Advanced > Format Document
Then save the document, as long as the file doesn't get modified by another external editor, it should stay consistent. Fixes it for me.
If you have Cygwin with the cygutils package installed, you can use this chain of commands from the Cygwin shell:
unix2dos -idu *.cpp | sed -e 's/ 0 [1-9][0-9]//' -e 's/ [1-9][0-9]* 0 //' | sed '/ [1-9][0-9] / !d' | sed -e 's/ [1-9][0-9 ] //' | xargs unix2dos
(Replace the *.cpp with whatever wildcard you need)
To understand how this works, the unix2dos command is used to convert the files, but only files that have inconsistent line endings (i.e., a mixture of UNIX and DOS) need to be converted. The -idu option displays the number of dos and unix line endings in the file. For example:
0 491 Acad2S5kDim.cpp
689 0 Acad2S5kPolyline.cpp
0 120 Acad2S5kRaster.cpp
433 12 Acad2S5kXhat.cpp
0 115 AppAuditInfo.cpp
Here, only the Acad2S5kXhat.cpp file needs to be converted. The sed commands filter the output to produce a list of just the files that need to be converted, and these are then processed via xargs.
Related
I want to remove the first two characters of a column in a text file.
I am using the below but this is also truncating the headers.
sed -i 's/^..//' file1.txt
Below is my file:
FileName,Age
./Acct_Bal_Tgt.txt,7229
./IDQ_HB1.txt,5367
./IDQ_HB_LOGC.txt,5367
./IDQ_HB.txt,5367
./IGC_IDQ.txt,5448
./JobSchedule.txt,3851
I want the ./ to be removed from each line in the file name.
Transferring comments to an answer, as requested.
Modify your script to:
sed -e '2,$s/^..//' file1.txt
The 2,$ prefix limits the change to lines 2 to the end of the file, leaving line 1 unchanged.
An alternative is to remove . and / as the first two characters on a line:
sed -e 's%^[.]/%%' file1.txt
I tend to use -e to specify that the script option follows; it isn't necessary unless you split the script over several arguments (so it isn't necessary here where there's just one argument for the script). You could use \. instead of [.]; I'm allergic to backslashes (as you would be if you ever spent time working out whether you needed 8 or 16 consecutive backslashes to get the right result in a troff document).
Advice: Don't use the -i option until you've got your script working correctly. It overwrites your file with the incorrect output just as happily as it will with the correct output. Consequently, if you're asking about how to write a sed script on SO, it isn't safe to be using the -i option. Also note that the -i option is non-standard and behaves differently with different versions of sed (when it is supported at all). Specifically, on macOS, the BSD sed requires a suffix specified; if you don't want a backup, you have to use two arguments: -i ''.
Use this Perl one-liner:
perl -pe 's{^[.]/}{}' file1.txt > output.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s{^[.]/}{} : Replace a literal dot ([.]) followed by a slash ('/'), found at the beginning of the line (^), with nothing (delete them). This does not modify the header since it does not match the regex.
If you prefer to modify the file in-place, you can use this:
perl -i.bak -pe 's{^[.]/}{}' file1.txt
This creates the backup file file1.txt.bak.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
I have a text file that I am trying to convert to a Latex file for printing. One of the first steps is to go through and change lines that look like:
Book 01 Introduction
To look like:
\chapter{Introduction}
To this end, I have devised a very simple sed script:
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)/\\chapter{\1}/p'
This does the job, except, the closing curly bracket is placed where the initial backslash should be in the substituted output. Like so:
}chapter{Introduction
Any ideas as to why this is the case?
Your call to sed is fine; the problem is that your file uses DOS line endings (CRLF), but sed does not recognize the CR as part of the line ending, but as just another character on the line. The string Introduction\r is captured, and the result \chapter{Introduction\r} is printed by printing everything up to the carriage return (the ^ represents the cursor position)
\chapter{Introduction
^
then moving the cursor to the beginning of the line
\chapter{Introduction
^
then printing the rest of the result (}) over what has already been printed
}chapter{Introduction
^
The solution is to either fix the file to use standard POSIX line endings (linefeed only), or to modify your regular expression to not capture the carriage return at the end of the line.
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)\r?$/\\chapter{\1}/p'
As an alternative to sed, awk using gsub might work well in this situation:
awk '{gsub(/Book [0-9]+/,"\\chapter"); print $1"{"$2"}"}'
Result:
\chapter{Introduction}
A solution is to modify the capture group. In this case, since all book chapter names consist only of alphabetic characters I was able to use [[:alpha:]]*. This gave a revised sed script of:
sed -n -e 's/Book [[:digit:]]\{2\}\s*\([[:alpha:]]*\)/\\chapter{\1}/p'.
I'm using sed on windows to delete some superfluous lines in a unix style format (\n line endings). Unfortunately, sed replaces these line endings even in lines it does not change to \r\n. How can I stop sed from doing that?
My sed is a simple sed-for-windows-standalone-exe:
C:\dev>sed --version
super-sed version 3.59
based on GNU sed version 3.02.80
GNU sed ( http://gnuwin32.sourceforge.net/packages/sed.htm ) has the -b option for "binary mode", i. e. not replacing \n with \r\n.
If you use the sed that comes with cygwin then it usually uses the binary mode even without the -b option. Namely, cygwin commands use the input file path to decide whether they should run in text or binary mode, i. e. outputting \r\n or \n: http://www.cygwin.com/cygwin-ug-net/using-textbinary.html.
As the document says, binary mode is the default for MS-DOS pathnames, and in my experience the filesystems mounted by default are also mounted in binary mode.
If you add the parameter -b you treat the file as a binary file and it won't change your line endings.
The manual states:
-b, --binary
open files in binary mode (CR+LFs are not processed specially)
I do not know how you can do that with sed on Windows but have you tried:
unix2dos you_file
sed ...
dos2unix you_file
Sorry, but sed doesn't seem to work as you would like, i.e., sed != awk (which is configurable).
I downloaded the ssed executable and the help output did not mention any option for this, as I'm sure you know.
You could try to modify the source code, or contacting the authors.
Reading thru the NEWS file in the source code, I found
* The s/// command now understands the following escape (in both halves):
\a an "alert" (BEL)
\f a form-feed
\n a newline
\r a carriage-return
\t a horizontal tab
\v a vertical tab
\oNNN a character with the octal value NNN
\dNNN a character with the decimal value NNN
\xNN a character with the hexadecimal value NN
Have you tried `s/\r//' as the last command in your script?
I did a quick scan in most of the text files, but didn't find anything that leads me to believe there a cmd-line option that will give what you need.
As you don't want to use unix2dos, as an act of sheerest optimism, I offer the option of using tr to cleanout those pesky '\r's
sed -i -f yourSedScript yourFile
mv yourFile yourFile.wrk
tr -d '\015' yourFile.wrk > yourFile
Finally, as it seems that if you are editing Unix files on a windows box, you must be transfering files via 'ftp' or similar to get your Unix to Windows, why not rely on the ftp options to convert line endings?
I hope this helps.
I've searched for hours looking for the answer to this question which seems frustratingly simple...
I have a bash script which I've simplified to find the line that's stopping it from working and am left with:
#!/bin/bash
#
sed -i -e "s/<link>/\n/g" /usb/lenny/rss/tmp/rss.tmp
If I run this script, nothing happens to the file rss.tmp - but if I call this exact same sed command from the terminal, it makes all the replacements as expected.
Anyone have any idea what I'm doing wrong here?
Based on the discussion the issue sounds like it is a cygwin shell problem.
The issue is that shell scripts may not have \r\n line terminations - they need \n terminations. Earlier versions of cygwin behaved differently.
The relevant section from a Cygwin FAQ at http://cs.nyu.edu/~yap/prog/cygwin/FAQs.html
Q: Mysterious errors in shell scripts, .bashrc, etc
A: You may get mysterious messages when bash reads
your .bashrc or .bash_profile, such as
"\r command not found"
(or similar). When you get rid of empty lines, the
complaints about "\r" disappears, but probably other
errors remain. What is going on?
The answer may lie in the fact that a text file (also
called ASCII file) can come in two formats:
in DOS format or in UNIX format.
Most editors can automatically detect the formats
and work properly in either format.
In the DOS format, a new line is represented by two characters:
CR (carriage return or ASCII code 13) and LF (line feed or ASCII code 15).
In the UNIX format, a new line is represented by only
one character, LF. When your .bashrc file is read,
bash thinks the extra character is the name of a command,
hence the error message.
In Cygwin or unix, you can convert a file INFILE in DOS format
to a file OUTFILE in Unix format by calling:
> tr -d '\15' OUTFILE
NOTE:
If you now compare the number of characters in INFILE and OUTFILE,
you will see that the latter has lost the correct
number of characters (i.e., the number of lines in INFILE):
> wc INFILE OUTFILE
Try using that instead:
sed -i -e "s/\<link\>/\n/g" /usb/lenny/rss/tmp/rss.tmp
You need to give an output file or the result will be only shown on the screen.
sed -e 's/<link>/\n/g' /usb/lenny/rss/tmp/rss.tmp > /usb/lenny/rss/tmp/output.tmp
to feed a file to the command you use "<", while to make a file u use ">" and sed is used as text formater not editor as far as i know
maybe something like this should work
cat < /usb/lenny/rss/tmp/rss.tmp | sed -i -e "s/<link>/\n/g" > /usb/lenny/rss/tmp/rssedit.tmp
cat gets the file and with sed editing it and ouput goes to rssedit.tmp
than check if rssedit.tmp has what u wanted
if it does and only if it does
next line of the your skript
should be
mv /usb/lenny/rss/tmp/rssedit.tmp /usb/lenny/rss/tmp/rss.tmp
which will replace made 1 with original, with renameing to original
I'm a Java developer and I'm using Ubuntu to develop. The project was created in Windows with Eclipse and it's using the Windows-1252 encoding.
To convert to UTF-8 I've used the recode program:
find Web -iname \*.java | xargs recode CP1252...UTF-8
This command gives this error:
recode: Web/src/br/cits/projeto/geral/presentation/GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data
I've searched about it and get the solution in Bash and Windows, Recode: Ambiguous output in step `data..CR-LF' and it says:
Convert line endings from CR/LF to a
single LF: Edit the file with Vim,
give the command :set ff=unix and save
the file. Recode now should run
without errors.
Nice, but I've many files to remove the CR/LF character from, and I can't open each to do it. Vi doesn't provide any option to command line for Bash operations.
Can sed be used to do this? How?
There should be a program called dos2unix that will fix line endings for you. If it's not already on your Linux box, it should be available via the package manager.
sed cannot match \n because the trailing newline is removed before the line is put into the pattern space, but it can match \r, so you can convert \r\n (DOS) to \n (Unix) by removing \r:
sed -i 's/\r//g' file
Warning: this will change the original file
However, you cannot change from Unix EOL to DOS or old Mac (\r) by this. More readings here:
How can I replace a newline (\n) using sed?
Actually, Vim does allow what you're looking for. Enter Vim, and type the following commands:
:args **/*.java
:argdo set ff=unix | update | next
The first of these commands sets the argument list to every file matching **/*.java, which is all Java files, recursively. The second of these commands does the following to each file in the argument list, in turn:
Sets the line-endings to Unix style (you already know this)
Writes the file out iff it's been changed
Proceeds to the next file
I'll take a little exception to jichao's answer. You can actually do everything he just talked about fairly easily. Instead of looking for a \n, just look for carriage return at the end of the line.
sed -i 's/\r$//' "${FILE_NAME}"
To change from Unix back to DOS, simply look for the last character on the line and add a form feed to it. (I'll add -r to make this easier with grep regular expressions.)
sed -ri 's/(.)$/\1\r/' "${FILE_NAME}"
Theoretically, the file could be changed to Mac style by adding code to the last example that also appends the next line of input to the first line until all lines have been processed. I won't try to make that example here, though.
Warning: -i changes the actual file. If you want a backup to be made, add a string of characters after -i. This will move the existing file to a file with the same name with your characters added to the end.
Update: The Unix to DOS conversion can be simplified and made more efficient by not bothering to look for the last character. This also allows us to not require using -r for it to work:
sed -i 's/$/\r/' "${FILE_NAME}"
The tr command can also do this:
tr -d '\15\32' < winfile.txt > unixfile.txt
and should be available to you.
You'll need to run tr from within a script, since it cannot work with file names. For example, create a file myscript.sh:
#!/bin/bash
for f in `find -iname \*.java`; do
echo "$f"
tr -d '\15\32' < "$f" > "$f.tr"
mv "$f.tr" "$f"
recode CP1252...UTF-8 "$f"
done
Running myscript.sh would process all the java files in the current directory and its subdirectories.
In order to overcome
Ambiguous output in step `CR-LF..data'
the simple solution might be to add the -f flag to force the conversion.
Try the Python script by Bryan Maupin found here (I've modified it a little bit to be more generic):
#!/usr/bin/env python
import sys
input_file_name = sys.argv[1]
output_file_name = sys.argv[2]
input_file = open(input_file_name)
output_file = open(output_file_name, 'w')
line_number = 0
for input_line in input_file:
line_number += 1
try: # first try to decode it using cp1252 (Windows, Western Europe)
output_line = input_line.decode('cp1252').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
try: # then if that fails, try to decode using latin1 (ISO 8859-1)
output_line = input_line.decode('latin1').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
sys.exit(1) # and just keep going
output_file.write(output_line)
input_file.close()
output_file.close()
You can use that script with
$ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql
Go back to Windows, tell Eclipse to change the encoding to UTF-8, then back to Unix and run d2u on the files.