I have a text file which was created by Matlab (I don't have the source code), and was in the form:
a b c d
e f g h
I used
sed -i '' $'s/\t/\/g' filename
to replace all the tabs with commas and ended up have a file that looks like this:
a,b,c,d
e,f,g,h
then, I tried to remove all the line breaks using
tr '\n' ' ' < filename
It gave me only the last line, But when I manually edited the text file by placing the pointer to the end of the line and then pressing "del" and "enter" and re-ran the code it worked fine.
So, the newline in the text file is probably not symbolized by \n, what other chars are there to symbolize line breaks?
P.S If I run the tr line on the file before I remove the tabs I get an empty output.
Thank you.
Sounds like your newlines are \r\n (Windows-style ones). One option would be to remove them first using this command:
tr -s '\r\n' ' ' < file
The -s switch means each sequence of characters present in the input is only replaced by a single space. Thanks to glenn jackman for pointing this out.
Guessing your intention slightly, you may want to use something like this, to replace all spaces including line breaks with commas:
tr -s '[:space:]' ',' < file
You could then pipe this to sed to remove the trailing comma if you wanted.
Related
I am trying to convert a bash script to python for an intern project; basically, the script parses a table, and prints the information as an HTML document.
This line is confusing me. TMP is a temporary document that is the output of lsload, which outputs a table containing server host info.
# Force header text to lowercase
tr '[:upper:]' '[:lower:]' <${TMP} |head --lines=+1 |sed -e 's/[ \t]\+/ /g' >${H_TMP}
Okay, well the first tr command is converting the header text from uppercase to lowercase. I'm not really sure what the head command is doing. And I am confused as to what the sed is doing as well. Could anyone clarify what is going on in this line?
As a bonus, does anyone have ideas as to how I can convert this to Python?
EDIT: Okay, I seem to understand what sed is doing; it is converting any amount of spaces or tabs to just a single space. Just confused about head now.
You should be able to find the documentation for any Unix command easily by searching for its man page.
http://man7.org/linux/man-pages/man1/head.1.html
Any basic introduction to the Unix command line will also reveal that head reads the first n lines of a text file, and tail correspondingly reads the last n lines of a text file.
The entire snippet corresponds to
with open(os.environ['TMP']) as inputfile, open(os.environ['H_TMP'], 'w') as outputfile:
for line in inputfile:
# sed 's/[ \t]+/ /g' is re.sub(...)
# tr ... is lower()
line = re.sub(r'\s+', ' ' , line).lower()
outputfile.write(line)
# head --lines=1 -- quit after a single line
break
The regex escape \s matches many different whitespace characters; if your input is simply ASCII, it will overlap with the simple character class [ \t]. We can only guess whether you require this to match strictly those two characters if indeed you want to handle Unicode.
For maximum compactness, you could reduce this down to
with open(os.environ['TMP']) as inputfile, open(os.environ['H_TMP'], 'w') as outputfile:
outputfile.write(re.sub(r'\s+', ' ' , inputfile.readline()).lower())
If you want to read a fixed number of lines where that number is not 1, maybe look at enumerate():
with open(os.environ['TMP']) as inputfile, open(os.environ['H_TMP'], 'w') as outputfile:
for lineno, line in enumerate(inputfile, 1):
line = re.sub(r'\s+', ' ' , line).lower()
outputfile.write(line)
if lineno == 234:
break
I want to write a ksh script delete all lines of a file beginning by a carriage return. I want to specify that in the same script I want to reuse the modified file so I need to do the modification directly in the file.
For example here is my file in Notepad ++ (with the carriage return shown as CRLF as its a Windows format file):
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CRLF
CE3;CPr3;CRLF
CRLF
CRLF
and I want to obtain:
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CE3;CPr3;CRLF
The script I wrote so far is:
sed -i '/^\n/d' ListeTable.lst
I also tried with \r and \R but nothing is working.
As I specify there is a following script that reuse the modified file that looks like (but there is more):
echo -n "(CE = '$(tail -n 1 ListeTable.lst | cut -d$';' -f1)'and CPr = '$(tail -n 1 ListeTable.lst | cut -d$';' -f2)')"
Ok, so I found a regex that works for this problem : '/^\s*$/d' (\s = match any whitespace character (newlines, spaces, tabs); * = the character may repeat any times or be absent; $ = to the end of the last \s character found)
So the working code is : sed -i '/^\s*$/d' ListeTable.lst
I have a tab separated text file e.g. test.txt containing ^M characters and a missing newline symbol at the end:
Samples Factor
1 2
Using tr and thread https://unix.stackexchange.com/questions/31947/how-to-add-a-newline-to-the-end-of-a-file , I am able to process this table.
My problem is that in my script the line
tr '\r' '\n' < test.txt | sed -e '$a\' > test_temp
mv test_temp test.txt
makes everything below appear in "red" in vim, and my immediately following code does not run. If I remove the sed -e '$a\' part, everything works.
Do you have an explanation for this?
Thanks for your help.
Vim seems to be incorrectly treating the backslash as special, even though backslashes don't do anything inside single quotes. What does :set syntax show? My Vim colors everything correctly with syntax=sh, so perhaps yours isn't treating your file as a shell script. (Interestingly, you may notice that Stack Overflow's syntax highlighter gets it wrong, too.)
If that's it, try adding a shebang line like #!/bin/bash up top.
Whatever it is, a simpler way to add a new line at the end is to simply echo one. It gets rid of the inefficiency of sed scanning through the entire input stream to find EOF.
{ tr '\r' '\n' < test.txt; echo; } > test_tmp
The backslash in '$a\' prevents the last quote from ending the string, so it eats the rest of the script. What are you trying to do with that sed command?
I had a problem with trim command in Unix. I had a original.csv file saved from windows excel that I wanted to port to use in Unix. The content in original.csv saved from windows is like:
Func,Failing Cycle
rtg_generic_A_A1_N2_cb_TEST_1_sm32,859180
rtg_generic_A_A1_N3_cb_TEST_1_sm32,859180
rtg_generic_A_A3_N5_dw_cb_TEST_1_sm32,788581
I wanted to trim off the carriage return in it so I put command:
tr '\r\n' '\n' < original.csv >trimmed.csv
yet the content inside trimmed.csv looks like:
Func,Failing Cycle
rtg_generic_A_A1_N2_cb_TEST_1_sm32,859180
rtg_generic_A_A1_N3_cb_TEST_1_sm32,859180
rtg_generic_A_A3_N5_dw_cb_TEST_1_sm32,788581
It seems additional empty line is added in between. May I know why and how I can get rid of this empty line?
Thanks
This command:
tr '\r\n' '\n'` < file
will translate each instance of \r into \n as tr translates character by character. This will make each line end with 2 \n.
To delete carriage returns, you can use this tr command:
tr -d '\r' < file
To convert line endings from Windows to Unix you can also use the dos2unix command
I have to deal with an output from a command. Just plain text. What I want is to remove unnecessary spaces by using
sed 's/ */ /g'
But it's doing it wrong - it seems to remove new line characters as well...
The other problem is about the $ character representing the 'new line'.
When I'm writing sth like this:
sed 's/$/FOO/g'
It is really replacing all the new line chars with the word FOO.
So its basically like:
text
text
text
Is converted to
textFOOtextFOOtext
The problem starts when 2 new lines occur. The text:
text
text
Is converted to:
textFOOFOOtext
BUT -- absolutely no SED line is able to convert the 2 new lines into one. I tried everything I found on the web.
How do I remove that additional new line?
Check if this helps:
tr -s ' ' < File
This will change multiple spaces to single space, if thats what u want