Why is awk overwriting the earlier-printed columns? [duplicate] - bash

I have text file which shows ^M character when opened using less command in mac terminal.
I tried using the below command to remove ^M character.
awk '{ gsub("\n", "\r"); print $0;}' input > output
cat input | tr ‘\n’ ‘\r’ > output
But none of them worked. Could someone help to fix this using some Linux commands.

You can use sed:
sed 's/^M// filename > newfilename
If you wish to use awk then do:
awk '{sub(/^M/,"")}1' filename > newfilename
To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession.
Update
As suggested by #glenn jackman in comments, it is easy to use \r then to get ^M

col < input > output
Or:
vim "+set ff=unix" "+saveas output" "+q" input

use the octal value from http://www.asciitable.com/
echo "1'2^M34" | awk 'gsub(/\015/,":")'
1'2:34

Related

bash / sed : editing of the file

I use sed to remove all lines starting from "HETATM" from the input file and cat to combine another file with the output recieved from SED
sed -i '/^HETATM/ d' file1.pdb
cat fil2.pdb file1.pdb > file3.pdb
is this way to do it in one line e.g. using only sed?
If you want to consider awk then it can be done in a single command:
awk 'FNR == NR {print; next} !/^HETATM/' file2.pdb file1.pdb > file3.pdb
With cat + grep combination please try following code. Simple explanation would be, using cat command's capability to concatenate file's output when multiple files are passed to it and using grep -v to remove all words starting from HETATM in file1.pdb before sending is as an input to cat command and creating new file named file3.pdb from cat command's output.
cat file2.pdb <(grep -v '^HETATM' file1.pdb) > file3.pdb
I'm not sure what you mean by "remove all lines starting from 'HETATM'", but if you mean that any line that appears in the file after a line that starts with "HETATM" will not be outputted, then your sed expression won't do it - it will just remove all lines starting with the pattern while leaving all following lines that do not start with the pattern.
There are ways to get the effect I believe you wanted, possibly even with sed - but I don't know sed all that well. In perl I'd use the range operator with a guaranteed non-matching end expression (not sure what will be guaranteed for your input, I used "XXX" in this example):
perl -ne 'unless (/^HETATM/../XXX/) { print; }' file1.pdb
mawk '(FNR == NR) < NF' FS='^HETATM' f1 f2

Why is cat printing only the first and last line of file? [duplicate]

I have this line inside a file:
ULNET-PA,client_sgcib,broker_keplersecurities
,KEPLER
I try to get rid of that ^M (carriage return) character so I used:
sed 's/^M//g'
However this does remove everything after ^M:
[root#localhost tmp]# vi test
ULNET-PA,client_sgcib,broker_keplersecurities^M,KEPLER
[root#localhost tmp]# sed 's/^M//g' test
ULNET-PA,client_sgcib,broker_keplersecurities
What I want to obtain is:
[root#localhost tmp]# vi test
ULNET-PA,client_sgcib,broker_keplersecurities,KEPLER
Use tr:
tr -d '^M' < inputfile
(Note that the ^M character can be input using Ctrl+VCtrl+M)
EDIT: As suggested by Glenn Jackman, if you're using bash, you could also say:
tr -d $'\r' < inputfile
still the same line:
sed -i 's/^M//g' file
when you type the command, for ^M you type Ctrl+VCtrl+M
actually if you have already opened the file in vim, you can just in vim do:
:%s/^M//g
same, ^M you type Ctrl-V Ctrl-M
You can simply use dos2unix which is available in most Unix/Linux systems. However I found the following sed command to be better as it removed ^M where dos2unix couldn't:
sed 's/\r//g' < input.txt > output.txt
Hope that helps.
Note: ^M is actually carriage return character which is represented in code as \r
What dos2unix does is most likely equivalent to:
sed 's/\r\n/\n/g' < input.txt > output.txt
It doesn't remove \r when it is not immediately followed by \n and replaces both with just \n. This fails with certain types of files like one I just tested with.
alias dos2unix="sed -i -e 's/'\"\$(printf '\015')\"'//g' "
Usage:
dos2unix file
If Perl is an option:
perl -i -pe 's/\r\n$/\n/g' file
-i makes a .bak version of the input file
\r = carriage return
\n = linefeed
$ = end of line
s/foo/bar/g = globally substitute "foo" with "bar"
In awk:
sub(/\r/,"")
If it is in the end of record, sub(/\r/,"",$NF) should suffice. No need to scan the whole record.
This is the better way to achieve
tr -d '\015' < inputfile_name > outputfile_name
Later rename the file to original file name.
I agree with #twalberg (see accepted answer comments, above), dos2unix on Mac OSX covers this, quoting man dos2unix:
To run in Mac mode use the command-line option "-c mac" or use the
commands "mac2unix" or "unix2mac"
I settled on 'mac2unix', which got rid of my less-cmd-visible '^M' entries, introduced by an Apple 'Messages' transfer of a bash script between 2 Yosemite (OSX 10.10) Macs!
I installed 'dos2unix', trivially, on Mac OSX using the popular Homebrew package installer, I highly recommend it and it's companion command, Cask.
This is clean and simple and it works:
sed -i 's/\r//g' file
where \r of course is the equivalent for ^M.
Simply run the following command:
sed -i -e 's/\r$//' input.file
I verified this as valid in Mac OSX Monterey.
remove any \r :
nawk 'NF+=OFS=_' FS='\r'
gawk 3 ORS= RS='\r'
remove end of line \r :
mawk2 8 RS='\r?\n'
mawk -F'\r$' NF=1

Remove first character of a text file from shell

I have a text file and I would like to only delete the first character of the text file, is there a way to do this in shell script?
I'm new to writing scripts so I really don't know where to start. I understand that the main command most people use is "sed" but I can only find how to use that as a find and replace tool.
All help is appreciated.
You can use the tail command, telling it to start from character 2:
tail -c +2 infile > outfile
You can use sed
sed '1s/^.//' startfile > endfile
1s means match line 1, in substitution mode (s)
^. means at the beginning of the line (^), match any character (.)
There's nothing between the last slashes, which means substitute with nothing (remove)
I used to use cut command to do this.
For example:
cat file|cut -c2-80
Will show characters from column 2 to 80 only.
In your case you can use:
cat file|cut -c2-10000 > newfile
I hope this help you.
[]s
You can also use the 0,addr2 address-range to limit replacements to the first substitution, e.g.
sed '0,/./s/^.//' file
That will remove the 1st character of the file and the sed expression will be at the end of its range -- effectively replacing only the 1st occurrence.
To edit the file in place, use the -i option, e.g.
sed -i '0,/./s/^.//' file
or simply redirect the output to a new file:
sed '0,/./s/^.//' file > newfile
A few other ideas:
awk '{print (NR == 1 ? substr($0,2) : $0)}' file
perl -0777 -pe 's/.//' file
perl -pe 's/.// unless $done; $done = 1' file
ed file <<END
1s/.//
w
q
END
dd allows you to specify an offset at which to start reading:
dd ibs=1 seek=1 if="$input" of="$output"
(where the variables are set to point to your input and output files, respectively)

"grep" a csv file including multi-lines fields?

file.csv:
XA90;"standard"
XA100;"this is
the multi-line"
XA110;"other standard"
I want to grep the "XA100" entry like this:
grep XA100 file.csv
to obtain this result:
XA100;"this is
the multi-line"
but grep return only one line:
XA100;"this is
source.csv contains 3 entries.
The "XA100" entry contain a multi-line field.
And grep doesn't seem to be the right tool to "grep" CSV file including multilines fields.
Do you know the way to make the job ?
Edit: the real world file contains many columns. The researched term can be in any column (not at begin of line, nor at the begin of field). All fields are encapsulated by ". Any field can contain a multi-line, from 1 line to any, and this cannot be predicted.
Give this line a try:
awk '/^XA100;/{p=1}p;p&&/"$/{p=0}' file
I extended your example a bit:
kent$ cat f
XA90;"standard"
XA100;"this is
the
multi-
line"
XA110;"other standard"
kent$ awk '/^XA100;/{p=1}p;p&&/"$/{p=0}' f
XA100;"this is
the
multi-
line"
In the comments you mention: In the real world file, each line start with ". I assume they also end with " and present you this:
Test file:
$ cat file
"single line"
"multi-
lined"
Code and outputs:
$ awk 'BEGIN{RS=ORS="\"\n"} /single/' file
"single line"
$ awk 'BEGIN{RS=ORS="\"\n"} /m/' file
"multi-
lined"
You can also parametrize the search:
$ awk -v s="multi" 'BEGIN{RS=ORS="\"\n"} match($0,s)' file
"multi-
lined"
try:
Solution 1:
awk -v RS="XA" 'NR==3{gsub(/$\n$/,"");print RS $0}' Input_file
Making Record separator as string XA then looking for line 3rd here and then globally substituting the $\n$(which is to remove the extra line at the end of the line) with NULL. Then printing the Record Separator with the current line.
Solution 2:
awk '/XA100/{print;getline;while($0 !~ /^XA/){print;getline}}' Input_file
Looking for string XA100 then printing the current line and using getline to go to next line, using while loop then which will run and print the lines until a line is starting from XA.
If this file was exported from MS-Excel or similar then lines end with \r\n while the newlines inside quotes are just \ns so then all you need is:
$ awk -v RS='\r\n' '/XA100/' file
XA100;"this is
the multi-line"
The above uses GNU awk for multi-char RS. On some platforms, e.g. cygwin, you'll have to add -v BINMODE=3 so gawk sees the \rs rather than them getting stripped by underlying C primitives.
Otherwise, it's extremely hard to parse CSV files in general without a real CSV parser (which awk currently doesn't have but is in the works for GNU awk) but you could do this (again with GNU awk for multi-char RS):
$ cat file
XA90;"standard"
XA100;"this is
the multi-line"
XA110;"other standard"
$ awk -v RS="\"[^\"]*\"" -v ORS= '{gsub(/\n/," ",RT); print $0 RT}' file
XA90;"standard"
XA100;"this is the multi-line"
XA110;"other standard"
to replace all newlines within quotes with blank chars and then process it as regular 1-line-per-record file.
Using PS response, this works for the small example:
sed 's/^X/\n&/' file.csv | awk -v RS= '/XA100/ {print}'
For my real world CSV file, with many columns, with researched term anywhere, with unknown count of multi-lines, with characters " replaced by "", with multi-lines lines beginning with ", with all fields encapsulated by ", this works. Note the exclusion of the second character " in sed part:
sed 's/^"[^"]/\n&/' file.csv | awk -v RS= '/RESEARCH_TERM/ {print}'
Because first column of any entry cannot start with "". First column allways looks like "XXXXXXXXX", where X is any character but ".
Thank you all for so much responses, maybe others solutions are working depending the CSV file format you use.

sed replace empty line with character

How do you replace a blank line in a file with a certain character using sed?
I have used the following command but it still returns the original input:
sed 's/^$/>/' filename
Original input:
ACTCTATCATC
CTACTATCTATCC
CCATCATCTACTC
...
Desired output:
ACTCTATCATC
>
CTACTATCTATCC
>
CCATCATCTACTC
>
...
Thanks for any help
Here is a way with awk. This wouldn't care if you have spaces or blank lines:
awk '!NF{$0=">"}1' file
NF stands for number of fields. Since blank lines or lines with just spaces have no fields, we use that to insert your text. 1 triggers the condition to be true and prints the line:
Test:
$ cat -vet file
ACTCTATCATC$
$
CTACTATCTATCC$
$
CCATCATCTACTC$
$
$ are end of line markers
$ awk '!NF{$0=">"}1' file
ACTCTATCATC
>
CTACTATCTATCC
>
CCATCATCTACTC
>
You may have tabs or white spaces in your filename' empty lines, try the following:
sed 's/^\s*$/>/' filename
You may have whitespace in your input. First thing to try is:
sed 's/^[[:blank:]]*$/>/' filename
The following code should work:
sed -i 's/^[[:space:]]*$/string/' foo
What's missing here is the escape character. This will work for you.
sed 's/^$/\>/g' filename
And if you need to delete the empty lines and print others, Try
sed '/^$/d' filename

Resources