sed '$' matching start of line instead of end - bash

I am trying to append '.tsv' to the end of a column of text in a file.
You can do this easily with sed 's|$|.tsv|' myfile.txt
However, this is not working for my file, and I am trying to figure out why and how to fix it so that this works.
The column I want to edit looks like this:
$ cut -f12 chickspress.tsv | sort -u | head
Adipose_proteins
Adrenal_gland
Cerebellum
Cerebrum
Heart
Hypothalamus
Ovary
Sciatic_nerve
Testis
Tissue
But when I try to use sed, the result comes out wrong:
$ cut -f12 chickspress.tsv | sort -u | sed -e 's|$|.tsv|'
.tsvose_proteins
.tsvnal_gland
.tsvbellum
.tsvbrum
.tsvt
.tsvthalamus
.tsvy
.tsvtic_nerve
.tsvis
.tsvue
.tsvey
.tsvr
.tsv
.tsvreas
.tsvoral_muscle
.tsventriculus
the .tsv is supposed to be at the end of the line, not the front.
I thought there might be some whitespace error, so I tried this (macOS):
$ cut -f12 chickspress.tsv | sort -u | cat -ve
Adipose_proteins^M$
Adrenal_gland^M$
Cerebellum^M$
Cerebrum^M$
Heart^M$
Hypothalamus^M$
Ovary^M$
Sciatic_nerve^M$
Testis^M$
Tissue^M$
kidney^M$
liver^M$
lung^M$
pancreas^M$
pectoral_muscle^M$
proventriculus^M$
This ^M does not look right, its not present in my other files, but I am not sure what it is representing here or how to fix it or just get this sed command to work around it.
I produced this file using Python's csv.DictWriter in a script which I've used many times in the past but never noticed this error coming from its output before. Run on macOS in this case.

EDIT: As per Ed's comment, in case you want to remove carriage returns at last of lines only then following may help.
awk '{sub(/\r$/,"")} 1' Input_file > temp_file && mv temp_file Input_file
OR
sed -i.bak '#s#\r$##' Input_file
Remove the control M characters by doing following and then try your command.
tr -d '\r' < Input_file > temp_file && mv temp_file Input_file
Or if you have dos2unix utility in your system you could use that too for removing these characters.
With awk:
awk '{gsub(/\r/,"")} 1' Input_file > temp_file && mv temp_file Input_file
With sed:
sed -i.bak 's#\r##g' Input_file

Related

Trimming a textfile

i want to trim a textfile and delete all lines from line n to the end of the file. I tried to use sed for that. The sed command for n=26 should look like that:
sed -i '26,$d' /path/to/textfile
So in my textfile i don't know n beforehand, but i know that there is a unique text in that line. So i tried it that way:
myvar=`grep -n 'unique text' /path/to/textfile | awk -F":" '{print $1 }'`
sed -i "${myvar}"',$d' /path/to/textfile
That works and deletes all wanted lines but it throws the error message:
sed: -e expression # 1, character 1: unknown command: »,«
So i tried changing my command to:
myvar=`grep -n 'unique text' /path/to/textfile | awk -F":" '{print $1 }'`
sed -i "${myvar},$d" /path/to/textfile
With that i get the same error message but it doesn't delete the lines.
I tried some variations with ' and " and how to put the variable in there, but it never works as wanted. Does someone knows what i do wrong?
I would appreciate other methods for trimming the textfile as long as i can do it in a bash script.
You can replace the fixed line number with a regular expression matching the line to start at.
sed -i '/unique text/,$d' /path/to/textfile
You can also use ed to edit the file, rather than rely on a non-standard sed extension.
printf '/unique text/,$d\nwq\n' | ed /path/to/textfile

Is it possible to pipe head output to sed?

Input file
hello how are u
some what doing fine
so
thats all
huh
thats great
cool
gotcha im fine
I wanted to remove last 4 lines without re directing to another file or say in place edit.
I used head -n -3 input.txt but its removing only the last 2 lines.
Also wanted to understand is it possible to pipe head's output to sed
like head -n -3 input.txt | sed ...
Yes, I went thru sed's option to remove last n lines like below but couldn't understand the nuances of the command so went ahead with the alternative of head command
sed -e :a -e '$d;N;2,5ba' -e 'P;D' file
EDIT: Without creating a temp file solution:
awk -i inplace -v lines=$(wc -l < Input_file) 'FNR<=(lines-4)' Input_file
Could you please try following and let me know if this helps you.
tac Input_file | tail -n +5 | tac > temp_file && mv temp_file Input_file
Solution 2nd: Using awk.
awk -v lines=$(wc -l < Input_file) 'FNR<=(lines-4)' Input_file > temp_file && mv temp_file Input_file

Excluding '#' comments from a sed selection

I'm trying to get a config value from a yml file but there is one line that has that same value, but commented out. That is:
...
#database_name: prod
database_name: demo
database_user: root
database_password: password
...
I'm getting all values with this sed/awk command:
DATABASE_NAME=$(sed -n '/database_name/p' "$CONFIG_PATH" | awk -F' ' '{print $2}');
Now, if I do that, I get the right values for the user and password, but get double name.
Question is:
How do I exclude '#' comments from my sed selection?
You might as well use awk for the whole operation:
DATABASE_NAME=$(awk -F' ' '$1!~/^#/ && /database_name/{print $2}' "$CONFIG_PATH")
This will exclude all lines that start with # (comments).
If there is always a character before the d use /[^#]database_name/p.
If not you can use /\(^\|[^#]\)database_name/p.
I think the braces are a GNU sed feature (not sure though)
sed -n '/database_name/ {/^[[:blank:]]*#/!p}'
For lines matching "database_name", if the line does NOT begin with blanks and a hash then print it.
if the file has blank spaces at starting of lines:
sed 's/ //g' file.txt | awk '/^(database)/{print}'
I ended up using #etan-reisner solution.
Here is another solution to my particular problem I found along the way:
DATABASE_NAME=$(cat "$CONFIG_PATH" | grep -v '^[[:space:]]*#' | sed -n '/database_host/p' | awk -F' ' '{print $2}');
This will filter every line that contains some spaces followed by a hash.

sed emulate "tr | grep"

Given the following file
$ cat a.txt
FOO='hhh';BAR='eee';BAZ='ooo'
I can easily parse out one item with tr and grep
$ tr ';' '\n' < a.txt | grep BAR
BAR='eee'
However if I try this using sed it just prints everything
$ sed 's/;/\n/g; /BAR/!d' a.txt
FOO='hhh'
BAR='eee'
BAZ='ooo'
With awk you could do this:
awk '/BAR/' RS=\; file
But if in the case of BAZ this would produce an extra newline, because the is no ; after the last word. If you want to remove that newline as well you would need to do something like:
awk '/BAZ/{sub(/\n/,x); print}' RS=\; file
or with GNU awk or mawk you could use:
awk '/BAZ/' RS='[;\n]'
If your grep has the -o option then you could also try this:
grep -o '[^;]*BAZ[^;]*' file
sed can do it just as you want:
sed -n 's/.*\(BAR[^;]*\).*/\1/gp' <<< "FOO='hhh';BAR='eee';BAZ='ooo'"
The point here is that you must suppress sed's default output -- the whole line --, and print only the substitutions you want to performed.
Noteworthy points:
sed -n suppresses the default output;
s/.../.../g operates in the entire line, even if already matched -- greedy;
s/.1./.2./p prints out the substituted part (.2.);
the tr part is given as the delimiter in the expression \(BAR[^;]*\);
the grep job is represented by the matching of the line itself.
awk 'BEGIN {RS=";"} /BAR/' a.txt
The following grep solution might work for you:
grep -o 'BAR=[^;]*' a.txt
$ sed 's/;/\n/g;/^BAR/!D;P;d' a.txt
BAR='eee'
replace all ; with \n
delete until BAR line is at the top
print BAR line
delete pattern space

Display all fields except the last

I have a file as show below
1.2.3.4.ask
sanma.nam.sam
c.d.b.test
I want to remove the last field from each line, the delimiter is . and the number of fields are not constant.
Can anybody help me with an awk or sed to find out the solution. I can't use perl here.
Both these sed and awk solutions work independent of the number of fields.
Using sed:
$ sed -r 's/(.*)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
Note: -r is the flag for extended regexp, it could be -E so check with man sed. If your version of sed doesn't have a flag for this then just escape the brackets:
sed 's/\(.*\)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
The sed solution is doing a greedy match up to the last . and capturing everything before it, it replaces the whole line with only the matched part (n-1 fields). Use the -i option if you want the changes to be stored back to the files.
Using awk:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file
1.2.3.4
sanma.nam
c.d.b
The awk solution just simply prints n-1 fields, to store the changes back to the file use redirection:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file > tmp && mv tmp file
Reverse, cut, reverse back.
rev file | cut -d. -f2- | rev >newfile
Or, replace from last dot to end with nothing:
sed 's/\.[^.]*$//' file >newfile
The regex [^.] matches one character which is not dot (or newline). You need to exclude the dot because the repetition operator * is "greedy"; it will select the leftmost, longest possible match.
With cut on the reversed string
cat youFile | rev |cut -d "." -f 2- | rev
If you want to keep the "." use below:
awk '{gsub(/[^\.]*$/,"");print}' your_file

Resources