Subtract length element two columns - bash

I've a file from which I get two columns: cut -d $'\t' -f 4,5 file.txt
Now I would like to get the difference in length of each element between column 1 and 2.
Input from cut command
A T
AA T
AC TC
A CT
What I would expect
0
1
0
-1

Using awk.
awk ' {print length($1) - length($2)} ' cutoutput.txt
Or awk on the original file you can simply do:
awk ' {print length($4) - length($5)} ' file.txt

You probably can do this only with awk without using cut. Since you don't have the original input file, I would use the following with a | to your cut command:
cut -d $'\t' -f 4,5 file.txt | \
awk '{for (i=1;i<NF;i++) s=length($i)-length($NF); printf s"\n"}'

Related

Finding difference of values between corresponding fields in two CSV files

I have been trying to find difference of values between corresponding fields in two CSV files
$ cat f1.csv
A,B,25,35,50
C,D,30,40,36
$
$ cat f2.csv
E,F,20,40,50
G,H,22,40,40
$
Desired output:
5 -5 0
8 0 -4
I could able to achieve it like this:
$ paste -d "," f1.csv f2.csv
A,B,25,35,50,E,F,20,40,50
C,D,30,40,36,G,H,22,40,40
$
$ paste -d "," f1.csv f2.csv | awk -F, '{print $3-$8 " " $4-$9 " " $5-$10 }'
5 -5 0
8 0 -4
$
Is there any better way to achieve it with awk alone without paste command?
As first step replace only paste with awk:
awk -F ',' 'NR==FNR {file1[FNR]=$0; next} {print file1[FNR] FS $0}' f1.csv f2.csv
Output:
A,B,25,35,50,E,F,20,40,50
C,D,30,40,36,G,H,22,40,40
Then split file1[FNR] FS $0 to an array with , as field separator:
awk -F ',' 'NR==FNR {file1[FNR]=$0; next} {split(file1[FNR] FS $0, arr, FS); print arr[3]-arr[8], arr[4]-arr[9], arr[5]-arr[10]}' f1.csv f2.csv
Output:
5 -5 0
8 0 -4
From man awk:
FNR: The input record number in the current input file.
NR: The total number of input records seen so far.
Another way using nl and awk
$ (nl f1.csv;nl f2.csv) | sort | awk -F, ' {a1=$3;a2=$4;a3=$5; getline; print a1-$3,a2-$4,a3-$5 } '
5 -5 0
8 0 -4
$

Extracting unique columns from a file into a comma separated list with a particular order

I have a .csv file with these values
product,0 0,no way
brand,0 0 0,detergent
product,0 0 1,sugar
negative,0 0 1, sight
positive, 0 0 1, salt
and I want to make a file with comma separated rows in sorted order except "negative" always is at the end.
So I want
["brand","positive","product","negative"]
I was not able to automate this process so what I did was
awk -F ',' '{print $1}' file.csv | sort | uniq -c > file2.txt
awk '{if(NR>1) printf ", ";printf("\"%s\"",$0)} END {print ""}' file2.txt > file3.txt
I get "brand","negative","positive","product"
Then I manually move "negative" to the end and also append [ and ] to front and back to get
["brand","positive","product","negative"]
Is there a way to make it more efficient and automate the process?
another solution with easy to understand steps
$ awk -F, '{print ($1=="negative"?1:0) "\t\"" $1 "\""}' file | # mark negatives
sort | cut -f2 | uniq | # sort, cut, uniq
paste -sd, | sed 's/^/[/;s/$/]/' # serialize, add brackets
["brand","positive","product","negative"]
Here is a single gnu awk command to make it work:
awk -F, '{
a[$1] = ($1 == "negative" ? "~" : "") $1
}
END {
n = asort(a)
printf "["
for (i = 1; i <= n; i++) {
sub(/^~/, "", a[i])
printf "\"%s\"%s", a[i], (i < n ? ", " : "]\n")
}
}' file.csv
["brand", "positive", "product", "negative"]
There are lots of ways to approach this. Do you really want the result as what looks like a JSON array, with square brackets and quotation marks around the column names? If so, then jq is probably a good tool to use to generate it. Something like this will do it all as a single jq program:
jq -csR '[split("\n")|
map(select(length>0))[]|
split(",")[0]]|
sort_by(if .=="negative" then "zzzz" else . end)' file.csv
Which outputs this:
["brand","positive","product","negative"]
If you just want the headings separated by commas in a line without the other punctuation, suitable for heading up a CSV file, you can use more traditional text-manipulation commands:
cut -d, -f1 file.csv |
sed 's/negative/zzz&/' |
sort -u |
sed 's/zzz//' |
paste -d, -s -
Or you can slightly modify the jq command by adding the -r flag and another pipe at the end:
jq -csrR '[split("\n")|
map(select(length>0))[]|
split(",")[0]]|
sort_by(if .=="negative" then "zzzz" else . end)|
join(",")' file.csv
Either of which outputs this:
brand,positive,product,negative
Using Perl one-liner
$ cat unique.txt
product,0 0,no way
brand,0 0 0,detergent
product,0 0 1,sugar
negative,0 0 1, sight
positive, 0 0 1, salt
$ perl -F, -lane ' { $x=$F[0];$x=~s/^(negative)/z\1/g;$rating{$x}++ } END {$q="\x22";$y=join("$q,$q",sort keys %rating) ; $y=~s/${q}z/$q/g; print "[$q$y$q]" }' unique.txt
["brand","positive","product","negative"]
$
This worked for me:
cut -d, -f1 file.csv | sort -u | sed "/^negative/d" | tr '\n' ',' | sed -e 's/^/["/' -e 's/,/","/g' -e 's/$/negative"]/'

Adjusting column padding in bash

Any idea how can I put the output as the following?
Input:
1 GATTT
2 ATCGT
Desired output:
1 GATTT
2 ATCGT
I tried the following and it did not work
cut -c7,1-6,8-
$ awk -v OFS='\t' '{print $1,$2}' input
1 GATTT
2 ATCGT
or
$ awk '{print $1 "\t" $2}' input
SED can also be used:
sed "s/[:digit:]* .*/ &/g" input
1 GATTT
2 ATCGT
I'm assuming that the original whitespace were 6 spaces based on your cut command. The easiest way to knock this out with simple bash commands is using a tab for separation on the output.
echo " 1 GATTT" | cut -d ' ' -f 7- | tr ' ' '\t'
The cut command makes the delimeter a space character and takes from field 7 on. Then the tr (translate) command converts the remaining space to a tab.

awk print something if column is empty

I am trying out one script in which a file [ file.txt ] has so many columns like
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha| |325
xyz| |abc|123
I would like to get the column list in bash script using awk command if column is empty it should print blank else print the column value
I have tried the below possibilities but it is not working
cat file.txt | awk -F "|" {'print $2'} | sed -e 's/^$/blank/' // Using awk and sed
cat file.txt | awk -F "|" '!$2 {print "blank"} '
cat file.txt | awk -F "|" '{if ($2 =="" ) print "blank" } '
please let me know how can we do that using awk or any other bash tools.
Thanks
I think what you're looking for is
awk -F '|' '{print match($2, /[^ ]/) ? $2 : "blank"}' file.txt
match(str, regex) returns the position in str of the first match of regex, or 0 if there is no match. So in this case, it will return a non-zero value if there is some non-blank character in field 2. Note that in awk, the index of the first character in a string is 1, not 0.
Here, I'm assuming that you're interested only in a single column.
If you wanted to be able to specify the replacement string from a bash variable, the best solution would be to pass the bash variable into the awk program using the -v switch:
awk -F '|' -v blank="$replacement" \
'{print match($2, /[^ ]/) ? $2 : blank}' file.txt
This mechanism avoids problems with escaping metacharacters.
You can do it using this sed script:
sed -r 's/\| +\|/\|blank\|/g' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
If you don't want the |:
sed -r 's/\| +\|/\|blank\|/g; s/\|/ /g' File
abc pqr lmn 123
pqr xzy 321 azy
lee cha blank 325
xyz blank abc 123
Else with awk:
awk '{gsub(/\| +\|/,"|blank|")}1' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
You can use awk like this:
awk 'BEGIN{FS=OFS="|"} {for (i=1; i<=NF; i++) if ($i ~ /^ *$/) $i="blank"} 1' file
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123

Exclude a column when pasting two data files

I have one file "dat1.txt" which is like:
0 5.71159e-01
1 1.92632e-01
2 -4.73603e-01
and another file "dat2.txt" which is:
0 5.19105e-01
1 2.29702e-01
2 -3.05675e-01
to write combine these two files into one I use
paste dat1.txt dat2.txt > data.txt
But I do not want the 1st column of the 2nd file in the output file. How do I modify the unix command?
If your files are in sorted order along column 1, you could try:
join dat[12].txt
You could try this in awk itself,
$ awk 'FNR==NR {a[FNR]=$0;next} {print a[FNR],$2}' data1.txt data2.txt
0 5.71159e-01 5.19105e-01
1 1.92632e-01 2.29702e-01
2 -4.73603e-01 -3.05675e-01
Use cut to remove the first column and then pipe to paste.
cut -d' ' -f 1 --complement dat2.txt | paste dat1.txt - > data.txt
Note that the - in the past ecommand means to read from stdin in place of the second file.
If cut is broken on OSX, awk might work.
awk '{for (i=2; i<=NF; i++) print $i}' dat2.txt | paste dat1.txt - > data.txt
paste dat1.txt <(cut -d" " -f2- dat2.txt)
Using cut to remove column 1, and using process substitution to use its output in paste
Output:
0 5.71159e-01 5.19105e-01
1 1.92632e-01 2.29702e-01
2 -4.73603e-01 -3.05675e-01

Resources