Computing differences between columns of tab delimited file - bash

I have a tab delimited file of 4 columns and n number of rows.
I want to find the difference in values present in column 3 and 2 and want to store them in another file.
This is what I am doing
cat filename | awk '{print $3 - $2}'>difference
and it is not working. How can I improve the code?
Solution:
I was missing the closing single quotation, and my eyes were so tuned to the screen that I couldn't figure it out in 35 lines code what was going wrong...and out of frustration I wrote the question on forum ... and [to complete] the comedy of errors, the syntax I wrote here [in the] question is correct (as it contains both single quotes).
Thank you all for your help.

Set the field separator if you have other whitespace in the lines.
BEGIN {
FS="\t"
}

Try using -F to force the delimiter as tab and enclose your
cat filename | awk -F"\t" '{print $3 - $2}' > difference

Does anyone test before they give their answers/ awk breaks on white space and not just spaces.
I just did this:
awk '{print $3 - $2}' temp.txt
And it works perfectly.
Here's my file:
1 2 7 4
11 12 13 14
1 12 3 4
1 2 3 4
1 2 3 4
And here's my results:
$ awk '{print $3 - $2}' temp.txt
5
1
-9
1
1
$
In fact, I used your command, and got the same results?
Can you explain what's not working for you? What data are you using, and what results are you getting?

Try this:
cat filename | awk -F '^T' '{print $3 - $4}' > difference
where ^T is tab delimiter (get it by pressing Ctrl+V+T)

Related

How to extract rows present only once by column via commandline

I have a space separated file as shown below:
D2ABMACXX:5:1101:10000:93632_1:N:0 c111 12462 6
D2ABMACXX:5:1101:10004:54586_1:N:0 c6753 3473 1
D2ABMACXX:5:1101:10004:54586_2:N:0 c7000 5726 1
D2ABMACXX:5:1101:10006:56411_1:N:0 c4282 877 42
D2ABMACXX:5:1101:10006:56411_2:N:0 c5703 240 6
D2ABMACXX:5:1101:10013:29259_2:N:0 c6008 384 11
I would need to extract rows that are present only once based on the text before "_" in column 1. The sample output should look like below:
##required output format###
D2ABMACXX:5:1101:10000:93632_1:N:0 c111 12462 6
D2ABMACXX:5:1101:10013:29259_2:N:0 c6008 384 11
I have a complicated way of doing this but loosing original information:
cat file.txt | awk '{print $2,$3,$4,$1}' | sed 's/_1//g; s/_2//g' | uniq -f 3 -u
Could anyone suggest an optimal way of doing this on a huge text file ~10Gb getting the output in the same format as that of input as shown in the required output format?
You can try doing all with awk, for example:
awk -F'_' '{ uniqs[$1] = $0; count[$1]++ } END { for (uniq in uniqs) if ( count[uniq] == 1 ) print uniqs[uniq] }' file.txt

Remove comma using awk command with multiple record

Let says i've record like this.
Input
1,1,1,1.213,1,1,1.23
2,2,2,2.345,2,2,2.33
3,3,3,3.456,3,3,3.44
I want to be like this
Output
1,1,1,1,1,1,1.23
2,2,2,2,2,2,2.33
3,3,3,3,3,3,3.44
How to remove the comma only on column number 4th ?, i don't want to remove comma on last column
You can use:
awk -F"," '{print $1,$2,$3,int($4),$5,$6,$7}'
The int() is what you are looking for I guess.
Example:
$ cat test
1,1,1,1.213,1,1,1.23
2,2,2,2.345,2,2,2.33
3,3,3,3.456,3,3,3.44
$ awk -F"," '{print $1,$2,$3,int($4),$5,$6,$7}' test
1 1 1 1 1 1 1.23
2 2 2 2 2 2 2.33
3 3 3 3 3 3 3.44
Edit (Good suggestion from ccf):
You could use this instead of the long version of awk command above.
$ awk -F',' '{$4=int($4); print}'
1,1,1,1.213,1,1,1.23
1 1 1 1 1 1 1.23
If temp.txt has the input, then
$ cat temp.txt | sed 's/\.[0-9]\+//1'
1,1,1,1,1,1,1.23
2,2,2,2,2,2,2.33
3,3,3,3,3,3,3.44
1 at the end means, only replace the first match.

Linux bash grouping

I have this file:
count,name
1,B1
1,B1
1,B3
1,B3
1,B2
1,B2
1,B2
and I routinely have to get counters on the total per group. The first number is always one. The only important thing is the group. I wrote a java program to do it for me. The output would be
B1: 2
B2: 3
B3: 2
The format is not important, just the counters per group name.
I was wondering, can this be done in bash? awk? sed?
Well, it is very simple to solve with sort and uniq:
$ sort file | uniq -c
2 1,B1
3 1,B2
2 1,B3
Then, if you need the proper formatting, you may use cut to strip the first column, and awk to print the result:
$ cut -d ',' -f 2 file | sort | uniq -c | awk '{printf "%s: %d\n", $2, $1}'
B1: 2
B2: 3
B3: 2
With awk, I would write
awk -F, 'NR>1 {n[$2]++} END {OFS=":";for (x in n) print x, n[x]}' file
assuming you actually have a header line in the file.

How to get the second column from command output?

My command's output is something like:
1540 "A B"
6 "C"
119 "D"
The first column is always a number, followed by a space, then a double-quoted string.
My purpose is to get the second column only, like:
"A B"
"C"
"D"
I intended to use <some_command> | awk '{print $2}' to accomplish this. But the question is, some values in the second column contain space(s), which happens to be the default delimiter for awk to separate the fields. Therefore, the output is messed up:
"A
"C"
"D"
How do I get the second column's value (with paired quotes) cleanly?
Use -F [field separator] to split the lines on "s:
awk -F '"' '{print $2}' your_input_file
or for input from pipe
<some_command> | awk -F '"' '{print $2}'
output:
A B
C
D
If you could use something other than 'awk' , then try this instead
echo '1540 "A B"' | cut -d' ' -f2-
-d is a delimiter, -f is the field to cut and with -f2- we intend to cut the 2nd field until end.
This should work to get a specific column out of the command output "docker images":
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 16.04 12543ced0f6f 10 months ago 122 MB
ubuntu latest 12543ced0f6f 10 months ago 122 MB
selenium/standalone-firefox-debug 2.53.0 9f3bab6e046f 12 months ago 613 MB
selenium/node-firefox-debug 2.53.0 d82f2ab74db7 12 months ago 613 MB
docker images | awk '{print $3}'
IMAGE
12543ced0f6f
12543ced0f6f
9f3bab6e046f
d82f2ab74db7
This is going to print the third column
Or use sed & regex.
<some_command> | sed 's/^.* \(".*"$\)/\1/'
You don't need awk for that. Using read in Bash shell should be enough, e.g.
some_command | while read c1 c2; do echo $c2; done
or:
while read c1 c2; do echo $c2; done < in.txt
If you have GNU awk this is the solution you want:
$ awk '{print $1}' FPAT='"[^"]+"' file
"A B"
"C"
"D"
awk -F"|" '{gsub(/\"/,"|");print "\""$2"\""}' your_file
#!/usr/bin/python
import sys
col = int(sys.argv[1]) - 1
for line in sys.stdin:
columns = line.split()
try:
print(columns[col])
except IndexError:
# ignore
pass
Then, supposing you name the script as co, say, do something like this to get the sizes of files (the example assumes you're using Linux, but the script itself is OS-independent) :-
ls -lh | co 5

awk line break with printf

I have a simple shell script, shown below, and I want to put a line break after each line returned by it.
#!/bin/bash
vcount=`db2 connect to db_lexus > /dev/null; db2 list tablespaces | grep -i "Tablespace ID" | wc -l`
db2pd -d db_lexus -tablespaces | grep -i "Tablespace Statistics" -A $vcount | awk '{printf ($2 $7)}'
The output is:
Statistics:IdFreePgs0537610230083224460850d
and I want the output to be something like that:
Statistics:
Id FreePgs
0 5376
1 0
2 3008
3 224
4 608
5 0
Is that possible to do with shell scripting?
Your problem can be reduced to the following:
$ cat infile
11 12
21 22
$ awk '{ printf ($1 $2) }' infile
11122122
printf is for formatted printing. I'm not even sure if the behaviour of above usage is defined, but it's not how it's meant to be done. Consider:
$ awk '{ printf ("%d %d\n", $1, $2) }' infile
11 12
21 22
"%d %d\n" is an expression that describes how to format the output: "a decimal integer, a space, a decimal integer and a newline", followed by the numbers that go where the %d are. printf is very flexible, see the manual for what it can do.
In this case, we don't really need the power of printf, we can just use print:
$ awk '{ print $1, $2 }' infile
11 12
21 22
This prints the first and second field, separated by a space1 – and print does add a newline without us telling it to.
1More precisely, "separated by the value of the output field separator OFS", which defaults to a space and is printed wherever we use , between two arguments. Forgetting the comma is a popular mistake that leads to no space between the record fields.
It looks like you just want to print columns 2 and 7 of whatever is passed to AWK. Try changing your AWK command to
awk '{print $2, $7}'
This will also add a line break at the end.
I realize you are asking about how to do something in a shell script, but it would certainly be a LOT easier to get this from the database using SQL:
#!/bin/bash
export DB2DBDFT=db_lexus
db2 "select tbsp_id, tbsp_free_pages \
from table(mon_get_tablespace('',-2)) as T \
order by tbsp_id"

Resources