How to get the second column from command output? - shell

My command's output is something like:
1540 "A B"
6 "C"
119 "D"
The first column is always a number, followed by a space, then a double-quoted string.
My purpose is to get the second column only, like:
"A B"
"C"
"D"
I intended to use <some_command> | awk '{print $2}' to accomplish this. But the question is, some values in the second column contain space(s), which happens to be the default delimiter for awk to separate the fields. Therefore, the output is messed up:
"A
"C"
"D"
How do I get the second column's value (with paired quotes) cleanly?

Use -F [field separator] to split the lines on "s:
awk -F '"' '{print $2}' your_input_file
or for input from pipe
<some_command> | awk -F '"' '{print $2}'
output:
A B
C
D

If you could use something other than 'awk' , then try this instead
echo '1540 "A B"' | cut -d' ' -f2-
-d is a delimiter, -f is the field to cut and with -f2- we intend to cut the 2nd field until end.

This should work to get a specific column out of the command output "docker images":
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 16.04 12543ced0f6f 10 months ago 122 MB
ubuntu latest 12543ced0f6f 10 months ago 122 MB
selenium/standalone-firefox-debug 2.53.0 9f3bab6e046f 12 months ago 613 MB
selenium/node-firefox-debug 2.53.0 d82f2ab74db7 12 months ago 613 MB
docker images | awk '{print $3}'
IMAGE
12543ced0f6f
12543ced0f6f
9f3bab6e046f
d82f2ab74db7
This is going to print the third column

Or use sed & regex.
<some_command> | sed 's/^.* \(".*"$\)/\1/'

You don't need awk for that. Using read in Bash shell should be enough, e.g.
some_command | while read c1 c2; do echo $c2; done
or:
while read c1 c2; do echo $c2; done < in.txt

If you have GNU awk this is the solution you want:
$ awk '{print $1}' FPAT='"[^"]+"' file
"A B"
"C"
"D"

awk -F"|" '{gsub(/\"/,"|");print "\""$2"\""}' your_file

#!/usr/bin/python
import sys
col = int(sys.argv[1]) - 1
for line in sys.stdin:
columns = line.split()
try:
print(columns[col])
except IndexError:
# ignore
pass
Then, supposing you name the script as co, say, do something like this to get the sizes of files (the example assumes you're using Linux, but the script itself is OS-independent) :-
ls -lh | co 5

Related

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

how to find maximum and minimum values of a particular column using AWK [duplicate]

I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!
Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`
a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}
Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.
late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat
A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)
#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`

awk: find minimum and maximum in column

I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!
Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`
a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}
Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.
late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat
A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)
#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`

How to print all the columns after a particular number using awk?

On shell, I pipe to awk when I need a particular column.
This prints column 9, for example:
... | awk '{print $9}'
How can I tell awk to print all the columns including and after column 9, not just column 9?
awk '{ s = ""; for (i = 9; i <= NF; i++) s = s $i " "; print s }'
When you want to do a range of fields, awk doesn't really have a straight forward way to do this. I would recommend cut instead:
cut -d' ' -f 9- ./infile
Edit
Added space field delimiter due to default being a tab. Thanks to Glenn for pointing this out
awk '{print substr($0, index($0,$9))}'
Edit:
Note, this doesn't work if any field before the ninth contains the same value as the ninth.
sed -re 's,\s+, ,g' | cut -d ' ' -f 9-
Instead of dealing with variable width whitespace, replace all whitespace as single space. Then use simple cut with the fields of interest.
It doesn't use awk so isn't germane but seemed appropriate given the other answers/comments.
Generally perl replaces awk/sed/grep et. al., and is much more portable (as well as just being a better penknife).
perl -lane 'print "#F[8..$#F]"'
Timtowtdi applies of course.
awk -v m="\x01" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
This chops what is before the given field nr., N, and prints all the rest of the line, including field nr.N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line, which is the problem with Ascherer's answer.
Define a function:
fromField () {
awk -v m="\x01" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
For N=0 it returns the whole line, as is, and for n>NF the empty string
Here is an example of ls -l output:
-rwxr-----# 1 ricky.john 1493847943 5610048 Apr 16 14:09 00-Welcome.mp4
-rwxr-----# 1 ricky.john 1493847943 27862521 Apr 16 14:09 01-Hello World.mp4
-rwxr-----# 1 ricky.john 1493847943 21262056 Apr 16 14:09 02-Typical Go Directory Structure.mp4
-rwxr-----# 1 ricky.john 1493847943 10627144 Apr 16 14:09 03-Where to Get Help.mp4
My solution to print anything post $9 is awk '{print substr($0, 61, 50)}'
Using cut instead of awk and overcoming issues with figuring out which column to start at by using the -c character cut command.
Here I am saying, give me all but the first 49 characters of the output.
ls -l /some/path/*/* | cut -c 50-
The /*/*/ at the end of the ls command is saying show me what is in subdirectories too.
You can also pull out certain ranges of characters ala (from the cut man page). E.g., show the names and login times of the currently logged in users:
who | cut -c 1-16,26-38
To display the first 3 fields and print the remaining fields you can use:
awk '{s = ""; for (i=4; i<= NF; i++) s= s $i : "; print $1 $2 $3 s}' filename
where $1 $2 $3 are the first 3 fields.
function print_fields(field_num1, field_num2){
input_line = $0
j = 1;
for (i=field_num1; i <= field_num2; i++){
$(j++) = $(i);
}
NF = field_num2 - field_num1 + 1;
print $0
$0 = input_line
}
Usually it is desired to pass the remaining columns unmodified. That is, without collapsing contiguous white space.
Imagine the case of processing the output of ls -l or ps faux (not recommended, just giving examples where the last column may contain sequences of whitespace)). We'd want any contiguous white space in the remaning columns preserved so that a file named my file.txt doesn't become my file.txt.
Preserving white space for the remainder of the line is surprisingly difficult using awk. The accepted awk-based answer does not, even with the suggested refinements.
sed or perl are better suited to this task.
sed
echo '1 2 3 4 5 6 7 8 9 10' | sed -E 's/^([^ \t]*[ \t]*){8}//'
Result:
9 10
The -E option enables modern ERE regex syntax. This saves me the trouble of backslash escaping the parentheses and braces.
The {8} is a quantifier indicating to match the previous item exactly 8 times.
The sed s command replaces 8 occurrences of white space delimited words by an empty string. The remainder of the line is left intact.
perl
Perl regex supports the \h escape for horizontal whitespace.
echo '1 2 3 4 5 6 7 8 9 10' | perl -pe 's/^(\H*\h*){8}//'
Result:
9 10
ruby -lane 'print $F[3..-1].join(" ")' file

awk line break with printf

I have a simple shell script, shown below, and I want to put a line break after each line returned by it.
#!/bin/bash
vcount=`db2 connect to db_lexus > /dev/null; db2 list tablespaces | grep -i "Tablespace ID" | wc -l`
db2pd -d db_lexus -tablespaces | grep -i "Tablespace Statistics" -A $vcount | awk '{printf ($2 $7)}'
The output is:
Statistics:IdFreePgs0537610230083224460850d
and I want the output to be something like that:
Statistics:
Id FreePgs
0 5376
1 0
2 3008
3 224
4 608
5 0
Is that possible to do with shell scripting?
Your problem can be reduced to the following:
$ cat infile
11 12
21 22
$ awk '{ printf ($1 $2) }' infile
11122122
printf is for formatted printing. I'm not even sure if the behaviour of above usage is defined, but it's not how it's meant to be done. Consider:
$ awk '{ printf ("%d %d\n", $1, $2) }' infile
11 12
21 22
"%d %d\n" is an expression that describes how to format the output: "a decimal integer, a space, a decimal integer and a newline", followed by the numbers that go where the %d are. printf is very flexible, see the manual for what it can do.
In this case, we don't really need the power of printf, we can just use print:
$ awk '{ print $1, $2 }' infile
11 12
21 22
This prints the first and second field, separated by a space1 – and print does add a newline without us telling it to.
1More precisely, "separated by the value of the output field separator OFS", which defaults to a space and is printed wherever we use , between two arguments. Forgetting the comma is a popular mistake that leads to no space between the record fields.
It looks like you just want to print columns 2 and 7 of whatever is passed to AWK. Try changing your AWK command to
awk '{print $2, $7}'
This will also add a line break at the end.
I realize you are asking about how to do something in a shell script, but it would certainly be a LOT easier to get this from the database using SQL:
#!/bin/bash
export DB2DBDFT=db_lexus
db2 "select tbsp_id, tbsp_free_pages \
from table(mon_get_tablespace('',-2)) as T \
order by tbsp_id"

Resources