Is there a way to use the cut command in BASH to print specific columns but with characters? - bash

I know I can use -f1 to print a column, but is there a way for the cut to look through columns for a specific string and print out that column?

Not entirely clear if this is what you're looking for, but:
$ cat input
Length,Color,Height,Weight,Size
1,2,1,4,5
7,7,1,7,7
$ awk 'NR==1{for(i=1;i<=NF+1;i++) if($i==h) break; next} {print $i}' h=Color FS=, input
2
7

You can figure out the colum no by a small function like this:
function select_column() {
file="$1"
sep="$2"
col_name="$3"
# get the separators before the field:
separators=$(head -n 1 "${file}" | sed -e"s/\(.*${sep}\|^\)${col_name}\(${sep}.*\|$\)/\1/g" | tr -d -c ",")
# add one, because before the n-th fields there are n-1 separators
((field_no=${#separators}+1))
# now just call cut and skip the first row by using tail -n +2
cut -d "${sep}" -f ${field_no} "${file}" | tail -n +2
}
When called with:
select_column testfile.csv "," subno
it outputs:
10
76
55
83
30
53
67
25
52
16
57
86
2
75
28
on the following testfile.csv:
rand2,no,subno,rand1
john,8017610,10,96
ringo,5673276,76,42
ringo,9260555,55,19
john,7565683,83,72
ringo,8833230,30,35
paul,1571553,53,55
john,9972467,67,80
ringo,922025,25,88
paul,9908052,52,1
john,6264216,16,19
paul,4350857,57,3
paul,7253386,86,50
john,3426002,2,57
ringo,1437775,75,85
paul,4384228,28,77

Related

Using BASH, selecting row and column [CUT command?]

1 A 18 -180
2 B 19 -180
3 C 20 -150
50 D 21 -100
128 E 22 -130
10 F 23 -0
10 G 23 -0
In the above file, I can easily print out the column using cat command.
cat /file_directory | cut -d' ' -f3
In that case, the output would be the third column.
But, What I want to do is something different. For example, I wanna pick the element depending on the row element.
So if I pick B from the second row, the printout would be [row associated "B" in the second column ][column =3] = [2][3]. which is only 19, not anything else. HOW TO DO IT?
Use awk:
$ awk '$2 == "B" {print $3}' file.txt
19
awk splits each row into fields (by default using arbitrary whitespace a the field delimiters). Each statement has two parts: a pattern to select a line, and an action to take on a selected line. In the above, we check if the 2nd column ($2) has the value "B"; for each line for which that is true, we print the value in the 3rd column.
#!/bin/sh
cat sofile+.txt | tr -s ' ' > sofile1+.txt
mv sofile1+.txt sofile+.txt
cat > edcommands+.sh << EOF
/B/
EOF
line=$(ed -s sofile+.txt < edcommands+.txt)
echo ${line} | cut -d' ' -f2,3
rm ./edcommands+.txt
Sofile+.txt is what contains your data.
You might also need to install ed for this, since it isn't in most distributions by default any more, sadly.

Update values in column in a file based on values from an array using bash script

I have a text file with the following details.
#test.txt
team_id team_level team_state
23 2
21 4
45 5
I have an array in my code teamsstatearr=(12 34 45 ...) and I want to be able add the value in the array to the third column. The array could have many elements and the test.txt file is just a small portion that I have shown below.
Details of the file contents:
The text file has only three headers. The headers are separated by tab. The number of rows in the file are equivalent to the number of items in the array as well.
Thus my test.txt would look like the following.
team_id team_level team_state
23 2 12
21 4 34
45 5 45
(many more rows are present)
What I have done as of now: I don't see the file have the update in the third column with the values.
# Write the issue ids to file
for item in "${teamstatearr[#]}"
do
printf '%s\n' "item id in loop: ${item}"
awk -F, '{$2=($item)}1' OFS='\t', test.txt
done
I would appreciate if anyone could help me find the most easiest and efficient way to do it.
If you don't mind a slightly different table layout, you could do:
teamsstatearr=(12 34 45)
{
# print header
head -n1 test.txt
# combine the remaining lines of test.txt and the array values
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
# use `column -t` to format the output as table
} | column -t
Output:
team_id team_level team_state
23 2 12
21 4 34
45 5 45
To write the output to the same file, you can redirect the output to a new file and overwrite the original file with mv:
teamsstatearr=(12 34 45)
{
head -n1 test.txt
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
} | column -t > temp && mv temp test.txt
If you have sponge from the moreutils package installed, you could to this without a temporary file:
teamsstatearr=(12 34 45)
{
head -n1 test.txt
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
} | column -t | sponge test.txt
Or using awk and column (with the same output):
teamsstatearr=(12 34 45)
awk -v str="${teamsstatearr[*]}" '
BEGIN{split(str, a)} # split `str` into array `a`
NR==1{print; next} # print header
{print $0, a[++cnt]} # print current line and next array element
' test.txt | column -t

how to find maximum and minimum values of a particular column using AWK [duplicate]

I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!
Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`
a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}
Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.
late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat
A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)
#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`

how to append some text before each matched lines

I need to append two words before each match lines -
have following text file -
demo.txt -
Good
70 80 75 77 82
Best
Fail
34 32 30 24 29
What I am looking for is, if it find Good it should append 2 words before it like below -
Good
(sysdate) Good 70 80 75 77 82
and if no record found, should not do anything like with Best record as there's no record so no need to append (sysdate) Best in front of it's line.
but trick here is, it should check two conditions, first status as Good, Best or Fail status from file and second if associate record is blank then no need to append anything.
below is short code of shell script -
#!/bin/bash
TIME=`date +"%Y-%m-%d %H:%M:%S"`
log="demo.txt"
for line in $log
do
if $line eq 'Good'; then
sed "/$line/!p s/[[:space:]]\+$//g" $log | sed "s/^/$TIME,Good /g" $log | sed 's/ /,/g' $log > demo.csv
elif $line eq 'Best'; then
sed "/$line/!p s/[[:space:]]\+$//g" $log | sed "s/^/$TIME,Best /g" $log | sed 's/ /,/g' $log > demo.csv
else
sed "/$line/!p s/[[:space:]]\+$//g" $log | sed "s/^/$TIME,Fail /g" $log | sed 's/ /,/g' $log > demo.csv
fi
done
Note :- looking for below output to csv file -
demo.csv -
Good
(sysdate),Good,70,80,75,77,82
Best
Fail
(sysdate),Fail,34,32,30,24,29
Input
$ cat demo.txt
Good
70 80 75 77 82
Best
Fail
34 32 30 24 29
Output
$ awk -v OFS="," 'NF==1{ print; s=$0; next}{$1=$1; print "(sysdate)",s,$0}' demo.txt
Good
(sysdate),Good,70,80,75,77,82
Best
Fail
(sysdate),Fail,34,32,30,24,29
With datetime
$ awk -v t="$(date +'%Y-%m-%d %H:%M:%S')" -v OFS="," 'NF==1{ print; s=$0; next}{$1=$1; print t,s,$0}' demo.txt
Good
2017-03-15 17:12:16,Good,70,80,75,77,82
Best
Fail
2017-03-15 17:12:16,Fail,34,32,30,24,29
With gawk
$ awk -v OFS="," 'BEGIN{t=strftime("%Y-%m-%d %H:%M:%S",systime())}NF==1{ print; s=$0; next}{$1=$1; print t,s,$0}' demo.txt
Good
2017-03-15 17:18:50,Good,70,80,75,77,82
Best
Fail
2017-03-15 17:18:50,Fail,34,32,30,24,29
Explanation
awk -v OFS="," ' # call awk set o/p field separator as comma
BEGIN{ # Begin block here we save system datetime in variable t and is gwak specific
t=strftime("%Y-%m-%d %H:%M:%S",systime())
}
NF==1{ # if no of fields/columns is equal to 1 then
print; # print current record/line/row
s=$0; # save current line in variable s
next # stop processing go to next line
}
{
$1=$1; # record recompilation
# since you need comma as separator between fields in o/p,
# you can also do $2=$2
# assigning any value to any field ($1, etc.)
# causes record recompilation
print t,s,$0 # print variable t, s and current line
}' demo.txt

awk: find minimum and maximum in column

I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!
Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`
a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}
Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.
late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat
A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)
#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`

Resources