I wrote program that should write words from example.txt from the longest to the shortest. I don't know how exactly '^.{$v}$' should look like to make it work?
#!/bin/bash
v=30
while [ $v -gt 0 ] ; do
grep -P '^.{$v}$' example.txt
v=$(($v - 1))
done
I tried:
${v}
$v
"$v"
It is my first question, sorry for any mistake :)
What you're doing is not how you'd approach this problem in shell. Read why-is-using-a-shell-loop-to-process-text-considered-bad-practice to learn some of the issues and then this is how you'd really do what you're trying to do in a shell script:
$ cat file
now
is
the
winter
of
our
discontent
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n | cut -f3-
discontent
winter
now
the
our
is
of
To understand what that's doing, look at the awk output:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file
3 1 now
2 2 is
3 3 the
6 4 winter
2 5 of
3 6 our
10 7 discontent
The first number is the length of each line and the second number is the order the lines appeared in the input file so when we come to sort it:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n
10 7 discontent
6 4 winter
3 1 now
3 3 the
3 6 our
2 2 is
2 5 of
we can sort by length (longest first) with -k1rn but retain the order from the input file for lines that are the same length by adding -k2n. Then the cut just removes the 2 leading numbers that awk added for sort to use.
use :
grep -P "^.{$v}$" example.txt
Related
I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!
Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`
a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}
Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.
late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat
A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)
#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`
I have a line like this
3672975 3672978 3672979
awk '{print $1}' will return the first number 3672975
If I still want the first number, but indicating it is the 3rd one from the bottom, how should I adjust awk '{print $-3}'?
The reason is, I have hundreds of numbers, and I always want to obtain the 3rd one from the bottom.
Can I use awk to obtain the total number of items first, then do the subtraction?
$NF is the last field, $(NF-1) is the one before the last etc., so:
$ awk '{print $(NF-2)}'
for example:
$ echo 3672975 3672978 3672979 | awk '{print $(NF-2)}'
3672975
Edit:
$ echo 1 10 100 | awk '{print $(NF-2)}'
1
or with cut and rev
echo 1 2 3 4 | rev | cut -d' ' -f 3 | rev
2
I am trying to write a script that does the following:
Given a string that look like this "There are 5 apples and 3 oranges"
Extract the two integers (5, 3)
Compare them
I got the extract part done.
NUM=echo $String | grep -o "[0-9]\+"
But NUM will be something like this:
5
3
\n
I tried ${NUM[0]} and ${NUM[#]} just to get the first value but it doesn't work.
Any suggestions?
I would do this with process substitution and mapfile:
$ mapfile -t nums < <(grep -Eo '[[:digit:]]+' <<< 'There are 5 apples and 3 oranges')
$ declare -p nums
declare -a nums='([0]="5" [1]="3")'
This makes sure that only newlines separate array elements. This wouldn't be a problem in this case as the search terms are sequences of numbers, but it's a robust approach that would work for any pattern.
Notice that mapfile requires Bash 4.0 or newer.
The way you assign to NUM is incorrect.
So is the grep pattern in your post.
Write like this:
input='There are 5 apples and 3 oranges'
nums=($(grep -Eo '[0-9]+' <<< "$input"))
${nums[0]} will contain the first number, ${nums[1]} the 2nd, and so on.
If the input comes from a command:
nums=($(cmd | grep -Eo '[0-9]+'))
With GNU awk for FPAT:
$ echo 'There are 5 apples and 3 oranges' |
awk -v FPAT='[0-9]+' '{print ($1 > $2 ? "greater" : "lesser")}'
greater
$ echo 'There are 2 apples and 3 oranges' |
awk -v FPAT='[0-9]+' '{print ($1 > $2 ? "greater" : "lesser")}'
lesser
with GNU awk:
gawk '{if($1>$2){print $1">"$2}else if($1<$2){print $1"<"$2} else {print $1"="$2}}' FPAT='[0-9]+' <<<'There are 5 apples and 8 oranges'
The value of FPAT should be a string that provides a regular
expression. This regular expression describes the contents of each
field.
This question already has an answer here:
Find patterns of a file in another file and print out a corresponding field of the latter maintaining the order
(1 answer)
Closed 5 years ago.
I have a file containing following numbers
file1.txt
1
5
6
8
14
I have another file named rmsd.txt which contains values like the following
1 2.12
2 3.1243
3 4.156
4 3.22
5 3.882
6 8.638
7 8.838
8 7.5373
9 10.7373
10 8.3527
11 3.822
12 5.672
13 7.23
14 5.9292
I want to get the values of column 2 from rmsd.txt for the numbers present in file.txt. I want to get something like the following
1 2.12
5 3.882
6 8.638
8 7.5373
14 5.9292
I can do that by do like that grep 1 rmsd.txt and so on but it will take a long time. I was trying a for loop something like
for a in awk '{print $1}' file.txt; do
grep $a rmsd.txt >result.txt
done
But it didn't work. Maybe it is very simple and I am thinking in a wrong direction. Any help will be highly appreciated.
for WORD in `cat FILE`
do
echo $WORD
command $WORD > $WORD
done
Origina source
EDIT: Here is your code with few fixes:
for a in `awk '{print $1}' file.txt`
do
grep $a rmsd.txt >>result.txt
done
This is tailor made job for awk:
awk 'NR==FNR{a[$1]; next} $1 in a' file1.txt msd.txt
1 2.12
5 3.882
6 8.638
8 7.5373
14 5.9292
Most likely it a duplicate, as soon as I find a good dup, I will mark it so.
awk to the rescue!
$ awk 'NR==FNR{a[$1];next} $1 in a' file1 file2
1 2.12
5 3.882
6 8.638
8 7.5373
14 5.9292
or with sort/join
$ join <(sort file1) <(sort file2) | sort -n
or grep/sed
$ grep -f <(sed 's/.*/^&\\b/' file1) file2
I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space.
I want to find the minimum and maximum of the first column.
The data file looks like this:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`
However, the output is min=10 and max=9.
(The similar commands can return me the right minimum and maximum of the second column.)
Could someone tell me where I was wrong? Thank you!
Awk guesses the type.
String "10" is less than string "4" because character "1" comes before "4".
Force a type conversion, using addition of zero:
min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`
a non-awk answer:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}
Your problem was simply that in your script you had:
if ($1<a) a=$1 fi
and that final fi is not part of awk syntax so it is treated as a variable so a=$1 fi is string concatenation and so you are TELLING awk that a contains a string, not a number and hence the string comparison instead of numeric in the $1<a.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN pick whatever you'd prefer to print when the input file is empty.
late but a shorter command and with more precision without initial assumption:
awk '(NR==1){Min=$1;Max=$1};(NR>=2){if(Min>$1) Min=$1;if(Max<$1) Max=$1} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat
A very straightforward solution (if it's not compulsory to use awk):
Find Min --> sort -n -r numbers.txt | tail -n1
Find Max --> sort -n -r numbers.txt | head -n1
You can use a combination of sort, head, tail to get the desired output as shown above.
(PS: In case if you want to extract the first column/any desired column you can use the cut command i.e. to extract the first column cut -d " " -f 1 sample.dat)
#minimum
cat your_data_file.dat | sort -nk3,3 | head -1
#this fill find minumum of column 3
#maximun
cat your_data_file.dat | sort -nk3,3 | tail -1
#this will find maximum of column 3
#to find in column 2 , use -nk2,2
#assing to a variable and use
min_col=`cat your_data_file.dat | sort -nk3,3 | head -1 | awk '{print $3}'`