Extracting minimum and maximum from line number grep - bash

Currently, I have a command in a bash script that greps for a given string in a text file and prints the line numbers only using sed ...
grep -n "<string>" file.txt | sed -n 's/^\([0-9]*\).*/\1/p'
The grep could find multiple matches, and thus, print multiple line numbers. From this command's output, I would like to extract the minimum and maximum values, and assign those to respective bash variables. How could I best modify my existing command or add new commands to accomplish this? If using awk or sed will be necessary, I have a preference of using sed. Thanks!

You can get the minimum and maximum with this:
grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'
You can also read them into an array:
F=($(grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'))
echo ${F[0]} # min
echo ${F[1]} # max

grep -n "<string>" file.txt | sed -n -e '1s/^\([0-9]*\).*/\1/p' -e '$s/^\([0-9]*\).*/\1/p'

grep .... |awk -F: '!f{print $1;f=1} END{print $1}'

Here's how I'd do it, since grep -n 'pattern' file prints output in the format line number:line contents ...
minval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | head -1)
maxval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | tail -1)
the cut -d':' -f1 command splits the grep output around the colon and pulls out only the first field (the line numbers), sort -n sorts the numeric line numbers in ascending order (which they would already be in, but it's good practice to ensure it), then head -1 and tail -1 remove the first, and last value in the sorted list respectively, i.e. the minimum and maximum values and assign them to variables $minval and $maxval respectively.
Hope this helps!
Edit: Turns out you can't do it the way I had it originally, since echoing out a list of newline-separated values apparently concatenates them into one line.

It can be done with one process. Like this:
awk '/expression/{if(!n)print NR;n=NR} END {print n}' file.txt
Then You can assign to an array (as perreal suggested). Or You can modify this script and assign to varables using eval
eval $(awk '/expression/{if(!n)print "A="NR;n=NR} END {print "B="n}' file.txt)
echo $A
echo $B
Output (file.txt contains three lines of expression)
1
3

Related

How can I print the first matched line using sed or grep?

I have a config file where each line is in a format say UniqueOption = SomeValue:
$ cat somefile
option1sub1 = yes
option1sub2 = 1234
...
option1subn = xxxx
option2 = 2345
option3 = no
...
I want to deal with each value of "option1" in a loop. but, sed or grep give me all of option1 in one time.
How could I achieve that using sed or grep, getting a single option1 line at a time?
pipe the output of grep to a while loop:
grep 'option1' somefile | while read line
do
echo "single option is in var $line"
done
Solution 1st: Following awk may help you on same to get the value of option1 string's last value.
awk -F" = " '/^option1/{print $NF}' Input_file
Solution 2nd: Above will print all values of string option1 in case you need only very first value of string option1 then use following.
awk -F" = " '/^option1/{print $NF;exit}' Input_file
The following will parse out all sub-options for option1 in the file file.conf and save them in a bash array. The options are then easily accessed from that array.
#!/bin/bash
while IFS= read -r data; do
opt1+=( "$data" )
done < <( awk -F ' *= *' '$1 ~ /^option1/ { print $2 }' file.conf )
printf 'Option 1, sub-option 1 is "%s"\n' "${opt1[0]}"
Output:
Option 1, sub-option 1 is "yes"
The awk script will return everything after the = (and any spaces), which allows you to store data that contains multiple words. Only the lines starting with option1 in the configuration file are processed.
This would be adapted to parse the whole configuration file into a single structure, possibly using an associative array in a sufficiently recent version of bash.
Already we can see few awesome answers but as you asked something with grep, you can use one of the following if you want.
For all values
grep option1 m | cut -d "=" -f2 | awk '{$1=$1};1'
For first value
grep option1 m | cut -d "=" -f2 | awk '{$1=$1};1' | head -1
Here: cut is used to cut the second option uisng dilimiter =; awk is used to trim the spaces in output and head is used to print first occurrence
With sed
sed '/^option1.* = /!d;s///' somefile
With gnu grep 2.20 (support of pcre)
grep -oP '^option1.* = \K.*' somefile
If you want to get only the first match
sed '/^option1.* = /!d;s///;q' somefile
grep -m1 -oP '^option1.* = \K.*' somefile

Grep search on ONE LINE only?

I am using Bash to find the dimensions of a matrix. Here is my code to get the number of elements in one row, however it prints out for the whole file. I just need the number of elements in ONE ROW.
grep -oP "\^I" $1 | wc -l
Here is what the $1 is referring to:
1^I2^I3^I4$
5^I6^I7^I8$
For some reason, it is printing out 9 instead of 3.
Thanks in advance!
Use:
cat $1 | head -n 1 | sed 's/\^I/\n/g' | wc -l
I take the only the first row using head, replace every column delimiter with a newline using sed, then pipe that to wc.
You can use sed before calling grep to isolate one specific line of your file:
sed -n '1p' file | grep -oP "^I" | wc -l
^
^
# will print the 1st line, 2p will print the second line etc
on your input it gives:
using awk
$ awk -F'\\^I' 'NR==1{print NF-1}' $1
3
-F'\\^I' use ^I as field separator
NR==1 first line only
print NF-1 since the question is about counting number of ^I, need to print number of fields minus one
also, if $1 is argument being passed to shell script, use "$1" as good practice
and a guess, this is actual data OP is working with
$ cat ip.txt
1 2 3 4
5 6 7 8
$ cat -A ip.txt
1^I2^I3^I4$
5^I6^I7^I8$
$ # exit to avoid unnecessary processing of other lines
$ awk -F'\t' 'NR==1{print NF-1; exit}' ip.txt
3
sed 's:\^I:\n:g; q' | wc -l
^ ^
|_______|_______ change all ^I to \n
|_______ quit after first line

Bash : How to check in a file if there are any word duplicates

I have a file with 6 character words in every line and I want to check if there are any duplicate words. I did the following but something isn't right:
#!/bin/bash
while read line
do
name=$line
d=$( grep '$name' chain.txt | wc -w )
if [ $d -gt '1' ]; then
echo $d $name
fi
done <$1
Assuming each word is on a new line, you can achieve this without looping:
$ cat chain.txt | sort | uniq -c | grep -v " 1 " | cut -c9-
You can use awk for that:
awk -F'\n' 'found[$1] {print}; {found[$1]++}' chain.txt
Set the field separator to newline, so that we look at the whole line. Then, if the line already exists in the array found, print the line. Finally, add the line to the found array.
Note: If a line will only be suppressed once, so if the same line appears, say, 6 times, it will be printed 5 times.

Print text at specified line number bash

I want to print the text at a specified line number from a file.
Here is my bash script
line=12
sed -n "$line{p;q;}"
My line number comes in a variable. But the above code is not working. What should I do?
Using sed
line=12
sed -n "${line}p" my_file
# Multiple lines
line1=10
line2=15
sed -n "${line1},${line2}p" my_file
In awk:
awk "NR==${line}" my_file
# Multiple lines
awk "NR >= ${line1} && NR <= ${line2}" my_file
Or using head and tail but probably not as efficient:
head -${line} my_file | tail -1
# Multiple lines
head -${line2} my_file | tail -$(($line2-$line1+1))
You have to give the file name as an argument to sed.
line=12
sed -n "$line{p;q;}" filename
If you are passing the filename as an argument to a bash script, you need to use:
line=12
sed -n "$line{p;q;}" "$1"
Fast sed command (useful for bigger files) is:
n=12; sed $n'q;d' file

results of wc as variables

I would like to use the lines coming from 'wc' as variables. For example:
echo 'foo bar' > file.txt
echo 'blah blah blah' >> file.txt
wc file.txt
2 5 23 file.txt
I would like to have something like $lines, $words and $characters associated to the values 2, 5, and 23. How can I do that in bash?
In pure bash: (no awk)
a=($(wc file.txt))
lines=${a[0]}
words=${a[1]}
chars=${a[2]}
This works by using bash's arrays. a=(1 2 3) creates an array with elements 1, 2 and 3. We can then access separate elements with the ${a[indice]} syntax.
Alternative: (based on gonvaled solution)
read lines words chars <<< $(wc x)
Or in sh:
a=$(wc file.txt)
lines=$(echo $a|cut -d' ' -f1)
words=$(echo $a|cut -d' ' -f2)
chars=$(echo $a|cut -d' ' -f3)
There are other solutions but a simple one which I usually use is to put the output of wc in a temporary file, and then read from there:
wc file.txt > xxx
read lines words characters filename < xxx
echo "lines=$lines words=$words characters=$characters filename=$filename"
lines=2 words=5 characters=23 filename=file.txt
The advantage of this method is that you do not need to create several awk processes, one for each variable. The disadvantage is that you need a temporary file, which you should delete afterwards.
Be careful: this does not work:
wc file.txt | read lines words characters filename
The problem is that piping to read creates another process, and the variables are updated there, so they are not accessible in the calling shell.
Edit: adding solution by arnaud576875:
read lines words chars filename <<< $(wc x)
Works without writing to a file (and do not have pipe problem). It is bash specific.
From the bash manual:
Here Strings
A variant of here documents, the format is:
<<<word
The word is expanded and supplied to the command on its standard input.
The key is the "word is expanded" bit.
lines=`wc file.txt | awk '{print $1}'`
words=`wc file.txt | awk '{print $2}'`
...
you can also store the wc result somewhere first.. and then parse it.. if you're picky about performance :)
Just to add another variant --
set -- `wc file.txt`
chars=$1
words=$2
lines=$3
This obviously clobbers $* and related variables. Unlike some of the other solutions here, it is portable to other Bourne shells.
I wanted to store the number of csv file in a variable. The following worked for me:
CSV_COUNT=$(ls ./pathToSubdirectory | grep ".csv" | wc -l | xargs)
xargs removes the whitespace from the wc command
I ran this bash script not in the same folder as the csv files. Thus, the pathToSubdirectory
You can assign output to a variable by opening a sub shell:
$ x=$(wc some-file)
$ echo $x
1 6 60 some-file
Now, in order to get the separate variables, the simplest option is to use awk:
$ x=$(wc some-file | awk '{print $1}')
$ echo $x
1
declare -a result
result=( $(wc < file.txt) )
lines=${result[0]}
words=${result[1]}
characters=${result[2]}
echo "Lines: $lines, Words: $words, Characters: $characters"

Resources