Grep search on ONE LINE only? - bash

I am using Bash to find the dimensions of a matrix. Here is my code to get the number of elements in one row, however it prints out for the whole file. I just need the number of elements in ONE ROW.
grep -oP "\^I" $1 | wc -l
Here is what the $1 is referring to:
1^I2^I3^I4$
5^I6^I7^I8$
For some reason, it is printing out 9 instead of 3.
Thanks in advance!

Use:
cat $1 | head -n 1 | sed 's/\^I/\n/g' | wc -l
I take the only the first row using head, replace every column delimiter with a newline using sed, then pipe that to wc.

You can use sed before calling grep to isolate one specific line of your file:
sed -n '1p' file | grep -oP "^I" | wc -l
^
^
# will print the 1st line, 2p will print the second line etc
on your input it gives:

using awk
$ awk -F'\\^I' 'NR==1{print NF-1}' $1
3
-F'\\^I' use ^I as field separator
NR==1 first line only
print NF-1 since the question is about counting number of ^I, need to print number of fields minus one
also, if $1 is argument being passed to shell script, use "$1" as good practice
and a guess, this is actual data OP is working with
$ cat ip.txt
1 2 3 4
5 6 7 8
$ cat -A ip.txt
1^I2^I3^I4$
5^I6^I7^I8$
$ # exit to avoid unnecessary processing of other lines
$ awk -F'\t' 'NR==1{print NF-1; exit}' ip.txt
3

sed 's:\^I:\n:g; q' | wc -l
^ ^
|_______|_______ change all ^I to \n
|_______ quit after first line

Related

Count of matching word, pattern or value from unix korn shell scripting is returning just 1 as count

I'm trying to get the count of a matching pattern from a variable to check the count of it, but it's only returning 1 as the results, here is what I'm trying to do:
x="HELLO|THIS|IS|TEST"
echo $x | grep -c "|"
Expected result: 3
Actual Result: 1
Do you know why is returning 1 instead of 3?
Thanks.
grep -c counts lines not matches within a line.
You can use awk to get a count:
x="HELLO|THIS|IS|TEST"
echo "$x" | awk -F '|' '{print NF-1}'
3
Alternatively you can use tr and wc:
echo "$x" | tr -dc '|' | wc -c
3
$ echo "$x" | grep -o '|' | grep -c .
3
grep -c does not count the number of matches. It counts the number of lines that match. By using grep -o, we put the matches on separate lines.
This approach works just as well with multiple lines:
$ cat file
hello|this|is
a|test
$ grep -o '|' file | grep -c .
3
The grep manual says:
grep, egrep, fgrep - print lines matching a pattern
and for the -c flag:
instead print a count of matching lines for each input file
and there is just one line that match
You don't need grep for this.
pipe_only=${x//[^|]} # remove everything except | from the value of x
echo "${#pipe_only}" # output the length of pipe_only
Try this :
$ x="HELLO|THIS|IS|TEST"; echo -n "$x" | sed 's/[^|]//g' | wc -c
3
With only one pipe with perl:
echo "$x" |
perl -lne 'print scalar(() = /\|/g)'

sed: interpolating variables in timestamp format

I would like to use sed to extract all the lines between two specific strings from a file.
I need to do this on a script and my two strings are variables.
The strings will be in a sort of time stamp format, which means they can be something like:
2014/01/01 or 2014/01/01 08:01
I was trying with something like:
sed -n '/$1/,/$2/p' $file
or even
sed -n '/"$1"/,/"$2"/p' $file
with no luck, tried also to replace / as delimiter with ;.
I'm pretty sure the problem is due to the / and blank in input variables, but I can't figure out the proper syntax.
The syntax to use alternate regex delimiters is:
\ c regexp c
Match lines matching the regular expression regexp. The c may be any character.
https://www.gnu.org/software/sed/manual/sed.html#Addresses
So, pick one of
sed -n '\#'"$1"'#,\#'"$2"'#p' "$file"
sed -n "\\#$1#,\\#$2#p" "$file"
sed -n "$( printf '\#%s#,\#%s#p' "$1" "$2" )" "$file"
or awk
awk -v start="$1" -v end="$1" '$0 ~ start {p=1}; p; $0 ~ end {p=0}' "$file"
From the first $1 to the last $2:
sed -n "\\#$1#,\$p" "$file" | tac | sed -n "\\#$2#,\$p" | tac
This prints from the first $1 to the end, reverses the lines, prints from the first $2 to the new end, and reverses the lines again.
An example: from the first "5" to the last "7"
$ set -- 5 7
$ seq 20 | sed -n "\\#$1#,\$p" | tac | sed -n "\\#$2#,\$p" | tac
5
6
7
8
9
10
11
12
13
14
15
16
17
Try using double quotes instead of single ones.
sed -n "/$1/,/$2/p" $file

using cut command in bash [duplicate]

This question already has answers here:
Get just the integer from wc in bash
(19 answers)
Closed 8 years ago.
I want to get only the number of lines in a file:
so I do:
$wc -l countlines.py
9 countlines.py
I do not want the filename, so I tried
$wc -l countlines.py | cut -d ' ' -f1
but this just echo empty line.
I just want number 9 to be printed
Use stdin and you won't have issue with wc printing filename
wc -l < countlines.py
You can also use awk to count lines. (reference)
awk 'END { print NR }' countlines.py
where countlines.py is the file you want to count
If your file doesn't ends with a \n (new line) the wc -l gives a wrong result. Try it with the next simulated example:
echo "line1" > testfile #correct line with a \n at the end
echo -n "line2" >> testfile #added another line - but without the \n
the
$ wc -l < testfile
1
returns 1. (The wc counts the number of newlines (\n) in a file.)
Therefore, for counting lines (and not the \n characters) in a file, you should to use
grep -c '' testfile
e.g. find empty character in a file (this is true for every line) and count the occurences -c. For the above testfile it returns the correct 2.
Additionally, if you want count the non-empty lines, you can do it with
grep -c '.' file
Don't trust wc :)
Ps: one of the strangest use of wc is
grep 'pattern' file | wc -l
instead of
grep -c 'pattern' file
cut is being confused by the leading whitespace.
I'd use awk to print the 1st field here:
% wc -l countlines.py | awk '{ print $1 }'
As an alternative, wc won't print the file name if it is being piped input from stdin
$ cat countlines.py | wc -l
9
yet another way :
cnt=$(wc -l < countlines.py )
echo "total is $cnt "
Piping the file name into wc removes it from the output, then translate away the whitespace:
wc -l <countlines.py |tr -d ' '
Use awk like this:
wc -l countlines.py | awk {'print $1'}

Extracting minimum and maximum from line number grep

Currently, I have a command in a bash script that greps for a given string in a text file and prints the line numbers only using sed ...
grep -n "<string>" file.txt | sed -n 's/^\([0-9]*\).*/\1/p'
The grep could find multiple matches, and thus, print multiple line numbers. From this command's output, I would like to extract the minimum and maximum values, and assign those to respective bash variables. How could I best modify my existing command or add new commands to accomplish this? If using awk or sed will be necessary, I have a preference of using sed. Thanks!
You can get the minimum and maximum with this:
grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'
You can also read them into an array:
F=($(grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'))
echo ${F[0]} # min
echo ${F[1]} # max
grep -n "<string>" file.txt | sed -n -e '1s/^\([0-9]*\).*/\1/p' -e '$s/^\([0-9]*\).*/\1/p'
grep .... |awk -F: '!f{print $1;f=1} END{print $1}'
Here's how I'd do it, since grep -n 'pattern' file prints output in the format line number:line contents ...
minval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | head -1)
maxval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | tail -1)
the cut -d':' -f1 command splits the grep output around the colon and pulls out only the first field (the line numbers), sort -n sorts the numeric line numbers in ascending order (which they would already be in, but it's good practice to ensure it), then head -1 and tail -1 remove the first, and last value in the sorted list respectively, i.e. the minimum and maximum values and assign them to variables $minval and $maxval respectively.
Hope this helps!
Edit: Turns out you can't do it the way I had it originally, since echoing out a list of newline-separated values apparently concatenates them into one line.
It can be done with one process. Like this:
awk '/expression/{if(!n)print NR;n=NR} END {print n}' file.txt
Then You can assign to an array (as perreal suggested). Or You can modify this script and assign to varables using eval
eval $(awk '/expression/{if(!n)print "A="NR;n=NR} END {print "B="n}' file.txt)
echo $A
echo $B
Output (file.txt contains three lines of expression)
1
3

Delete first line of file if it's empty

How can I delete the first (!) line of a text file if it's empty, using e.g. sed or other standard UNIX tools. I tried this command:
sed '/^$/d' < somefile
But this will delete the first empty line, not the first line of the file, if it's empty. Can I give sed some condition, concerning the line number?
With Levon's answer I built this small script based on awk:
#!/bin/bash
for FILE in $(find some_directory -name "*.csv")
do
echo Processing ${FILE}
awk '{if (NR==1 && NF==0) next};1' < ${FILE} > ${FILE}.killfirstline
mv ${FILE}.killfirstline ${FILE}
done
The simplest thing in sed is:
sed '1{/^$/d}'
Note that this does not delete a line that contains all blanks, but only a line that contains nothing but a single newline. To get rid of blanks:
sed '1{/^ *$/d}'
and to eliminate all whitespace:
sed '1{/^[[:space:]]*$/d}'
Note that some versions of sed require a terminator inside the block, so you might need to add a semi-colon. eg sed '1{/^$/d;}'
Using sed, try this:
sed -e '2,$b' -e '/^$/d' < somefile
or to make the change in place:
sed -i~ -e '2,$b' -e '/^$/d' somefile
If you don't have to do this in-place, you can use awk and redirect the output into a different file.
awk '{if (NR==1 && NF==0) next};1' somefile
This will print the contents of the file except if it's the first line (NR == 1) and it doesn't contain any data (NF == 0).
NR the current line number,NF the number of fields on a given line separated by blanks/tabs
E.g.,
$ cat -n data.txt
1
2 this is some text
3 and here
4 too
5
6 blank above
7 the end
$ awk '{if (NR==1 && NF==0) next};1' data.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
and
cat -n data2.txt
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
$ awk '{if (NR==1 && NF==0) next};1' data2.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
Update:
This sed solution should also work for in-place replacement:
sed -i.bak '1{/^$/d}' somefile
The original file will be saved with a .bak extension
Delete the first line of all files under the actual directory if the first line is empty :
find -type f | xargs sed -i -e '2,$b' -e '/^$/d'
This might work for you:
sed '1!b;/^$/d' file

Resources