Grep not working for one file several times - bash

I need to search for open and close html tags and print how many have been found. But seem to be second use one file is not working. Second block shows me every time 0 tags. If i move second block above first then it show me right number of tags, but the block that is now on second place does show 0 tags.
./s.sh <my.html
TAG=$(grep -oP "<([^>\/]+)>" $1 | wc -l)
echo "<TAG> -" $TAG
CTAG=$(grep -oP "</([^>\/]+)>" $1 | wc -l)
echo "</TAG> -" $CTAG
I'm getting this output:
<TAG> - 13
</TAG> - 0
But should get something like this:
<TAG> - 13
</TAG> - 11
Input example:
<HTML>
<P>Список сотрудников
<TABLE BORDER=0>
<TR><TH>ФИО</TH><TH>Дата</TH></TR>
<TR><TD>Иванов И.И.</TD><TD>10.12.2019</TD></TR>
<TR><TD>Сидоров А.В.</TD><TD>11.11.1977</TD></TR>
</TABLE>
<P>Всего: 2 чел.
</HTML>

No need to escape the slash in the pattern, and you can omit the capturing group and the -P option:
$ TAG=$(grep -o "<[^>/]*>" "$1" | wc -l)
$ echo "<TAG>: " $TAG
<TAG>: 13
$ CTAG=$(grep -o "</[^/>]*>" "$1" | wc -l)
$ echo "</TAG>: " $CTAG
</TAG>: 11

Related

Get directory name with grep and remove it

please is there any simple way how can I get NAME output only from lines, where DATE < 5 days ago and then call other command called rm on these lines with NAME as argument?
I have the following output from mega-ls path/ -l (mega.nz) command:
FLAGS VERS SIZE DATE NAME
d--- - - 06Feb2020 05:00:01 bk_20200206050000
d--- - - 07Feb2020 05:00:01 bk_20200207050000
d--- - - 08Feb2020 05:00:01 bk_20200208050000
d--- - - 09Feb2020 05:00:01 bk_20200209050000
d--- - - 10Feb2020 05:00:01 bk_20200210050000
d--- - - 11Feb2020 05:00:01 bk_20200211050000
I tried grep, sort and other ways e.g. mega-ls path/ -l | head -n 5 but I don't know how to search these lines based on the date.
Thank you a lot.
I try find simple way for you request ;)
mega-ls path/ -l | head -n 5 | tr -s ' ' | cut -d ' ' -f6 | grep -v -e '^$' | grep '^bk_20200206.*' | xargs rm -f
Part 1 : This is you command (returned folders list by extra data)
mega-ls path/ -l | head -n 5
Part 2 : Try to remove extra space in your part 1 result
tr -s ' '
Part 3 : Try to use cut command to delimit result part 2 and return Name Folders column
cut -d ' ' -f6
Part 4 : Try to remove Empty lines from result part 3 (result of header line)
grep -v -e '^$'
Part 5 : This your request for search folders name by date yyyymmdd format example : 20200206 (replace 20200206 to your real date need)
grep '^bk_20200206.*'
Part 6 : (Very Important!!) If you need to delete result folders use this part (Very Important!!)
xargs rm -f
Best Regards

Loop Script from Input File

I have a reference file with device names in them. For example WABEL8499IPM101. I'm using this script to set the base name (without the last 3 digits) to look at the reference file and see what is already used. If 101 is used it will create a file for me with 102, 103 if I request 2 total. I'm looking to use an input file to run it multiple times. I'm also trying to figure out how to start at 101 if there isn't a name found when searching the reference file
I would like to loop this using an input file instead of manually entering bash test.sh WABEL8499IPM 2 each time. I would like to be able to build an input file of all the names that need compared and then output. It would also be nice that if there isn't a match that it starts creating names at WABEL8499IPM101 instead of just WABEL8499IPM1.
Input file example:
ColumnA (BASE NAME) ColumnB (QUANTITY)
WABEL8499IPM 2
Script:
SRCFILE="~/Desktop/deviceinfo.csv"
LOGDIR="~/Desktop/"
LOGFILE="$LOGDIR/DeviceNames.csv"
# base name, such as "WABEL8499IPM"
device_name=$1
# quantity, such as "2"
quantityNum=$2
# the largest in sequence, such as "WABEL8499IPM108"
max_sequence_name=$(cat $SRCFILE | grep -o -e "$device_name[0-9]*" | sort --reverse | head -n 1)
# extract the last 3digit number (such as "108") from max_sequence_name
max_sequence_num=$(echo $max_sequence_name | rev | cut -c 1-3 | rev)
# create new sequence_name
# such as ["WABEL8499IPM109", "WABEL8499IPM110"]
array_new_sequence_name=()
for i in $(seq 1 $quantityNum);
do
cnum=$((max_sequence_num + i))
array_new_sequence_name+=($(echo $device_name$cnum))
done
#CODE FOR CREATING OUTPUT FILE HERE
#for fn in ${array_new_sequence_name[#]}; do touch $fn; done;
# write log
for sqn in ${array_new_sequence_name[#]};
do
echo $sqn >> $LOGFILE
done
Usage:
bash test.sh WABEL8499IPM 2
Result in the log file:
WABEL8499IPM109
WABEL8499IPM110
Just wrap a loop around your code instead of assuming the args come in on the command line.
SRCFILE="~/Desktop/deviceinfo.csv"
LOGDIR="~/Desktop/"
LOGFILE="$LOGDIR/DeviceNames.csv"
while read device_name quantityNum
do max_sequence_name=$( grep -o -e "$device_name[0-9]*" $SRCFILE |
sort --reverse | head -n 1)
max_sequence_num=${max_sequence_name: -3}
array_new_sequence_name=()
for i in $(seq 1 $quantityNum)
do cnum=$((max_sequence_num + i))
array_new_sequence_name+=("$device_name$cnum")
done
for sqn in ${array_new_sequence_name[#]};
do echo $sqn >> $LOGFILE
done
done < input.file
I'd maybe pass the input file as the parameter now.

How to properly use the grep command to grab and store integers?

I am currently building a bash script for class, and I am trying to use the grep command to grab the values from a simple calculator program and store them in the variables I assign, but I keep receiving a syntax error message when I try to run the script. Any advice on how to fix it? my script looks like this:
#!/bin/bash
addanwser=$(grep -o "num1 + num2" Lab9 -a 5 2)
echo "addanwser"
subanwser=$(grep -o "num1 - num2" Lab9 -s 10 15)
echo "subanwser"
multianwser=$(grep -o "num1 * num2" Lab9 -m 3 10)
echo "multianwser"
divanwser=$(grep -o "num1 / num2" Lab9 -d 100 4)
echo "divanwser"
modanwser=$(grep -o "num1 % num2" Lab9 -r 300 7)
echo "modawser"`
You want to grep the output of a command.
grep searches from either a file or standard input. So you can say either of these equivalent:
grep X file # 1. from a file
... things ... | grep X # 2. from stdin
grep X <<< "content" # 3. using here-strings
For this case, you want to use the last one, so that you execute the program and its output feeds grep directly:
grep <something> <<< "$(Lab9 -s 10 15)"
Which is the same as saying:
Lab9 -s 10 15 | grep <something>
So that grep will act on the output of your program. Since I don't know how Lab9 works, let's use a simple example with seq, that returns numbers from 5 to 15:
$ grep 5 <<< "$(seq 5 15)"
5
15
grep is usually used for finding matching lines of a text file. To actually grab a part of the matched line other tools such as awk are used.
Assuming the output looks like "num1 + num2 = 54" (i.e. fields are separated by space), this should do your job:
addanwser=$(Lab9 -a 5 2 | awk '{print $NF}')
echo "$addanwser"
Make sure you don't miss the '$' sign before addanwser when echo'ing it.
$NF selects the last field. You may select nth field using $n.

Get the first real number from a series of files

I try to take the first number from each file.dat of the form:
5.01 1 56.413481000 -0.00063400 0.00095770
5.01 2 61.193808800 0.00102170 0.00078280
5.01 3 65.974136600 -0.00108170 0.00102620
5.01 4 70.754464300 0.00082490 0.00103630
and then use this number (5.01) as the title of a .png file.
I use a bash script and I know the command line=$(head -n 1 $f) as found in a question here, but this take to me the first line of the file $f.
In this case also the space in the line is saved and the .png file title became:
plot 5.01 1 56.413481000 -0.00063400 0.00095770.png
There is some way to take only 5.01 and have a trim title for the plot?
Thanks to all.
I'd probably just do it with perl:
VAL=$( echo "$line" | perl -pe 's/^[^\d]+//g;s/[^\d\.].*$//' )
Something like that anyway.
Should remove:
anything that isn't a digit from the start of line.
Anything not-digit or not . to the end of line.
Or with grep:
grep -o "[0-9]*\.[0-9]*" file.dat | head -1
Edit:
Testing without the head -1 for a oneline input:
echo " 5.01 2 61.193808800 0.00102170 0.00078280" | grep -o "[0-9]*\.[0-9]*"
5.01
61.193808800
0.00102170
0.00078280
Using head -1 will return the first match on the first line.
When you know the match will be on the first line, so can we ignore files with an incorrect first line (and don't grep through complete files):
Make a two-headed monster:
head -1 | grep -o "[0-9]*\.[0-9]*" file.dat | head -1
To extract the first field, assuming they are tab separated:
val=$(head -n 1 $f | cut -f 1)
or, if they are space separated instead:
val=$(head -n 1 $f | cut -f 1 -d ' ')
OR you can avoid calling any extra processes and keep all data manipulation in the bash shell with
while read realNum restOfLine ;
break
done < $f
echo $realNum
This grabs the first "word" and puts the remaining into "restOfLine".
The break ensures that you only read the first line of the file.
IHTH

Get variable value by finding keyword in unix environment

In UNIX environment, I have a file.txt that contains following details:
Data recording started:
0001100 Matched at 412090
0001101 Mismatched at 414798
0001102 Matched at 420007
0001103 Mismatched at 420015
Job completed
How do I can get the first Matched value by searching "Matched" (line 2) word and also for the first "Mismatched" (line 3)
Find the difference between them and store as a variable, "dif"
The result is Matched minus Mismatched, so it cannot find the data by specify line number, i.e. find line 3 last integers minus line 2 last integers, because the mismatched may come at first like following:
Data recording started:
0001100 Mismatched at 412090
0001101 Matched at 414798
0001102 Mismatched at 420007
0001103 Matched at 420015
Job completed
One way:
echo $((
$(grep Matched input | head -1 | sed 's/.*at //')
- $(grep Mismatched input | head -1 | sed 's/.*at //')
))
or using only sed:
echo $((
$(sed -n 's/.*Matched.*at //p' input | head -1)
- $(sed -n 's/.*Mismatched.*at //p' input | head -1)
))
Output
-2708
We can use grep -m 1 to kick away head.
dif=$((
$(grep -m 1 'Matched' a.txt | sed 's/.*at \([0-9]*\).*/\1/')
- $(grep -m 1 'Mismatched' a.txt | sed 's/.*at \([0-9]*\).*/\1/')
))
echo $dif

Resources