Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files?

Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files? - bash

As of now how i am writing the script is to count the number of lines for the 2 files.
Then i put it though condition if it is greater than the old.
However, i am not sure how to compare it based on percentage of the old files.
I there a better way to design the script.
#!/bin/bash
declare -i new=$(< "$(ls -t file name*.csv | head -n 1)" wc -l)
declare -i old=$(< "$(ls -t file name*.csv | head -n 2)" wc -l)
echo $new
echo $old
if [ $new -gt $old ];
then
echo "okay";
else
echo "fail";

If you need to check for x% max diff line, you can count the number of '<' lines in the diff output. Recall the the diff output will look like.
+ diff node001.html node002.html
2,3c2,3
< 4
< 7
---
> 2
> 3
So that code will look like:
old=$(wc -l < file1)
diff1=$(diff file1 file2 | grep -c '^<')
pct=$((diff1*100/(old-1)))
# Check Percent
if [ "$pct" -gt 60 ] ; then
...
fi

Related

How can I compare grep line numbers in a conditional statement in bash?

I am building a git pre-commit hook to reject commits that contain string 'X'. I have a working version below.
#!/bin/sh
RED='\033[0;31m'
NC='\033[0m'
for FILE in `git diff --cached --name-only --diff-filter=ACM`; do
if [ "grep 'X' $FILE" ]
then
echo -e "${RED}[REJECTED]${NC}"
exit 1
fi
done
exit
What I would like to do is change the condition to look for string 'X', and if found, look for string 'Y' exactly 3 lines later. For example, X is on line 7 and Y is on line 10. I only want to reject commits with files containing strings 'X' and 'Y' separated by 3 lines. I have tried some funky things like:
if [ "grep -n 'X' $FILE" ] + 3 -eq [ "grep -n 'Y' $FILE" ]
How can I create the conditional I need? How best to generalise this?

I can pull it off, but there might be simpler ways:
for FILE in `git diff --cached --name-only --diff-filter=ACM`; do
nl $FILE | grep 'X' - | while read line blah; do
lines=$( head -n $(( $line + 3 )) $FILE | tail -n 1 | grep 'Y' | wc -l )
if [ $lines -gt 0 ]; then
echo you got it baby, on line $line for file $FILE
fi
done
done
Or something along those lines.

You can use grep -A to extract the three lines following X, and then check the last line for Y:
#!/bin/bash
file=$1
chunk=$(grep -A3 X "$file")
size=$(wc -l <<< "$chunk")
lastline=${chunk##*$'\n'}
if ((size == 4)) && grep -q Y <<< "$lastline" ; then
echo Found both X and Y
fi
The size check is needed for the case when the Y follows X at the end of the file with less than two lines in between.

use sed to check for a pattern occurring n lines after your first pattern
sed -n '/<pattern1>/{n;n;n; /<patter2>/p}' $FILE

Arithmetic operation fails in Shell script

Basically I'm trying to check if there are any 200 http responses in the log, in last 3 line. but I'm getting the below error. Because of this the head command is failing..Please help
LINES=`cat http_access.log |wc -l`
for i in $LINES $LINES-1 $LINES-2
do
echo "VALUE $i"
head -$i http_access.log | tail -1 > holy.txt
temp=`cat holy.txt| awk '{print $9}'`
if [[ $temp == 200 ]]
then
echo "line $i has 200 code at "
cat holy.txt | awk '{print $4}'
fi
done
Output:
VALUE 18
line 18 has 200 code at [21/Jan/2018:15:34:23
VALUE 18-1
head: invalid trailing option -- - Try `head --help' for more information.

Use $((...)) to perform arithmetic.
for i in $((LINES)) $((LINES-1)) $((LINES-2))
Without it, it's attempting to run the commands:
head -18 http_access.log
head -18-1 http_access.log
head -18-2 http_access.log
The latter two are errors.
A more flexible way to write the for loop would be using C-style syntax:
for ((i = LINES - 2; i <= LINES; ++i)); do
...
done

You got the why from JohnKugelman's answer, I will just propose a simplified code that might work for you:
while read -ra fields; do
[[ ${fields[9]} = 200 ]] && echo "Line ${fields[0]} has 200 code: ${fields[4]}"
done < <(cat -n http_access.log | tail -n 3 | tac)
cat -n: Numbers lines of the file
tail -n 3: Prints 3 last lines. You can just change this number for more lines
tac: Prints the lines outputted by tail in reversed order
read -ra fields: Reads the fields into an array $fields
${fields[0]}: The line number
${fields[num_of_field]}: Individual fields
You can also use wc instead of numbering using cat -n. For larger inputs, this will be slightly faster:
lines=$(wc -l < http_access.log)
while read -ra fields; do
[[ ${fields[8]} = 200 ]] && echo "Line $lines has 200 code: ${fields[3]}"
((lines--))
done < <(tail -n 3 http_access.log | tac)

Output a file in two columns in BASH

I'd like to rearrange a file in two columns after the nth line.
For example, say I have a file like this here:
This is a bunch
of text
that I'd like to print
as two
columns starting
at line number 7
and separated by four spaces.
Here are some
more lines so I can
demonstrate
what I'm talking about.
And I'd like to print it out like this:
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
How could I do that with a bash command or function?

Actually, pr can do almost exactly this:
pr --output-tabs=' 1' -2 -t tmp1
↓
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
-2 for two columns; -t to omit page headers; and without the --output-tabs=' 1', it'll insert a tab for every 8 spaces it added. You can also set the page width and length (if your actual files are much longer than 100 lines); check out man pr for some options.
If you're fixed upon “four spaces more than the longest line on the left,” then perhaps you might have to use something a bit more complex;
The following works with your test input, but is getting to the point where the correct answer would be, “just use Perl, already;”
#!/bin/sh
infile=${1:-tmp1}
longest=$(longest=0;
head -n $(( $( wc -l $infile | cut -d ' ' -f 1 ) / 2 )) $infile | \
while read line
do
current="$( echo $line | wc -c | cut -d ' ' -f 1 )"
if [ $current -gt $longest ]
then
echo $current
longest=$current
fi
done | tail -n 1 )
pr -t -2 -w$(( $longest * 2 + 6 )) --output-tabs=' 1' $infile
↓
This is a bunch and separated by four spa
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
… re-reading your question, I wonder if you meant that you were going to literally specify the nth line to the program, in which case, neither of the above will work unless that line happens to be halfway down.

Thank you chatraed and BRPocock (and your colleague). Your answers helped me think up this solution, which answers my need.
function make_cols
{
file=$1 # input file
line=$2 # line to break at
pad=$(($3-1)) # spaces between cols - 1
len=$( wc -l < $file )
max=$(( $( wc -L < <(head -$(( line - 1 )) $file ) ) + $pad ))
SAVEIFS=$IFS;IFS=$(echo -en "\n\b")
paste -d" " <( for l in $( cat <(head -$(( line - 1 )) $file ) )
do
printf "%-""$max""s\n" $l
done ) \
<(tail -$(( len - line + 1 )) $file )
IFS=$SAVEIFS
}
make_cols tmp1 7 4

Could be optimized in many ways, but does its job as requested.
Input data (configurable):
file
num of rows borrowed from file for the first column
num of spaces between columns
format.sh:
#!/bin/bash
file=$1
if [[ ! -f $file ]]; then
echo "File not found!"
exit 1
fi
spaces_col1_col2=4
rows_col1=6
rows_col2=$(($(cat $file | wc -l) - $rows_col1))
IFS=$'\n'
ar1=($(head -$rows_col1 $file))
ar2=($(tail -$rows_col2 $file))
maxlen_col1=0
for i in "${ar1[#]}"; do
if [[ $maxlen_col1 -lt ${#i} ]]; then
maxlen_col1=${#i}
fi
done
maxlen_col1=$(($maxlen_col1+$spaces_col1_col2))
if [[ $rows_col1 -lt $rows_col2 ]]; then
rows=$rows_col2
else
rows=$rows_col1
fi
ar=()
for i in $(seq 0 $(($rows-1))); do
line=$(printf "%-${maxlen_col1}s\n" ${ar1[$i]})
line="$line${ar2[$i]}"
ar+=("$line")
done
printf '%s\n' "${ar[#]}"
Output:
$ > bash format.sh myfile
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
$ >

ksh: shell script to search for a string in all files present in a directory at a regular interval

I have a directory (output) in unix (SUN). There are two types of files created with timestamp prefix to the file name. These file are created on a regular interval of 10 minutes.
e. g:
1. 20140129_170343_fail.csv (some lines are there)
2. 20140129_170343_success.csv (some lines are there)
Now I have to search for a particular string in all the files present in the output directory and if the string is found in fail and success files, I have to count the number of lines present in those files and save the output to the cnt_succ and cnt_fail variables. If the string is not found I will search again in the same directory after a sleep timer of 20 seconds.
here is my code
#!/usr/bin/ksh
for i in 1 2
do
grep -l 0140127_123933_part_hg_log_status.csv /osp/local/var/log/tool2/final_logs/* >log_t.txt; ### log_t.txt will contain all the matching file list
while read line ### reading the log_t.txt
do
echo "$line has following count"
CNT=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT=`expr $CNT - 1`
echo $CNT
done <log_t.txt
if [ $CNT > 0 ]
then
exit
fi
echo "waiitng"
sleep 20
done
The problem I'm facing is, I'm not able to get the _success and _fail in file in line and and check their count

I'm not sure about ksh, but while ... do; ... done is notorious for running off with whatever variables you're using in bash. ksh might be similar.
If I've understand your question right, SunOS has grep, uniq and sort AFAIK, so a possible alternative might be...
First of all:
$ cat fail.txt
W34523TERG
ADFLKJ
W34523TERG
WER
ASDTQ34T
DBVSER6
W34523TERG
ASDTQ34T
DBVSER6
$ cat success.txt
abcde
defgh
234523452
vxczvzxc
jkl
vxczvzxc
asdf
234523452
vxczvzxc
dlkjhgl
jkl
wer
234523452
vxczvzxc
And now:
egrep "W34523TERG|ASDTQ34T" fail.txt | sort | uniq -c
2 ASDTQ34T
3 W34523TERG
egrep "234523452|vxczvzxc|jkl" success.txt | sort | uniq -c
3 234523452
2 jkl
4 vxczvzxc
Depending on the input data, you may want to see what options sort has on your system. Examining uniq's options may prove useful too (it can do more than just count duplicates).

Think you want something like this (will work in both bash and ksh)
#!/bin/ksh
while read -r file; do
lines=$(wc -l < "$file")
((sum+=$lines))
done < <(grep -Rl --include="[1|2]*_fail.csv" "somestring")
echo "$sum"
Note this will match files starting with 1 or 2 and ending in _fail.csv, not exactly clear if that's what you want or not.
e.g. Let's say I have two files, one starting with 1 (containing 4 lines) and one starting with 2 (containing 3 lines), both ending in `_fail.csv somewhere under my current working directory
> abovescript
7
Important to understand grep options here
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
and
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)

Finaly I'm able to find the solution. Here is the complete code:
#!/usr/bin/ksh
file_name="0140127_123933.csv"
for i in 1 2
do
grep -l $file_name /osp/local/var/log/tool2/final_logs/* >log_t.txt;
while read line
do
if [ $(echo "$line" |awk '/success/') ] ## will check the success file
then
CNT_SUCC=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT_SUCC=`expr $CNT_SUCC - 1`
fi
if [ $(echo "$line" |awk '/fail/') ] ## will check the fail file
then
CNT_FAIL=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT_FAIL=`expr $CNT_FAIL - 1`
fi
done <log_t.txt
if [ $CNT_SUCC > 0 ] && [ $CNT_FAIL > 0 ]
then
echo " Fail count = $CNT_FAIL"
echo " Success count = $CNT_SUCC"
exit
fi
echo "waitng for next search..."
sleep 10
done
Thanks everyone for your help.

I don't think I'm getting it right, but You can't diffrinciate the files?
maybe try:
#...
CNT=`expr $CNT - 1`
if [ $(echo $line | grep -o "fail") ]
then
#do something with fail count
else
#do something with success count
fi

bash gnu parallel help

its about
http://en.wikipedia.org/wiki/Parallel_(software)
and very rich manpage http://www.gnu.org/software/parallel/man.html
(for x in `cat list` ; do
do_something $x
done) | process_output
is replaced by this
cat list | parallel do_something | process_output
i am trying to implement that on this
while [ "$n" -gt 0 ]
do
percentage=${"scale=2;(100-(($n / $end) * 100))"|bc -l}}
#get url from line specified by n from file done1
nextUrls=`sed -n "${n}p" < done1`
echo -ne "${percentage}% $n / $end urls saved going to line 1. current: $nextUrls\r"
# function that gets links from the url
getlinks $nextUrls
#save n
echo $n > currentLine
let "n--"
let "end=`cat done1 |wc -l`"
done
while reading documentation for gnu parallel
i found out that functions are not supported so getlinks wont be used in parallel
best i have found so far is
seq 30 | parallel -n 4 --colsep ' ' echo {1} {2} {3} {4}
makes output
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21 22 23 24
25 26 27 28
29 30
while loop mentioned above should go like this if I am right
end=`cat done1 |wc -l`
seq $end -1 1 | parallel -j+4 -k
#(all exept getlinks function goes here, but idk how? )|
# everytime it finishes do
getlinks $nextUrls
thx for help in advance

It seems what you want is a progress meter. Try:
cat done1 | parallel --eta wget
If that is not what you want, look at sem (sem is an alias for parallel --semaphore and is normally installed with GNU Parallel):
for i in `ls *.log` ; do
echo $i
sem -j+0 gzip $i ";" echo done
done
sem --wait
In your case it will be something like:
while [ "$n" -gt 0 ]
do
percentage=${"scale=2;(100-(($n / $end) * 100))"|bc -l}}
#get url from line specified by n from file done1
nextUrls=`sed -n "${n}p" < done1`
echo -ne "${percentage}% $n / $end urls saved going to line 1. current: $nextUrls\r"
# function that gets links from the url
THE_URL=`getlinks $nextUrls`
sem -j10 wget $THE_URL
#save n
echo $n > currentLine
let "n--"
let "end=`cat done1 |wc -l`"
done
sem --wait
echo All done

Why does getlinks need to be a function? Take the function and transform it into a shell script (should be essentially identical except you need to export environmental variables in and you of course cannot affect the outside environment without lots of work).
Of course, you cannot save $n into currentline when you are trying to execute in parallel. All files will be overwriting each other at the same time.

i was thinking of makeing something more like this, if not parallel or sam something else because parallel does not supprot funcitons aka http://www.gnu.org/software/parallel/man.html#aliases_and_functions_do_not_work
getlinks(){
if [ -n "$1" ]
then
lynx -image_links -dump "$1" > src
grep -i ".jpg" < src > links1
grep -i "http" < links1 >links
sed -e 's/.*\(http\)/http/g' < links >> done1
sort -f done1 > done2
uniq done2 > done1
rm -rf links1 links src done2
fi
}
func(){
percentage=${"scale=2;(100-(($1 / $end) * 100))"|bc -l}}
#get url from line specified by n from file done1
nextUrls=`sed -n "${$1}p" < done1`
echo -ne "${percentage}% $n / $end urls saved going to line 1. current: $nextUrls\r"
# function that gets links from the url
getlinks $nextUrls
#save n
echo $1 > currentLine
let "$1--"
let "end=`cat done1 |wc -l`"
}
while [ "$n" -gt 0 ]
do
sem -j10 func $n
done
sem --wait
echo All done
My script has become really complex, and i do not want to make a feature unavailable with something i am not sure it can be done
this way i can get links with full internet traffic been used, should take less time that way

tryed sem
#!/bin/bash
func (){
echo 1
echo 2
}
for i in `seq 10`
do
sem -j10 func
done
sem --wait
echo All done
you get
errors
Can't exec "func": No such file or directory at /usr/share/perl/5.10/IPC/Open3.p
m line 168.
open3: exec of func failed at /usr/local/bin/sem line 3168

It is not quite clear what the end goal of your script is. If you are trying to write a parallel web crawler, you might be able to use the below as a template.
#!/bin/bash
# E.g. http://gatt.org.yeslab.org/
URL=$1
# Stay inside the start dir
BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
URLLIST=$(mktemp urllist.XXXX)
URLLIST2=$(mktemp urllist.XXXX)
SEEN=$(mktemp seen.XXXX)
# Spider to get the URLs
echo $URL >$URLLIST
cp $URLLIST $SEEN
while [ -s $URLLIST ] ; do
cat $URLLIST |
parallel lynx -listonly -image_links -dump {} \; wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and do { $seen{$1}++ or print }' |
grep -F $BASEURL |
grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
mv $URLLIST2 $URLLIST
done
rm -f $URLLIST $URLLIST2 $SEEN

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files? - bash

Related

How can I compare grep line numbers in a conditional statement in bash?

Arithmetic operation fails in Shell script

Output a file in two columns in BASH

ksh: shell script to search for a string in all files present in a directory at a regular interval

bash gnu parallel help

Categories

Resources