Using wget and text array usage - bash

Quick question. Currently using this to get header expiry info.
wget -O /dev/null -S $1 2>&1 | grep -q -m 1 "Expires: Sun, 19 Nov 1978 05:00:00 GMT" && echo "Yes" || echo "No"
With execution via "./is-drupal www.URL.com"
How could I iterate through an array in a text document that would be like
www.URL1.com
www.URL2.com
www.URL3.com
etc.
Then if the value returned "Yes" it would save the Yes URLs to a new text file.
Your best input is greatly appreciated!

#!/bin/bash
while read -r line; do
wget -O /dev/null -S "$line" 2>&1 | grep -q -m 1 "Expires: Sun, 19 Nov 1978 05:00:00 GMT"
if [ ${PIPESTATUS[1]} -eq 0 ]; then # check greps return code
echo "Yes"
echo "$line" >> yes_urls.txt
else
echo "No"
fi
done < text.txt

Related

Select directories with date in their names from a date range

I'm creating a list of directories with the requested date range in their name.
Directories are all labeled other_2019-07-18T00-00-00. The T is messing me up!
Copied this loop from somewhere.
#!/bin/bash
curdate=$(date +%Y-%m-%d%H-%M-%S)
#
for o in other_*; do
tmp=${o##other_}
tmp=$(echo "$tmp" | sed 's/T//') # clean up prefixes
fdate=$(date -d "${tmp}")
(( curdate < fdate )) && echo "$o"
done
I expect the final echo to include the path of all dir that match.
Unlike AWK, Bash comparison operator < works only on numerical values.
Please try instead:
#!/bin/bash
curdate=$(date +%Y%m%d%H%M%S)
for o in other_*; do
tmp=${o##other_}
fdate=$(echo "$tmp" | sed 's/[-T]//g') # numeralization
(( curdate < fdate )) && echo "$o"
done
As an alternative, you can compare epoch times:
#!/bin/bash
curdate=$(date +%s)
for o in other_*; do
tmp=${o##other_}
tmp=$(echo "$tmp" | sed 's/T/ /' | sed 's/\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\)$/\1:\2:\3/')
fdate=$(date -d "$tmp" +%s)
(( curdate < fdate )) && echo "$o"
done
Hope this helps.
Instead of dropping T...
date -d 2019-03-23T00-06-28
date: invalid date '2019-03-23T00-06-28'
ok, but:
date -d 2019-03-23T00:06:28
Sat Mar 23 00:06:28 UTC 2019
So we have to replace last two dashes by ::
As your question is tagged bash:
file="somepath/other_2019-07-18T00-00-00.extension"
time=${file#*other_} # suppress from left until 'other_'
time=${time%.*} # suppress extension
time=${time//-/:} # replace all dash by a `:`
time=${time/:/-} # replace 1nd `:` by a dash
time=${time/:/-} # replace 1nd `:` by a dash (again)
date -d $time
Thu Jul 18 00:00:00 UTC 2019
This could by written:
printf -v now "%(%s)T" -1 # bashism for current date to variable $now
for file in somepath/other_*.ext ;do
time=${file#*other_} time=${time%.*} time=${time//-/:}
time=${time/:/-} time=${time/:/-}
read fdate < <(date +%s -d $time)
((fdate > now)) && { echo $file: $((fdate - now)) ; }
done
Reducing forks (to date) improve quickness:
for matching your sample, you could replace for file in somepath/other_*.ext ;do by for file in other_*; do. This must work quite same.
fifo=/tmp/fifo-date-$$
mkfifo $fifo
exec 5> >(exec stdbuf -o0 date -f - +%s >$fifo 2>&1)
echo now 1>&5
exec 6< $fifo
read -t 1 -u 6 now
rm $fifo
for file in otherdates/*.ext ; do
time=${file#*other_} time=${time%.*} time=${time//-/:}
time=${time/:/-} time=${time/:/-}
echo $time 1>&5 && read -t 1 -u 6 fdate
((fdate > now)) && {
echo $file: $((fdate - now))
}
done
exec 6>&-
exec 5>&-
In this, I run date +%s in background, with -f argument, date will interpret each incoming line, then answer UNIX_TIME. So $now is first read from date process, by:
echo now >&5 && # now, the string
read -u 6 now # read will populate `$now` variable
Nota, once fifo opened by both input and output, they could be deleted. It will remain for process until process close them.
there is no space between day and hour, which causes date not to be able to read the date. Try:
sed 's/T/ /'

Date comparison with EPOCH to find an expiry in bash script

#!/bin/bash
for ADDR in `netstat -plant|grep LISTEN|grep http|awk '{print $4}'|egrep -v ':80$|:5555$'|sort -u`; do
EXPDATE=`openssl s_time 2>/dev/null | openssl s_client -connect $ADDR 2>/dev/null | openssl x509 -dates 2>/dev/null | grep ^notA | cut -f2 -d= | sed -e "s/ GMT//"`
printf "\t\t\t|%s\t|%s\t|\t%s\t|\n" "$ADDR" "$EXPDATE"
done
EXPDATES="$(echo "$EXPDATE" | awk '{print $1,$2,$4,$3}')"
CURREPOCH="$(date +%s)"
for i in "$EXPDATES"; do
CREXPEPOCH="$(date +%s -d "$i")"
if [[ "$CURREPOCH" -gt "$CREXPEPOCH" ]]; then
echo "No Expiry Found."
else
echo "Cert expired"
fi
done
Here, I'm getting dates from EXPDATE which has multiple date values as shown below,
Jul 12 12:00:00 2019
Jun 18 12:00:00 2019
May 8 00:00:00 2018
Nov 14 00:00:00 2017
And, converting to EPOCH time for better comparison with current EPOCH..
If any past date found, script should return "expired", else "no expiry found"..
I tried above script which is not working..
How I can do that? Any help?
The below tracks contents in an array rather than trying to abuse strings as iterable.
#!/usr/bin/env bash
# return all addresses that smell like HTTP
get_addrs() {
netstat -plant \
| awk '/LISTEN/ && /http/ && ! ($4 ~ /:(80|5555)$/) { print $4; }' \
| sort -u
}
# Given a local server address, return a usable client address
# converts wildcard addresses to loopback ones.
clientAddr() {
local addr=$1
case $addr in
0.0.0.0:*) addr=127.0.0.1:${addr#0.0.0.0:} ;;
:::*) addr='localhost:'"${addr#:::}" ;;
esac
printf '%s\n' "$addr"
}
# Given a local address that runs a HTTPS server, return the last valid date for its certificate
endDateForAddr() {
local addr endDate
addr=$(clientAddr "$1") || return
endDate=$(openssl s_client -connect "${addr}" </dev/null 2>/dev/null \
| openssl x509 -dates \
| awk -F= '/^notAfter/ { print $2; exit }')
[[ $endDate ]] && printf '%s\n' "$endDate"
}
# Loop over our local HTTPS services...
expDates=( )
while read -r addr; do
# find an address we can use to refer to each...
addr=$(clientAddr "$addr") || continue
# ...and use that to find its certificate expirey date.
result=$(endDateForAddr "$addr") || continue
# then add that to our array.
expDates+=( "$result" )
done < <(get_addrs)
# in bash 4.3, this is more efficiently written: printf -v curr_epoch '%(%s)T' -1
curr_epoch="$(date +%s)"
for expdate in "${expDates[#]}"; do
exp_epoch=$(date +%s -d "$expdate")
if (( curr_epoch > exp_epoch )); then
echo "$expdate is in the past"
else
echo "$expdate is in the future"
fi
done
...its output (correct as of this writing):
Jul 12 12:00:00 2019 is in the future
Jun 18 12:00:00 2019 is in the future
May 8 00:00:00 2018 is in the future
Nov 14 00:00:00 2017 is in the future

Grep large number of patterns from a huge log file

I have a shell script which is invoked every hour via cron job and to search through the asterisk logs and provide me the unique ids for a call which ended with cause 31.
while read ref
do
cat sample.log | grep "$ref" | grep 'got hangup request, cause 31' | grep -o 'C-[0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z]' >> cause_temp.log
done < callref.log
The issue is that the while loop is too slow and for accuracy I have included 4 while loops like mentioned above to perform various checks.
callref.log file consists of call identifier values and every hour it will have about 50-90 thousand values and the script take about 45-50 minutes to complete the execution and email me the report.
It would be of great help if I would be able to cut down the execution time of the loops. Since the size of sample.log file is about 20 GB and for each loop the file is opened and search is performed, I figured that the while loop is the bottleneck here.
Have done the research and found some useful links like
Link 1 Link 2
But the solutions suggested I cannot implement or do not know how to. Any suggestion would be helpful. Thanks
Since sample.log consists of sensitive information I would not be able to share any logs, but below are some sample logs which I got from the internet.
Dec 16 18:02:04 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"503"<sip:503#192.168.1.107>' failed for '192.168.1.137' - Wrong password
Dec 16 18:03:13 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"502"<sip:502#192.168.1.107>' failed for '192.168.1.137' - Wrong password
Dec 16 18:04:49 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"1737245082"<sip:1737245082#192.168.1.107>' failed for '192.168.1.137' - Username/auth name mismatch
Dec 16 18:04:49 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"100"<sip:100#192.168.1.107>' failed for '192.168.1.137' - Username/auth name mismatch
Jun 27 18:09:47 host asterisk[31774]: ERROR[27910]: chan_zap.c:10314 setup_zap: Unable to register channel '1-2'
Jun 27 18:09:47 host asterisk[31774]: WARNING[27910]: loader.c:414 __load_resource: chan_zap.so: load_module failed, returning -1
Jun 27 18:09:47 host asterisk[31774]: WARNING[27910]: loader.c:554 load_modules: Loading module chan_zap.so failed!
the file callref.log consists of a list of lines which looks like -
C-001ec22d
C-001ec23d
C-001ec24d
C-001ec31d
C-001ec80d
Also the desired output of the above while loop looks like C-001ec80d
Also my main concern is to make the while loop run faster. Like load all the values of callref.log in an array and search for all the values simultaneously in a single pass of sample.log if possible.
Since you could not produce adequate sample logs to test against even when requested, I whipped up some test material myself:
$ cat callref.log
a
b
$ cat sample.log
a 1
b 2
c 1
Using awk:
$ awk 'NR==FNR { # hash callrefs
a[$1]
next
}
{ # check callrefs from sample records and output when match
for(l in a)
if($0 ~ l && $0 ~ 1) # 1 is the static string you look for along a callref
print l
}' callref.log sample.log
a 1
HTH
I spent a day building a test framework and testing variations of different commands and I think you already have the fastest one.
Which leads me to think that if you are to get better performance you should look into a log digesting framework, like ossec (where your log samples came from) perhaps splunk. Those may be too clumsy for your wishes. Alternatively you should consider designing and building something in java/C/perl/awk better suited to parsing.
Running your existing script more frequently will also help.
Good luck! If you like I can box up the work I did and post it here, but I think its overkill.
as requested;
CalFuncs.sh: a library I source in most of my scripts
#!/bin/bash
LOGDIR="/tmp"
LOG=$LOGDIR/CalFunc.log
[ ! -d "$LOGDIR" ] && mkdir -p $(dirname $LOG)
SSH_OPTIONS="-o StrictHostKeyChecking=no -q -o ConnectTimeout=15"
SSH="ssh $SSH_OPTIONS -T"
SCP="scp $SSH_OPTIONS"
SI=$(basename $0)
Log() {
echo "`date` [$SI] $#" >> $LOG
}
Run() {
Log "Running '$#' in '`pwd`'"
$# 2>&1 | tee -a $LOG
}
RunHide() {
Log "Running '$#' in '`pwd`'"
$# >> $LOG 2>&1
}
PrintAndLog() {
Log "$#"
echo "$#"
}
ErrorAndLog() {
Log "[ERROR] $# "
echo "$#" >&2
}
showMilliseconds(){
date +%s
}
runMethodForDuration(){
local startT=$(showMilliseconds)
$1
local endT=$(showMilliseconds)
local totalT=$((endT-startT))
PrintAndLog "that took $totalT seconds to run $1"
echo $totalT
}
genCallRefLog.sh - generates fictitious callref.log size depending on argument
#!/bin/bash
#Script to make 80000 sequential lines of callref.log this should suffice for a POC
if [ -z "$1" ] ; then
echo "genCallRefLog.sh requires an integer of the number of lines to pump out of callref.log"
exit 1
fi
file="callref.log"
[ -f "$file" ] && rm -f "$file" # del file if exists
i=0 #put start num in here
j="$1" #put end num in here
echo "building $j lines of callref.log"
for (( a=i ; a < j; a++ ))
do
printf 'C-%08x\n' "$a" >> $file
done
genSampleLog.sh generates fictitious sample.log size depending on argument
#!/bin/bash
#Script to make 80000 sequential lines of callref.log this should suffice for a POC
if [ -z "$1" ] ; then
echo "genSampleLog.sh requires an integer of the number of lines to pump out of sample.log"
exit 1
fi
file="sample.log"
[ -f "$file" ] && rm -f "$file" # del file if exists
i=0 #put start num in here
j="$1" #put end num in here
echo "building $j lines of sample.log"
for (( a=i ; a < j; a++ ))
do
printf 'Dec 16 18:02:04 asterisk1 asterisk[31774]: NOTICE[31787]: C-%08x got hangup request, cause 31\n' "$a" >> $file
done
and finally the actual test script I used. Often I would comment out the building scripts as they only need to run when changing the log size. I also typically would only run one testing function at a time and record the results.
test.sh
#!/bin/bash
source "./CalFuncs.sh"
targetLogFile="cause_temp.log"
Log "Starting"
checkTargetFileSize(){
expectedS="$1"
hasS=$(cat $targetLogFile | wc -l)
if [ "$expectedS" != "$hasS" ] ; then
ErrorAndLog "Got $hasS but expected $expectedS, when inspecting $targetLogFile"
exit 244
fi
}
standard(){
iter=0
while read ref
do
cat sample.log | grep "$ref" | grep 'got hangup request, cause 31' | grep -o 'C-[0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z]' >> $targetLogFile
done < callref.log
}
subStandardVarient(){
iter=0
while read ref
do
cat sample.log | grep 'got hangup request, cause 31' | grep -o "$ref" >> $targetLogFile
done < callref.log
}
newFunction(){
grep -f callref.log sample.log | grep 'got hangup request, cause 31' >> $targetLogFile
}
newFunction4(){
grep 'got hangup request, cause 31' sample.log | grep -of 'callref.log'>> $targetLogFile
}
newFunction5(){
#splitting grep
grep 'got hangup request, cause 31' sample.log > /tmp/somefile
grep -of 'callref.log' /tmp/somefile >> $targetLogFile
}
newFunction2(){
iter=0
while read ref
do
((iter++))
echo "$ref" | grep 'got hangup request, cause 31' | grep -of 'callref.log' >> $targetLogFile
done < sample.log
}
newFunction3(){
iter=0
pat=""
while read ref
do
if [[ "$pat." != "." ]] ; then
pat="$pat|"
fi
pat="$pat$ref"
done < callref.log
# Log "Have pattern $pat"
while read ref
do
((iter++))
echo "$ref" | grep 'got hangup request, cause 31' | grep -oP "$pat" >> $targetLogFile
done < sample.log
#grep: regular expression is too large
}
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
numLines="100000"
Log "testing algorithms with $numLines in each log file."
setupCallRef(){
./genCallRefLog.sh $numLines
}
setupSampleLog(){
./genSampleLog.sh $numLines
}
setupCallRef
setupSampleLog
runMethodForDuration standard > /dev/null
checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration subStandardVarient > /dev/null
checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration newFunction > /dev/null
checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction2 > /dev/null
# checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction3 > /dev/null
# checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction4 > /dev/null
# checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration newFunction5 > /dev/null
checkTargetFileSize "$numLines"
The above shows that the existing method was always faster than anything I came up with. I think someone took care to optimize it.

How to send TCPDUMP output to remote script via CURL

So I have a script here that is taking a TCPDUMP output. We are trying to send (2) variables to a PHP script over the web ($SERVER). The filename header is created and contains both $FILETOSEND which is the filename and filedata. The actual data for the filedata variable is coming from a file called 1 (the data is formatted as you can tell). I am having issues with the section that calls out #send common 10 sec dump.
I am trying to CURL the file 1 and I am doing so by using curl --data "$(cat 1)" $SERVER
The script isn't sending the file 1 at all, mostly just sends the filename and no file data. Is there a problem with the way I am sending the file? Is there a better way to format it?
while true; do
sleep $DATASENDFREQ;
killall -9 tcpdump &> /dev/null
if [ -e $DUMP ]; then
mv $DUMP $DUMP_READY
fi
create_dump
DATE=`date +"%Y-%m-%d_%H-%M-%S"`
FILETOSEND=$MAC-$DATE-$VERSION
# we write fileheader to the file. 2 vars : filename, filedata.
FILEHEADER="filename=$FILETOSEND&filedata="
echo $FILEHEADER > 2
# change all colons to underscores for avoiding Windows filenames issues
sed -i 's/:/_/g' 2
# delete all newlines \n in the file
tr -d '\n' < 2 > 1
# parsing $DUMP_READY to awk.txt (no header in awk.txt)
awk '{ if (NF > 18 && $10 == "signal") {print "{\"mac\": \""$16"\",\"sig\": \""$9"\",\"ver\": \""$8"\",\"ts\": \""$1"\",\"ssid\": \""$19"\"}" }}' $DUMP_READY > awk.txt
sed -i 's/SA://g' awk.txt
sed -i 's/&/%26/g' awk.txt
cat awk.txt >> 1
sync
# send $OFFLINE
if [ -e $OFFLINE ]; then
curl -d $OFFLINE $SERVER
if [ $? -eq "0" ]; then
echo "status:dump sent;msg:offline dump sent"
rm $OFFLINE
else
echo "status:dump not sent;msg:offline dump not sent"
fi
fi
# send common 10 secs dump
curl --data "$(cat 1)" $SERVER
if [ $? -eq "0" ]; then
echo "status:dump sent"
else
cat 1 >> $OFFLINE
echo "status:dump not sent"
fi
if [ -e $DUMP_READY ]; then
rm -f $DUMP_READY 1 2 upload_file*
fi

Implementing a datalogger in bash

Hi I'm a newby in Bash scripting.
I need to log a data stream from a specific IP address and generate a logfile for each day as "file-$date.log" (i.e at 00:00:00 UT close the previous day file and create the correspondig to the new one)
I need to show data stream on screen while it is logged in a file
I try this solution but not works well because never closesthe initial file
apparently the condition check never executes while the first command of the pipe it is something different to an constant string like echo "something".
#!/bin/bash
log_data(){
while IFS= read -r line ; do printf '%s %s\n' "$(date -u '+%j %Y-%m-%d %H:%M:%S')" "$line"; done
}
register_data() {
while : ;
do
> stream.txt
DATE=$(date -u "+%j %Y-%m-%d %H:%M")
HOUR=$(date -u "+%H:%M:%S")
file="file-$DATE.log"
while [[ "${HOUR}" != 00:00:00 ]];
do
tail -f stream.txt | tee "${file}"
sleep 1
HOUR=$(date -u "+%H:%M:%S")
done
> stream.txt
done
}
nc -vn $IP $IP_port | log_data >> stream.txt &
register_data
I'll will be glad if someone can give me some clues to solve this problem.

Resources