Randomly loop over days in bash-script - bash

At the moment, I have a while-loop that takes a starting date, runs a python script with the day as the input, then takes the day + 1 until a certain due date is reached.
day_start=2016-01-01
while [ "$day_start"!=2018-01-01 ] ;
do
day_end=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
python script.py --start="$day_start" --end="$day_end";
day_start=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
done
I would like to do the same thing, but now to pick a random day between 2016-01-01 and 2018-01-01 and repeat until all days have been used once. I think it should be a for-loop instead of this while loop, but I have trouble to specify the for-loop over this date-range in bash. Does anyone have an idea how to formulate this?

It can take quite a long time if you randomly choose the dates because of the Birthday Problem. (You'll hit most of the dates over and over again but the last date can take quite some time).
The best idea I can give you is this:
Create all dates as before in a while loop (only the day_start-line)
Output all dates into a temporary file
Use sort -R on this file ("shuffles" the contents and prints the result)
Loop over the output from sort -R and you'll have dates randomly picked until all were reached.
Here's an example script which incorporates my suggestions:
#!/bin/bash
day_start=2016-01-01
TMPFILE="$(mktemp)"
while [ "$day_start" != "2018-01-01" ] ;
do
day_start=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
echo "${day_start}"
done > "${TMPFILE}"
sort -R "${TMPFILE}" | while read -r day_start
do
day_end=$(date +"%Y-%m-%d" -d "$day_start + 1 day")
python script.py --start="$day_start" --end="$day_end";
done
rm "${TMPFILE}"
By the way, without the spaces in the while [ "$day_start" != "2018-01-01" ];, bash won't stop your script.

Fortunately, from 16 to 18 there was no leap year (or was it, and it just works because of that)?
Magic number: 2*365 = 730
The i % 100, just to have less output.
for i in {0..730}; do nd=$(date -d "2016/01/01"+${i}days +%D); if (( i % 100 == 0 || i == 730 )); then echo $nd ; fi; done
01/01/16
04/10/16
07/19/16
10/27/16
02/04/17
05/15/17
08/23/17
12/01/17
12/31/17
With the format instruction (here +%D), you might transform the output to your needs, date --help helps.
In a better readable format, and with +%F:
for i in {0..730}
do
nd=$(date -d "2016/01/01"+${i}days +%F)
echo $nd
done
2016-01-01
2016-04-10
2016-07-19
...
For a random distribution, use shuf (here, for bevity, with 7 days):
for i in {0..6}; do nd=$(date -d "2016/01/01"+${i}days +%D); echo $nd ;done | shuf
01/04/16
01/07/16
01/05/16
01/01/16
01/03/16
01/06/16
01/02/16

Related

shell script to modify part of each line to a new value

I have a file containing on each line a date time value
I have a command to change all the values to the today date, but i need to be able to change not only to today, but let's say, first 10 lines changed to today, next 10 lines to be changed to yesterday's date, and so on.
Could you please help me on this one?
file snippet:
bla|TRANSACTTIME=20181127153310|bla|bla
bla|TRANSACTTIME=20181127153310|bla|bla
bla|TRANSACTTIME=20181127153310|bla|bla
bla|TRANSACTTIME=20181127153310|bla|bla
I thinks this should work,
#!/bin/bash
set +x
STEP=3 #size of the block you want to modify
DATE_STEP=1 #how many days you want to step
BASEDATE=20181127 #basedate you want to replace
LINES=$(cat $1 | wc -l)
BLOCKS=$((LINES / STEP ))
MODULE=$((LINES % STEP ))
if [ "$MODULE" -ne "0" ];
then
BLOCKS=$((BLOCKS + 1))
fi
START=1
END=$STEP
ADD_DAYS=0
for i in $(seq 1 $BLOCKS);
do
NEWDATE=$(date +'%Y%m%d' -d"today+$ADD_DAYS days")
#sed is used twice, first to get the required lines and then to do the replacement
sed -n ${START},${END}p $1 | sed s/$BASEDATE/$NEWDATE/
START=$((END + 1))
END=$((END + STEP))
ADD_DAYS=$((ADD_DAYS + DATE_STEP))
done
output goes directly to stdout

Is it really slow to handle text file(more than 10K lines) with shell script?

I have a file with more than 10K lines of record.
Within each line, there are two date+time info. Below is an example:
"aaa bbb ccc 170915 200801 12;ddd e f; g; hh; 171020 122030 10; ii jj kk;"
I want to filter out the lines the days between these two dates is less than 30 days.
Below is my source code:
#!/bin/bash
filename="$1"
echo $filename
touch filterfile
totalline=`wc -l $filename | awk '{print $1}'`
i=0
j=0
echo $totalline lines
while read -r line
do
i=$[i+1]
if [ $i -gt $[j+9] ]; then
j=$i
echo $i
fi
shortline=`echo $line | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'`
date1=`echo $shortline | awk '{print $1}'`
date2=`echo $shortline | awk '{print $2}'`
if [ $date1 -gt 700000 ]
then
continue
fi
d1=`date -d $date1 +%s`
d2=`date -d $date2 +%s`
diffday=$[(d2-d1)/(24*3600)]
#diffdays=`date -d $date2 +%s` - `date -d $date1 +%s`)/(24*3600)
if [ $diffday -lt 30 ]
then
echo $line >> filterfile
fi
done < "$filename"
I am running it in cywin. It took about 10 second to handle 10 lines. I use echo $i to show the progress.
Is it because i am using some wrong way in my script?
This answer does not answer your question but gives an alternative method to your shell script. The answer to your question is given by Sundeep's comment :
Why is using a shell loop to process text considered bad practice?
Furthermore, you should be aware that everytime you call sed, awk, echo, date, ... you are requesting the system to execute a binary which needs to be loaded into memory etc etc. So if you do this in a loop, it is very inefficient.
alternative solution
awk programs are commonly used to process log files containing timestamp information, indicating when a particular log record was written. gawk extended the awk standard with time-handling functions. The one you are interested in is :
mktime(datespec [, utc-flag ]) Turn datespec into a timestamp in the
same form as is returned by systime(). It is similar to the function
of the same name in ISO C. The argument, datespec, is a string of the
form "YYYY MM DD HH MM SS [DST]". The string consists of six or seven
numbers representing, respectively, the full year including century,
the month from 1 to 12, the day of the month from 1 to 31, the hour of
the day from 0 to 23, the minute from 0 to 59, the second from 0 to
60, and an optional daylight-savings flag.
The values of these numbers need not be within the ranges specified;
for example, an hour of -1 means 1 hour before midnight. The
origin-zero Gregorian calendar is assumed, with year 0 preceding year
1 and year -1 preceding year 0. If utc-flag is present and is either
nonzero or non-null, the time is assumed to be in the UTC time zone;
otherwise, the time is assumed to be in the local time zone. If the
DST daylight-savings flag is positive, the time is assumed to be
daylight savings time; if zero, the time is assumed to be standard
time; and if negative (the default), mktime() attempts to determine
whether daylight savings time is in effect for the specified time.
If datespec does not contain enough elements or if the resulting time
is out of range, mktime() returns -1.
As your date format is of the form yymmdd HHMMSS we need to write a parser function convertTime for this. Be aware in this function we will pass times of the form yymmddHHMMSS. Furthermore, using a space delimited fields, your times are located in field $4$5 and $11$12. As mktime converts the time to seconds since 1970-01-01 onwards, all we need to do is to check if the delta time is smaller than 30*24*3600 seconds.

awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ t1=convertTime($4$5); t2=convertTime($11$12)}
(t2-t1 < 30*3600*24) { print }' <file>
If you are not interested in the real delta time (your sed line removes the actual time of the day), than you can adopt it to :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s "00 00 00"
return mktime(s)
}
{ t1=convertTime($4); t2=convertTime($11)}
(t2-t1 < 30*3600*24) { print }' <file>
If the dates are not in the fields, you can use match to find them :
awk 'function convertTime(t) {
s="20"substr(t,1,2)" "substr(t,3,2)" "substr(t,5,2)" "
s= s substr(t,7,2)" "substr(t,9,2)" "substr(t,11,2)"
return mktime(s)
}
{ match($0,/[0-9]{6} [0-9]{6}/);
t1=convertTime(substr($0,RSTART,RLENGTH));
a=substr($0,RSTART+RLENGTH)
match(a,/[0-9]{6} [0-9]{6}/)
t2=convertTime(substr(a,RSTART,RLENGTH))}
(t2-t1 < 30*3600*24) { print }' <file>
With some modifications, often without speed in mind, I can reduce the processing time by 50% - which is a lot:
#!/bin/bash
filename="$1"
echo "$filename"
# touch filterfile
totalline=$(wc -l < "$filename")
i=0
j=0
echo "$totalline" lines
while read -r line
do
i=$((i+1))
if (( i > ((j+9)) )); then
j=$i
echo $i
fi
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
if (( date1 > 700000 ))
then
continue
fi
d1=$(date -d "$date1" +%s)
d2=$(date -d "$date2" +%s)
diffday=$(((d2-d1)/(24*3600)))
# diffdays=$(date -d $date2 +%s) - $(date -d $date1 +%s))/(24*3600)
if (( diffday < 30 ))
then
echo "$line" >> filterfile
fi
done < "$filename"
Some remarks:
# touch filterfile
Well - the later CMD >> filterfile overwrites this file and creates one, if it doesn't exist.
totalline=$(wc -l < "$filename")
You don't need awk, here. The filename output is surpressed if wc doesn't see the filename.
Capturing the output in an array:
shortline=($(echo "$line" | sed 's/.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*\([0-9]\{6\}\)[ ][0-9]\{6\}.*/\1 \2/'))
date1=${shortline[0]}
date2=${shortline[1]}
allows us array access and saves another call to awk.
On my machine, your code took about 42s for 2880 lines (on your machine 2880 s?) and about 19s for the same file with my code.
So I suspect, if you aren't running it on an i486-machine, that cygwin might be a slowdown. It's a linux environment for windows, isn't it? Well, I'm on a core Linux system. Maybe you try the gnu-utils for Windows - the last time I looked for them, they were advertised as gnu-utils x32 or something, maybe there is an a64-version available by now.
And the next thing I would have a look at, is the date calculation - that might be a slowdown too.
2880 lines isn't that much, so I don't suspect that my SDD drive plays a huge role in the game.

How could I use bash to work out how many tuesdays there are in a month? [duplicate]

I need to sort data on a weekly base and all i have are dates in a logfile.
Therefore to sort out data per week i would like to create a list with the dates of all mondays for a given year. I have tried to work something out and the only idea i currently have is to use ncal with year and month as argument looping over all months and extracting all mondays. Isn't there a more efficient way?
To get all mondays, by getting all dates and filtering by Mondays:
for i in `seq 0 365`
do date -d "+$i day"
done | grep Mon
Of course, you could also take a monday and keep incrementing by 7 days.
hope that's what you mean. Below can be changed to vary the output formats of the dates.
date command can be used for that, dunno if ncal is any more/less efficient.
I know you went for "binning" now, but here is a more readable v.
$ cat /tmp/1.sh
#!/bin/bash
test -z "$year" && {
echo "I expect you to set \$year environment variable"
echo "In return I will display you the Mondays of this year"
exit 1
}
# change me if you would like the date format to be different
# man date would tell you all the combinations you can use here
DATE_FORMAT="+%Y-%m-%d"
# change me if you change the date format above. I need to be
# able to extract the year from the date I'm shoing you
GET_YEAR="s/-.*//"
# this value is a week, in milliseconds. Changing it would change
# what I'm doing.
WEEK_INC=604800
# Use another 3-digit week day name here, to see dates for other week days
DAY_OF_WEEK=Mon
# stage 1, let's find us the first day of the week in this year
d=1
# is it DAY_OF_WEEK yet?
while test "$(date -d ${year}-1-${d} +%a)" != "$DAY_OF_WEEK"; do
# no, so let's look at the next day
d=$((d+1));
done;
# let's ask for the milliseconds for that DAY_OF_WEEK that I found above
umon=$(date -d ${year}-1-${d} +%s)
# let's loop until we break from inside
while true; do
# ndate is the date that we testing right now
ndate=$(date -d #$umon "$DATE_FORMAT");
# let's extract year
ny=$(echo $ndate|sed "$GET_YEAR");
# did we go over this year? If yes, then break out
test $ny -ne $year && { break; }
# move on to next week
umon=$((umon+WEEK_INC))
# display the date so far
echo "$ndate"
done
No need to iterate over all 365 or 366 days in the year. The following executes date at most 71 times.
#!/bin/bash
y=2011
for d in {0..6}
do
if (( $(date -d "$y-1-1 + $d day" '+%u') == 1)) # +%w: Mon == 1 also
then
break
fi
done
for ((w = d; w <= $(date -d "$y-12-31" '+%j') - 1; w += 7))
do
date -d "$y-1-1 + $w day" '+%Y-%m-%d'
done
Output:
2011-01-03
2011-01-10
2011-01-17
2011-01-24
2011-01-31
2011-02-07
2011-02-14
2011-02-21
2011-02-28
2011-03-07
. . .
2011-11-28
2011-12-05
2011-12-12
2011-12-19
2011-12-26
Another option that I've come up based on the above answers. The start and end date can now be specified.
#!/bin/bash
datestart=20110101
dateend=20111231
for tmpd in {0..6}
do
date -d "$datestart $tmpd day" | grep -q Mon
if [ $? = 0 ];
then
break
fi
done
for ((tmpw = $tmpd; $(date -d "$datestart $tmpw day" +%s) <= $(date -d "$dateend" +%s); tmpw += 7))
do
echo `date -d "$datestart $tmpw day" +%d-%b-%Y`
done
You can get the current week number using date. Maybe you can sort on that:
$ date +%W -d '2011-02-18'
07

how do you through loop a certain date range

I have files in a directory that are date based but not obviously date-stamped.
File_yyyymmdd_record.log
These are lying around in a directory for a few years worth of time.
Now if these were simply numbers all I needed to do was get the difference and incremenet a counter to push the value
var=substring( File_yyyymmdd_record.log ) /* get the yyyymmdd part */
var2=substring( File2_yyyymmdd_record.log ) /* get the yyyymmdd part */
delta=var2-var1
set i=delta and loop through to get the values for all these recordID's ( record ID is the yyyymmdd part )
The problem is if I have 2 different months and also years in the directory say 20131210 and 20140110
the difference not going to gimme all the recordID's in that directory , since, when it spills over to the next month the plain numeric calculation is not applicable- it should be a date based calculation.
what I want to do is use 2 input parameters to the shell
shell.sh recordID1 recordID2
and based on these it will find all records and store them some place and loop through each record as an input like this
find <dir> -iname recordID* ...<some awk and sed here> |
while read recordID ;
do <stuff >
done
Anyway this can be achieved esp in 2 contexts-
First the date calculation part and the other is to store these recordID's so I can cycle through them. Maybe echo them to a tmp file is what comes off the bat.
For the date calculation part - I tried this and it works . But not sure if it will falter some time / situation
echo $((($(date -u -d 2010-04-29 +%s) - $(date -u -d 2010-03-28 +%s)) / 86400))
So given recordID1 as 20100328 I have 32 days recordID's to look for in that directory.
You have to advance dates for 32 days from recordID1 and store them some place.
How best can all this be done.
I got your points, you need find out log files with file name between 20131210 and 20140110 .
(no need convert to epoch time)
#! /usr/bin/bash
sep=20131210
eep=20140110
find /DIR -type f -name "*.log" |while read file
do
d=${file:5:8}
if [ "$d" -ge "$sep" ] && [ "$d" -le "$eep" ]; then
do <stuff >
fi
done
Something like this should do:
s=20130102 # start date
e=20130202 # end date
sep=$(date +"%s" -d"$s") # conv to epoch
eep=$(date +"%s" -d"$e")
for f in *.log; do
d=$(date +"%s" -d$(sed -n 's/^[^_]*_\([^_]*\)_[^_]*.log/\1/p' <<< "$f"))
if [ "$d" -ge "$sep" ] && [ "$d" -le "$eep" ]; then
echo $f
fi
done

bash shell date parsing, start with specific date and loop through each day in month

I need to create a bash shell script starting with a day and then loop through each subsequent day formatting that output as %Y_%m_d
I figure I can submit a start day and then another param for the number of days.
My issue/question is how to set a DATE (that is not now) and then add a day.
so my input would be 2010_04_01 6
my output would be
2010_04_01
2010_04_02
2010_04_03
2010_04_04
2010_04_05
2010_04_06
[radical#home ~]$ cat a.sh
#!/bin/bash
START=`echo $1 | tr -d _`;
for (( c=0; c<$2; c++ ))
do
echo -n "`date --date="$START +$c day" +%Y_%m_%d` ";
done
Now if you call this script with your params it will return what you wanted:
[radical#home ~]$ ./a.sh 2010_04_01 6
2010_04_01 2010_04_02 2010_04_03 2010_04_04 2010_04_05 2010_04_06
Very basic bash script should be able to do this:
#!/bin/bash
start_date=20100501
num_days=5
for i in `seq 1 $num_days`
do
date=`date +%Y/%m/%d -d "${start_date}-${i} days"`
echo $date # Use this however you want!
done
Output:
2010/04/30
2010/04/29
2010/04/28
2010/04/27
2010/04/26
Note: NONE of the solutions here will work with OS X. You would need, for example, something like this:
date -v-1d +%Y%m%d
That would print out yesterday for you. Or with underscores of course:
date -v-1d +%Y_%m_%d
So taking that into account, you should be able to adjust some of the loops in these examples with this command instead. -v option will easily allow you to add or subtract days, minutes, seconds, years, months, etc. -v+24d would add 24 days. and so on.
#!/bin/bash
inputdate="${1//_/-}" # change underscores into dashes
for ((i=0; i<$2; i++))
do
date -d "$inputdate + $i day" "+%Y_%m_%d"
done
Very basic bash script should be able to do this.
Script:
#!/bin/bash
start_date=20100501
num_days=5
for i in seq 1 $num_days
do
date=date +%Y/%m/%d -d "${start_date}-${i} days"
echo $date # Use this however you want!
done
Output:
2010/04/30
2010/04/29
2010/04/28
2010/04/27
2010/04/26
You can also use cal, for example
YYYY=2014; MM=02; for d in $(cal $MM $YYYY | grep "^ *[0-9]"); do DD=$(printf "%02d" $d); echo $YYYY$MM$DD; done
(originally posted here on my commandlinefu account)
You can pass a date via command line option -d to GNU date handling multiple input formats:
http://www.gnu.org/software/coreutils/manual/coreutils.html#Date-input-formats
Pass starting date as command line argument or use current date:
underscore_date=${1:-$(date +%y_%m_%d)}
date=${underscore_date//_/-}
for days in $(seq 0 6);do
date -d "$date + $days days" +%Y_%m_%d;
done
you can use gawk
#!/bin/bash
DATE=$1
num=$2
awk -vd="$DATE" -vn="$num" 'BEGIN{
m=split(d,D,"_")
t=mktime(D[1]" "D[2]" "D[3]" 00 00 00")
print d
for(i=1;i<=n;i++){
t+=86400
print strftime("%Y_%m_%d",t)
}
}'
output
$ ./shell.sh 2010_04_01 6
2010_04_01
2010_04_02
2010_04_03
2010_04_04
2010_04_05
2010_04_06
2010_04_07

Resources