Bash script for renaming files, using modular arithmetic?

Bash script for renaming files, using modular arithmetic? - bash

I've got a series of files that are namedHHMMSSxxxxxxxxxxxxxxxx.mp3, where HH,MM, and SS are parts of a timestamp and the x's are unique per file.
The timestamp follows a 24 hour form (where 10am is 100000, 12pm is 120000, 6pm is 180000, 10pm is 220000, etc). I'd like to shift each down by 10 hours, so that 10am is 000000, 12pm is 020000, etc.
I know basic BASH commands for renaming and moving, etc, but I can't figure out how to do the modular arithmetic on the filenames.
Any help would be very much appreciated.

#!/bin/bash
for f in *.mp3
do
printf -v newhour '%02d' $(( ( 10#${f:0:2} + 14 ) % 24 ))
echo mv "$f" "$newhour${f:2}"
done
Remove the echo to make it functional.
Explanation:
printf -v newhour '%02d' - this is like sprintf(), the value is stored in the named variable
$(( ( 10#${f:0:2} + 14 ) % 24 )) - 10# forces the number to base 10 (e.g. 08 would otherwise be considered an invalid octal), ${f:0:2} extracts the first two characters (the hour), the rest does the math
"$newhour${f:2}" - prepend the new hour before the substring of the original name, starting at the third character

The easiest way is probably to extract the timestamp and use date to turn it into a number of seconds, do normal math on the result, then convert it back. date -d datestring +format lets you do these conversions.

Related

Linux: How to round off the date() to nearest interval

I want to round down the minutes to the nearest 15 min interval i.e. 00,15,30,45. I'm currently doing the below:
echo $(date +'%Y/%m/%d/%H/')$((($(($(date +'%M') / 15))-0)*15))
But at the start of the hour between 1-14 minutes, I get "/2021/11/03/21/0" instead of 00.
Also, I'm not sure if this is the best way to do this. Are there any alternatives?

Would you please try the following:
mod=$(( 10#$(date +%M) \% 15 ))
date -d "-${mod} minutes" +%Y/%m/%d/%H/%M
The variable mod holds the remainder of the minutes divided by 15.
Then round down to the nearest 15 minute interval by subtracting mod.
[Edit]
The manpage of crontab says:
Percent-signs (%) in the command, unless
escaped with backslash (), will be changed into newline
characters, and all data after the first % will be sent to
the command as standard input.
If you want to execute the command within crontab, please modify the command as:
mod=$(( 10#$(date +\%M) \% 15 ))
date -d "-${mod} minutes" +\%Y/\%m/\%d/\%H/\%M
[Edit2]
If you want to embed the code in crontab file, please add a line which look like:
0 12 * * * username bash -c 'mod=$(( 10#$(date +\%M) \% 15 )); DATEVAR=$(date -d "-${mod} minutes" +\%Y/\%m/\%d/\%H/\%M); write.sh "$DATEVAR"'
Please modify the execution time/date and the username accordingly.
The default shell to execute crontab command may be /bin/sh. Then you will need to explicitly switch it to /bin/bash to execute bash commands.
My apology that a backslash in front of % 15 (modulo operation) was missing in my previous post.

Another approach:
min=$(printf "%0.2d" $(( ($(date +'%M') / 15) * 15 )))
echo "$(date +'%Y/%m/%d/%H/')$min"

date -d "#$((($(date +%s) + 450) / 900 * 900))"
This uses the properties of integer division to “subtract a modulus” and adds half of the desired interval to (almost) mimic a rounding operation.
A bit of extra sub-second rounding precision (for no good reason) can be achieved by taking %N (nanoseconds) into account. But it will not matter, because the exact half of the rounding interval (450 seconds) is already aligned with the default epoch resolution (1 second). (If the number of seconds in the desired rounding interval was odd, then the following would increase the time rounding precision.)
date -d "#$((($(date +%s%N) + 45*10**10) / (9*10**11) * 900))"

Pure bash, bash version 4.3 or higher:
printf '%(%Y/%m/%d/%H/%M)T\n' "$(( $(printf '%(%s)T') /(15*60)*(15*60) ))"
Using GNU date (any bash version or POSIX shell):
date -d #$(( $(date +%s) /(15*60)*(15*60) )) +%Y/%m/%d/%H/%M
Truncates the current epoch date (seconds since 1970-01-01 00:00:00) to a 15 minute (900 second) interval, then converts to desired format.
Retrieves the current date/time once only.
If you build a date/time from two separate date/times, it can be wrong, when a unit ticks over in between.
The printf date-time format string was added in bash 4.2, and was changed in 4.3 to also print the current time, if no input date was given.
Note that bash arithmetic treats numbers that start with zero as octals, and numbers like 08 and 09 will cause an error (because they are not octal numbers).

Calculating PowerShell ticks in Bash

I am translating a PowerShell script in Bash.
This is how the ticks for current datetime are obtained in PowerShell:
[System.DateTime]::Now.Ticks;
By following the definition of Ticks, this is how I am trying to approximate the same calculation using the date command in bash:
echo $(($(($(date -u '+%s') - $(date -d "0001-01-01T00:00:00.0000000 UTC" '+%s'))) * 10000000 ))
This is what I got the last time I tried:
$ echo $(($(($(date -u '+%s') - $(date -d "0001-01-01T00:00:00.0000000 UTC" '+%s'))) * 10000000 )) ; pwsh -c "[System.DateTime]::Now.Ticks;"
637707117310000000
637707189324310740
In particular, the first 7 digits are identical, but digits in position 8 and 9 are still too different between the two values.
I calculated that this means there is just a 2 hours difference between the 2 values. But why? It cannot be the timezone, since I specified UTC timezone in both date commands, right? What do you think?
Note: my suspects about the timezone are increasing, since I am currently based in UTC+2 (therefore 2 hours difference from UTC), but how is this possible since I explicitly specified UTC as timezone in the date commands?

Solved it! The problem wasn't in the date commands, it was in the PowerShell command, which was using the +2 Timezone (CEST time). To fix this, I am now using UtcNow instead of Now.
This is what I am getting now:
$ echo $(($(($(date -u '+%s') - $(date -d "0001-01-01T00:00:00.0000000 UTC" '+%s'))) * 10000000 )) ; pwsh -c "[System.DateTime]::UtcNow.Ticks;"
637707132410000000
637707132415874110
As you can see, now all the digits are identical, except for the last 7th digits, since I added zeros on purpose to convert from seconds to ticks, as I am not interested in fractions of seconds (for now) and I consider them negligible.
Alternative way
Another way to make the two values identical (still excluding fractions of seconds), is to remove the -u option in the first date command in order to use the current time zone, and replace UTC with +0200 in the second date command. If I do this, I can leave Now on the PowerShell command (instead of replacing it with UtcNow).
By doing this, I am getting:
$ echo $(($(($(date '+%s') - $(date -d "0001-01-01T00:00:00.0000000 +0200" '+%s'))) * 10000000)) ; pwsh -c "[System.DateTime]::Now.Ticks;"
637707218060000000
637707218067248090
If you also want fractions of seconds
I just understood that if you also need to consider fractions of seconds, then you just need to add the result of date '+%N' (nanoseconds) divided by 100 to the calculation, in any of the two approaches shown above.
Since the result of date '+%N' can have some leading zeros, Bash may think it's an octal value. To avoid this, just prepend 10# to explicitly say it is a decimal value.
For example, taking the second approach shown above (the "alternative way"), now I get:
$ echo $(($(($(date '+%s') - $(date -d "0001-01-01T00:00:00.0000000 +0200" '+%s'))) * 10000000 + $((10#$(date '+%N')/100)) ))
637707225953311420

Base error when trying to get week number

I'm trying to get the week number of last week. The following command normally had work, but now I'm getting error.
lastweeknumber=$((`date +%V`-1))
bash: 09: value too great for base (error token is "09")
This week number is 09, so I've tried to convert to decimal adding 10# like this $(10#(date +%V)) but it's not working.
How to fix this?

Consider the following, which uses bash's built-in functionality in place of the external date command, and thus requires a recent shell release but is much faster to run (and will behave consistently without depending on a specific version of date).
With that done, though, there's still a need to strip the leading 0 -- which a parameter expansion will do just fine:
printf -v seconds_now '%(%s)T' -1
printf -v weeknum_lastweek '%(%V)T' "$(( seconds_now - (60 * 60 * 24 * 7) ))"
echo "The index of last week is ${weeknum_lastweek#0}"

It is because date +%V returns 09 and shell is interpreting any value starting with 0 as an octal number. Note that 09 is an invalid octal number hence you get that error value too great for base.
You can just force module 10 arithmetic in (( ... )):
echo $(( 10#$(date +%V) - 1 ))
8

Another way that handles wrapping around year correctly:
lastweeknumber=$(date -d "1 week ago" +%V)

Determine number of digits bash

I have a date that is inputted into a bash script through 3 separate command line arguments. The user could put in either 1 or 2 digits for the month and day and 4 digits for the year (eg. 2014 01 01 or 2014 1 1). But my script needs two digits to run. I was thinking of using an if statement to handle this. It would basically say "if the amount of digits in the month is less than 2 then put a leading zero in front of it". Though, I am unsure how to determine the amount of digits in bash. I am rather new to scripting so any help would be greatly appreciated!

You could solve this using the printf command:
month=5
month2=`printf %02d $month`
The %02d format specifier formats $month as two decimal digits with a leading zero if necessary.

For any bash variable $x (including parameters $1, $2, etc.), ${#x} is the length of $x in characters.

How can I use bash (grep/sed/etc) to grab a section of a logfile between 2 timestamps?

I have a set of mail logs: mail.log mail.log.0 mail.log.1.gz mail.log.2.gz
each of these files contain chronologically sorted lines that begin with timestamps like:
May 3 13:21:12 ...
How can I easily grab every log entry after a certain date/time and before another date/time using bash (and related command line tools) without comparing every single line? Keep in mind that my before and after dates may not exactly match any entries in the logfiles.
It seems to me that I need to determine the offset of the first line greater than the starting timestamp, and the offset of the last line less than the ending timestamp, and cut that section out somehow.

Convert your min/max dates into "seconds since epoch",
MIN=`date --date="$1" +%s`
MAX=`date --date="$2" +%s`
Convert the first n words in each log line to the same,
L_DATE=`echo $LINE | awk '{print $1 $2 ... $n}'`
L_DATE=`date --date="$L_DATE" +%s`
Compare and throw away lines until you reach MIN,
if (( $MIN > $L_DATE )) ; then continue ; fi
Compare and print lines until you reach MAX,
if (( $L_DATE <= $MAX )) ; then echo $LINE ; fi
Exit when you exceed MAX.
if (( $L_DATE > $MAX )) ; then exit 0 ; fi
The whole script minmaxlog.sh looks like this,
#!/usr/bin/env bash
MIN=`date --date="$1" +%s`
MAX=`date --date="$2" +%s`
while true ; do
read LINE
if [ "$LINE" = "" ] ; then break ; fi
L_DATE=`echo $LINE | awk '{print $1 " " $2 " " $3 " " $4}'`
L_DATE=`date --date="$L_DATE" +%s`
if (( $MIN > $L_DATE )) ; then continue ; fi
if (( $L_DATE <= $MAX )) ; then echo $LINE ; fi
if (( $L_DATE > $MAX )) ; then break ; fi
done
I ran it on this file minmaxlog.input,
May 5 12:23:45 2009 first line
May 6 12:23:45 2009 second line
May 7 12:23:45 2009 third line
May 9 12:23:45 2009 fourth line
June 1 12:23:45 2009 fifth line
June 3 12:23:45 2009 sixth line
like this,
./minmaxlog.sh "May 6" "May 8" < minmaxlog.input

Here one basic idea of how to do it:
Examine the datestamp on the file to see if it is irrelevent
If it could be relevent, unzip if necessary and examine the first and last lines of the file to see if it contains the start or finish time.
If it does, use a recursive function to determine if it contains the start time in the first or second half of the file. Using a recursive function I think you could find any date in a million line logfile with around 20 comparisons.
echo the logfile(s) in order from the offset of the first entry to the offset of the last entry (no more comparisons)
What I don't know is: how to best read the nth line of a file (how efficient is it to use tail n+**n|head 1**?)
Any help?

You have to look at every single line in the range you want (to tell if it's in the range you want) so I'm guessing you mean not every line in the file. At a bare minimum, you will have to look at every line in the file up to and including the first one outside your range (I'm assuming the lines are in date/time order).
This is a fairly simple pattern:
state = preprint
for every line in file:
if line.date >= startdate:
state = print
if line.date > enddate:
exit for loop
if state == print:
print line
You can write this in awk, Perl, Python, even COBOL if you must but the logic is always the same.
Locating the line numbers first (with say grep) and then just blindly printing out that line range won't help since grep also has to look at all the lines (all of them, not just up to the first outside the range, and most likely twice, one for the first line and one for the last).
If this is something you're going to do quite often, you may want to consider shifting the effort from 'every time you do it' to 'once, when the file is stabilized'. An example would be to load up the log file lines into a database, indexed by the date/time.
That takes a while to get set up but will result in your queries becoming a lot faster. I'm not necessarily advocating a database - you could probably achieve the same effect by splitting the log files into hourly logs thus:
2009/
01/
01/
0000.log
0100.log
: :
2300.log
02/
: :
Then for a given time, you know exactly where to start and stop looking. The range 2009/01/01-15:22 through 2009/01/05-09:07 would result in:
some (the last bit) of the file 2009/01/01/1500.txt.
all of the files 2009/01/01/1[6-9]*.txt.
all of the files 2009/01/01/2*.txt.
all of the files 2009/01/0[2-4]/*.txt.
all of the files 2009/01/05/0[0-8]*.txt.
some (the first bit) of the file 2009/01/05/0900.txt.
Of course, I'd write a script to return those lines rather than trying to do it manually each time.

Maybe you can try this:
sed -n "/BEGIN_DATE/,/END_DATE/p" logfile

It may be possible in a Bash environment but you should really take advantage of tools that have more built-in support for working with Strings and Dates. For instance Ruby seems to have the built in ability to parse your Date format. It can then convert it to an easily comparable Unix Timestamp (a positive integer representing the seconds since the epoch).
irb> require 'time'
# => true
irb> Time.parse("May 3 13:21:12").to_i
# => 1241371272
You can then easily write a Ruby script:
Provide a start and end date. Convert those to this Unix Timestamp Number.
Scan the log files line by line, converting the Date into its Unix Timestamp and check if that is in the range of the start and end dates.
Note: Converting to a Unix Timestamp integer first is nice because comparing integers is very easy and efficient to do.
You mentioned "without comparing every single line." Its going to be hard to "guess" at where in the log file the entries starts being too old, or too new without checking all the values in between. However, if there is indeed a monotonically increasing trend, then you know immediately when to stop parsing lines, because as soon as the next entry is too new (or old, depending on the layout of the data) you know you can stop searching. Still, there is the problem of finding the first line in your desired range.
I just noticed your edit. Here is what I would say:
If you are really worried about efficiently finding that start and end entry, then you could do a binary search for each. Or, if that seems like overkill or too difficult with bash tools you could have a heuristic of reading only 5% of the lines (1 in every 20), to quickly get a close to exact answer and then refining that if desired. These are just some suggestions for performance improvements.

I know this thread is old, but I just stumbled upon it after recently finding a one line solution for my needs:
awk -v ts_start="2018-11-01" -v ts_end="2018-11-15" -F, '$1>=ts_start && $1<ts_end' myfile
In this case, my file has records with comma separated values and the timestamp in the first field. You can use any valid timestamp format for the start and end timestamps, and replace these will shell variables if desired.
If you want to write to a new file, just use normal output redirection (> newfile) appended to the end of above.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio