Unique Linux filename, sortable by time

Unique Linux filename, sortable by time - bash

Previously I was using uuidgen to create unique filenames that I then need to iterate over by date/time via a bash script. I've since found that simply looping over said files via 'ls -l' will not suffice because evidently I can only trust the OS to keep timestamp resolution in seconds (nonoseconds is all zero when viewing files via stat on this particular filesystem and kernel)
So I then though maybe I could just use something like date +%s%N for my filename. This will print the seconds since 1970 followed by the current nanoseconds.
I'm possibly over-engineering this at this point, but these are files generated on high-usage enterprise systems so I don't really want to simply trust the nanosecond timestamp on the (admittedly very small) chance two files are generated in the same nanosecond and we get a collision.
I believe the uuidgen script has logic baked in to handle this occurrence so it's still guaranteed to be unique in that case (correct me if I'm wrong there... I read that someplace I think but the googles are failing me right now).
So... I'm considering something like
FILENAME=`date +%s`-`uuidgen -t`
echo $FILENAME
to ensure I create a unique filename that can then be iterated over with a simple 'ls' and who's name can be trusted to both be unique and sequential by time.
Any better ideas or flaws with this direction?

If you order your date format by year, month (zero padded), day (zero padded), hour (zero padded), minute (zero padded), then you can sort by time easily:
FILENAME=`date '+%Y-%m-%d-%H-%M'`-`uuidgen -t`
echo $FILENAME
or
FILENAME=`date '+%Y-%m-%d-%H-%M'`-`uuidgen -t | head -c 5`
echo $FILENAME
Which would give you:
2015-02-23-08-37-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
or
2015-02-23-08-37-xxxxx
# the same as above, but shorter unique string
You can choose other delimiters for the date/time besides - as you wish, as long as they're within the valid characters for Linux file name.

You will need %N for precision (nanoseconds):
filename=$(date +%s.%N)_$(uuidgen -t); echo $filename
1424699882.086602550_fb575f02-bb63-11e4-ac75-8ca982a9f0aa
BTW if you use %N and you're not using multiple threads, it should be unique enough.

You could take what TIAGO said about %N precision, and combine it with taskset
You can find some info here: http://manpages.ubuntu.com/manpages/hardy/man1/taskset.1.html
and then run your script
taskset --cpu-list 1 my_script
Never tested this, but, it should run your script only on the first core of your CPU. I'm thinking that if your script runs on your first CPU core, combined with date %N (nanoseconds) + uuidgen there's no way you can get duplicate filenames.

Related

Bash script that creates files of a set size

I'm trying to set up a script that will create empty .txt files with the size of 24MB in the /tmp/ directory. The idea behind this script is that Zabbix, a monitoring service, will notice that the directory is full and wipe it completely with the usage of a recovery expression.
However, I'm new to Linux and seem to be stuck on the script that generates the files. This is what I've currently written out.
today="$( date +¨%Y%m%d" )"
number=0
while test -e ¨$today$suffix.txt¨; do
(( ++number ))
suffix=¨$( printf -- %02d ¨$number¨ )
done
fname=¨$today$suffix.txt¨
printf ´Will use ¨%s¨ as filename\n´ ¨$fname¨
printf -c 24m /tmp/testf > ¨$fname¨
I'm thinking what I'm doing wrong has to do with the printf command. But some input, advice and/or directions to a guide to scripting are very welcome.
Many thanks,
Melanchole

I guess that it doesn't matter what bytes are actually in that file, as long as it fills up the temp dir. For that reason, the right tool to create the file is dd, which is available in every Linux distribution, often installed by default.
Check the manpage for different options, but the most important ones are
if: the input file, /dev/zero probably which is just an endless stream of bytes with value zero
of: the output file, you can keep the code you have to generate it
count: number of blocks to copy, just use 24 here
bs: size of each block, use 1MB for that

How to get the last time a file was modified in Unix

I am trying to get the last time the date a file was modified. I used a variable for date and a variable for time.
This will get the date and time but I want to use -r using the date command to make a reflection of when the date was last modified. Just not sure how I go about using it in my variables.
How would I go about doing this?
Here are my variables:
DATE="$(date +'%m/%d/%Y')"
TIME="$(date +'%H:%M')"
I tried putting the -r after and before the time and date.

Though people might tell you, you should not parse the output of ls, simply that can easily break if your file name contains tabs, spaces, line breaks, your user decides to simply specify a different set of ls options, the ls version you find is not behaving like you expected...
Use stat instead:
stat -c '%Y'
will give you the seconds since epoch.
Try
date -d "#$(stat -c '%Y' $myfile)" "+%m/%d/%Y"
to get the date, and read through man date to get the time in the format you want to, replacing '%F' in the command line above:
date -d "#$(stat -c '%Y' $myfile)" "+%H:%M"
EDIT: used your formats.
EDIT2: I really don't think your date format is wise, because it's just so ambiguous for anyone not from the US, and also it's not easily sortable. But it's a cultural thing, so this is more of a hint: If you want to make your usable for people from abroad, either use Year-month-day as format, or get the current locale's setting to format dates.

I think you are looking for
ls -lt myfile.txt
Here in 6th column you will see when file was modified.
Or you could use stat myfile.txt to check the modified time of a file.

I know this is a very old question, but, for the sake of completeness, I'm including an additional answer here.
The original question does not specify the specific operating system. stat differs significantly from SysV-inspired Unixes (e.g. Linux) and BSD-inspired ones (e.g. Free/Open/NetBSD, macOS/Darwin).
Under macOS Big Sur (11.5), you can get the date of a file with a single stat command:
stat -t '%m/%d/%Y %H:%M' -f "%Sm" myfile.txt
will output
04/10/2021 23:22
for April 10, 2021.
You can easily put that in two commands, one for the date, another for the time, of course, to comply with the original question as formulated.

Use GNU stat.
mtime=$(stat --format=%y filename)

Create a new sequence of files from an existing sequence, along with numbering

I know this question has been asked, but I can't find more than one solution, and it does not work for me. Essentially, I'm looking for a bash script that will take a file list that looks like this:
image1.jpg
image2.jpg
image3.jpg
And then make a copy of each one, but number it sequentially backwards. So, the sequence would have three new files created, being:
image4.jpg
image5.jpg
image6.jpg
And yet, image4.jpg would have been an untouched copy of image3.jpg, and image5.jpg an untouched copy of image2.jpg, and so on. I have already tried the solution outlined in this stackoverflow question with no luck. I am admittedly not very far down the bash scripting path, and if I take the chunk of code in the first listed answer and make a script, I always get "2: Syntax error: "(" unexpected" over and over. I've tried changing the syntax with the ( around a bit, but no success ever. So, either I am doing something wrong or there's a better script around.
Sorry for not posting this earlier, but the code I'm using is:
image=( image*.jpg )
MAX=${#image[*]}
for i in ${image[*]}
do
num=${i:5:3} # grab the digits
compliment=$(printf '%03d' $(echo $MAX-$num | bc))
ln $i copy_of_image$compliment.jpg
done
And I'm taking this code and pasting it into a file with nano, and adding !#/bin/bash as the first line, then chmod +x script and executing in bash via sh script. Of course, in my test runs, I'm using files appropriately titled image1.jpg - but I was also wondering about a way to apply this script to a directory of jpegs, not necessarily titled image(integer).jpg - in my file keeping structure, most of these are a single word, followed by a number, then .jpg, and it would be nice to not have to rewrite the script for each use.

Perhaps something like this. It will work well for something like script image*.jpg where the wildcard matches a set of files which match a regular pattern with monotonously increasing numbers of the same length, and less ideally with a less regular subset of the files in the current directory. It simply assumes that the last file's digit index plus one through the total number of file names is the range of digits to loop over.
#!/bin/sh
# Extract number from final file name
eval lastidx=\$$#
tmp=${lastidx#*[!0-9][0-9]}
lastidx=${lastidx#${lastidx%[0-9]$tmp}}
tmp=${lastidx%[0-9][!0-9]*}
lastidx=${lastidx%${lastidx#$tmp[0-9]}}
num=$(expr $lastidx + $#)
width=${#lastidx}
for f; do
pref=${f%%[0-9]*}
suff=${f##*[0-9]}
# Maybe show a warning if pref, suff, or width changed since the previous file
printf "cp '$f' '$pref%0${width}i$suff'\\n" $num
num=$(expr $num - 1)
done |
sh
This is sh-compatible; the expr stuff and the substring extraction up front is ugly but Bourne-compatible. If you are fine with the built-in arithmetic and string manipulation constructs of Bash, converting to that form should be trivial.
(To be explicit, ${var%foo} returns the value of $var with foo trimmed off the end, and ${var#foo} does similar trimming from the beginning of the value. Regular shell wildcard matching operators are available in the expression for what to trim. ${#var} returns the length of the value of $var.)

Maybe your real test data runs from 001 to 300, but here you have image1 2 3, and therefore you extract one, not three digits from the filename. num=${i:5:1}
Integer arithmetic can be done in the bash without calling bc
${#image[#]} is more robust than ${#image[*]}, but shouldn't be a difference here.
I didn't consult a dictionary, but isn't compliment something for your girl friend? The opposite is complement, isn't it? :)
the other command made links - to make copies, call cp.
Code:
#!/bin/bash
image=( image*.jpg )
MAX=${#image[#]}
for i in ${image[#]}
do
num=${i:5:1}
complement=$((2*$MAX-$num+1))
cp $i image$complement.jpg
done
Most important: If it is bash, call it with bash. Best: do a shebang (as you did), make it executable and call it by ./name . Calling it with sh name will force the wrong interpreter. If you don't make it executable, call it bash name.

Bash/batch multiple file, single folder, incrimental rename script; user provided filename prefix parameter

I have a folder of files which need to be renamed.
Instead of a simple incrimental numeric rename function I need to first provide a naming convention which will then incriment in order to ensure file name integrity within the folder.
say i have files:
wei12346.txt
wifr5678.txt
dkgj5678.txt
which need to be renamed to:
Eac-345-018.txt
Eac-345-019.txt
Eac-345-020.txt
Each time i run the script the naming could be different and the numeric incriment to go along with it may also be ddifferent:
Ebc-345-010.pdf
Ebc-345-011.pdf
Ebc-345-012.pdf
So i need to ask for a provided parameter from the user, i was thinking this might be useful as the previous file name in the list of files to be indexed eg: Eac-345-017.txt
The other thing I am unsure about with the incriment is how the script would deal with incrimenting 099 to 100 or 999 to 1000 as i am not aware of how this process is carried out.
I have been told that this is an easy script in perl however I am running cygwin on a windows machine in work and have access to only bash and windows shells in order to execute the script.
Any pointers to get me going would be greatly appreciated, i have some experience programming but scripting is almost entirely new.
Thanks,
Craig
(i understand there are allot of posts on this type of thing already but none seem to offer any concise answer, hence my question)

#!/bin/bash
prefix="$1"
shift
base_n="$1"
shift
step="$1"
shift
n=$base_n
for file in "$#" ; do
formatted_n=$(printf "%03d" $n)
# re-use original file extension whilke we're at it.
mv "$file" "${prefix}-${formatted_n}.${file##*.}"
let n=n+$step
done
Save the file, invoke it like this:
bash fancy_rename.sh Ebc-345- 10 1 /path/to/files/*
Note: In your example you "renamed" a .txt to a .pdf, but above I presumed the extension would stay the same. If you really wanted to just change the extension then it would be a trivial change. If you wanted to actually convert the file format then it would be a little more complex.
Note also that I have formatted the incrementing number with %03d. This means that your number sequence will be e.g.
010
011
012
...
099
100
101
...
999
1000
Meaning that it will be zero padded to three places but will automatically overflow if the number is larger. If you prefer consistency (always 4 digits) you should change the padding to %04d.

OK, you can do the following. You can ask the user first the prefix and then the starting sequence number. Then, you can use the built-in printf from bash to do the correct formatting on the numbers, but you may have to decide to provide enough number width to hold all the sequence, because this will result in a more homogeneous names. You can use read to read user input:
echo -n "Insert the prefix: "
read prefix
echo -n "Insert the sequence number: "
read sn
for i in * ; do
fp=`printf %04d $sn`
mv "$i" "$prefix-$fp.txt"
sn=`expr $sn + 1`
done
Note: You can extract the extension also. That wouldn't be a problem. Also, here I selected 4 numbers fot the sequence number, calculated into the variable $fp.

bash script to rename files based on a calculation

I have a file system containing PNG images. The layout of the filesystem is: ZOOM/X/Y.png where ZOOM, X, and Y are all integers.
I need to change the names of the PNG files. Basically, I need to convert Y from its current value to 2^ZOOM-Y-1. I've written a bash script to accomplish this task. However, I suspect it can be optimized substantially. (I also suspect that I may have been better off writing it in Perl, but that is another story.)
Here is the script. Is this about as good as it gets? Or can the performance be optimized? Are there tools I can use that would profile the script for me and tell me where I'm spending all my execution time?
#!/bin/bash
tiles=`ls -d */*/*`
for oldPath in $tiles
do
oldY=`basename -s .png $oldPath`
zoomX=`dirname $oldPath`
zoom=`echo $zoomX | sed 's#\([^\]\)/.*#\1#'`
newY=`echo 2^$zoom-$oldY-1|bc`
mv ${zoomX}/${oldY}.png ${zoomX}/${newY}.png
done

for oldpath in */*/*
do
x=$(basename "$oldpath" .png)
zoom_y=$(dirname "$oldpath")
y=$(basename "$zoom_y")
ozoom=$(dirname "$zoom_y")
nzoom=$(echo "2^$zoom-$y-1" | bc)
mv "$oldpath" $nzoom/$y/$x.png
done
This avoids using sed. I like basename and dirname. However, you can also use bash (and Korn) shell notations such as:
y=${zoom_y#*/}
ozoom=${zoom_y%/*}
You might be able to do it all without invoking basename or dirname at all.

REWRITE due to misunderstanding of the formula and the updated var names. Still no subprocesses apart from mv and ls.
#!/bin/bash
tiles=`ls -d */*/*`
for thisPath in $tiles
do
thisFile=${thisPath#*/*/}
oldY=${thisFile%.png}
zoomX=${thisPath%/*}
zoom=${thisPath%/*/*}
newY=$(((1<<zoom) - oldY - 1))
mv ${zoomX}/${oldY}.png ${zoomX}/${newY}.png
done

It's likely that the overall throughput of your rename is limited by the filesystem. Choosing the right filesystem and tuning it for this sort of operation would speed up the overall job much more than tweaking the script.
If you optimize the script you'll probably see less CPU consumed but the same total duration. Since forking off the various subprocesses (basename, dirname, sed, bc) are probably more significant than the actual work you are probably right that a perl implementation would use less CPU because it can do all of those operations internally (including the mv).

I see 3 improvements I would do, if it was my script. Whether they have an huge impact - I don't think so.
But you should avoid as hell parsing the output of ls. Maybe this directory is very predictable, from the things found inside, but if I read your script correctly, you can use the globbing with for directly:
for thisPath in */*/*
repeatedly, $(cmd) is better than cmd with the deprecated backticks, which aren't nestable.
thisDir=$(dirname $thisPath)
arithmetic in bash directly:
newTile=$((2**$zoom-$thisTile-1))
as long as you don't need floating point, or output is getting too big.
I don't get the sed-part:
zoom=`echo $zoomX | sed 's#\([^\]\)/.*#\1#'`
Is there something missing after the backslash? A second one? You're searching for something which isn't a backslash, followed by a slash-something? Maybe it could be done purely in bash too.

one precept of computing credited to Donald Knuth is, "don't optimize too early." Scripts run pretty fast and 'mv' operations(as long as they're not going across filesystems where you're really copying it to another disk and then deleting the file) are pretty fast as well, as all the filesystem has to do in most cases is just rename the file or change its parentage.
Probably where it's spending most of its time is in that intial 'ls' operation. I suspect you have ALOT of files. There isn't much that can be done there. Doing it another language like perl or python is going to face the same hurdle. However you might be able to get more INTELLIGENCE and not limit yourself to 3 levels(//*).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio