Tool for launching command on files with same date - bash

I have several set of files (several hundreds). In each set, each file is related to a date (year/month/day) that is encoded in the file name. I want to execute a command that takes in input a file from each set for a particular date.
Since files are sometime missing in some sets, I want to only launch the command when all sets contain a file for a particular date.
I would to know if there is any existing (command line or other) tool that can do this kind of thing. I searched but I could not find anything.
The use of date as key for files is not mandatory. I guess that any tool that is generic enough will provide a way to specify the key as a parameter.
Edit:
There are less than 10 sets but each contain several hundreds of files.
Each set is located in a separate directory.

Since this answer is tagged with bash, here is a bash script to check if a file exists containing some date string (provided as the first argument of the script) in each of the given sets. If it exists in every one, then some_command is executed:
#!/bin/bash
datestr=$1
all_exist=Y
for set in dir1 dir2 dir3 dir4
do
[ -f "$set/"*"$datestr"* ] || all_exist=""
done
[ "$all_exist" ] && some_command

So this can really be divided into two tasks:
Find dates for which a set exists
Launch a command on each set
You are not revealing how your files are organized, but if you have something like boom20140112/a.txt and boom20140112/b.txt forming one set, and foo20140111/a.txt and foo20140111/c.txt another, you can find the dates with
dates () {
printf "%s\n" *201[0-9][0-9][0-9][0-9][0-9]/. |
sed -e 's%^[0-9]*%%' -e 's%/\.$%%' |
sort -u
}
If your sets look different, you can probably adapt this. The general idea is to obtain a list of pertinent file names, strip out the parts which aren't the date, and remove any duplicates. Now you have a list of dates.
Here is another implementation which assumes that you have files named tags/tags_(date)_a.txt and tags/tags_(date)_b.txt and input/samples_(date).txt forming one set per date, where date is formatted like 2014-01-12.
dates () {
printf "%s\n" input/* tags/* |
sed 's/.*_\(201[1-9]-[0-9][0-9]-[0-9][0-9]\)[_.].*/\1/' |
sort -u
}
Given that, loop over the dates and run your command on each set.
dates | while read -r date; do
command *$date/*
done

Related

How does one group files with complicated file names using a bash script?

I'm stuck.
Ultimate Goal - Concatenate multiple MP4 files into single MP4 files using FFMPEG to reduce the file count and make it easier to manage in a video editor.
The files that I've collected from my security system have a unique naming convention. In each file name, there are some markers that denote the different camera angles. Here is an example of one of the files:
7845582F4FA9_0_rotating_2022-06-01T17.13.47-07.00.mp4
Prefix
Suffix
Number
Type
Date
Hour
Minute
Second
TZ
Ext
7845582F
4FA9
_0_
rotating
_2022-06-01
T17
13
47
-07.00
.mp4
Is it possible to find all files for a specific date from a single folder and ensure that the list produced is grouped by date, suffix, number, and then sorted by time?
Better yet, a separate list for each grouping?
Here's an example Bash script to accomplish this:
#!/bin/bash
loop_over_suffixes() {
for i in *$1*; do
local suffix=($suffix ${i:7:4})
done
for j in ${suffix[#]}; do
echo "Loop over suffix: ${suffix[j]}"
ls *$j*$1*
# Do logic here
done
}
loop_over_suffixes '2022-06-01'
You would call the function with whatever date you wanted. If you wanted to dynamically pass it todays date you could run:
loop_over_suffixes $(date +"%Y-%m-%d")
Just put in your logic where the # Do logic here section is. The ls command should sort by number and time automatically.

Merging CSV files based on filename filter

I'm trying to develop a bash script which filters csv files (generated every hour) for a day before and merge them into a single CSV file. This script seems to do the job for me, except that I'm trying to filter files based on their filenames.
There would be 24 files for each day in the directory, and I need to filter out these files based on their name format:
foofoo_2017052101502.csv
foofoo_2017052104502.csv
foofoo_2017052104503.csv
foofoo_2017052204501.csv
foofoo_2017052204504.csv
Here, I need to filter out for May 21, 2017. So my output CSV files must have the first three .csv files.
What should I add in the script for this filter?
The following script will calculate the previous day yyyymmdd and use that value in the grep to automatically filter out all the file names generated the previous day.
For MacOS
dt=`date -j -v-1d +%Y%m%d`
echo $dt
OutputFiles=`ls | grep foofoo_${dt}`
For Linux
dt=`date -d "yesterday" +%Y%m%d`
echo $dt
OutputFiles=`ls | grep foofoo_${dt}`
These commands when added to the script mentioned will filter the file names for the previous day based upon the current time stamp.
You can let bash do the filtering for you using globbing, for example to list only files with date May 21, 2017 you could use:
for filename in foofoo_20170521*.csv; do...
If you want to be able to call your script with an argument specifying the date to have more flexibility, you can use:
for filename in "foofoo_${1}*.csv"; do...
And then call your script with the date that you want to filter as an argument:
./your_script 20170521
And as #David C. Rankin mention in the comments, a very practical way to do it would be to concatenate all the files from the date you want into one csv that you would then use in your script:
cat foofoo_20170521*.csv > combined_20170521.csv

Bash - File name change Date + 1

I have around 500 files that I need to rename with the date the report represents. The filenames are currently:
WUSR1453722998383.csv
WUSR1453723010659.csv
WUSR1453723023497.csv
And so on. The numbers in the filename have nothing to do with the date, so I cannot use the filename as a guide for what the file should be renamed to. The reports start from 02/12/2014 and there is a report for every day of the month up until yesterday (09/04/2016). Luckily as well the filename is sequential - so 04/12/2014 will have a higher number than 03/12/2014 which will have a higher number than 02/12/2014. This means the files are automatically listed in alphabetical order.
There is however a date in the first line of the CSV before the data:
As at Date,2014-12-02
Now I've checked that I have all the files already and I do, so what's the best way to rename there to the date? I can either set the starting date as 02/12/2014 and rename each file as a +1 date or the script can read the date on the first line of the file (As at Date,2014-12-02 for example) and use that date to rename the file.
I have no idea how to write either of the method above in bash, so if you could help out with this, that would be really appreciated.
In terms of file output, I was hoping for:
02-12-2014.csv
03-12-2014.csv
And so on
Is that the answer you need? Assume all the file are under current directory. Do some testings before you do the real operation. The condition is every date string at your cvs file is unique. There will be some files be overwritten otherwise.
#!/bin/bash
for f in *.csv
do
o=$(sed '1q' $f |awk -F"[,-]" '{print $NF"-"$(NF-1)"-"$(NF-2)".csv"}')
# should we backup the file?
# cp $f ${f}.bak
mv $f $o
done

Call script on all file names starting with string in folder bash

I have a set of files I want to perform an action on in a folder that i'm hoping to write a scipt for. Each file starts with mazeFilex where x can vary from any number , is there a quick and easy way to perform an action on each file? e.g. I will be doing
cat mazeFile0.txt | ./maze_ppm 5 | convert - maze0.jpg
how can I select each file knowing the file will always start with mazeFile?
for fname in mazeFile*
do
base=${fname%.txt}
base=${base#mazeFile}
./maze_ppm 5 <"$fname" | convert - "maze${base}.jpg"
done
Notes
for fname in mazeFile*; do
This codes starts the loop. Written this way, it is safe for all filenames, whether they have spaces, tabs or whatever in their names.
base=${fname%.txt}; base=${base#mazeFile}
This removes the mazeFile prefix and .txt suffix to just leave the base name that we will use for the output file.
./maze_ppm 5 <"$fname" | convert - "maze${base}.jpg"
The output filename is constructed using base. Note also that cat was unnecessary and has been removed here.
for i in mazeFile*.txt ; do ./maze_ppm 5 <$i | convert - `basename maze${i:8} .txt`.jpg ; done
You can use a for loop to run through all the filenames.
#!/bin/bash
for fn in mazeFile*; do
echo "the next file is $fn"
# do something with file $fn
done
See answer here as well: Bash foreach loop
I see you want a backreference to the number in the mazeFile. Thus I recommend John1024's answer.
Edit: removes the unnecessary ls command, per #guido 's comment.

replace $1 variable in file with 1-10000

I want to create 1000s of this one file.
All I need to replace in the file is one var
kitename = $1
But i want to do that 1000s of times to create 1000s of diff files.
I'm sure it involves a loop.
people answering people is more effective than google search!
thx
I'm not really sure what you are asking here, but the following will create 1000 files named filename.n containing 1 line each which is "kite name = n" for n = 1 to n = 1000
for i in {1..1000}
do
echo "kitename = $i" > filename.$i
done
If you have mysql installed, it comes with a lovely command line util called "replace" which replaces files in place across any number of files. Too few people know about this, given it exists on most linux boxen everywhere. Syntax is easy:
replace SEARCH_STRING REPLACEMENT -- targetfiles*
If you MUST use sed for this... that's okay too :) The syntax is similar:
sed -i.bak s/SEARCH_STRING/REPLACEMENT/g targetfile.txt
So if you're just using numbers, you'd use something like:
for a in {1..1000}
do
cp inputFile.html outputFile-$a.html
replace kitename $a -- outputFile-$a.html
done
This will produce a bunch of files "outputFile-1.html" through "outputFile-1000.html", with the word "kitename" replaced by the relevant number, inside the file.
But, if you want to read your lines from a file rather than generate them by magic, you might want something more like this (we're not using "for a in cat file" since that splits on words, and I'm assuming here you'd have maybe multi-word replacement strings that you'd want to put in:
cat kitenames.txt | while read -r a
do
cp inputFile.html "outputFile-$a.html"
replace kitename "$a" -- kitename-$a
done
This will produce a bunch of files like "outputFile-red kite.html" and "outputFile-kite with no string.html", which have the word "kitename" replaced by the relevant name, inside the file.

Resources