I have a directory with files coming for every day. Now I want to zip those files group by dates. Is there anyway to group/list the files which landed in same date.
Suppose there are below files in a directory
-rw-r--r--. 1 anirban anirban 1598 Oct 14 07:19 hello.txt
-rw-r--r--. 1 anirban anirban 1248 Oct 14 07:21 world.txt
-rw-rw-r--. 1 anirban anirban 659758 Oct 14 11:55 a
-rw-rw-r--. 1 anirban anirban 9121 Oct 18 07:37 b.csv
-rw-r--r--. 1 anirban anirban 196 Oct 20 08:46 go.xls
-rw-r--r--. 1 anirban anirban 1698 Oct 20 08:52 purge.sh
-rw-r--r--. 1 anirban anirban 47838 Oct 21 08:05 code.java
-rw-rw-r--. 1 anirban anirban 9446406 Oct 24 05:51 cron
-rw-rw-r--. 1 anirban anirban 532570 Oct 24 05:57 my.txt
drwxrwsr-x. 2 anirban anirban 67 Oct 25 05:05 look_around.py
-rw-rw-r--. 1 anirban anirban 44525 Oct 26 17:23 failed.log
So there are no way to group the files with any suffix/prefix, since all are unique. Now when I will run the command I am seeking I will get a set of lines like below based on group by dates.
[ [hello.txt world.txt a] [b.csv] [go.xls purge.sh] [code.java] ... ] and so on.
With that list I will loop through and make archive
tar -zvcf Oct_14.tar.gz hello.txt world.txt a
If you have the GNU version of the date command, you can get the date of modification of a file with the -r flag, which can be very useful.
For example, given the file list in your question, date +%b_%d -r hello.txt will output Oct_14.
Using this, you could loop over the files, and build up tar files:
If the tar file doesn't exist, create it with a single file
If the tar file exists, add the file to it
After the loop, zip the tar files
Like this:
#!/usr/bin/env bash
tarfiles=()
for file; do
tarfile=$(date +%b_%d.tar -r "$file")
if ! [ -f "$tarfile" ]; then
tar cf "$tarfile" "$file"
tarfiles+=("$tarfile")
else
tar uf "$tarfile" "$file"
fi
done
for tarfile in "${tarfiles[#]}"; do
gzip "$tarfile"
done
Pass the list of files you want to archive as command line parameters, for example if /path/to/files is the directory where you want to archive files (listed in your question), and you save this script in ~/bin/tar-by-dates.sh, then you can use like this:
cd /path/to/files
~/bin/tar-by-dates.sh *
Create zero-seperated list of (Month_Day.tar FILENAME) pairs and use xargs to add each file to the corresponding archive:
find . -maxdepth 1 -mindepth 1 -type f -printf "%Tb%Td.tar\0%f\0"|xargs -n 2 -0 tar uf
Related
thank you so much for any advice and feedback on this matter.
This is is my situation:
I have a directory with several hundred files, all that start with foo* and end with *.txt. However, they differ in between the beginning and end with a unique identifier that is "Group#.#" and the files look like so:
foo.Group1.1.txt
foo.Group1.2.txt
foo.Group1.4.txt
foo.Group2.45.txt
.
.
.
foo.Group16.9.txt
The files begin with Group1 and end at Group 16. They are simple one column txt files, each file has several thousand lines. Each row is a number.
I want to so a series of concatenations with these files in which I concatenate all but those files with the "Group1" and then all the files except "Group1" and "Group2" and then all the files except "Group1", "Group2", and "Group3" and so on, until I am left with just the last Group: "Group16"
In order to do this I use a bash extended globbing expression with a negation syntax to concatenate all files except those that have "Group1" as their ID.
I make a directory "jacks" and output the concatenated file into a txt file within this subdirectory:
cat !(*Group1.*) > jacks/jackknife1.freqs.txt
I can then continue using this command, but adding "Group2" and "Group3" for subsequent concatenations.
cat !(*Group1.*|*Group2.*) > jacks/jackknife2.freqs.txt
cat !(*Group1.*|*Group2.*|*Group3.*) > jacks/jackknife3.freqs.txt
Technically, this works. And 16 Groups isn't too terrible to do this manually.
But I am wondering if there is a way, perhaps using loops or bash scripting to automate this process and speed it up?
I would appreciate any advice or leads on this question!
thank you very much,
daniela
Some tries around bash globbing
Try using echo before cat !
touch foo.Group{1..3}.{1..5}.txt
ls -l
total 0
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.5.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.5.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.5.txt
Then
echo !(*Group1.*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.3.txt foo.Group2.4.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.2.txt foo.Group3.3.txt foo.Group3.4.txt foo.Group3.5.txt
Ok, and
echo !(*Group[23].*)
foo.Group1.1.txt foo.Group1.2.txt foo.Group1.3.txt foo.Group1.4.txt foo.Group1.5.txt
Or
echo !(*Group*(1|3).*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.3.txt foo.Group2.4.txt foo.Group2.5.txt
Or even
echo !(*Group*(1|*.3).*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.4.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.2.txt foo.Group3.4.txt foo.Group3.5.txt
and
echo !(*Group*(1|*.[2-4]).*)
foo.Group2.1.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.5.txt
I will let you think about last two sample! ;-)
unfortunately I'm quite new at bash, and I want to write a script that will start in a main directory, and check all subdirectories one by one for the presence of certain files, and if those files are present, perform an operation on them. For now, I have written a simplified version to test whether I can do the first part (checking for the files in each directory). This code runs without any errors that I can tell, but it does not echo anything to say that it has successfully found the files which I know are there.
#!/bin/bash
runlist=(1 2 3 4 5 6 7 8 9)
for f in *; do
if [[ -d {$f} ]]; then
#if f is a directory then cd into it
cd "{$f}"
for b in $runlist; do
if [[ -e "{$b}.png" ]]; then
echo "Found {$b}"
#if the file exists then say so
fi
done
cd -
fi
done
'''
Welcome to stackoverflow.
The following will do the trick (a combination of find, array, and if then else):
# list of files we are looking for
runlist=(1 2 4 8 16 32 64 128)
#find each of above anywhere below current directory
# using -maxdepth 1 because, based on on your exam you want to look one level only
# if that's not what you want then take out -maxdepth 1 from the find command
for b in ${runlist[#]}; do
echo
PATH_TO_FOUND_FILE=`find . -name $b.png`
if [ -z "$PATH_TO_FOUND_FILE" ]
then
echo "nothing found" >> /dev/null
else
# You wanted a postive confirmation, so
echo found $b.png
# Now do something with the found file. Let's say ls -l: change that to whatever
ls -l $PATH_TO_FOUND_FILE
fi
done
Here is an example run:
mamuns-mac:stack foo$ ls -lR
total 8
drwxr-xr-x 4 foo 1951595366 128 Apr 11 18:03 dir1
drwxr-xr-x 3 foo 1951595366 96 Apr 11 18:03 dir2
-rwxr--r-- 1 foo 1951595366 652 Apr 11 18:15 find_file_and_do_something.sh
./dir1:
total 0
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 1.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 8.png
./dir2:
total 0
-rw-r--r-- 1 foo 1951595366 0 Apr 11 18:03 64.png
mamuns-mac:stack foo$ ./find_file_and_do_something.sh
found 1.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 ./dir1/1.png
found 8.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 ./dir1/8.png
found 64.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 18:03 ./dir2/64.png
It only prints the "else" statement for everything but I know for a fact the files exist that it's looking for. I've tried adapting some of the other answers but I thought this should definitely work.
Does anyone know what's wrong with my syntax?
# Contents of script
for ID_SAMPLE in $(cut -f1 metadata.tsv | tail -n +2);
do if [ -f ./output/${ID_SAMPLE} ]; then
echo Skipping ${ID_SAMPLE};
else
echo Processing ${ID_SAMPLE};
fi
done
Additional information
# Output directory
(base) -bash-4.1$ ls -lhS output/
total 170K
drwxr-xr-x 8 jespinoz tigr 185 Jan 3 16:16 ERR1701760
drwxr-xr-x 8 jespinoz tigr 185 Jan 17 18:03 ERR315863
drwxr-xr-x 8 jespinoz tigr 185 Jan 16 23:23 ERR599042
drwxr-xr-x 8 jespinoz tigr 185 Jan 17 00:10 ERR599072
drwxr-xr-x 8 jespinoz tigr 185 Jan 16 13:00 ERR599078
# Example of inputs
(base) -bash-4.1$ cut -f1 metadata.tsv | tail -n +2 | head -n 10
ERR1701760
ERR599078
ERR599079
ERR599070
ERR599071
ERR599072
ERR599073
ERR599074
ERR599075
ERR599076
# Output of script
(base) -bash-4.1$ bash test.sh | head -n 10
Processing ERR1701760
Processing ERR599078
Processing ERR599079
Processing ERR599070
Processing ERR599071
Processing ERR599072
Processing ERR599073
Processing ERR599074
Processing ERR599075
Processing ERR599076
# Checking a directory
(base) -bash-4.1$ ls -l ./output/ERR1701760
total 294
drwxr-xr-x 2 jespinoz tigr 386 Jan 15 21:00 checkpoints
drwxr-xr-x 2 jespinoz tigr 0 Jan 10 01:36 tmp
-f is for checking whether the name is a file, but all your names are directories. Use -d to check that.
if [ -d "./output/$ID_SAMPLE" ]
then
If you want to check whether the name exists with any type, use -e.
So I tried
ncftpls -l
which gives me a list
-rw-r--r-- 1 100 ftpgroup 3817084 Jan 29 15:50 1548773401.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817089 Jan 29 15:51 1548773461.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817083 Jan 29 15:52 1548773521.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817085 Jan 29 15:53 1548773582.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817090 Jan 29 15:54 1548773642.tar.gz
But all I want is to check the timestamp (which is the name of the tar.gz)
How to only get the timestamp list ?
As requested, all I wanted to do is delete old backups, so awk was a good idea (at least it was effective) even it wasn't the right params. My method to delete old backup is probably not the best but it works
ncftpls *authParams* | (awk '{match($9,/^[0-9]+/, a)}{ print a[0] }') | while read fileCreationDate; do
VALIDITY_LIMIT="$((`date +%s`-600))"
a=$VALIDITY_LIMIT
b=$fileCreationDate
if [ $b -lt $a ];then
deleteFtpFile $b
fi
done;
You can use awk to only display the timestamps from the output like so:
ncftpls -l | awk '{ print $5 }'
I have to store the last 5 test results folders. Every time the tests are executed, a folder called "target" is created with all the results. The idea is to move that folder and add a counter (Ex.: target1, target2, ...). And store the last 5 executions.
It is important that when 5 folders exists and a new test execution creates a new folder, the oldest should be removed and all the folders renamed starting for the new one as target1 and the new last from target4 to target5
For now, I'm just storing the lastone.
rm -rf ${WORKSPACE}/target
if [ -d "${WORKSPACE}/tjba-hmi-toolkit/target" ]; then
# Control will enter here if "target" exists.
cp -r ${WORKSPACE}/tjba-hmi-toolkit/target ${WORKSPACE}/
fi
This will work with the minor glitch that first copy will have '0' suffix
$ cp -rp tmp cptest/tmp$(find cptest -type d -name 'tmp*' | wc -l)
$ ls -l cptest/
total 0
drwxr-xr-x 4 luis users 273 abr 27 14:42 tmp0
drwxr-xr-x 4 luis users 273 abr 27 14:42 tmp1
drwxr-xr-x 4 luis users 273 abr 27 14:42 tmp2
drwxr-xr-x 4 luis users 273 abr 27 14:42 tmp3
drwxr-xr-x 4 luis users 273 abr 27 14:42 tmp4`