Get date from filename and sort into folders? - bash

I'm running wget to get data from an FTP server like this:
wget -r -nH -N --no-parent ftp://username:password#example.com/ -P /home/data/
All of the files are in a format similar to this:
2016_07_10_bob-randomtext.csv.gz
2016_07_11_joe-importantinfo.csv.gz
Right now it's putting all of these files into /home/data/.
What I want to do is get the time from the filename and put it into their own folders based on the date. For example:
/home/data/2016_07_10/2016_07_10_bob-randomtext.csv.gz
/home/data/2016_07_11/2016_07_11_joe-importantinfo.csv.gz
Based off the answers here, it is possible to get the date from a file name. However, I'm not really sure how to turn that into a folder automatically...
Sorry if this is a bit confusing. Any help or advice would be appreciated.

Keeping the download of all the files into one directory, /home/files
destination=/home/data
for filename in /home/files/*; do
if [[ -f "$filename" ]]; then # ignore it if it's a directory (not a file)
name=$(basename "$filename")
datedir=$destination/${name:0:10} # first 10 characters of the filename
mkdir -p "$datedir" # create the directory if it doesn't exist
mv "$filename" "$datedir"
fi
done

Related

shell script - Download files with wget only when file name is in my list

I will download a lot of files from a server with wget. But the files should only be stored when the file name is in a given list. Otherwise wget should stop getting these file and start the next one.
I tried the following:
#!/bin/bash
etsienURL="http://www.etsi.org/deliver/etsi_en"
etsitsURL="http://www.etsi.org/deliver/etsi_ts"
listOfStandards=("en_302571" "en_3023630401" "en_3023630501" "en_3023630601" "en_30263702" "en_30263703" "en_302663" "en_302931" "ts_10153901" "ts_10153903" "ts_1026360501" "ts_1027331" "ts_10286801" "ts_10287103" "ts_10289401" "ts_10289402" "ts_102940" "ts_102941" "ts_102942" "ts_102943" "ts_103097" "ts_10324601" "ts_10324603")
wget -r -nd -nc -e robots=off -A.pdf $etsienURL
wget -r -nd -nc -e robots=off -A.pdf $etsitsURL
for file in *.pdf
do
relevant=false
for t in "${listOfStandards[#]}"
do
if [[ $(basename "$file" .pdf) == *"$t"* ]]
then
relevant=true
break
fi
done
if [ $relevant == false ]
then
rm "$file"
fi
done
With this code all files will be downloaded. After the download the script checks, if the filename or a part of this is in the list. Otherwise the script delete the file. But this cost a lot of disc space. I will only download a file, if the file name contains one if the list items.
Perhaps somebody can help to find a solution.
Found the solution. I forgot the --no-parent tag for wget.

How to rename wget-downloaded files sequentially?

Let's say I am downloading image files from a website with wget.
wget -H -p -w 2 -nd -nc -A jpg,jpeg -R gif "forum.foo.com/showthread.php?t=12345"
there are 20 images in that page.. when downloaded, the images are saved as their original file names.
I want to rename the first image downloaded by wget as
001-original_filename.jpg, the second one as 002-original_filename.jpg, and so on..
What to do? Is bash or curl needed for this?
Note: I am on windows.
If you have bash installed, after files downloaded, run this.
i=1
ls -crt | while read file; do
newfile=$(printf "%.3d-%s\n" $i "$file")
mv "$file" "$newfile"
i=$((i+1))
done
ls -crt : list files using creation date, reverse order using time stamp.
.3d in printf will precision number to 3 digit

Shell script to move, archive and rename files

I am trying to move a bunch of files into a new directory, "archive" and then zip all these files, and renaming the zip onto "DD-MM-YYYY - DD-MM-YYYY", where the second DD-MM-YYYY is 7 days ahead of the first DD-MM-YYYY.
This is what I have done so far,
CURRDATEforARCHIVE=`date +%Y-%m-%d`
mv /Source/path /Destination/path/inbound/
mv /Destination/path/inbound /Destination/path/$CURRDATEforARCHIVE
cd /Destination/path/
zip -r $CURRDATEforARCHIVE.zip $CURRDATEforARCHIVE
rm -rf /Destination/path/$CURRDATEforARCHIVE
mkdir /Source/path/inbound
But I think my implementation is rather clunky, not very clean. Is there a more "streamlined" manner to achieve it?
The easiest way to achieve the second date is to use date -d with a relative time offset. Just create a second variable with something like:
#!/bin/sh
CURRDATEforARCHIVE=`date +%Y-%m-%d`
FUTUREforARCHIVE=`date -d "now + 7 days" "+%Y-%m-%d"`
echo "file_${CURRDATEforARCHIVE}-${FUTUREforARCHIVE}.zip"
Output
$ sh sevendays.sh
file_2015-07-12-2015-07-19.zip
You should also veriify you successfully created the zip file before removing the sources:
zip -r "$CURRDATEforARCHIVE.zip" "$CURRDATEforARCHIVE"
if [ -f "$CURRDATEforARCHIVE.zip" ] then
rm -rf "/Destination/path/$CURRDATEforARCHIVE"
else
printf "error: zip file creation failed '%s'\n" "$CURRDATEforARCHIVE.zip"
fi
Note: always "quote" your variable to protect against spaces in file names, etc..

bash - For every file in a directory, copy it into another directory, only if it doesn't exists there already

Thank you very much in advance for helping!
I have a directory with some html files
$ ls template/content/html
devel.html
idex.html
devel_iphone.html
devel_ipad.html
I'd like to write a bash function to copy every file in that folder into a new location (introduction/files/), ONLY if a file with the same name doesn't exist already there.
This is what I have so far:
orig_html="template/content/html";
dest_html="introduction/files/";
function add_html {
for f in $orig_html"/*";
do
if [ ! -f SAME_FILE_IN_$dest_html_DIRECTORY ];
then
cp $f $dest_html;
fi
done
}
The capital letters is where I was stuck.
Thank you very much.
Would the -n option be enough for your needs?
-n, --no-clobber
do not overwrite an existing file (overrides a previous -i option)
use rsync like this:
rsync -c -avz --delete $orig_html $dest_html
which keep $orig_html indentical with $dest_html based file checksum.
Do you need a bash script ? cp supports the -r (recursive) option, and the -u (update) option. From the man page:
-u, --update
copy only when the SOURCE file is newer than the destination
file or when the destination file is missing
Your $f variable contains the full path, because of the /*.
Try doing something like:
for ff in $orig_html/*
do
thisFile=${ff##*/}
if [ ! -f ${dest_html}/$thisFile ]; then
cp $ff ${dest_html}
fi
done

Recycle bin in bash problem

I need to make a recycle bin code using bash. Here is what I have done so far. My problem is that when I move a file with the same name into the trash folder it just overwrites the previous file. Can you give me any suggestions on how to approach this problem?
#!/bin/bash
mkdir -p "$HOME/Trash"
if [ $1 = -restore ]; then
while read file; do
mv $HOME/Trash/$2 /$file
done < try.txt
else
if [ $1 = -restoreall ]; then
mv $HOME/Trash/* /$PWD
else
if [ $1 = -empty ]; then
rm -rfv /$HOME/Trash/*
else
mv $PWD/"$1"/$HOME/Trash
echo -n "$PWD" >> /$HOME/Bash/try
fi
fi
fi
You could append the timestamp of the time of deletion to the filename in your Trash folder. Upon restore, you could strip this off again.
To add a timestamp to your file, use something like this:
DT=$(date +'%Y%m%d-%H%M%S')
mv $PWD/"$1" "/$HOME/Trash/${1}.${DT}"
This will, e.g., create a file like initrd.img-2.6.28-11-generic.20110615-140159 when moving initrd.img-2.6.28-11-generic.
To get the original filename, strip everything starting from the last dot, like with:
NAME_WITHOUT_TIMESTAMP=${file%.*-*}
The pattern is on the right side after the percentage char. (.* would also be enough to match.)
Take a look how trash-cli does it. It's written in Python and uses the same trash bin as desktop environments. Trash-cli is available at least in the big Linux distributions.
http://code.google.com/p/trash-cli/
Probably the easiest thing to do is simply add -i to the invocation of mv. That will prompt the user whether or not to replace. If you happen to have access to gnu cp (eg, on Linux), you could use cp --backup instead of mv.

Resources