S3cmd move file and del folder - bash

I'm trying to write a bash script to automate my backup plan. I use a script which creates a S3 folder each day with the day as folder name. And Each hour he uploads a backup in this folder.
exemple: /Application1/20130513/dump.01
My backup plan is to keep 2 days of full backup(each hour) and keep 1 backup by day for the latest 15 days in a s3 folder ("oldbackup").
What is wrong in my script?
#check and clean the S3 bucket
BUCKETNAME='application1';
FOLDERLIST = s3cmd ls s3://$BUCKETNAME
LIMITFOLDER = date --date='1 days ago' +'%Y%m%d'
for f in $FOLDERLIST
do
if [[ ${f} > $LIMITFOLDER && f != "oldbackup" ]]; then
s3cmd sync s3://$BUCKETNAME/$f/dump.rdb.0 s3://$BUCKETNAME/"oldbackup"/dump.rdb.$f
s3cmd del s3://$BUCKETNAME/$f --recursive;
fi
done
OLDBACKUP = s3cmd ls s3://$BUCKETNAME/"oldbackup"
LIMITOLDBACKUP = date --date='14 days ago' +'%Y%m%d'
for dump in $OLDBACKUP
if [${dump} > $LIMITOLDBACKUP]; then
s3cmd del s3://$BUCKETNAME/"oldbackup"/$dump
fi
done
Thanks

First, you are probably going to want to store FOLDERLIST as an array. You can do so like this: FOLDERLIST=($(command)).
Next, you should always store the output of commands which you intend to use like a string like so OUTPUT="$(command)".
So for example your first three lines should look like:
BUCKETNAME="application1"
FOLDERLIST=($(s3cmd ls s3://$BUCKETNAME))
LIMITFOLDER="$(date --date="1 days ago" +"%Y%m%d")"
Now your first for-loop should work.
That's the only thing I can guess is wrong with your script (the second for-loop suffers the same) but you really gave me nothing better to go on.
Your second for-loop, (besides not iterating over a proper array) has no do keyword, so you should do:
for dump in $OLDBACKUP
do
# rest of loop body
done
That could be another issue with your script.
Finally, you're only ever using OLDBACKUP and FOLDERLIST to iterate over. The same can be accomplished just by doing:
for f in $(s3cmd ls s3://$BUCKETNAME)
do
# loop body
done
There's no need to store the output in variables unless you plan to reuse it several times.
As a separate matter though, there's no need to use variable names consisting entirely of the capitalized alphabet. You can use lowercased variable names too so long as you understand that using the names of commands will cause errors.

Related

Process multiple files one by one dynamically in workflow using indirect file method

My workflow uses 3 indirect files.
The indirect files can have one or more file names.
Let's say all 3 indirect files have 2 file names each.
Indirect_file1 has (file1,file2)
Indirect_file2 has (filea,fileb)
Indirect_file3 has (filex,filey)
My workflow should run in sequence.
First sequence (file1,filea,filex)
Second sequence (file2,fileb,filey)
we are on Linux environment, so i guess it can be done using shell script
Any pointers will be appreciated.
Thanks in Advance.
This should work -
in informatica session, modify input type to 'Command'
in informatica session, change command type to 'Command generating file List'
for first worfklow set the command like this 'cut -d ',' file -f1' if your delimiter is comma.
for second worfklow set the command like this 'cut -d ',' file -f2' if your delimiter is comma.
You might want to make small work packages first before processing. When the workflow takes a long time it is easier to (re-)start new processes.
You can start with something like this:
# Step 1, move the current set to temporary folder
combine_dir=/tmp/combine
mkdir "${combine_dir}"
mv Indirect_file1 "${combine_dir}"
mv Indirect_file2 "${combine_dir}"
mv Indirect_file3 "${combine_dir}"
# Step 2, construct work packages in other tmp dir
workload_dir=/tmp/workload
mkdir "${workload_dir}"
for file in Indirect_file1 Indirect_file2 Indirect_file3; do
loadnr=1
for work in $(grep -Eo '[^(,)]*' "${file}"); do
echo "${work}" >> ${workload_dir}/sequence${loadnr}
((loadnr++))
done
done
# The sequenceXXX files have been generated with one file on each line.
# When you must have it like (file1,filea,filex), change above loop.
# Now files are ready to be processed. Move them to some dir where files will be handled.
# Please cleanup temporary files

How to run a command for folders that start with the word SAM and are inside another folder?

I'm trying to run a command for all the folders that start with the word SAM which are inside another folder called date (the date changes)and folder date is inside another folder called subject_01 (subject changes) and the folder subject_01 is inside the main folder called root.
Structure:
root/subject/date/SAM_folders
This is the command I want to run and need to be executed from the folder date:
dtiConvPrep.sh folder_name
Example:
dtiConvPrep.sh SAM_03_14_25
I created a script:
#!/bin/bash
array=(/root/*/*) #this vector contains all the folders (subject/date)
len=${#array[#]}
for (( q=0; q<$len; q++ ));
do
cd ${array[$q]} #To execute the command from the folder date for each subject
sleep 1
dtiConvPrep.sh SAM*
done
But it only runs for 1 SAM folder in each folder called date for all the subjects.
Any idea how can I solve this problem? Thanks
for dir in /root/*/*/SAM_*; do
(
cd "$(dirname "$dir")"
dtiConvPrep.sh "$(basename "$dir")"
)
done
A for ((i = 0; i < len; ++i)) style loop is a very C-/Java-like thing to do. In Bash it's more idiomatic to iterate over arrays directly. Or in this case, iterate over the glob directly.
I put parentheses around the loop body so the cds run in a subshell and are only temporary. It's not necessary here since you're cding to absolute paths, but it's a good habit to get into. I like to avoid cding in the middle of scripts as it changes global state in an easy to mess up way.
You may find all the double quotes a bit of an eyesore but it's a good habit to always quote variable expansions and $(...) expansions in case they contain whitespace or other special characters. In this script we need nested quotes to be 100% safe.

Bash - File name change Date + 1

I have around 500 files that I need to rename with the date the report represents. The filenames are currently:
WUSR1453722998383.csv
WUSR1453723010659.csv
WUSR1453723023497.csv
And so on. The numbers in the filename have nothing to do with the date, so I cannot use the filename as a guide for what the file should be renamed to. The reports start from 02/12/2014 and there is a report for every day of the month up until yesterday (09/04/2016). Luckily as well the filename is sequential - so 04/12/2014 will have a higher number than 03/12/2014 which will have a higher number than 02/12/2014. This means the files are automatically listed in alphabetical order.
There is however a date in the first line of the CSV before the data:
As at Date,2014-12-02
Now I've checked that I have all the files already and I do, so what's the best way to rename there to the date? I can either set the starting date as 02/12/2014 and rename each file as a +1 date or the script can read the date on the first line of the file (As at Date,2014-12-02 for example) and use that date to rename the file.
I have no idea how to write either of the method above in bash, so if you could help out with this, that would be really appreciated.
In terms of file output, I was hoping for:
02-12-2014.csv
03-12-2014.csv
And so on
Is that the answer you need? Assume all the file are under current directory. Do some testings before you do the real operation. The condition is every date string at your cvs file is unique. There will be some files be overwritten otherwise.
#!/bin/bash
for f in *.csv
do
o=$(sed '1q' $f |awk -F"[,-]" '{print $NF"-"$(NF-1)"-"$(NF-2)".csv"}')
# should we backup the file?
# cp $f ${f}.bak
mv $f $o
done

Tool for launching command on files with same date

I have several set of files (several hundreds). In each set, each file is related to a date (year/month/day) that is encoded in the file name. I want to execute a command that takes in input a file from each set for a particular date.
Since files are sometime missing in some sets, I want to only launch the command when all sets contain a file for a particular date.
I would to know if there is any existing (command line or other) tool that can do this kind of thing. I searched but I could not find anything.
The use of date as key for files is not mandatory. I guess that any tool that is generic enough will provide a way to specify the key as a parameter.
Edit:
There are less than 10 sets but each contain several hundreds of files.
Each set is located in a separate directory.
Since this answer is tagged with bash, here is a bash script to check if a file exists containing some date string (provided as the first argument of the script) in each of the given sets. If it exists in every one, then some_command is executed:
#!/bin/bash
datestr=$1
all_exist=Y
for set in dir1 dir2 dir3 dir4
do
[ -f "$set/"*"$datestr"* ] || all_exist=""
done
[ "$all_exist" ] && some_command
So this can really be divided into two tasks:
Find dates for which a set exists
Launch a command on each set
You are not revealing how your files are organized, but if you have something like boom20140112/a.txt and boom20140112/b.txt forming one set, and foo20140111/a.txt and foo20140111/c.txt another, you can find the dates with
dates () {
printf "%s\n" *201[0-9][0-9][0-9][0-9][0-9]/. |
sed -e 's%^[0-9]*%%' -e 's%/\.$%%' |
sort -u
}
If your sets look different, you can probably adapt this. The general idea is to obtain a list of pertinent file names, strip out the parts which aren't the date, and remove any duplicates. Now you have a list of dates.
Here is another implementation which assumes that you have files named tags/tags_(date)_a.txt and tags/tags_(date)_b.txt and input/samples_(date).txt forming one set per date, where date is formatted like 2014-01-12.
dates () {
printf "%s\n" input/* tags/* |
sed 's/.*_\(201[1-9]-[0-9][0-9]-[0-9][0-9]\)[_.].*/\1/' |
sort -u
}
Given that, loop over the dates and run your command on each set.
dates | while read -r date; do
command *$date/*
done

bash not adding current date to file name

I have a bash script which backups my source code on a 10 minute basis thru crontab. Script was working until the end of August. It's not working since September 1st. This is the script:
#!/bin/sh
date=`date +%e-%m-%y`
cd /home/neky/python
tar -zcf lex.tar.gz lex/
echo $date
mv lex.tar.gz lex-$date.tar.gz
mv lex-$date.tar.gz /home/neky/Dropbox/lex/lex-$date.tar.gz
If I execute it manually, it print the current date 4-09-12, and this error mv: target ‘4-09-12.tar.gz’ is not a directory
What could be the problem?
Your date contains a space when the day of month is a single digit (which also explains why it only stopped working in the new month). That results in your command being split up, i.e.
# this is what it you end up with
mv lex.tar.gz lex- 4-09-12.tar.gz
Use date +%d-%m-%y instead which will give you 04-09-12 (note %d instead of %e).
If you really want a space in the name, you'll need to quote your variables, i.e.:
mv lex.tar.gz "lex-$date.tar.gz"
mv "lex-$date.tar.gz" /home/neky/Dropbox/lex/
The character % (part of your date format) is a special one in cron scripts, so you need to escape it.

Resources