I want to list specified files (files uploaded yesterday) from an amazon S3.
Then I want to loop on this list, and for every single element of the list I want to unzip the file.
My code is:
for file in s3cmd ls s3://my-bucket/`date +%Y%m%d -d "1 day ago"`*
do s3cmd get $file
arrIN=(${file//my-bucket\//})
gunzip ${arrIN[1]}
done
so basicaly arrIN=(${file//my-bucket//}); explodes my string and allow me to retrieve the name of the file I want to unzip.
Thing is, file are downloading but nothing is being unzip, so I tried:
for file in s3cmd ls s3://my-bucket/`date +%Y%m%d -d "1 day ago"`*
do s3cmd get $file
echo test1
done
Files are being downloaded but nothing is being echo. Loop is just working for the first line...
You need to use command substitution to iterate over the result of the desired s3smd ls command.
for file in $(s3cmd ls s3://my-bucket/$(date +%Y%m%d -d "1 day ago")*); do
However, this isn't the preferred way to iterate over the output of a command, since in theory the results could contain whitespace. See Bash FAQ 001 for the proper method.
Related
I have a directory of files that I need to rename to a given string with a timestamp.
I use the following code:
for file in itvl_*
do
mv "$file" "Interval_$(stat -c %Y "$file" | date +%Y%m%d%H%M%S).Interval_001"
done
When I run the script in a directory of files that fit the given mask itvl_*, it removes all but one or two of the files and then successfully renames the last file in the group.
What might be happening here?
It doesn't delete all the files, it simple renames all of them to the same target name.
You can see it by running:
for file in itvl_*
do
echo "Interval_$(stat -c %Y "$file" | date +%Y%m%d%H%M%S).Interval_001"
done
The result is that they override one another and only the last one survives.
I have files that has following format in directory:
SLS20160112.001 (20160112 stands for YYYYMMDD)
I wish to archive all previous month files, for example:
SLS20160201.001
SLS20150201.001
SLS20160107.001
SLS20160130.001
For the above files listed, i will archive SLS20160107.001 and SLS20160130.001 because from the filename it stamps January.
For the SLS20160201.001 it still remains as i only want to archive previous month file. I can only extract date from the filename, not the mdate or adate.
My current logic is to loop through all files, then get previous month files and then pipe out the filename and tar it. But not sure how to do that part.
for file in SLS*; do
f="${file%.*}"
GET PREVIOUS MONTH FILES AND THEN ECHO
done | tar cvzf SlSBackup_<PREVIOUS_MONTH>.TAR.GZ -T-
It looks like you want to solve the problem by using a shell script. I do a lot of work on Mac so I use csh/tcsh (default shell for OSX), and my answer will be in csh/tcsh script. You can either translate it to bash (your shell) or you can easily spawn a new shell by just typing$ tcsh.
You can write a small shell script which can filter filelist for your desired month.
#!/bin/csh -f
set mon_wanted = '01'
foreach file (`ls -1 SLS*.*`)
set mon = `echo $file | awk '{print substr($0, 8, 2)}'`
if ($mon != $mon_wanted) continue
echo -n $file ' '
end
Let's say the filename is foo.csh. Make it executable by
$ chmod 755 foo.csh
Then,
$ tar cvzf `foo.csh` > out.tar.gz
I have a folder filled with ~300 files. They are named in this form username#mail.com.pdf. I need about 40 of them, and I have a list of usernames (saved in a file called names.txt). Each username is one line in the file. I need about 40 of these files, and would like to copy over the files I need into a new folder that has only the ones I need.
Where the file names.txt has as its first line the username only:
(eg, eternalmothra), the PDF file I want to copy over is named eternalmothra#mail.com.pdf.
while read p; do
ls | grep $p > file_names.txt
done <names.txt
This seems like it should read from the list, and for each line turns username into username#mail.com.pdf. Unfortunately, it seems like only the last one is saved to file_names.txt.
The second part of this is to copy all the files over:
while read p; do
mv $p foldername
done <file_names.txt
(I haven't tried that second part yet because the first part isn't working).
I'm doing all this with Cygwin, by the way.
1) What is wrong with the first script that it won't copy everything over?
2) If I get that to work, will the second script correctly copy them over? (Actually, I think it's preferable if they just get copied, not moved over).
Edit:
I would like to add that I figured out how to read lines from a txt file from here: Looping through content of a file in bash
Solution from comment: Your problem is just, that echo a > b is overwriting file, while echo a >> b is appending to file, so replace
ls | grep $p > file_names.txt
with
ls | grep $p >> file_names.txt
There might be more efficient solutions if the task runs everyday, but for a one-shot of 300 files your script is good.
Assuming you don't have file names with newlines in them (in which case your original approach would not have a chance of working anyway), try this.
printf '%s\n' * | grep -f names.txt | xargs cp -t foldername
The printf is necessary to work around the various issues with ls; passing the list of all the file names to grep in one go produces a list of all the matches, one per line; and passing that to xargs cp performs the copying. (To move instead of copy, use mv instead of cp, obviously; both support the -t option so as to make it convenient to run them under xargs.) The function of xargs is to convert standard input into arguments to the program you run as the argument to xargs.
I'm trying to automate some processes by using hazel app to move a file to a specific folder, execute a shell script on any csv in that folder, and then move it to another folder. Right now the part I'm working on is the shell script. I have been testing out the cut command in terminal on csvs but I'm not sure if its the same thing as a shell script since it doesnt seem to be working but what I have is:
cut -d',' -f2,12 test.csv > campaigns-12-31-13.csv
It looks for test.csv but I would like it to work with any csv, and it also exports it with the date 12-31-13 but I'm just trying to get it to export with whatever yesterdays date was.
How do I convert this to a shell script that will execute on any csv in the folder and so it adds the date for yesterday at the end of the filename?
You can try the following script:
#! /bin/bash
saveDir="saveCsv"
dd=$(date +%Y-%m-%d -d "yesterday")
for file in *.csv ; do
bname=$(basename "$file" .csv)
saveName="${saveDir}/${bname}-${dd}.csv"
cut -d',' -f2,12 "$file" > "$saveName"
done
I get 5 files everyday (via wget) saved to /tmp to be loaded to hdfs in a bash script.
donaldDuck-2013-07-20.zip
mickeyMouse-2013-07-20.zip
goofyGoof-2013-07-20.zip
plutoStar-2013-07-20.zip
bigBadWolf-2013-07-20.zip
The date part of the filename is dynamic.
How do I then tell hadoop to load each of the 5 files in? I heard something about a loop.
for file in /tmp/*; do
echo "Running ${file##*/} ...."
done
Do I replace the echo line with the "hadoop fs -put..." statement? How will it look like?
You can do something like:
#!/bin/bash
when=$(date "+%Y-%m-%d") #output like 2013-07-23
names=(donaldDuck mickeyMouse goofyGoof plutoStar bigBadWolf)
for file in "${names[#]}"
do
ls -l $file-$when.zip #output like donaldDuck-2013-07-23.zip
done
Explanation
The names are stored in an array $names. Hence, we can loop through it with for file in "${names[#]}". In parallel, we have the date stored in $when, so that the format is matched with $file-$when.zip.
Here is what I would do:
hdfsdir=/path/to/hdfs/output/dir
datethru=`date "+%Y-%m-%d" --date="3 days ago"` # replace by how many days ago you want
for i in `ls /tmp/*-$datethru.zip`; do
hadoop fs -put $i $hdfsdir
done
This will essentially grab all the files in your directory that contain a specific date and end in .zip, and upload each of these files to a specific directory in hdfs.