Choose folder from tree of folders in s3 using Bash script - bash

I have files which existing under s3 path like below:
s3://ttttt/2018/11/01/02 -->(tttt=s3 bucket, 2018=year, 11=moth, 01=day, hour=02)
I have a new files all the time which inserted in s3 as the example below:
s3://tttt/2018/10/01/01/ls.s3.4cede5e7d25c.2018-10-01T01.00.tag_lre.txt.gz
s3://tttt/2018/10/01/01/ls.s3.4cede5e7d25c.2018-10-01T01.00.tag_lre.txt.gz
s3://tttt/2018/10/01/02/ls.s3.4cede5e7d25c.2018-10-01T02.00.tag_lre.txt.gz
I would like to pick up via Bash script:
1. max year
2. max month
3. max day
4. max hour
The script that I built seems like that (but does not work good):
#!/bin/bash
result=`aws s3 ls "s3://tttt/2018/" | awk '{print $2}' |tail -n 1`
result1=`aws s3 ls "s3://tttt/2018/${result:0:2}/" | awk '{print $NF-1}' |tail -n 1`
result2=2018/${result:0:2}/${result1:0:2}/
Any ideas how to write it?

Can did not show the output of aws s3 ls "s3://tttt/2018/"|tail -1.
With
result="s3://tttt/2018/10/01/01/ls.s3.4cede5e7d25c.2018-10-01T01.00.tag_lre.txt.gz"
you can do
IFS=/ read -r s3 empty bucket year month day hour file <<< "${result}"
result:
set | egrep "bucket=|year=|month=|day=|hour="
bucket=tttt
day=01
hour=01
month=10
year=2018

Related

how to calculate percentage difference with cmp command

I am aware of cmp command in linux which is used to do byte-by-byte comparison, could we build upon this to get percentage difference.
Example I have two files a1.jpg and a2.jpg
So when I compare these two files using cmp. Could I get percentage of difference between these two files.
example: a1.jpg -> has 1000 bytes and a2.jpg has 1021 (taking bigger file as reference)
So could get percentage difference between two files i.e No of byter differing/Total bytes in larger
Looking for some shell script snippet. Thanks in advance
You could create a file script with the following content - let us call this file percmp.sh:
#!/bin/sh
DIFF=`cmp -l $1 $2 | wc -l`
SIZE_A=`wc -c $1 | awk '{print $1}'`
SIZE_B=`wc -c $2 | awk '{print $1}'`
if [ $SIZE_A -gt $SIZE_B ]
then
MAX=$SIZE_A
else
MAX=$SIZE_B
fi
echo $DIFF/$MAX*100|bc -l
Be sure that it will be saved with Linux encription.
Then you run it with the two file names as arguments. For example, assuming percmp.sh and the two files are in the same folder you run the command:
sh percmp.sh FILE1.jpg FILE2.jpg
Otherwise you specify the full path of both the script and the files.
The code do exactly what you need, if you need reference:
#!/bin/sh tells how to interpret the file
cmp -l lists all the different bites
wc -l number of rows (in the code: lenght of the list of different bites -> number of different bytes)
wc -c size of a file
awk text parsing (to get ONLY the size of the file)
-gt Greater Than
bc -l performs the inputed division
Hope I helped!

How to list files(with spaces) in s3 bucket using shell script (bash)?

I am listing all the files in s3 bucket and writing it in a text file. For example, my bucket has the following list of files:
text.zip
fixed.zip
hello.zip
good test.zip
I use the following code:
fileList=$(aws s3 ls s3://$inputBucketName/ | awk '{print $4}')
if [ ! -z "$fileList" ]
then
$AWS_CLI s3 ls s3://$inputBucketName/ | awk '{print $1,$2,$4}' > s3op.txt
sort -k1,1 -k2 s3op.txt > s3op_srt.txt
awk '{print $3}' s3op_srt.txt > filesOrder.txt
fi
cat filesOrder.txt;
After this when I iterate the files from the file I created (I will delete the files in S3 at the end of the loop, so the file won't be processed again):
fileName=`head -1 filesOrder.txt`
the files are listed like below:
text.zip
fixed.zip
hello.zip
good
So the problem is that, the list is not able to list the files with spaces correctly.
As the file name is returned as "good" and not as "good test.zip", it is not able to delete the file from S3.
Expected Result is
text.zip
fixed.zip
hello.zip
good test.zip
I used following command to delete files in S3:
aws s3 rm s3://$inputBucketName/$fileName
Put the full file path under double quotes.
For example:
aws s3 rm "s3://test-bucket/good test.zip"
In your case, it would be:
aws s3 rm "s3://$inputBucketName/$fileName"
Here even if the fileName has spaces, it'll be deleted.

Using the first column of a file as input in a script

I am having some problems with using the first column ${1} as input to a script.
Currently the portions of the script looks like this.
#!/bin/bash
INPUT="${1}"
for NAME in `cat ${INPUT}`
do
SIZE="`du -sm /FAServer/na3250-a/homes/${NAME} | sed 's|/FAServer/na3250-a/homes/||'`"
DATESTAMP=`ls -ld /FAServer/na3250-a/homes/${NAME} | awk '{print $6}'`
echo "${SIZE} ${DATESTAMP}"
done
However, I want to modify the INPUT="${1}" to take the first {1} within a specific file. This is so I can run the lines above in another script and use a file that is previously generated as the input. Also to have the output go out to a new file.
So something like:
INPUT="$location/DisabledActiveHome ${1}" ???
Here's my full script below.
#!/bin/bash
# This script will search through Disabled Users OU and compare that list of
# names against the current active Home directories. This is to find out
# how much space those Home directories take up and which need to be removed.
# MUST BE RUN AS SUDO!
# Setting variables for _adm and storage path.
echo "Please provide your _adm account name:"
read _adm
echo "Please state where you want the files to be generated: (absolute path)"
read location
# String of commands to lookup information using ldapsearch
ldapsearch -x -LLL -h "REDACTED" -D $_adm#"REDACTED" -W -b "OU=Accounts,OU=Disabled_Objects,DC="XX",DC="XX",DC="XX"" "cn=*" | grep 'sAMAccountName'| egrep -v '_adm$' | cut -d' ' -f2 > $location/DisabledHome
# Get a list of all the active Home directories
ls /FAServer/na3250-a/homes > $location/ActiveHome
# Compare the Disabled accounts against Active Home directories
grep -o -f $location/DisabledHome $location/ActiveHome > $location/DisabledActiveHome
# Now get the size and datestamp for the disabled folders
INPUT="${1}"
for NAME in `cat ${INPUT}`
do
SIZE="`du -sm /FAServer/na3250-a/homes/${NAME} | sed 's|/FAServer/na3250-a/homes/||'`"
DATESTAMP=`ls -ld /FAServer/na3250-a/homes/${NAME} | awk '{print $6}'`
echo "${SIZE} ${DATESTAMP}"
done
I'm new to all of this so any help is welcome. I will be happy to clarify any and all questions you might have.
EDIT: A little more explanation because I'm terrible at these things.
The lines of code below came from a previous script are a FOR loop:
INPUT="${1}"
for NAME in `cat ${INPUT}`
do
SIZE="`du -sm /FAServer/na3250-a/homes/${NAME} | sed 's|/FAServer/na3250-a/homes/||'`"
DATESTAMP=`ls -ld /FAServer/na3250-a/homes/${NAME} | awk '{print $6}'`
echo "${SIZE} ${DATESTAMP}"
done
It is executed by typing:
./Script ./file
The FILE that is being referenced has one column of user names and no other data:
User1
User2
User3
etc.
The Script would take the file and look at the first users name, which is reference by
INPUT=${1}
then run a DU command on that user and find out what the size of their HOME drive is. That would be reported by the SIZE variable. It will do the same thing with the DATESTAMP in regards to when the HOME drive was created for the user. When it is done doing the tasks for that user, it would move on to the next one in the column until it is done.
So following that logic, I want to automate the entire process. Instead of doing this in two steps, I would like to make this all a one step process.
The first process would be to generate the $location/DisabledActiveHome file, which would have all of the disabled users names. Then to run the last portion to get the Size and creation date of each HOME drive for all the users in the DisabledActiveHome file.
So to do that, I need to modify the
INPUT=${1}
line to reflect the previously generated file.
$location/DisabledActiveHome
I don't understand your question really, but I think you want this. Say your file is called file.txt and looks like this:
1 99
2 98
3 97
4 96
You can get the first column like this:
awk '{print $1}' file.txt
1
2
3
4
If you want to use that in your script, do this
while read NAME; do
echo $NAME
done < <(awk '{print $1}' file.txt)
1
2
3
4
Or you may prefer cut like this:
while read NAME; do
echo $NAME
done < <(cut -d" " -f1 file.txt)
1
2
3
4
Or this may suit even better
while read NAME OtherUnwantedJunk; do
echo $NAME
done < file.txt
1
2
3
4
This last, and probably best, solution above uses IFS, which is bash's Input Field Separator, so if your file looked like this
1:99
2:98
3:97
4:96
you would do this
while IFS=":" read NAME OtherUnwantedJunk; do
echo $NAME
done < file.txt
1
2
3
4
INPUT="$location/DisabledActiveHome" worked like a charm. I was confused about the syntax and the proper usage and output

Bash, issue on for loop

I want to list specified files (files uploaded yesterday) from an amazon S3.
Then I want to loop on this list, and for every single element of the list I want to unzip the file.
My code is:
for file in s3cmd ls s3://my-bucket/`date +%Y%m%d -d "1 day ago"`*
do s3cmd get $file
arrIN=(${file//my-bucket\//})
gunzip ${arrIN[1]}
done
so basicaly arrIN=(${file//my-bucket//}); explodes my string and allow me to retrieve the name of the file I want to unzip.
Thing is, file are downloading but nothing is being unzip, so I tried:
for file in s3cmd ls s3://my-bucket/`date +%Y%m%d -d "1 day ago"`*
do s3cmd get $file
echo test1
done
Files are being downloaded but nothing is being echo. Loop is just working for the first line...
You need to use command substitution to iterate over the result of the desired s3smd ls command.
for file in $(s3cmd ls s3://my-bucket/$(date +%Y%m%d -d "1 day ago")*); do
However, this isn't the preferred way to iterate over the output of a command, since in theory the results could contain whitespace. See Bash FAQ 001 for the proper method.

s3cmd count lines with zcat and grep

I need to count the number of entries in a zipped (.gz) file from a S3 bucket containing certain characters. How could I do it?
Specifically, my S3 bucket is s3://mys3.com/. Under that, there are thousands of buckets like the following:
s3://mys3.com/bucket1/
s3://mys3.com/bucket2/
s3://mys3.com/bucket3/
...
s3://mys3.com/bucket2000/
In each of the bucket, there are about hundreds of zipped(.gz) JSON objects like the following:
s3://mys3.com/bucket1/file1.gz
s3://mys3.com/bucket1/file2.gz
s3://mys3.com/bucket1/file3.gz
...
s3://mys3.com/bucket1/file100.gz
Each of the zipped file contains about 20,000 JSON objects (Each JSON object is a line). In each of the JSON object, there are certain fields containing the word "request". I want to count how many JSON objects are there in bucket1 containing the word "request". I tried this but it did not work:
zcat s3cmd --recursive ls s3://mys3.com/bucket1/ | grep "request" | wc -l
I do not have a lot of shell experiences, so could anyone help me with that? Thanks!
In case anyone is interested:
s3cmd ls --recursive s3://mys3.com/bucket1/ | awk '{print $4}' | grep '.gz' | xargs -I# s3cmd get # - | zgrep 'request' | wc -l

Resources