Using makefile to download a file from AWS to local - bash

I want to set up a target which downloads the latest s3 file containing _id_config within a path. So I know I can get the name of file I am interested in by
FILE=$(shell aws s3 ls s3:blah//xyz/mno/here --recursive | sort | tail -n 2 | awk '{print $4}' | grep id_config)
Now, I want to download the file to local with something like
download_stuff:
aws s3 cp s3://prod_an.live.data/$FILE .
But when I run this, my $FILE has some extra stuff like
aws s3 cp s3://blah/2022-02-17 16:02:21 2098880 blah//xyz/mno/here54fa8c68e41_id_config.json .
Unknown options: 2098880,blah/xyz/mno/here54fa8c68e41_id_config.json,.
Please can someone help me understand why 2098880 and the spaces are there in the output and how to resolve this. Thank you in advance.

Suggesting a trick with ls options -1 and -t to get the latest files in a folder:
FILE=$(shell aws s3 ls -1t s3:blah//xyz/mno/here |head -n 2 | grep id_config)

Related

Can't use gsutil cp with gitlab CI

I'm using gitlab runner on a mac mini server.
While using user named "runner" I manage to use this command:
gsutil ls -l gs://tests/ |grep staging | sort -k 2 | tail -n 3| head -n 2 |awk '{print $3}' | gsutil -m cp -I .
I manage to get the files, but while using the same command in gitlab-ci.yml like this:
stages:
- test
test:
stage: test
when: always
script:
- gsutil ls -l gs://tests/ |grep staging | sort -k 2 | tail -n 3| head -n 2 |awk '{print $3}' | gsutil -m cp -I .
I get the error:
bash: line 141: gsutil: command not found
Also I checked and gitlab runner is using the same user I used.
The gitlab runner is configured with shell executor.
Changing the command to hold the full path of gsutil didn't help either.
I added whoami to the gitlab-ci.yml and got the result of the same user "runner"
I managed to solve this issue by using this solution:
gcloud-command-not-found-while-installing-google-cloud-sdk
I included this 2 line into my gitlab-ci.yml before using the gsutil command.
source '[path-to-my-home]/google-cloud-sdk/path.bash.inc'
source '[path-to-my-home]/google-cloud-sdk/completion.bash.inc'

How can I process every file in my S3 bucket with bash?

I have a bunch of large files (100MB - 1GB) in a Bucket. I would like to "map" all those files using a bash script. I am unable to download all files at once because my computer does not have enough storage.
Does anyone have an idea how I can do this? Anything smarter than the following solution?
for file in $(aws s3 ls s3://my-bucket | rev | cut -d' ' -f1 | rev) ; do
aws s3 cp s3://my-bucket/$file $file;
./script;
aws s3 cp $file s3://my-bucket/$file;
rm $file;
done
Explanation:
$(aws s3 ls s3://my-bucket | rev | cut -d' ' -f1 | rev) gets the name of the files in the bucket over which we are iterating
aws s3 cp s3://my-bucket/$file $file downloads a single file
./script runs my custom script
aws s3 cp $file s3://my-bucket/$file overwrites the old file with the new one
Interesting approach to reverse and reverse again - I would never have thought of that, but now one day might.
Taking the context as bash as per the title. Depending on what your script does, you could use other services like Glue for a "smarter"/faster but more complex solution.
I suppose you don't have any "folders" (prefixes) in your bucket. A more configurable and probably more reliable way to get the list to process could be to use the s3api, e.g.:
objects=$(aws s3api list-objects \
--bucket my-bucket \
--output json \
--query "Contents[].Key")
for file in $(echo $objects | jq -r '.[]'); do
aws s3 cp s3://my-bucket/$file $file;
./script;
aws s3 cp $file s3://my-bucket/$file;
rm $file;
done
This assumes you'd have jq installed.
The above doesn't need handling of "folders". You can also do some nicer things like choose a "subfolder" to process, e.g.:
--prefix path/to/process/ \
or filter on a file extension, e.g.:
--query "Contents[?contains(Key, '.mp4')].Key")
Might be nice to use prefixes to be able to handle restarts of the script or list limits being hit (e.g. default --page-size of 1000 for aws s3 ls).

aws s3 ls - how to recursively list objects with bash script avoid pagination error

I have on premise AWS S3 like storage. I need to list all files on specific bucket. When I am doing it at the top of the bucket I am getting error:
Error during pagination: The same next token was received twice:{'ContinuationToken':"file path"}
It happens when two many objects needs to be listed I think. This is something wrong at storage side but there is no cure for that right now.
I did a workaround for that and run S3 ls in the bash loop while. I manage to prepare a simple loop for different bucket where I have a much fewer number of objects. That loop were operating deep inside where I knew how many dirs I have.
./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/ | tr -s ' '| tr '/' ' ' | awk '{print $2}' | while read line0; do ./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/${line0}/| tr -s ' '| tr '/' ' ' | awk '{print $2}' | while read line1; do ./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/${line0}/${line1}/| tr -s ' '| tr '/' ' ' | awk '{print $2}' |while read line2; do ./aws --profile us-bucket --endpoint-url https://endpoint:18082 --no-verify-ssl s3 ls --recursive us-bucket/dir1/dir2/dir3/dir4/dir5/dir6/${line0}/${line1}/${line2}/;done;done;done > /tmp/us-bucket/us-bucket_dir2_dir3_dir4_dir5_dir6.txt
I would like to write loop which go from the top or root (how you prefer) and list all files (no matter how many dir we have on the path) from the last dir in the path going up to avoid appearing:
Error during pagination: The same next token was received twice:{'ContinuationToken':"file path"}
Any help/clues appreciated. Thanks.
Br,
Jay

Delete files after awk command

I'm setting to do an ls in a bucket.
Make a print in folder name
remove the /, do sorting and remove the last 3.
which will be the most recent, then I'm setting remove the folds except for those 3 recent ones.
for i in $(aws s3 ls s3://portal-storage-site | awk -F '-' '{print $2}' | sed 's/\///g'| sort -n| tail -3| xargs| sed 's/ /|/g');
do aws s3 ls s3://portal-storage-site| grep -Ev "PRE\s.*\-($i)\/" | awk '{print $2}'|xargs echo "aws s3 ls s3://portal-storage-site/"; done
I expect the output is exec
aws s3 ls s3://portal-storage-site/2e5d0599-120/
aws s3 ls s3://portal-storage-site/6f08a223-118/
aws s3 ls s3://portal-storage-site/ba67667e-121/
aws s3 ls s3://portal-storage-site/ba67667e-122/
but the actual is
aws s3 ls s3://portal-storage-site/2e5d0599-119/ 2e5d0599-120/ 6f08a223-118/ ba67667e-121/ ba67667e-122/
Instead of using xargs you can try to compose your second aws ls command in awk and send it to bash:
aws s3 ls s3://portal-storage-site| grep -Ev "PRE\s.*\-($i)\/" | awk '{print "aws s3 ls s3://portal-storage-site/" $2}'| bash

Piping filename for last create file into imagej does not show image

I have a large directory of images and want to access the most recent one via command line. I want to show it using imagej, but the piping of the command does open imagej, but does not open an image:
ls -Art | tail -n 1| imagej
is the command I am using. Am I doing sth wrong? I am on a docker image using xubuntu.
If I only use the ls -Art | tail -n 1 i get the image: 1541917543_right.tiff. Which is shown correctly if I use the imagej command with the filename.
It might be a case of needing to use the --open option:
ls -Art | tail -n 1 | imagej --open
Or perhaps try using xargs:
ls -Art | tail -n 1 | xargs imagej --open
There was also a bug report filed regarding the opening images from cli (legacy version) on github. If the above suggestions don't work maybe post a response over there.
Put the imagej command at the end, you want to pipe the filename into the command

Resources