Bintray docker repository storage - bintray

On Bintray I found out that I have a private docker repository consuming quite a lot of space:
Account usage by repository
I then proceeded to do some house keeping and kept only the last 3 tags of all the images I have. However, that didn't help much. The storage didn't change at all after deleting all these old tags.
I got this API endpoint here: https://bintray.com/docs/api/#_get_package_files to have an estimate on the package files size:
for img in $(cat images) ; do curl -s -XGET -u "user:pass" https://bintray.com/api/v1/packages/my-org/internal-docker/$img/files | python -m json.tool | jq '.[] | .size' | awk '{ sum += $1 } END { print sum }' ; done
Suming all those up gets me 63723101568 bytes, 60GB.
Any idea where the other 310GBs are?
Notice that, even if the 3 tags were completely different from each other, I would get worst case 3x that figure, so 180GB. But the 375GB is still there.

Where are you getting the array 'images'?
You are not curling for the entire list but rather single files.
It's possible your list did not include all the images in this repo.
Check to make sure you have traversed all subfolders for images as well.

After some time, something changed in the backend storage
I asked on their support (you have to click on Feedback when logged in to Bintary) and they're checking if there was any house keeping done or only after I complained to them something was done.
I'll update if I hear more from them.

Related

Disconnecting and reconnecting nvme

Is there capacity within amazon/centos/linux to switch the ordering round of nitro disks?
I have an ami which consistently has devices in the incorrect order, by this I mean nvme1n1 and nvme2n1 should be switched round. If I run nvme id-ctrl -v /dev/nvme1n1 | grep sn I get a different serial number back following a reboot. I know they're "wrong" as the serial numbers are not reflective of their capacity... Hope that makes sense (I appreciate it's a bit confusing). This only ever occurs on servers with two or more disks; upon a reboot the disks are "correct"
My question is, is there a method of forcing the nvme device to disconnect and reconnect (in the hope that the mapping works as expected in the correct order).
Thanks guys
Amazon Linux version 2017.09.01 and later contains scripts and a udev rule that automatically maps NVMe devices to /dev/xvd?. It is very briefly mentioned in the documentation, but there is not much information there.
You can obtain a copy by launching the Amazon Linux AMI, but there are also other places on the web where they have been posted. For example, I found this gist.
Very simple in the end:
echo 1 > /sys/bus/pci/devices/$(readlink -f /sys/class/nvme/nvme1 | awk -F "/" '{print $5}')/remove
echo 1 > /sys/bus/pci/devices/$(readlink -f /sys/class/nvme/nvme2 | awk -F "/" '{print $5}')/remove
echo 1 > /sys/bus/pci/rescan

Merge fastq.gz files with same name in different localizations in Google-Cloud

I would like to merge several fastq.gz files with the same name in different folders in the Google-Cloud. I have a total of 15 patients. Each patient has paired-end data "R1" and "R2". Each R1 and R2 are divided into 4 files. The size of each file is approximately 28 GB.
My goal is to merge the 4 files to obtain the complete fastq.gz R1 and R2 files for each patient.
I have never worked with Google-Cloud before.
Here is how the folders and the files are in the bucket (example with 2 patients):
gs://bucketID
/folder1
/folder001
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder002
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
/folder2
/folder003
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder004
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
/folder3
/folder005
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder006
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
/folder4
/folder007
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder008
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
I want to make a script that targets fastq.gz files with the same name in different folders, then merge them. However, I have no idea how to do this on Google-Cloud.
Here is the same example with colors (I want to concatenate files with the same color):
Example with colors
Here's how I see the bash script:
bucket="bucketID"
dir1=$bucket/"folder1"
dir2=$bucket/"folder2"
dir3=$bucket/"folder3"
dir4=$bucket/"folder4"
destdir=$bucket/"destdir"
participants = (Patient1
Patient2
)
for i in ${participants[*]};
do
zcat dir1/.../$i/_R1.fastq.gz dir2/.../$i/_R1.fastq.gz dir3/.../$i/_R1.fastq.gz dir4/.../$i/_R1.fastq.gz | gzip >$destdir/"merged_"$i/_R1.fastq.gz
zcat dir1/.../$i/_R2.fastq.gz dir2/.../$i/_R2.fastq.gz dir3/.../$i/_R2.fastq.gz dir4/.../$i/_R2.fastq.gz | gzip >$destdir/"merged_"$i/_R2.fastq.gz
done
Should I use "gsutil compose" instead to merge?
At the end, I would like to have only two files R1 and R2 for each patient: merged_patient#_R1.fastq.gz and merged_patient#_R2.fastq.gz.
In the example I gave above, it would give 4 files:
merged_Patient1_R1.fastq.gz
merged_Patient1_R2.fastq.gz
merged_Patient2_R1.fastq.gz
merged_Patient2_R2.fastq.gz
Thank you!
I would recommend you to use the following command in order to concatenate your files:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
You can check the documentation in this link.
I've tried to do a simple bash script by using the "gsutil compose" command with fastq.gz files, and it was working fine for me.
The compose command creates a new object whose content is the concatenation of a given sequence of source objects under the same bucket.
Hope this helps!
Ok I found the solution with gsutil compose :
declare -a participantsArray=("Patient1"
"Patient2"
)
bucket="bucketID"
dir1=$bucket/"folder1"
dir2=$bucket/"folder2"
dir3=$bucket/"folder3"
dir4=$bucket/"folder4"
destdir=$bucket/"destdir"
for i in ${participantsArray[#]};
do
fileR1="${i}_R1.fastq.gz"
fileR2="${i}_R2.fastq.gz"
gsutil compose "${dir1}/*/${fileR1}" "${dir2}/*/${fileR1}" "${dir3}/*/${fileR1}" "${dir4}/*/${fileR1}" "${destdir}/merged_${fileR1}"
gsutil compose "${dir1}/*/${fileR2}" "${dir2}/*/${fileR2}" "${dir3}/*/${fileR2}" "${dir4}/*/${fileR2}" "${destdir}/merged_${fileR2}"
done
As you said the solution was not difficult to find.
Thank you again!

export github commits/names to CSV with bash & jq

For a project I need to extract data from a lot of different blockchain GitHub profiles to a csv.
After browsing through the GitHub API I was able to achieve some of the necessary data being shown as txt/csv files using bash commands and jq.
Now doing all of this manually would probably take 7 days. I have a list of profiles i need to loop through saved as CSV.
The list looks like this --> https://docs.google.com/spreadsheets/d/1lFsewAYI7F8zSw7WPhI9E9WwR8f4G1clw1yjxY3wz_4/edit#gid=0
My approach so far to get all the repo names looks like this:
sample='[{"name":"0chain"},{"name":"0stateapp"},{"name":"0xcert"}]'
the csv belongs in here, I didn't know how to redirect it to that variable yet, but for testing purposes this was enough. If somebody knows how to, feel free to give a hint.
for row in $(echo "${sample}" | jq -r '.[] | #base64'); do
_jq()
{
echo ${row} | base64 --decode | jq -r ${1}
}
for GHUSER in $( echo $(_jq '.name')); do
curl -s https://api.github.com/users/$GHUSER/repos?per_page=100 | jq -r '.[]|.full_name'
done
done
The output looks like this:
0chain/0chain-token
0chain/client-sdk
0chain/docs
0chain/gorocksdb
0chain/hostadmin
0chain/rocksdb
0stateapp/ZSCoin
0xcert/0xcert
0xcert/conventions
0xcert/docs
0xcert/erc721-validator
0xcert/erc721-validator-api
0xcert/erc721-validator-ui
0xcert/erc721-website
0xcert/ethereum
0xcert/ethereum-crowdsale
0xcert/ethereum-dex
0xcert/ethereum-erc20
0xcert/ethereum-erc721
0xcert/ethereum-minter
0xcert/ethereum-utils
0xcert/ethereum-xcert
0xcert/ethereum-xcert-builder
0xcert/ethereum-zxc
0xcert/framework
0xcert/framework-cert-test
0xcert/nonfungiblealliance-www
0xcert/solidity-style-guide
0xcert/techpaper
0xcert/truffle
0xcert/web3.js
What I need to do is use all of the above values and generate a file that contains:
Github Profile (already stored in the attached sheet)
The Date when accessing this information
All the repositories belonging to that profile (code above but
filtered)
Now the Interesting part:
The commit history
number of commit (ID)
number of commit (ID)
Date of commit
Description of commit
person who commited
checks passed
checks failed
Almost the same needs to be done for closed and open pull requests although I think when solving the "problem" above solving the pull requests is the same strategy.
For the commits I'd do something like this:
for commits in $( $repoarray) do curl -i https://api.github.com/repos/$commits/commits | jq -r '.[]|.author.lgoin (and whatever els is needed)' done
basically this chart here needs to be filled
https://docs.google.com/spreadsheets/d/1mFXiohiWNXNP8CVztFA1PFF41jn3J9sRUhYALZShsPY/edit?usp=sharing
what I need help with:
storing my output from the first loop in a an array
loop through that array to get the number of commits
loop through that array to get the data to closed pull requests
loop through that array to get the data to open pull requests
Excuse my "noobish" question.
I'm using bash/jq and the GitHub API for the time.
I'd appreciate any kind of help.

How to list the published container images in the Google Container Registry in a CLI in image size order

Using a CLI, I want to list the images in each repository in a Google Container Registry project but with the following conditions:
Lists the images with the latest tag only
Lists the human-readable size of the images
Lists the name of the images
The closest I've managed to get us through gsutil:
gsutil du -h gs://eu.artifacts.my-registry.appspot.com/containers/images
Resulting in:
33.77 MiB gs://eu.artifacts.my-registry.appspot.com/containers/images/sha256:03c1a2387ef6cb30a7428a46821f946d6a2c591a26cb2066891c55b2b6846ae2
1.27 MiB gs://eu.artifacts.my-registry.appspot.com/containers/images/sha256:03c1e7db6bf0140bd5fa34236a35453cb73cef01f6d89b98bc5995ae8ea07aaf
1.32 KiB gs://eu.artifacts.my-registry.appspot.com/containers/images/sha256:03c3c97495d60c68d37d04a7e6c9b3a48bb159ce5dde13d0d81b4e75e2a3f1d4
81.92 KiB gs://eu.artifacts.my-registry.appspot.com/containers/images/sha256:03c5483cb8ac9c9ae498507e15d68d909a11859a8e5238556b7188e0af4d9264
457.43 KiB gs://eu.artifacts.my-registry.appspot.com/containers/images/sha256:03c7f98faa1cfc05264e743e23ca2e118d24c57bfd67d5cb2e2c7a57e8124b6c
7.88 KiB gs://eu.artifacts.my-registry.appspot.com/containers/images/sha256:03c83b13d044844cd3f6b278382e408541f22029acaf55d9e7e5689b8d51eeea
But obviously this does not meet most of my criteria.
The information is available through the GUI like so on a per image basis:
Any ideas?
I'm open to gsutil, gcloud, docker, anything really which can be installed on a docker container.
You can use the Google Cloud UI to accomplish this. There's a column selector right next to the filter bar and it has an option for the image size.
Once the column is displayed, you'll be able to order by size.
Its seems you have only one outstanding issue with listing container images size after reading your comment at Jason's answer. So it is not possible to retrieve with gcloud command directly. Here are two work around I tested:
You can use gcloud container images describe command to see the size of the images. Make sure you use "--log-http" flag with it. Command should be like this:
$ gcloud container images describe gcr.io/myproject/myimage:tag --log-http
Another way to get the size of the image is using gsutil stat command.
So here's what I did:
a. Upon running below command, I listed all my images from the GCS bucket and saved it to a file called images.txt
$ gsutil ls "BUCKET URL" > images.txt
b. I ran gcloud stat command like below to read image names from the images.txt file and return size of the images chronologically.
$ for x in $(cat images.txt); do `gsutil stat $x | grep Content-Length | awk '{print $2}'`; done
You can customize this little script according to your need.
I understand these are not efficient workaround but thats all seems to be an option now. However, GCR just implements the docker container API, so may be you can read this document to see if you can find/do something of your own.
Hi here just to share a rudimental script which takes the first tag and get the size of the whole layers and write it on a report, it takes ages on 3TB repo but at least i know which repo is big.
echo "REPO,SIZE" > repository-size-report.csv
for REPO in $(gcloud container images list --repository eu.gcr.io/comerge-comerge01-171833 --format="table[no-heading](NAME)") ; do
for TAGS in $(gcloud container images list-tags $REPO --format="table[no-heading](TAGS)"); do
TAG=$(echo $TAGS | cut -d, -f1)
SUM=0
for SIZE in $(gcloud container images describe $REPO:$TAG --log-http 2>&1 | grep size | grep -o '[0-9][0-9]*') ; do
SUM=$((SUM + SIZE))
done
HSUM=$(echo $SUM | numfmt --to iec --format "%8f")
echo "$REPO:$TAG,$HSUM"
echo "$REPO:$TAG,$HSUM" >> repository-size-report.csv
done
done
You can use the command gcloud container images list command to accomplish this task; however, you will need to set the appropriate flags to fulfill your use case. You can read more about the command and the flag options here.

Getting log entry "disk online" from system log

When a disk inserted to my cluster, i wanna know that.
So i need to listen /var/adm/messages and when i catch !NEW! "online" line i must write it to a different log file.
When disk goes online I get this kind of log entries:
Dec 8 10:10:46 SMNODE01 genunix: [ID 408114 kern.info] /scsi_vhci/disk#g5000c50095f92a8f (sd69) online
Tail works without -F option. But i need -F option :/
tail messages | grep 408114 | grep '/scsi_vhci/disk#'| egrep -wi --color 'online'
I have 3 uniform words for grep.
1- The id "408114" is unique for online status.
2- /scsi_vhci/disk#
3- online
P.S: Sorry for my english :)
For grep AND use .*:
$ grep 408114.*/scsi_vhci/disk#.*online test
Dec 8 10:10:46 SMNODE01 genunix: [ID 408114 kern.info] /scsi_vhci/disk#g5000c50095f92a8f (sd69) online
Next time don't edit the question completely but ask another question.

Resources