export github commits/names to CSV with bash & jq - bash

For a project I need to extract data from a lot of different blockchain GitHub profiles to a csv.
After browsing through the GitHub API I was able to achieve some of the necessary data being shown as txt/csv files using bash commands and jq.
Now doing all of this manually would probably take 7 days. I have a list of profiles i need to loop through saved as CSV.
The list looks like this --> https://docs.google.com/spreadsheets/d/1lFsewAYI7F8zSw7WPhI9E9WwR8f4G1clw1yjxY3wz_4/edit#gid=0
My approach so far to get all the repo names looks like this:
sample='[{"name":"0chain"},{"name":"0stateapp"},{"name":"0xcert"}]'
the csv belongs in here, I didn't know how to redirect it to that variable yet, but for testing purposes this was enough. If somebody knows how to, feel free to give a hint.
for row in $(echo "${sample}" | jq -r '.[] | #base64'); do
_jq()
{
echo ${row} | base64 --decode | jq -r ${1}
}
for GHUSER in $( echo $(_jq '.name')); do
curl -s https://api.github.com/users/$GHUSER/repos?per_page=100 | jq -r '.[]|.full_name'
done
done
The output looks like this:
0chain/0chain-token
0chain/client-sdk
0chain/docs
0chain/gorocksdb
0chain/hostadmin
0chain/rocksdb
0stateapp/ZSCoin
0xcert/0xcert
0xcert/conventions
0xcert/docs
0xcert/erc721-validator
0xcert/erc721-validator-api
0xcert/erc721-validator-ui
0xcert/erc721-website
0xcert/ethereum
0xcert/ethereum-crowdsale
0xcert/ethereum-dex
0xcert/ethereum-erc20
0xcert/ethereum-erc721
0xcert/ethereum-minter
0xcert/ethereum-utils
0xcert/ethereum-xcert
0xcert/ethereum-xcert-builder
0xcert/ethereum-zxc
0xcert/framework
0xcert/framework-cert-test
0xcert/nonfungiblealliance-www
0xcert/solidity-style-guide
0xcert/techpaper
0xcert/truffle
0xcert/web3.js
What I need to do is use all of the above values and generate a file that contains:
Github Profile (already stored in the attached sheet)
The Date when accessing this information
All the repositories belonging to that profile (code above but
filtered)
Now the Interesting part:
The commit history
number of commit (ID)
number of commit (ID)
Date of commit
Description of commit
person who commited
checks passed
checks failed
Almost the same needs to be done for closed and open pull requests although I think when solving the "problem" above solving the pull requests is the same strategy.
For the commits I'd do something like this:
for commits in $( $repoarray) do curl -i https://api.github.com/repos/$commits/commits | jq -r '.[]|.author.lgoin (and whatever els is needed)' done
basically this chart here needs to be filled
https://docs.google.com/spreadsheets/d/1mFXiohiWNXNP8CVztFA1PFF41jn3J9sRUhYALZShsPY/edit?usp=sharing
what I need help with:
storing my output from the first loop in a an array
loop through that array to get the number of commits
loop through that array to get the data to closed pull requests
loop through that array to get the data to open pull requests
Excuse my "noobish" question.
I'm using bash/jq and the GitHub API for the time.
I'd appreciate any kind of help.

Related

Getting the last 100 commits in repositories of a GitHub user/organisation in Bash?

Context
I wrote the following code to get the last n commits of a repository of a GitHub user/organisation:
# Get commits
commits_json=$(curl -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/$github_username/$github_repo_name/commits?per_page=1&page=1)
echo "commits_json=$commits_json"
echo ""
# Get the first commit.
readarray -t branch_commits_arr < <(echo "$commits_json" | jq ".[].sha")
echo "branch_commits_arr=$branch_commits_arr"
Issue
I noticed I get into the reported rate limits of 60 API calls per hour when I try to do this for all repositories in a GitHub user/organisation.
Attempt I
I tried the more general format to get the commit lists in a single API call:
curl -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/$some_user/commits?per_page=10&page=1
Which returned:
{ "message": "Not Found",
"documentation_url": "https://docs.github.com/rest/reference/repos#get-a-repository"
}
Attempt II
Another approach to get the data without triggering the API rate limit would be to parse the atom format of each repository, however, it seems like this is an undesirable hack/more boilerplate code than needed.
Question
Hence, I was wondering, how can one get a list/json of/containing all/the most recent 100/n commits across all the repositories of a GitHub user/organisation, using the GitHub API in Bash?

Avoid mass e-mail notification in error analysis bash script

I am selecting error log details from a docker container and decide within a shell script, how and when to alert about the issue by discord and/or email.
Because I am receiving the email alerts too often with the same information in the email body, I want to implement the following two adjustments:
Fatal error log selection:
FATS="$(docker logs --since 24h $NODENAME 2>&1 | grep 'FATAL' | grep -v 'INFO')"
Email sent, in case FATS has some content:
swaks --from "$MAILFROM" --to "$MAILTO" --server "$MAILSERVER" --auth LOGIN --auth-user "$MAILUSER" --auth-password "$MAILPASS" --h-Subject "FATAL ERRORS FOUND" --body "$FATS" --silent "1"
How can I send the email only in the case, FATS has another content than the previous run of the script? I have thought about a hash about its content, which is stored and read in a text file. If the hash is the same than the previous script run, the email will be skipped.
Another option could be a local, temporary variable in the global user's bash profile, so that there is no file to be stored on the file system (to avoid read / writes).
How can I do that?
When you are writing a script for your monitoring, add functions for additional functionality, like:
logging all the alerts that have been send
make sure you don't send more than 1 alert each hour
consider sending warnings only during working hours
escalate a message when it fails N times without intermediate success
possible send an alert to different receivers (different email adresses or also to sms or teams)
make an interface for an operator so he can look back when something went wrong the first time.
When you have control which messages you send, it is easy to filter duplicate meassages (after changing --since).
I‘ve chosen the proposal of #ralf-dreager and reduced selection to 1d and 1h. Consequently, I‘ve changed my monitoring script to either go through the results of 1d or just 1h, without the need to select each time again and again. Huge performance improvement and no need to store anything else in a variable or on the file system.
FATS="$(docker logs --since 1h $NODENAME 2>&1 | grep 'FATAL' | grep -v 'INFO')"

Bintray docker repository storage

On Bintray I found out that I have a private docker repository consuming quite a lot of space:
Account usage by repository
I then proceeded to do some house keeping and kept only the last 3 tags of all the images I have. However, that didn't help much. The storage didn't change at all after deleting all these old tags.
I got this API endpoint here: https://bintray.com/docs/api/#_get_package_files to have an estimate on the package files size:
for img in $(cat images) ; do curl -s -XGET -u "user:pass" https://bintray.com/api/v1/packages/my-org/internal-docker/$img/files | python -m json.tool | jq '.[] | .size' | awk '{ sum += $1 } END { print sum }' ; done
Suming all those up gets me 63723101568 bytes, 60GB.
Any idea where the other 310GBs are?
Notice that, even if the 3 tags were completely different from each other, I would get worst case 3x that figure, so 180GB. But the 375GB is still there.
Where are you getting the array 'images'?
You are not curling for the entire list but rather single files.
It's possible your list did not include all the images in this repo.
Check to make sure you have traversed all subfolders for images as well.
After some time, something changed in the backend storage
I asked on their support (you have to click on Feedback when logged in to Bintary) and they're checking if there was any house keeping done or only after I complained to them something was done.
I'll update if I hear more from them.

Filter Column in CSV and get the unique value

I am having three columns in a CSV: Client Name, save Set Name and Status. For some clients, we have two Status as Failed and Success both. So, I want to filter those clients only which have status as only Failed. Clients who are having two entries such as Failed and success also, I want to omit.
When I am using the listed command, it's giving me values whose status was successful also might be later on. I want values which are only Failed. Not successful even once
cat "$pwd"/Daily-Failed.csv|egrep -i 'failed|Interrupted'|awk -F',' '{print $2,$3,$9}'|sort -u > "$pwd"/Final-Failed/Failed.csv
(edit) Or with newlines:
cat "$pwd"/Daily-Failed.csv|
egrep -i 'failed|Interrupted'|
awk -F',' '{print $2,$3,$9}'|
sort -u > "$pwd"/Final-Failed/Failed.csv
enter image description here
Please find the input and desired output. Input Client Name, Save Set, Status
Star,D:/,Failed
Star,C:/,Failed
Moon,C:/,Failed
Galaxy,D:/,Failed
Sun,D:/,Failed
Star,C:/,Success
Sun,D:/,Success
Output "Client Name","Save Set",Status
Galaxy,D:/,Failed
Moon,C:/,Failed
Star,D:/,Failed
I want to filter those clients only which have status as only Failed. Clients who are having two entries such as Failed and success also, I want to omit.
I'm going to assume, looking at your sample input (Which really needs to be text in your question, not an image), that both the Client Name and Save Set columns matter - you have (Star, C:/) with both success and failure rows, and (Star, D:/) with just failure, and the latter shows up in your output, and that's the only way that would make sense given your stated goal. On the other hand you also have two (Sun, D:/) rows, one success, one failure, and that shows up in your output even though it doesn't meet your criteria any way you look at it...
Anyways, this sort of grouping and filtering of tabular data screams database, and I like to script sqlite to make it do all the work in such cases:
#!/bin/sh
filename=Daily-Failed.csv
sqlite3 -batch -csv -header <<EOF
.import '${filename}' tbl
SELECT *
FROM tbl
GROUP BY "Client Name", "Save Set"
HAVING count(*) = 1 AND Status = 'Failed'
EOF
after taking the data in your image and turning it into a CSV file Daily-Failed.csv looking like
Client Name,Save Set,Status
Star,D:/,Failed
Star,C:/,Failed
Moon,C:/,Failed
Galaxy,D:/,Failed
Sun,D:/,Failed
Star,C:/,Success
Sun,D:/,Success
that script will output
"Client Name","Save Set",Status
Galaxy,D:/,Failed
Moon,C:/,Failed
Star,D:/,Failed

How to use WebAPI in bash for sonarqube?

I want to write a shell script to login and get bugs for a project. I want the dashboard values like bugs, Vulnerabilities, code smells and coverage.
The url of dashboard is: http://www.example.com/dashboard?id=example_project_name.
Here is what I tried:
curl GET -u username:password http://www.example.com/api/issues/search?project=example_project_name&types=BUG.
So, this prints all the data. I just need the value show in the below image:
Basically What I want to achieve is that I’m using a Sonarqube plugin in Jenkins, so I use extended email plugin to send email for job execution and in that email I want to give details like number of bugs in the repository after the build.
Is there any other way?
Finally after reading the documentation carefully, I got the values. Here is the script that I created.
#!/bin/bash
vul=$(curl -sX GET -u username:password 'http://www.example.com/api/issues/search?projectKeys=example_project_name&types=VULNERABILITY');
bug=$(curl -sX GET -u username:password 'http://www.example.com/api/issues/search?projectKeys=example_project_name&types=BUG');
no_vul=$(echo $vul | jq -r .total);
no_bug=$(echo $bug | jq -r .total);
echo "Total number of VULNERABILITIES are $no_vul"
echo "Total number of BUGS are $no_bug"
Here is the API documentation URL.

Resources