How to export the output of multiple gcloud queries into adjacent sheets within one CSV file, using Bash? - bash

I have the following 3 gcloud queries:
Query 1 - To enumerate users of a project:
gcloud projects get-iam-policy MyProject --format="csv(bindings.members)" >> output1.csv
Query 2 - To enumerate users of a folder:
gcloud resource-manager folders get-iam-policy MyFolder --format="csv(bindings.members)" >> output2.csv
Query 3 - To enumerate users of the organization:
gcloud organizations get-iam-policy MyOrg --format="csv(bindings.members)" >> output3.csv
My goal is to run all 3 queries together and export the output in multiple adjacent sheets within one CSV file, instead of 3 separate CSV files. Is that possible?
Please advise. Thanks.

It is not possible.
comma-delimited files (CSVs) do not support multiple 'tables' within a single file.
You must create a file per table.

Related

How to list azure Databricks workspaces along with properties like workspaceId?

My objective is to create a csv file that lists all azure databricks workspaces and in particular has the workspace id.
I have been able to retrieve all details as json using the CLI:
az rest -m get --header "Accept=application/json" -u 'https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Databricks/workspaces?api-version=2018-04-01' > workspaces.json
How can I retrieve the same information using azure resource graph?
If you prefer to work with the workspace list api that returns json, here is one approach for post processing the data (in my case I ran this from a jupyter notebook):
import json
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
# json from https://learn.microsoft.com/en-us/rest/api/databricks/workspaces/list-by-subscription?tabs=HTTP&tryIt=true&source=docs#code-try-0
# E.g.
# az rest -m get --header "Accept=application/json" -u 'https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Databricks/workspaces?api-version=2018-04-01' > workspaces.json
pdf = pd.read_json('./workspaces.json')
# flatten the nested json
pdf_flat = pd.json_normalize(json.loads(pdf.to_json(orient="records")))
# drop columns with name '*.type'
pdf_flat.drop(pdf_flat.columns[pdf_flat.columns.str.endswith('.type')], axis=1, inplace=True)
# drop rows without a workspaceId
pdf_flat = pdf_flat[ ~pdf_flat['value.properties.workspaceId'].isna() ]
# drop unwanted columns
pdf_flat.drop(columns=[
'value.properties.parameters.enableFedRampCertification.value',
'value.properties.parameters.enableNoPublicIp.value',
'value.properties.parameters.natGatewayName.value',
'value.properties.parameters.prepareEncryption.value',
'value.properties.parameters.publicIpName.value',
'value.properties.parameters.relayNamespaceName.value',
'value.properties.parameters.requireInfrastructureEncryption.value',
'value.properties.parameters.resourceTags.value.databricks-environment',
'value.properties.parameters.storageAccountName.value',
'value.properties.parameters.storageAccountSkuName.value',
'value.properties.parameters.vnetAddressPrefix.value',
], inplace=True)
pdf_flat
I was able to retrieve the information I needed by:
Searching for databricks resources in the Azure portal:
From there I could click Open Query to use the Azure Resource Graph Explorer and write a query to extract the information I need:
I ended up using the following query:
// Run query to see results.
where type == "microsoft.databricks/workspaces"
| project id,properties.workspaceId,name,tenantId,type,resourceGroup,location,subscriptionId,kind,tags

how can I use different CSVs for my JMeter script on different instance

We have 20 worker on AWS and I want to parameterized CSV file name for each instance Please help
I have divided my CSV in to number of Load generator hosts
$ wc -l "youroriginalcsv.csv" /* this will return number of total rows in csv*/
$ split -l "count of above query"/"number of hosts" "youroriginalcsv.csv" /* this will split CSV with file name as xaa, xab ... */
Transfer each unique CSV to all available hosts
$ scp xaa host1_user#host1_ip:/csvpath/csvfile.csv
$ scp xab host2_user#host2_ip:/csvpath/csvfile.csv
$ scp xaz hostN_user#hostN_ip:/csvpath/csvfile.csv
Now I want to use specific file name for specific host
What do you mean by "specific file name for specific host"? Your CSV files are all named csvfile.csv so it's sufficient to specify /csvpath/csvfile.csv in the CSV Data Set Config and each JMeter slave will pick up its own file containing partial data from the "big" CSV file.
If you want to use different names for CSV files depending on the machine IP address or DNS hostname - go for combination of If Controller with __machineName() or __machineIP() function
Also if you don't want the same data to be re-used by different JMeter slaves you can consider using Redis Data Set Config or HTTP Simple Table Server, this way you won't have to "split" and "copy" CSV files and will be able to centrally manage your test data from a single location

gcloud dns managed-zones list along with record-sets count format

In the output of gcloud dns managed-zones list ,I want to show the name of dnsName, creationTime, name, networkName, visibility and the count of recrod-sets in each hosted-zone.
I used below command to get two output in two commands
#get hosted-zone and other values
gcloud dns managed-zones list --format='table(dnsName, creationTime:sort=1, name, privateVisibilityConfig.networks.networkUrl.basename(), visibility)'
#get record-sets for a hostedzone
gcloud dns record-sets list --zone=$zoneName |awk 'NR>1{print}'|wc -l
I think I can get this in a shell script by getting a list of hosted zone and then printing two output together.
But is there a better way to do in a single gcloud command ?
IIRC (!?), you'll need to issue both gcloud commands as each provides distinct data.
To your point, you should be able to easily combine the combine the commands using a shell script and iterating over each zone from managed-zones list, to issue record-sets list --zone=${i}.
If you'd like help, please include dummy data from the 2 commands and I'll draft something for you.

Merge fastq.gz files with same name in different localizations in Google-Cloud

I would like to merge several fastq.gz files with the same name in different folders in the Google-Cloud. I have a total of 15 patients. Each patient has paired-end data "R1" and "R2". Each R1 and R2 are divided into 4 files. The size of each file is approximately 28 GB.
My goal is to merge the 4 files to obtain the complete fastq.gz R1 and R2 files for each patient.
I have never worked with Google-Cloud before.
Here is how the folders and the files are in the bucket (example with 2 patients):
gs://bucketID
/folder1
/folder001
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder002
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
/folder2
/folder003
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder004
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
/folder3
/folder005
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder006
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
/folder4
/folder007
Patient1_R1.fastq.gz
Patient1_R2.fastq.gz
/folder008
Patient2_R1.fastq.gz
Patient2_R2.fastq.gz
etc.
I want to make a script that targets fastq.gz files with the same name in different folders, then merge them. However, I have no idea how to do this on Google-Cloud.
Here is the same example with colors (I want to concatenate files with the same color):
Example with colors
Here's how I see the bash script:
bucket="bucketID"
dir1=$bucket/"folder1"
dir2=$bucket/"folder2"
dir3=$bucket/"folder3"
dir4=$bucket/"folder4"
destdir=$bucket/"destdir"
participants = (Patient1
Patient2
)
for i in ${participants[*]};
do
zcat dir1/.../$i/_R1.fastq.gz dir2/.../$i/_R1.fastq.gz dir3/.../$i/_R1.fastq.gz dir4/.../$i/_R1.fastq.gz | gzip >$destdir/"merged_"$i/_R1.fastq.gz
zcat dir1/.../$i/_R2.fastq.gz dir2/.../$i/_R2.fastq.gz dir3/.../$i/_R2.fastq.gz dir4/.../$i/_R2.fastq.gz | gzip >$destdir/"merged_"$i/_R2.fastq.gz
done
Should I use "gsutil compose" instead to merge?
At the end, I would like to have only two files R1 and R2 for each patient: merged_patient#_R1.fastq.gz and merged_patient#_R2.fastq.gz.
In the example I gave above, it would give 4 files:
merged_Patient1_R1.fastq.gz
merged_Patient1_R2.fastq.gz
merged_Patient2_R1.fastq.gz
merged_Patient2_R2.fastq.gz
Thank you!
I would recommend you to use the following command in order to concatenate your files:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
You can check the documentation in this link.
I've tried to do a simple bash script by using the "gsutil compose" command with fastq.gz files, and it was working fine for me.
The compose command creates a new object whose content is the concatenation of a given sequence of source objects under the same bucket.
Hope this helps!
Ok I found the solution with gsutil compose :
declare -a participantsArray=("Patient1"
"Patient2"
)
bucket="bucketID"
dir1=$bucket/"folder1"
dir2=$bucket/"folder2"
dir3=$bucket/"folder3"
dir4=$bucket/"folder4"
destdir=$bucket/"destdir"
for i in ${participantsArray[#]};
do
fileR1="${i}_R1.fastq.gz"
fileR2="${i}_R2.fastq.gz"
gsutil compose "${dir1}/*/${fileR1}" "${dir2}/*/${fileR1}" "${dir3}/*/${fileR1}" "${dir4}/*/${fileR1}" "${destdir}/merged_${fileR1}"
gsutil compose "${dir1}/*/${fileR2}" "${dir2}/*/${fileR2}" "${dir3}/*/${fileR2}" "${dir4}/*/${fileR2}" "${destdir}/merged_${fileR2}"
done
As you said the solution was not difficult to find.
Thank you again!

Bash script AWS S3 bucket delete all the files using their names contaning

I'm trying to remove only the files which are ONLY older than 5 days according to the file name containing "DITN1_" and "DITS1_" time using a bash script within the AWS S3 Bucket but the issue is all the files i'm trying to delete looks like as follows:
DITN1_2016.12.01_373,
DITS1_2012.10.10_141,
DITN1_2016.12.01_3732,
DITS1_2012.10.10_1412
if someone can help me out with the code would be nice.
thanks in advance
You can use aws cli command for deleting stuff using the bash script as follows
aws s3 rm s3://mybucket/ --recursive --include "mybucket/DITN1*"
However it does not support timestamp
For details see aws S3 cli
Is it important to use the name of the objects instead of metadata? You could get a list of objects in the bucket using the s3api:
aws s3api list-objects --bucket example --no-paginate # this last option will avoid pagination, don't use it if you have thousands of objects
Adding
--query Contents[]
Will give you back the contents of every object, including a LastModified section, which will tell you when the object was last modified, for example "2016-12-16T13:56:23.000Z".
http://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html
You could change this timestamp to epoch using
date "+%s" -d "put the timestamp here"
And compare it with the current time - 5 days.
OR if you really want to delete objects based on name, you could loop over the keys like this:
for key in $(aws s3api list-objects --bucket example --no-paginate --query Contents[].Key)
And add logic to determine the date. Something like this might work, judging by your examples:
key_without_prefix=${key#*_}
key_without_suffix=${key_without_prefix%_*}
Then you have your date, which you can compare with the current time - 5 days.

Resources