I am looking for a command line tool to make queries to Amazon Athena.
It works with JDBC, using the driver com.amazonaws.athena.jdbc.AthenaDriver, but I haven't found any command line tool that works with it.
Expanding on previous answer from #MasonWinsauer. Requires bash and jq.
#!/bin/bash
# Athena queries are fundamentally Asynchronous. So we have to :
# 1) Make the query, and tell Athena where in s3 to put the results (tell it the same place as the UI uses).
# 2) Wait for the query to finish
# 3) Pull down the results and un-wacky-Jsonify them.
# run the query, use jq to capture the QueryExecutionId, and then capture that into bash variable
queryExecutionId=$(
aws athena start-query-execution \
--query-string "SELECT Count(*) AS numBooks FROM books" \
--query-execution-context "Database=demo_books" \
--result-configuration "OutputLocation"="s3://whatever_is_in_the_athena_UI_settings" \
--region us-east-1 | jq -r ".QueryExecutionId"
)
echo "queryExecutionId was ${queryExecutionId}"
# Wait for the query to finish running.
# This will wait for up to 60 seconds (30 * 2)
for i in $(seq 1 30); do
queryState=$(
aws athena get-query-execution --query-execution-id "${queryExecutionId}" --region us-east-1 | jq -r ".QueryExecution.Status.State"
);
if [[ "${queryState}" == "SUCCEEDED" ]]; then
break;
fi;
echo " Awaiting queryExecutionId ${queryExecutionId} - state was ${queryState}"
if [[ "${queryState}" == "FAILED" ]]; then
# exit with "bad" error code
exit 1;
fi;
sleep 2
done
# Get the results.
aws athena get-query-results \
--query-execution-id "${queryExecutionId}" \
--region us-east-1 > numberOfBooks_wacky.json
# Todo un-wacky the json with jq or something
# cat numberOfBooks_wacky.json | jq -r ".ResultSet.Rows[] | .Data[0].VarCharValue"
As of version 1.11.89, the AWS command line tool supports Amazon Athena operations.
First, you will need to attach the AmazonAthenaFullAccess policy to the IAM role of the calling user.
Then, to get started querying, you will use the start-query-execution command as follows:
aws athena start-query-execution
--query-string "SELECT * FROM MyDb.MyTable"
--result-configuration "OutputLocation"="s3://MyBucket/logs" [Optional: EncryptionConfiguration]
--region <region>
This will return a JSON object of the QueryExecutionId, which can be used to retrieve the query results using the following command:
aws athena get-query-results
--query-execution-id <id>
--region <region>
Which also returns a JSON object of the results and metadata.
More information can be found in the official AWS Documentation.
Hope this helps!
You can try AthenaCLI, which is a command line client for Athena service that can do auto-completion and syntax highlighting, and is a proud member of the dbcli community
https://github.com/dbcli/athenacli
athena-cli should be a good start.
Related
sorry i am still new to bash scripting. I have around 10000 EC2 instance, i have created this bash script to change my EC2 instance type, all instance name and type are stored in a file. the code is working but it is taking so long to run through instance by instance.
does any have knows if i can run AWS Cli command on all EC2 instance in one go ? Thanks :)
#!/bin/bash
my_file='test.txt'
declare -a instanceID
declare -a fmo #Future Instance Size
while IFS=, read -r COL1 COL2; do
instanceID+=("$COL1")
fmo+=("$COL2")
done <"$my_file"
len=${#instanceID[#]}
for (( i=0; i < $len; i++)); do
vm_instance_id="${instanceID[$i]}"
vm_type="${fmo[$i]}"
echo Stoping $vm_instance_id
aws ec2 stop-instances --instance-ids $vm_instance_id
echo " Waiting for $vm_instance_id state to be STOP "
aws ec2 wait instance-stopped --instance-ids $vm_instance_id
echo Resizing $vm_instance_id to $vm_type
aws ec2 modify-instance-attribute --instance-id $vm_instance_id --instance-type $vm_type
echo Starting $vm_instance_id
aws ec2 start-instances --instance-ids $vm_instance_id
done
Refactor your code to a function that is passed a line from the file.
work() {
IFS=, read -r instanceID fmo <<<"$1"
stuff "$instanceID" "$fmo"
}
Run GNU xargs or GNU parallel for each line of file that calls the exported function. Use -P option run the function in paralell, see documentation.
export -f work
xargs -P0 -t bash -c 'work "$#"' -- <"$my_file"
As #KamilCuk pointed here, you can easily make this run in parallel. However, If you run this script in parallel, you might end up getting throttled by EC2, so make sure you include some backoff + retry logic / respect the limits specified here https://docs.aws.amazon.com/AWSEC2/latest/APIReference/throttling.html
The below script is intended to get the content of each entry in the S3 logging bucket and save it to a file
#!/bin/bash
#
# Get the content of each entry in the S3 logging bucket and save it to a file
#
LOGGING_BUCKET=dtgd-hd00
aws s3api list-objects-v2 --bucket "$LOGGING_BUCKET" | jq '.Contents' >> entries.json &&
keys=$(jq '.[].Key' entries.json )
for key in $keys;do
echo $key
aws s3api get-object --bucket "$LOGGING_BUCKET" --key "$key" ouput_file_"$key"
done
Once executed I got:
An error occurred (NoSuchKey) when calling the GetObject operation:
The specified key does not exist.
"dtgd-hd00/logs2021-08-10-05-43-18-01393D975686FA45"
However, if I do it from the CLI:
aws s3api get-object --bucket dtgd-hd00 \
--key "dtgd-hd00/logs2021-08-10-05-43-18-01393D975686FA45" \
output_file_"$key"
It works perfectly, getting the content and saving it to an output file as requested.
What could be wrong ??
The variable $key will be a quoted string, so you're basically double quoting the string, and S3 is failing to find "key_name" with the quotes. You could remove the quotes before passing them along:
for key in $keys;do
key="${key%\"}"
key="${key#\"}"
aws s3api get-object --bucket "$LOGGING_BUCKET" --key "$key" ouput_file_"$key"
done
Of course, it would be much more performant to use aws s3 sync and avoid this issue altogether.
I'm parsing AWS policy documents and I'm trying to send the errors in this command to /dev/null so that the user doesn't see them.
This is my code:
readarray -t aws_policy_effects < \
<( if aws iam get-policy-version --policy-arn "$aws_policy_arn" \
--version-id "$aws_policy_version_id" --profile="$aws_key" 2> /dev/null | \
jq -r '.PolicyVersion.Document.Statement[].Effect'
then
true
else
aws iam get-policy-version --policy-arn "$aws_policy_arn" \
--version-id "$aws_policy_version_id" --profile="$aws_key" | \
jq -r '.PolicyVersion.Document.Statement.Effect'
fi)
I'm using an 'if' statement so that the right jq query gets used based on the aws policy that we're reading.
I will get this error:
jq: error (at <stdin>:22): Cannot index string with string "Effect"
Because this command always run as the first condition of the if (if there's no list the statement of the AWS policy document):
++ jq -r '.PolicyVersion.Document.Statement[].Effect'
++ aws iam get-policy-version --policy-arn arn:aws:iam::123456789101:policy/IP_RESTRICTION --version-id v11 --profile=company-lab
Why isn't the error being buried by sending it to /dev/null? How can I get the error to not print out to the screen using this if statement?
I'm using an 'if' statement so that the right jq query gets used based on the aws policy that we're reading.
But jq is telling you that your jq query is not always valid. So one option would be to make your jq query more robust, e.g. by using postfix ? (e.g. .Effect?), or testing the type of the input, etc.
I didn't redirect jq's standard error, only the standard error of aws. Putting the 2> /dev/null to the end of the jq command works.
we have to check for the status of instance and iam trying to capture if any error to logfile. logfile has instance inforamtion but the error is not being writen to logfile below is code let me know what needs to be corrected
function wait-for-status {
instance=$1
target_status=$2
status=unknown
while [[ "$status" != "$target_status" ]]; do
status=`aws rds describe-db-instances \
--db-instance-identifier $instance | head -n 1 \
| awk -F \ '{print $10}'` >> ${log_file} 2>&1
sleep 30
echo $status >> ${log_file}
done
Rather than using all that head/awk stuff, if you want a value out of the CLI, you should use the --query parameter. For example:
aws rds describe-db-instances --db-instance-identifier xxx --query 'DBInstances[*].DBInstanceStatus'
See: Controlling Command Output from the AWS Command Line Interface - AWS Command Line Interface
Also, if your goal is to wait until an Amazon RDS instance is available, then you should use db-instance-available — AWS CLI Command Reference:
aws rds wait db-instance-available --db-instance-identifier xxx
Our application requires the use of HAProxy to load balance and route traffic (one per AZ), ALBs and ELBs are not configurable enough for our purposes. When deploying new code via AWS CodeDeploy, we would like the instances being patched to be placed into Maintenance Mode (removed from load balancing, connections drained). We have modified the default CodeDeploy lifecycle bash scripts to remove the instances from their respective HAProxy instances by sending an SSM Run Command to HAProxy from the instances in question. Currently this modification doesn't work, and the reason for failure is unknown. The script works when executed manually step by step (at least to the current point of failure). The part that fails is either the test that returns "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration.", or the setting of $HAPROXY_ID which aforementioned test depends on. The script runs just fine up until that point, but at that point, exits because it can't find the HAProxy instance ID.
I have checked IAM role permissions/credentials, environment variables, and file permissions which all appear to be correct. Normally I would place more logging into the script to debug, but deployments are too few and far between for us to make that practical.
My question: Is there a better way to do this? I can only guess we're not the only ones to use HAProxy with CodeDeploy, and there has to be a reliable method of doing this. Below is the current code being used that is not working.
#!/bin/bash
#
# Copyright 2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
. $(dirname $0)/common_functions.sh
if [[ "$DEPLOYMENT_GROUP_NAME" != "redacted" ]]; then
msg "ELB Deregistration doesn't need to happen when not on redacted."
exit
fi
msg "Running AWS CLI with region: $(get_instance_region)"
# get this instance's ID
INSTANCE_ID=$(get_instance_id)
if [ $? != 0 -o -z "$INSTANCE_ID" ]; then
error_exit "Unable to get this instance's ID; cannot continue."
fi
# Get current time
msg "Started $(basename $0) at $(/bin/date "+%F %T")"
start_sec=$(/bin/date +%s.%N)
msg "Checking if instance $INSTANCE_ID is part of an AutoScaling group"
asg=$(autoscaling_group_name $INSTANCE_ID)
if [ $? == 0 -a -n "${asg}" ]; then
msg "Found AutoScaling group for instance $INSTANCE_ID: ${asg}"
msg "Checking that installed CLI version is at least at version required for AutoScaling Standby"
check_cli_version
if [ $? != 0 ]; then
error_exit "CLI must be at least version ${MIN_CLI_X}.${MIN_CLI_Y}.${MIN_CLI_Z} to work with AutoScaling Standby"
fi
msg "Attempting to put instance into Standby"
autoscaling_enter_standby $INSTANCE_ID "${asg}"
if [ $? != 0 ]; then
error_exit "Failed to move instance into standby"
else
msg "Instance is in standby"
fi
fi
msg "Instance is not part of an ASG, continuing..."
## Get the instanceID of the HAProxy instance in this AZ and ENVIRONMENT - Will there ever be more than one???
HAPROXY_ID=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \
grep INSTANCES | \
awk '{print $8}' )
HAPROXY_IP=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \
grep INSTANCES | \
awk '{print $13}' )
if test -z "$HAPROXY_ID"; then
msg "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration."
exit
fi
## Put the current instance into MAINT mode with the HAProxy instance via SSM
msg "Deregistering $INSTANCE_ID from HAProxy $HAPROXY_ID"
DEREGCMD="{\"commands\":[\"haproxyctl disable server bk_app_servers/$INSTANCEID\"],\"executionTimeout\":[\"3600\"]}"
/usr/local/bin/aws ssm send-command \
--document-name "AWS-RunShellScript" \
--instance-ids "$HAPROXY_ID" \
--parameters "$DEREGCMD" \
--timeout-seconds 600 \
--output-s3-bucket-name "redacted" \
--output-s3-key-prefix "haproxy-codedeploy/deregister" \
--region us-east-1
if [ $? != 0 ]; then
error_exit "Failed to send SSM command to deregister instance $INSTANCE_ID from HAProxy $HAPROXY_ID"
fi
## Wait for all connections to drain from instance
SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
DRAIN_TIME=60
msg "Initial session count: $SESS_COUNT"
while [[ "$SESS_COUNT" -gt 0 ]]; do
if [[ "$COUNTER" -gt "$DRAIN_TIME" ]]; then
msg "Instance failed to drain all connections within $DRAIN_TIME seconds. Continuing to deploy anyway."
break
fi
msg $SESS_COUNT
sleep 1
COUNTER=$(($COUNTER + 1))
SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
done
msg "Finished $(basename $0) at $(/bin/date "+%F %T")"
end_sec=$(/bin/date +%s.%N)
elapsed_seconds=$(echo "$end_sec - $start_sec" | /usr/bin/bc)
msg "Elapsed time: $elapsed_seconds"
At the moment the only option for you is to add more logging and issue a deployment to test out this script and then look at your deployment logs. It sounds like you don't know why it's failing and only the logs can tell you that.
Try adding logging and seeing what happens. We should be just executing your script as is so it shouldn't act any differently but it's hard to tell without seeing the logs.
Good luck,
-Asaf