Airflow parameter passing - bash

I have a simple job that I'd like to move under an Airflow process, if possible. As it stands, I have a string of bash scripts that access a server and download the latest version of a file and then perform a variety of downstream manipulations to that file.
exec ./somescript.sh somefileurl
What I'd like to know is: how can I pass in the URL to this file every time I need to run this process?
It seems that if I try to run the bash script as a bash command like so:
download = BashOperator(
task_id='download_release',
bash_command='somescript.sh',
# params={'URL': 'somefileurl'},
dag=dag)
I have no way of passing in the one parameter that the bash script requires. Otherwise, if I try to send the bash script in as a bash command like so:
download = BashOperator(
task_id='download_release',
bash_command='./somescript.sh {{ URL }}',
params={'URL': 'somefileurl'},
dag=dag)
I receive an execution error as the program tries to execute the script in the context of a temporary directory. This breaks the script as it requires access to some credentials files that sit in the same directory and I'd like to keep the relative file locations intact...
Thoughts?
Update: What worked for me
download = BashOperator(
task_id='download_release',
bash_command='cd {{ params.dir }} && ./somescript.sh {{ params.url }}',
params={'url': 'somefileurl',
'dir': 'somedir'},
dag=dag)
I did not implement any parameter passing yet, though.

Here is an example of passing a parameter to your BashOperator:
templated_command = """
cd /working_directory
somescript.sh {{ dag_run.conf['URL'] }}
"""
download = BashOperator(
task_id='download_release',
bash_command=templated_command,
dag=dag)
For a discussion about this see passing parameters to externally trigged dag. Airflow has two example DAG's that demonstrate this: example_trigger_controller_dag and example_trigger_target_dag. Also, see the Airflow api reference on macros.

Related

Bash - Gitlab CI not converting variable to a string

I am using GitLab to deploy a project and have some environmental variables setup in the GitLab console which I use in my GitLab deployment script below:
- export S3_BUCKET="$(eval \$S3_BUCKET_${CI_COMMIT_REF_NAME^^})"
- aws s3 rm s3://$S3_BUCKET --recursive
My environmental variables are declared like so:
Key: s3_bucket_development
Value: https://dev.my-bucket.com
Key: s3_bucket_production
Value: https://prod.my-bucket.com
The plan is that it grabs the bucket URL from the environmental variables depending on which branch is trying to deploy (CI_COMMIT_REF_NAME).
The problem is that the S3_BUCKET variable does not seem to get set properly and I get the following error:
> export S3_BUCKET=$(eval \$S3_BUCKET_${CI_COMMIT_REF_NAME^^})
> /scripts-30283952-2040310190/step_script: line 150: https://dev.my-bucket.com: No such file or directory
It looks like it picks up the environmental variable value fine but does not set it properly - any ideas why?
It seems like you are trying to get the value of the variables S3_BUCKET_DEVELOPMENT and S3_BUCKET_PRODUCTION based on the value of CI_COMMIT_REF_NAME, you can do this by using parameter indirection:
$ a=b
$ b=c
$echo "${!a}" # c
and in your case, you would need a temporary variable as well, something like this might work:
- s3_bucket_variable=S3_BUCKET_${CI_COMMIT_REF_NAME^^}
- s3_bucket=${!s3_bucket_variable}
- aws s3 rm "s3://$s3_bucket" --recursive
You are basically telling bash to execute command, named https://dev.my-bucket.com, which obviously doesn't exist.
Since you want to assign output of command when using VAR=$(command) you should probably use echo
export S3_BUCKET=$(eval echo \$S3_BUCKET_${CI_COMMIT_REF_NAME^^})
Simple test:
VAR=HELL; OUTPUT="$(eval echo "\$S${VAR^^}")"; echo $OUTPUT
/bin/bash
It dynamically creates SHELL variable, and then successfully prints it

Taking output of terraform to bash script as input variable

I am writing a script that takes care of running a terraform file and create infra. I have requirement where I need to take output from the terraform into the same script to create schema for DB. I need to take Endpoint, username, Password and DB name and take it as an input into the script to login to the db and create schema. I need to take the output from aws_db_instance from terraform which is already created and push that as an input into the bash script.
Any help would be really appreciated as to how can we achieve this. thanks in advance. Below is the schema code that I would be using in script and would need those inputs from terraform.
RDS_MYSQL_USER="Username";
RDS_MYSQL_PASS="password";
RDS_MYSQL_BASE="DB-Name";
mysql -h $RDS_MYSQL_ENDPOINT -P $PORT -u $RDS_MYSQL_USER -p $RDS_MYSQL_PASS -D $RDS_MYSQL_BASE -e 'quit';```
The usual way to export particular values from a Terraform configuration is to declare Output Values.
In your case it seems like you want to export several of the result attributes from aws_db_instance, which you could do with declarations like the following in your root module:
output "mysql_host" {
value = aws_db_instance.example.address
}
output "mysql_port" {
value = aws_db_instance.example.port
}
output "mysql_username" {
value = aws_db_instance.example.username
}
output "mysql_password" {
value = aws_db_instance.example.password
sensitive = true
}
output "mysql_database_name" {
value = aws_db_instance.example.name
}
After you run terraform apply you should see Terraform report the final values for each of these, with the password hidden behind (sensitive value) because I declared it with sensitive = true.
Once that's worked, you can use the terraform output command with its -raw option to retrieve these values in a way that's more convenient to use in a shell script. For example, if you are using a Bash-like shell:
MYSQL_HOST="$(terraform output -raw mysql_host)"
MYSQL_PORT="$(terraform output -raw mysql_port)"
MYSQL_USERNAME="$(terraform output -raw mysql_username)"
MYSQL_PASSWORD="$(terraform output -raw mysql_password)"
MYSQL_DB_NAME="$(terraform output -raw mysql_database_name)"
Each run of terraform output will need to retrieve the latest state snapshot from your configured backend, so running it five times might be slow if your chosen backend has a long round-trip time. You could potentially optimize this by installing separate software like jq to parse the terraform output -json result to retrieve all of the values from a single command. There are some further examples in terraform output: Use in Automation.

Calling an Ansible script inside of a ansible script and/or within a Loop module

Context:
I am trying to deploy an IDAM solution. In order to do that specific things need to be installed in a certain order.
How I am currently doing it:
I have a variables file that looks like vars/main.yml:
EPM:
list_of_packages:
- package1.msi
- package2.msi
- package3.msi
- package4.msi
Now this method 1 works if I do:
win_package:
path: C:\path\{{ item }}
arguments: /qn /norestart
loop: "{{ EPM['list_of_packages'] }}
The problem is Package1 doesn't properly install. In order to get Package1 installed I needed to create a separate Ansible script that executes a bunch of .SQL scripts. This script, for the sake of this example is called: epm_sql_scripts.yml
This script loops through a bunch of .SQL scripts with the loop module and psiexec module.
It looks similar to method 1 script.
Is there a way I can call the epm_sql_scripts.yml from the method 1 script and/or from the vars/main.yml file?
I understand where you were heading but ansible wont allow you to do that. You cannot nest playbooks like your example.
Other important point is that, if you choose to use import_tasks the playbook imported must contain only tasks.

Bamboo: Access script variable in subsequent maven task

I have seen many posts where people asking to access Bamboo variables in script but this is not about that.
I am defining a variable in Shell Script task, as below, and then I would like to access that variable in the subsequent maven task.
#!/bin/sh
currentBuildNumber=${bamboo.buildNumber}
toSubtract=1
newVersion=$(( currentBuildNumber - toSubtract ))
echo "Value of newVersion: ${newVersion}"
This one goes perfectly fine. However I have a subsequent maven 3 task where I try to access this variable by typing ${newVersion} I get below error
error 07-Jun-2019 14:12:20 Exception in thread "main" java.lang.StackOverflowError
simple 07-Jun-2019 14:12:21 Failing task since return code of [mvn --batch-mode -Djava.io.tmpdir=/tmp versions:set -DnewVersion=1.0.${newVersion}] was 1 while expected 0
Basically, I would like to automate the version number of the built jar files just by using ${bamboo.buildNumber} and subtracting some number so that I won't have to enter the new version number every time I run a build.
Appreciate your help... thanks,
EDIT: I posted the same question on Atlassian forum too... I will update this post when I get an answer there... https://community.atlassian.com/t5/Bamboo-questions/Bamboo-Access-script-variable-in-subsequent-maven-task/qaq-p/1104334
Generally, the best solution I have found is to output the result to a file and use the Inject Variables task to read the variable into the build.
For example, in some builds I need a SUFFIX variable, so in a bash script I end up doing
SUFFIX=suffix=-beta-my-feature
echo $SUFFIX >> .suffix.cfg
Then I can use the Inject Variables Task to read that file
Inject Variables Task
Make sure it is a Result variable and you should be able to get to it using ${bamboo.NAMESPACE.name} for the suffix one, it would be ${bamboo.VERSION.suffix}

Specifying a particular callback to be used in playbook

I have created different playbooks for different operations in ansible.
And I have also created different Callback Scripts for different kinds of Playbooks (And packaged them with Ansible and installed).
The playbooks will be called from many different scripts/cron jobs.
Now, is it possible to specify a particular callback script to be called for a particular playbook? (Using a command line argument probably?)
What's happening right now is, all the Callback scripts are called for each playbook.
I cannot put the callback script relative to the location/folder of the playbook because it's already packaged inside the ansible package. Also, all the playbooks are in the same location too.
I am fine with modifying a bit of ansible source code to accommodate it too if needed.
After going through the code of Ansible, I was able to solve it with the below...
In each callback_plugin, you can specify self.disabled = True and the callback wont be called at all...
Also, which calling a playbook, there's an option to parsing extra arguments as key=value pairs. It will be part of the playbook object as extra_vars field.
So I did something like this in my callback_plugin.
def playbook_on_start(self):
callback_plugins = self.playbook.extra_vars.get('callback_plugin', '') // self.playbook is populated in your callback plugin by Ansible.
if callback_plugins not in ['email_reporter', 'all']:
self.disabled = True
And while calling the playbook, I can do something like,
ansible-playbook -e callback_plugin=email_reporter //Note -e is the argument prefix key for extra vars.
If with callback scripts you mean callback plugins, you could decide in those plugins if any playbook should trigger some action.
In the playbook_on_play_start method you have the name of the play, which you could use to decide if further notifications should be processed or not.
playbook_on_stats then is called at the end of the playbook.
SHOULD_TRIGGER = false
class CallbackModule(object):
def playbook_on_play_start(self, name):
if name == "My Cool Play":
SHOULD_TRIGGER = true
def playbook_on_stats(self, stats):
if SHOULD_TRIGGER == true:
do something cool
Please note, playbook_on_play_start is called for every play in your playbook, so it might be called multiple times.
If you are simply running a playbook via script you can do something like this
ANSIBLE_STDOUT_CALLBACK="json" ansible-playbook -i hosts play.yml
You are setting the callback as an environment variable prior to ansible-playbook command running.

Resources