Apache Airflow pass data from BashOperator to SparkSubmitOperator - shell

I am trying to login into a server 100.18.10.182 and triggering my spark submit job in the server 100.18.10.36 from .182 server in Apache Airflow. I have used BashOperator (a shell script to ssh into 100.18.10.182 server) and for the spark submit job, I have used SparkSubmitOperator as a downstream to BashOperator.
I am able to execute the BashOperator successfully but the SparkOperator fails with:
Cannot execute: Spark submit
I think this is because I am unable to pass the session of my SSH (of .182 server) into the next SparkSubmitOperator or it may be due to some other issue related to --jars or --packages, not sure here.
I was thinking to use xcom_push to push some data from my BashOperator and xcom_pull into the SparkSubmitOperator but not sure how to pass it in a way that my server is logged in and then my SparkSubmitOperator gets triggered from that box itself?
Airflow dag code:
t2 = BashOperator(
task_id='test_bash_operator',
bash_command="/Users/hardikgoel/Downloads/Work/airflow_dir/shell_files/airflow_prod_ssh_script.sh ",
dag=dag)
t2
t3_config = {
'conf': {
"spark.yarn.maxAppAttempts": "1",
"spark.yarn.executor.memoryOverhead": "8"
},
'conn_id': 'spark_default',
'packages': 'com.sparkjobs.SparkJobsApplication',
'jars': '/var/spark/spark-jobs-0.0.1-SNAPSHOT-1/spark-jobs-0.0.1-SNAPSHOT.jar firstJob',
'driver_memory': '1g',
'total_executor_cores': '21',
'executor_cores': 7,
'executor_memory': '48g'
}
t3 = SparkSubmitOperator(
task_id='t3',
**t3_config)
t2 >> t3
Shell Script code:
#!/bin/bash
USERNAME=hardikgoel
HOSTS="100.18.10.182"
SCRIPT="pwd; ls"
ssh -l ${USERNAME} ${HOSTS} "${SCRIPT}"
echo "SSHed successfully"
if [ ${PIPESTATUS[0]} -eq 0 ]; then
echo "successfull"
fi

Related

problem running script to a remote machine

I run a script via ssh to a remote machine :
ssh -p$port $user#$ip "bash /dati/bin/add_data.sh $t_ext_aria $t_pannello $t_ext_muro $t_cantina $t_bollitore $t_PT $t_P1 $t_P2 $H_PT $H_P1 $H_P2"
the content of the script add_data.sh ( in the remote machine) is pretty self explanatory : run a mysql query passing 11 params:
query="INSERT INTO temp (t_ext_aria , t_pannello , t_ext_muro , t_cantina , t_bollitore , t_PT , t_P1 , t_P2 , H_PT , H_P1 , H_P2) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,${10},${11})"
echo $query >> debug.log
mysql --user=rf --password=password tana << EOF
$query;
EOF
everything works fine except it doesn't log anything in debug.log ( I need to log to investigate why sometimes mysql query fails : probably some wrong formatted data )
BUT If I log in to the remote machine and I run the script from there: e.g.
bash add_data.sh 1 1 1 1 1 1 1 1 1 1 1
it correctly write to debug.log .
I put permission 777 to avoid any issue .
where am I wrong?
thank's fab

Curl returns Invalid JSON error in a Jenkins Pipeline script but returns the expected response on a bash shell run or in a Jenkins Freestyle job

I am writing a Jenkins Pipeline job for setting up AWS infrastructure using API calls to our in-house AWS CLI wrapper library. Running the raw bash scripts on a CentOS box or as a Jenkins Freestyle job runs fine. However, it fails in the context of a Pipeline job. I think that the quotes may need to be different for the Pipeline job but I am not sure how.
After further investigation, I found that the curl command returns the wrong response from the service when running the scripts within a Jenkins Pipeline job.
pipeline {
agent any
stages {
stage('Checkout code from Git'){
steps {
echo "Checkout code from a GitHub repository"
// Checkout code from a GitHub repository
checkout([$class: 'GitSCM', branches: [[name: '*/master']], doGenerateSubmoduleConfigurations: false, extensions: [[$class: 'SubmoduleOption', disableSubmodules: false, parentCredentials: false, recursiveSubmodules: true, reference: '', trackingSubmodules: false]], submoduleCfg: [], userRemoteConfigs: [[credentialsId: 'xxxx', url: 'git#github.com:bbc/repo.git']]])
}
}
stage('Call our internal AWS CLI Wrapper System API to perform an ACTION on a specified ENVIRONMENT') {
steps {
script {
if("${params.ENVIRONMENT}" == 'int' && "${params.ACTION}" == 'create'){
echo "ENVIRONMENT=${params.ENVIRONMENT}, ACTION=${params.ACTION}"
echo ""
sh '''#!/bin/bash
# Create Neptune Cluster for the Int environment
cd blah-db
echo "Current working directory is $PWD"
CLOUD_FORMATION_FILE=$PWD/infrastructure/templates/neptune-cluster.json
echo "The CloudFormation file to operate on is $CLOUD_FORMATION_FILE"
echo "Running jq to transform the source CloudFormation file"
template=$(jq -M '.Parameters.Env.Default="int"' $CLOUD_FORMATION_FILE)
echo "Echoing the transformed CloudFormation file: \n$template"
echo "Running curl to make the http request to our internal AWS CLI Wrapper System"
curl -d "{\"aws_account\": \"1111111111\", \"region\": \"us-east-1\", \"name_suffix\": \"cluster\", \"template\": $template}" \
-H 'Content-Type: application/json' -H 'Accept: application/json' https://base.api.url/v1/services/blah-neptune/int/stacks \
--cert /path/to/client/certificate/client.crt --key /path/to/client/private-key/client.key
cd ..
pwd
# Set a timer to run for 300 seconds or 5 minutes to create a delay to allow for the Neptune Cluster to be fully provisioned first before adding instances to it.
'''
}
}
}
}
}
}
The actual result that I get from making the API call:
{"error": "Invalid JSON. Expecting property name: line 1 column 1 (char 1)"}
try change the curl as following:
curl -d '{"aws_account": "1111111111", "region": "us-east-1", "name_suffix": "cluster", "template": $template}'
Or assign the whole cmd to a variable and print it out to see it's as your wanted or not.
cmd = '''#!/bin/bash
cd blah-db
...
'''
echo cmd // compare the output string to the cmd of freestyle job.
sh cmd

command not found error on calling oozie action via shell script

I'm trying to trigger Oozie job through shell script. But on execution of shell script am getting the below error:
"command not found" error in the line: ooziejob =$(oozie job -oozie
http://oozieserver:port/oozie -config
/root/SqoopWrapper1/sqoop_job.properties -run);
My shell script consisting of oozie command is;
input=/root/SqoopWrapper1/InputFile.txt
echo "internal field sep"
IFS='|'
while read SourceDB db_name Mysql_table hdfsdir libpath
do
echo "do...while"
if [ SourceDB = Mysql ]
then
driver = com.mysql.jdbc.Driver
jdbcUri = jdbc:mysql://host:3306
Mysql_table = WrapperTbl
UserName = ****
Password = ****
fi
echo "Oozie command exe"
ooziejob =$(oozie job -oozie http://oozieserver:port/oozie -config /root/SqoopWrapper1/sqoop_job.properties -run);
echo $ooziejob;
done < $input
exit 0
You have a space before the equal-sign.
BTW, if you post this kind of questions, you should always say what shell and what OS you are using.

Can not take jstat metrics using bash as Sensu plugin

I have created bash scirpt that takes jstat metrics of my jvm instances!
Here is the output example :
demo.server1.sms.jstat.eden 24.34 0
demo.server1.lcms.jstat.eden 54.92 0
demo.server1.lms.jstat.eden 89.49 0
demo.server1.tms.jstat.eden 86.05 0
But when the Sensu-client runs this script it returns
Could not attach to 8584
Could not attach to 8588
Could not attach to 17141
Could not attach to 8628
demo.server1.sms.jstat.eden 0
demo.server1.lcms.jstat.eden 0
demo.server1.lms.jstat.eden 0
demo.server1.tms.jstat.eden 0
Here is the example of check_cron.json
{
"checks": {
"jstat_metrics": {
"type": "metric",
"handlers": ["graphite"],
"command": "/etc/sensu/plugins/jstat-metrics.sh",
"interval": 5,
"subscribers": [ "webservers" ]
}
}
}
And piece of my bash script
jvm_list=("sms:$sms" "lcms:$lcms" "lms:$lms" "tms:$tms" "ums:$ums")
for jvm_instance in ${jvm_list[#]}; do
project=${jvm_instance%%:*}
pid=${jvm_instance#*:}
if [ "$pid" ]; then
metric=`jstat -gc $pid|tail -n 1`
output=$output$'\n'"demo.server1.$project.jstat.eden"$'\t'`echo $metric |awk '{ print $3}'`$'\t0'
fi
done
echo "$output"
I find out that problem is with jstat and i tried to write full jstat path like /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/jstat -gc $pid|tail -n 1 but it didn't help!
By the way if i will comment this row the output like "Could not attach to 8584" disappears!
I'm not a Java or Sensu user, but I can guess what happens.
Most likely, sensu-client runs your script as a user different from the one you use when testing manually, which doesn't have permissions to "attach" (whatever that means) to your jvm instances.
To verify this you can add invocation of "whoami" to your script, run it from sensu-client again, see what user it runs your script under and, if it is different, try to run your script as that user.
Yes you're right sensu runs all script as sensu user. To use jstat you have to add sensu to a sudoers.
just add file /etc/sudoers.d/sensu
Example:
Defaults:sensu !requiretty
Defaults:sensu secure_path =
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
sensu ALL = NOPASSWD: /etc/sensu/plugins/jsat-metrics.rb

How to use bash/expect to check if an SSH login works

My team manages many servers, and company policy dictates that the passwords on these servers must be changed every couple of weeks. Sometimes, our official database of passwords gets out of date for whatever reason (people forget to update it, usually), but we cannot identify this sometimes until months later, since we don't consistently use every server.
I want to write a script that will scrape the passwords from the database, and use those passwords to attempt an (ssh) login to each server every night, and send an email with the results to the team. I am able to scrape the database for login information, but I'm not sure how to check whether ssh login was successful or not in expect.
I cannot use public key authentication for this task. I want password authentication so I can verify the passwords.
I disable public-key authentication by specifying the following file:
PasswordAuthentication=yes
PubkeyAuthentication=no
My attempts at the expect script:
# $1 = host, $2 = user, $3 = password, $4 = config file
expect -c "spawn ssh $2#$1 -F $4
expect -re \".*?assword.*?\"
send \"$3\n\"
...
send \'^D\'"
I thought maybe exit status could indicate the success? Couldn't find anything in the man pages though.
I've been using something like the script below for a similar task.
#!/bin/sh
# Run using expect from path \
exec expect -f "$0" "$#"
# Above line is only executed by sh
set i 0; foreach n $argv {set [incr i] $n}
set pid [ spawn -noecho ssh $1#$3 $4 ]
set timeout 30
expect {
"(yes/no)" {
sleep 1
send "yes\n"
exp_continue
}
"(y/n)" {
sleep 1
send "y\n"
exp_continue
}
password {
sleep 1
send "$2\n"
exp_continue
}
Password {
sleep 1
send "$2\n"
exp_continue
}
"Last login" {
interact
}
"Permission denied" {
puts "Access not granted, aborting..."
exit 1
}
timeout {
puts "Timeout expired, aborting..."
exit 1
}
eof {
#puts "EOF reached."
}
}
set status [split [wait $pid]]
set osStatus [lindex $status 2]
set procStatus [lindex $status 3]
if { $osStatus == 0 } {
exit $procStatus
} else {
exit $procStatus
}
Do you specifically need to check if you can obtain a shell or is trying to execute a command also OK ?
If you just want to check authentication, you may want to do ssh asimplecommand (using echo, hostname, or something as such) and check if you get the expected result.
You may also want to launch ssh with -v option, and look for Authentication succeeded (at the debug1 log level).
crowbent has provided you an expect script to test ssh login however I would recommend using Non-interactive ssh password auth for testing out ssh/sftp. sshpass is much more secured and less error prone than expect.
The solution to the underlying problem (password database getting out of sync) is to use public key authentication. For everyone. Do NOT bother with passwords when it comes to SSH.
Successful login could be checked like this:
ssh -o PasswordAuthentication=no USER#HOST 'exit' || echo "SSH login failed."

Resources