I am trying to login into a server 100.18.10.182 and triggering my spark submit job in the server 100.18.10.36 from .182 server in Apache Airflow. I have used BashOperator (a shell script to ssh into 100.18.10.182 server) and for the spark submit job, I have used SparkSubmitOperator as a downstream to BashOperator.
I am able to execute the BashOperator successfully but the SparkOperator fails with:
Cannot execute: Spark submit
I think this is because I am unable to pass the session of my SSH (of .182 server) into the next SparkSubmitOperator or it may be due to some other issue related to --jars or --packages, not sure here.
I was thinking to use xcom_push to push some data from my BashOperator and xcom_pull into the SparkSubmitOperator but not sure how to pass it in a way that my server is logged in and then my SparkSubmitOperator gets triggered from that box itself?
Airflow dag code:
t2 = BashOperator(
task_id='test_bash_operator',
bash_command="/Users/hardikgoel/Downloads/Work/airflow_dir/shell_files/airflow_prod_ssh_script.sh ",
dag=dag)
t2
t3_config = {
'conf': {
"spark.yarn.maxAppAttempts": "1",
"spark.yarn.executor.memoryOverhead": "8"
},
'conn_id': 'spark_default',
'packages': 'com.sparkjobs.SparkJobsApplication',
'jars': '/var/spark/spark-jobs-0.0.1-SNAPSHOT-1/spark-jobs-0.0.1-SNAPSHOT.jar firstJob',
'driver_memory': '1g',
'total_executor_cores': '21',
'executor_cores': 7,
'executor_memory': '48g'
}
t3 = SparkSubmitOperator(
task_id='t3',
**t3_config)
t2 >> t3
Shell Script code:
#!/bin/bash
USERNAME=hardikgoel
HOSTS="100.18.10.182"
SCRIPT="pwd; ls"
ssh -l ${USERNAME} ${HOSTS} "${SCRIPT}"
echo "SSHed successfully"
if [ ${PIPESTATUS[0]} -eq 0 ]; then
echo "successfull"
fi
I have 2 script
script 1 : demo_details.txt
script 2 : demo.sh
Script 1 : Path : /demo/d/demo_details.txt and contain below details
export CON_DB_TECY=Username/Password#host:port/Servicename -> `abc/abc#local:123/orabc`
Script 2 : Path : /demo/d/demo.sh and contain below code
. /demo/d/demo_details.txt
sqlplus -s -S << EOF
$CON_DB_TECY
select * from dual;
exit;
EOF
When i run above script 2 using -> sh x demo.sh
It prints details of demo_details.txt => CON_DB_TECY=abc/abc#local:123/orabc
which is connection details that i want to secure and
should not be displayed when i run script using sh -x demo.sh
you can put all connection and sql in demo_details.txt and redirect it
demo_details.txt
Username/Password.....
select * from dual;
exit;
then
demo.sh
sqlplus -s -S <demo_details.txt
I have a script which is in Autosys Job : JOB_ABC_S1
command : /ABC/script.sh
Scrpt.sh code
grep -w "ABC" /d/file1.txt
status=$?
if [ $status -eq 0 ]
then
echo "Passed"
else
echo "Failed"
exit 1
fi
My issue is even if the script failed or pass , the AutoSys job is marked as SU SUCCESS
I don't want it to mark it as success , if script fail's .. it should mark AutoSys as FA and if script pass then mark job to SU SUCCESS
What should i change in the script to make it happen ?
Job :
insert_job : JOB_ABC_S1
machine : XXXXXXXXXXX
owner : XXXXXXXX
box_name : BOX_ABC_S1
application : XXXX
permission : XXXXXXXXXXX
max_run_alarm : 60
alarm_if_fails : y
send_notification : n
std_out_file : XXXXX
std_err_file : XXXXX
command : sh /ABC/script.sh
At first look all seems to be fine.
However, i would suggest a script modification which you can try out.
By default Autosys fails the jobs if the exit code is non-zero unless specified.
JOB JIL seems to be fine.
Please update your script as below and check for 2 things:
Executed job EXIT-CODE: either it should be 1 or 2. We are trying to fail the job in both the cases.
Str log files
Script:
#!/bin/sh
srch_count=$(grep -cw ABC /d/file1.txt)
if [ $srch_count -eq 0 ]; then
echo "Passed"
#exit 0
exit 2
else
echo "Failed"
exit 1
fi
This way we can confirm if the exit code is correctly being captured by Autosys.
I am trying to write a script which allows to check if the db2 table exists or not. If it exists I will continue to touch a file if not exists then it has to wait for 30 min and try to check the same after 30 min. How might I achieve this?
#!/bin/sh
db2 "connect to <database> user <username> using <password>"
Variable=`db2 -x "SELECT COUNT(1) FROM SCHEMA.TABLEA WHERE 1=2"`
while read Variable ;
do
if $Variable=0
then touch triggerfile.txt
else
sleep 30
fi
done
You want to continually poll (without limitation on time) for a table to exist? Might be more readable to use bash or korn syntax, and avoid backticks but that's your choice.
Usual caveats apply, don't hardcode the password.
Apart from the looping logic, you might try this inside the loop (bash or ksh syntax shown below), initialising the variables to suit yourself:
db2 "connect to $dbname user $username using $passwd"
(( $? > 0 )) && print "Failed to connect to database " && exit 1
db2 -o- "select 1 from syscat.tables where tabschema=$schema and tabname=$tabname with ur"
rc=$?
# rc = 0 : the table exists in that schema
# rc= 1 : the table does not exist
(( rc == 1 )) && touch triggerfile.txt
# rc >= 2 : some warning or error, need to investigate and correct
(( rc >= 2)) && print "problems querying syscat.tables" && exit 1
db2 -o- connect reset
I asked a related question and realized I wasn't asking the right question (i.e., this isn't about git).
The question is how to push a project to github without first creating the project in the clouds using R. Currently you can do this from the git command line in RStudio using info from this question.
Now I'm trying to make that into R code now from a Windows machine (Linux was easy). I'm stuck at the first step using curl from the command line via an R system call. I'll show what I have and then the error message (Thanks to SimonO101 for getting me this far.). Per his comments below I've edited heavily to reflect the problem as it:
R Code:
repo <- "New"
user <- "trinker"
password <- "password"
url <- "http://curl.askapache.com/download/curl-7.23.1-win64-ssl-sspi.zip"
tmp <- tempfile( fileext = ".zip" )
download.file(url,tmp)
unzip(tmp, exdir = tempdir())
system(paste0(tempdir(), "/curl http://curl.haxx.se/ca/cacert.pem -o " ,
tempdir() , "/curl-ca-bundle.crt"))
cmd1 <- paste0(tempdir(), "/curl -u '", user, ":", password,
"' https://api.github.com/user/repos -d '{\"name\":\"", repo, "\"}'")
system(cmd1)
cmd2 <- paste0(tempdir(), "/curl -k -u '", user, ":", password,
"' https://api.github.com/user/repos -d '{\"name\":\"", repo, "\"}'")
system(cmd2)
Error Messages (same for both approaches):
> system(cmd1)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 12 0 0 100 12 0 24 --:--:-- --:--:-- --:--:-- 30
100 47 100 35 100 12 65 22 --:--:-- --:--:-- --:--:-- 83{
"message": "Bad credentials"
}
I know all the files are there because:
> dir(tempdir())
[1] "curl-ca-bundle.crt" "curl.exe" "file1aec62fa980.zip" "file1aec758c1415.zip"
It can't be my password or user name because this works on Linux Mint (the only difference is the part before curl):
repo <- "New"
user <- "trinker"
password <- "password"
cmd1 <- paste0("curl -u '", user, ":", password,
"' https://api.github.com/user/repos -d '{\"name\":\"", repo, "\"}'")
system(cmd1)
NOTE: Windows 7 machine. R 2.14.1
EDIT - After OP offered bounty
Ok, it turns out it is to do with some crazy windows character escaping on the command line. Essentially the problem was we were passing improperly formatted json requests to github.
You can use shQuote to properly format the offending portion of the curl request for Windows. We can test platform type to see if we need to include special formatting for Windows cases like so:
repo <- "NewRepository"
json <- paste0(" { \"name\":\"" , repo , "\" } ") #string we desire formatting
os <- .Platform$OS.type #check if we are on Windows
if( os == "windows" ){
json <- shQuote(json , type = "cmd" )
cmd1 <- paste0( tempdir() ,"/curl -i -u \"" , user , ":" , password , "\" https://api.github.com/user/repos -d " , json )
}
This worked on my Windows 7 box without any problems. I can update the GitHub script if you want?
OLD ANSWER
I did some digging around here and here and it might be that the answer to your problem is to update the curl-ca-bundle. It may help on Windows to get R to use the internet2.dll.
repo <- "New"
user <- "trinker"
password <- "password"
url <- "http://curl.askapache.com/download/curl-7.23.1-win64-ssl-sspi.zip"
tmp <- tempfile( fileext = ".zip" )
download.file(url,tmp)
unzip(tmp, exdir = tempdir())
system( paste0( "curl http://curl.haxx.se/ca/cacert.pem -o " , tempdir() , "/curl-ca-bundle.crt" ) )
system( paste0( tempdir(),"/curl", " -u \'USER:PASS\' https://api.github.com/user/repos -d \'{\"name\":\"REPO\"}\'") )
Again, I can't test this as I don't have access to my Windows box, but updating the certificate authority file seems to have helped a few other people. From the curl website, the Windows version of curl should look for the curl-ca-bundle.crt file in the following order:
application's directory
current working directory
Windows System directory (e.g. C:\windows\system32)
Windows Directory (e.g. C:\windows)
all directories along %PATH%