I am trying to run a job in Airflow 2.1.2 which executes a dataflow job. The dataflow job reads data from storage bucket and uploads it to bigquery. The dataflow_default_options in the DAG has region defined as europe-west1 however it is overridden by the actual job in DAG to us-central1. Due to this the dataflow job fails on big query upload as the region is us-central1
It was working fine before when I was using the older version of airflow(1.10.15).
Code below:
DEFAULT_DAG_ARGS = {
'start_date': YESTERDAY,
'email': models.Variable.get('email'),
'email_on_failure': True,
'email_on_retry': False,
'retries': 0,
'project_id': models.Variable.get('gcp_project'),
'dataflow_default_options': {
'region': 'europe-west1',
'project': models.Variable.get('gcp_project'),
'temp_location': models.Variable.get('gcp_temp_location'),
'runner': 'DataflowRunner',
'zone': 'europe-west1-d'
}
}
with models.DAG(dag_id='GcsToBigQueryTriggered',
description='A DAG triggered by an external Cloud Function',
schedule_interval=None,
default_args=DEFAULT_DAG_ARGS,
max_active_runs=1) as dag:
# Args required for the Dataflow job.
job_args = {
'input': 'gs://{{ dag_run.conf["bucket"] }}/{{ dag_run.conf["name"] }}',
'output': models.Variable.get('bq_output_table'),
'fields': models.Variable.get('input_field_names'),
'load_dt': DS_TAG
}
# Main Dataflow task that will process and load the input delimited file.
dataflow_task = dataflow_operator.DataFlowPythonOperator(
task_id="data-ingest-gcs-process-bq",
py_file=DATAFLOW_FILE,
options=job_args)
If i change the region in the options of the dataflow_task to europe-west1, then the Dataflow job passes however it fails in Airflow with 404 error code as it waits for the JOB_DONE status of the dataflow job in the wrong region(us-central1).
Am I missing something ? Any help would be highly appreciated ?
Related
I'm trying to set up a simple Batch Compute Environment using a LaunchTemplate, so that I can specify a larger-than-default volume size:
const templateName = 'my-template'
const jobLaunchTemplate = new ec2.LaunchTemplate(stack, 'Template', {
launchTemplateName: templateName,
blockDevices: [ ..vol config .. ]
})
const computeEnv = new batch.CfnComputeEnvironment(stack, 'CompEnvironment', {
type: 'managed',
computeResources: {
instanceRole: jobRole.roleName,
instanceTypes: [
InstanceType.of(InstanceClass.C4, InstanceSize.LARGE).toString()
],
maxvCpus: 64,
minvCpus: 0,
desiredvCpus: 0,
subnets: vpc.publicSubnets.map(sn => sn.subnetId),
securityGroupIds: [vpc.vpcDefaultSecurityGroup],
type: 'EC2',
launchTemplate: {
launchTemplateName: templateName,
}
},
})
They both initialize fine when not linked, however as soon as the launchTemplate block is added to the compute environment, I get the following error:
Error: Resource handler returned message: "Resource of type 'AWS::Batch::ComputeEnvironment' with identifier 'compute-env-arn' did not stabilize." (RequestToken: token, HandlerErrorCode: NotStabilized)
Any suggestions are greatly appreciated, thanks in advance!
For anyone running into this - check the resource that is being created in the AWS Console - i.e go to aws.amazon.com and refresh the page over and over until you see it created by CF. This gave me a different error message regarding the instance profile not existing (A bit more helpful than the terminal error...)
A simple CfnInstanceProfile did the trick:
new iam.CfnInstanceProfile(stack, "batchInstanceProfile", {
instanceProfileName: jobRole.roleName,
roles: [jobRole.roleName],
});
I faced similar error.
But in my case cdk had created subnetGroups list in cdk.context.json and was trying to use the same in the CfnComputeEnvironment definition.
The problem was; I was using the default vpc and had manually modified few subnets. and cdk.context.json was not updated.
Solved by deleting the cdk.context.json
This file was recreated with correct values in next synth.
Tip for others facing similar problem:
Don't just rely on the error message; watch closely the Cloud-formation Script that's generated from CDK for the resource.
I am trying to trigger a job A(this is configured as trigger remote) remotely from another job B, and job B needs to hold until results come back to show success or failure, I initially tried using rest API using curl command, it perfectly works.here's the curl code:
curl -v -X POST 'https://xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters?config_files=./jenkins/unit-tests.json' --user xxxx:110f4dfa33ba8f8ef5d8d299beb6aa1543
I choose parameterized plugin code which installed on Jenkins server because it handles the polling mechanism internally and also has handler friendly methods. please see below code for remoteJob, but it fails with 405 error, that means method not allowed in HTTP language, looks like plugin is using GET method instead of post. I added an option for logging , but it does not seems to be showing more log.
def handle = triggerRemoteJob(
remoteJenkinsName: 'remote-master',
job: 'https://xxx.xxx.com/xxx-xxx/job/xxx/job/master/buildWithParameters',
remoteJenkinsUrl: 'https://xxx.xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters',
auth: TokenAuth(apiToken: hudson.util.Secret.fromString('110f4dfa33ba8f8ef5d8d299beb6aa1543'), userName: 'xxxx'),
parameters: 'config_files=./jenkins/unit-tests')
I am getting following error -
[Pipeline] triggerRemoteJob
##########################################################################
Parameterized Remote Trigger Configuration:
- job: https://xxx.xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters
- remoteJenkinsUrl: https://xxx.xxx.xxx/xxx-xxx/job/ius/job/master/buildWithParameters
- auth: 'Token Authentication' as user 'sseri'
- parameters: [config_files=./jenkins/unit-tests]
- blockBuildUntilComplete: true
- connectionRetryLimit: 5
- trustAllCertificates: false
##########################################################################
Connection to remote server failed [405], waiting to retry - 10 seconds until next attempt. URL: https://xxx.xxx.xxx/xxx-xxx/job/xxx/job/master/buildWithParameters/api/json, parameters:
Retry attempt #1 out of 5
Please help me in this regard!
I am not sure about the plugins you are using, but it's quite simple to implement this scenario "call a downstream job from upstream and fail upstream if the downstream fail" without any plugins.
Take a look at my example below.
let say if you have 2 jobs called jobA and jobB and your goal is to call jobB from jobA and fail the jobA if jobB fail.
**Scripted Pipeline for jobA **
node() {
try {
def jobB = build(job: jobName,parameters: [string(name:"parameterName",value: "parameterValue")])
def jobBStatus = jobB.getResult()
if(jobBStatus == "failed") {
throw new RuntimeException("Downstream job-b failed with reason ...");
}
...
}catch(Exception e) {
throw e
}
}
Declarative Pipeline for jobA
pipeline {
agent any;
stages {
stage('call jobB') {
steps {
script {
def jobB = build(job: jobName,parameters: [
string(name:"parameterName",value: "parameterValue")
])
def jobBStatus = jobB.getResult()
if(jobBStatus == "failed") {
error("Downstream job-b failed with reason ...")
}
}
}
}
}
}
Try using this Parameterized-Remote-Trigger-Plugin. It should give you what you want. I'm having some problems configuring it using authentication tokens and users using Jenkinsfile but if you are using the GUI im sure you will get the job done.
Usecase: I want to send jenkins job console log to elasticsearch, from there to kibana so that i can visualise the data.
I am using logstash plugin to achieve this. For freestyle job logstash plugin configuration is working fine but for jenkins pipeline jobs I am getting all required data like build number, job name, build duration and all but it is not showing the build result i.e., success or failure it is not showing.
I tried in two ways:
1.
stage('send to ES') {
logstashSend failBuild: true, maxLines: -1
}
2.
timestamps {
logstash {
node() {
sh'''
echo 'Hello, World!'
'''
try {
stage('GitSCM')
{
git url: 'github repo.git'
}
stage('Initialize')
{
jdk = tool name: 'jdk'
env.JAVA_HOME = "${jdk}"
echo "jdk installation path is: ${jdk}"
sh "${jdk}/bin/java -version"
sh '$JAVA_HOME/bin/java -version'
def mvnHome = tool 'mvn'
}
stage('Build Stage')
{
def mvnHome = tool 'mvn'
sh "${mvnHome}/bin/mvn -B verify"
}
currentBuild.result = 'SUCCESS'
} catch (Exception err) {
currentBuild.result = 'FAILURE'
}
}
}
}
But in both ways I am not getting build result i.e., success or failure in my elasticsearch or kibana.
Can someone help.
I didn't find a clear way to do that, my solution was add those lines at the end of the Jenkinsfile:
echo "Current result: ${currentBuild.currentResult}"
logstashSend failBuild: true, maxLines: 3
In my case, I dont need it to send all console logs, only one log with the result per job.
Objective: To trigger a downstream job from a different Jenkins instance and display the console output in the upstream job.
Job type: Pipeline scripts.
The complete code below:
properties([
parameters([
string(name: 'var1', defaultValue: "value1", description: ''),
string(name: 'var2', defaultValue: "value2", description: ''),
string(name: 'var3', defaultValue: "value3", description: '')
])
])
node('unique tag'){
stage("Trigger downstream"){
//From Jenkins
def remoteRunWrapper = triggerRemoteJob(
mode: [$class: 'ConfirmStarted', timeout: [timeoutStr: '1h'], whenTimeout: [$class: 'StopAsFailure']],
remotePathMissing: [$class: 'StopAsFailure'],
parameterFactories: [[$class: 'SimpleString', name: 'var1', value: var1], [$class: 'SimpleString', name: 'var2', value: var2], [$class: 'SimpleString', name: 'var3', value: var3]],
remotePathUrl: 'jenkins://..',
)
print(remoteRunWrapper.toString())
//would want to use other capabilities offered by remoteRunWrapper
}
}
The triggerRemoteJob is able to trigger the downstream job and return with an instance of RemoteRunWrapper after the job has started. The RemoteRunWrapper instance should provide capabilities that can allow me to check on the downstream job/retrieve logs. There is however no documentation on the RemoteRunWrapper that I could find. The methods described in the RunWrapper documentation cannot be used and the script fails with the error:
groovy.lang.MissingMethodException: No signature of method: com.cloudbees.opscenter.triggers.RemoteRunWrapper.getId() is applicable for argument types: () values: []
How can I find the capabilities offered by RemoteRunWrapper? Are there any better ways to achieve this?
Note:
1) The use of
mode: [$class: 'ConfirmStarted', timeout: [timeoutStr: '1h'], whenTimeout: [$class: 'StopAsFailure']],
remotePathUrl: 'jenkins://...'
is necessary as the below:
remoteJenkinsUrl: 'https://myjenkins:8080/...'
job: 'TheJob'
from the triggerRemoteJob documentation is failing to trigger the job and is returning a null object and the methods that are described here also cause the script to fail with MissingMethodException.
2) The [$class: 'RemoteBuildConfiguration'] provides an option 'enhancedLogging' that allows the console output of the remote job to also be logged. However when used, a classNotFound exception is seen (import statement was included).
3) It does not really matter whether the downstream job is triggered asynchronous or synchronously as long as it is possible to log the console output of the downstream job in the console output of the upstream job.
I'm playing on localhost with a DC/OS installation. While everything works fine, I can't seem to run a docker image located inside a private repo. I'm using python to communicate with chronos:
#celery.task(name='add-job', soft_time_limit=5)
def add_job(job_id):
job_document = mongo.jobs.find_one({
'_id': job_id
})
if job_document:
worker_document = mongo.workers.find_one({
'_id': job_document['workerId']
})
if worker_document:
job = {
'async': True,
'name': job_document['_id'],
'owner': 'owner#gmail.com',
'command': "python /code/run.py",
"disabled": False,
"shell": True,
"cpus": worker_document['cpus'],
"disk": worker_document['disk'],
"mem": worker_document['memory'],
'schedule': 'R1//PT300S',# start now,
"epsilon": "PT60M",
"container": {
"type": "DOCKER",
"forcePullImage": True,
"image": "quay.io/username/container",
"network": "HOST",
"volumes": [{
"containerPath": "/images/",
"hostPath": "/images/",
"mode": "RW"
}]
},
"uris": [
"file:///images/docker.tar.gz"
]
}
return chronos_client.add(job)
else:
return 'worker not found'
else:
return 'job not found'
The job runs fine with a public image (alpine:latest) but it fails without any error inside the dcos installation.
The job gets executed but it fails immediately. The error log of the job inside chronos looks like this:
I1212 12:39:11.141639 25058 fetcher.cpp:498] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/61d6d037-c9f5-482b-a441-11d85554461b-S1\/root","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":false,"value":"file:\/\/\/images\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/61d6d037-c9f5-482b-a441-11d85554461b-S1\/docker\/links\/7029bbea-4c3d-439a-8720-411f6fe40eb9","user":"root"}
I1212 12:39:11.143575 25058 fetcher.cpp:409] Fetching URI 'file:///images/docker.tar.gz'
I1212 12:39:11.143587 25058 fetcher.cpp:250] Fetching directly into the sandbox directory
I1212 12:39:11.143602 25058 fetcher.cpp:187] Fetching URI 'file:///images/docker.tar.gz'
I1212 12:39:11.143612 25058 fetcher.cpp:167] Copying resource with command:cp '/images/docker.tar.gz' '/var/lib/mesos/slave/slaves/61d6d037-c9f5-482b-a441-11d85554461b-S1/docker/links/7029bbea-4c3d-439a-8720-411f6fe40eb9/docker.tar.gz'
I1212 12:39:11.146726 25058 fetcher.cpp:547] Fetched 'file:///images/docker.tar.gz' to '/var/lib/mesos/slave/slaves/61d6d037-c9f5-482b-a441-11d85554461b-S1/docker/links/7029bbea-4c3d-439a-8720-411f6fe40eb9/docker.tar.gz'
Stdout is empty. Executed directly inside marathon as an application with the same settings the authentication works and my image is downloaded & executed. Is this something that chronos does not support? It should...I mean, it has commands for docker...
Update: digging deeper into the agent logs I found this:
Failed to run 'docker -H unix:///var/run/docker.sock pull quay.io/username/container': exited with status 1; stderr='Error: Status 403 trying to pull repository username/container: "{\"error\": \"Permission Denied\"}"
I tried the archive with it's config.json file on the agent itself and it can download when triggered from the command line. I just can't seem to understand why chronos is not using it properly. I can't find any other reference on how to put my credentials other than this.
As it turns out...the uris param is deprecated in favor of fetch. I started from scratch with a marathon config applied to chronos and watched the logs carefully when I saw this: {'message': 'Tried to add both uri (deprecated) and fetch parameters on aBPepwhG5z33e4teG', 'status': 'Bad Request'}. Then I changed my uris parameter into:
"fetch": [{
"uri": "/images/docker.tar.gz",
"extract": true,
"executable": false,
"cache": false
}]
...and it worked.
your post looked a little like this one, which turned out to be a problem with volumes.