How to create dataflow pipeline and auto deploy to google cloud?

How to create dataflow pipeline and auto deploy to google cloud? - maven

I'm using Apache beam and maven to create pipeline and run dataflow jobs. After the logic coding, I run the following command to upload the job/template to Google Cloud.
mvn compile exec:java -Dexec.mainClass=com.package.MyMainClass -Dexec.args="--runner=DataflowRunner --autoscalingAlgorithm=NONE --numWorkers=25 --project=<PROJEC> --subnetwork=regions/us-east1/subnetworks/default --zone=us-east1-b --network=default --stagingLocation=gs://<TBD> --templateLocation=gs://<TBD> --otherCustomOptions"
After that, I've seen two ways of starting to run the job
I had to go to the Dataflow UI page, click to create a new job and use my own template blablabla... and then the job will start running
The job already started running
I wonder how 2 is implemented. I basically want to get rid of the hassle of going into the UI. I want to submit and start the job right here at my laptop. Any insights will be appreciated!

It's important to make a distinction between traditional and templated Dataflow job execution:
If you use Dataflow templates (as in your case), staging and execution are separate steps. This separation gives you additional flexibility to decide who can run jobs and where the jobs are run from.
However, once your template is staged, you need to explicitly run your job from that template. To automate this process, you can make use of:
The API:
POST https://dataflow.googleapis.com/v1b3/projects/YOUR_PROJECT_ID/templates:launch?gcsPath=gs://YOUR_BUCKET_NAME/templates/TemplateName
{
"jobName": "JOB_NAME",
"parameters": {
"inputFile" : "gs://YOUR_BUCKET_NAME/input/my_input.txt",
"outputFile": "gs://YOUR_BUCKET_NAME/output/my_output"
},
"environment": {
"tempLocation": "gs://YOUR_BUCKET_NAME/temp",
"zone": "us-central1-f"
}
}
The gcloud command line tool:
gcloud dataflow jobs run JOB_NAME \
--gcs-location gs://YOUR_BUCKET_NAME/templates/MyTemplate \
--parameters inputFile=gs://YOUR_BUCKET_NAME/input/my_input.txt,outputFile=gs://YOUR_BUCKET_NAME/output/my_output
Or any of the client libraries.
Alternatively, if you don't want to create a Dataflow template and you just want to deploy and run the job directly (which is probably what you're reffering to in point 2), you can just remove the --templateLocation parameter. If you get any errors when doing this, make sure that your pipeline code can be executed for a non-templated job as well; for reference, take a look at this question.

Once the template is staged, as well as the UI you can start it using:
REST API
Gcloud Command Line

Related

Create multiple MarkLogic Schedule Task for same module through ml-gradle

I am trying to create multiple instance of application on same marklogic environment. I can able to create all the configurations(users,roles,databases,forests,app servers...) but could not able to schedule individual tasks for separate database with same module path.
When tried to run ml-gradle mldeployApps failing at Tasks creation.
My whole application configuration will depends on from property file. for any APP-NAME a seperate insiance need to be created.
I tried deploying through ml-gradle
The mlDeployTasks is failing as already an task is available for the module path. When try to run secong with new failing as it is not recognizing task database
JSON:
{
"task-enabled":true,
"task-path":"/ext/schedules/monitor.xqy",
"task-root":"/",
"task-type":"daily",
"task-period":1,
"task-start-time": "10:00:00",
"task-database":"%%DATABASE%%",
"task-modules":"%%MODULES_DATABASE%%",
"task-user":"admin",
"task-priority":"normal"
}
ERROR:
Logging HTTP response body to assist with debugging: {"errorResponse":{"statusCode":"500", "status":"Internal Server Error", "messageCode":"MANAGE-INVALID", "message":"MANAGE-INVALID (err:FOER0000): task-database"}}
Error occurred while sending PUT request to /manage/v2/tasks/5389046897270663947/properties?group-id=Default; logging request body to assist with debugging: {
Expectation :
wants to deploy and undeploy whole application including schedules tasks based on APPLICATION-NAME as seperate instance
Actual:
the mlDeployTasks based on the module-path each task is identified with old existing database and fails to create a new task server.
Please suggest me the right way to achieve the same

MarkLogic's Management API is seeing your request as an attempt to change the task-database, but it only allows one property for a scheduled task to change (task-enabled). I think what you'll need to do here is have different task-path values for your different databases. That's not ideal, but if the implementation logic is all in a library that's imported by the task, the different modules themselves will be very lightweight.

Try ml-gradle 3.10.0 - support for this now exists - see the release notes for ml-app-deployer 3.10.0 (which provides most of the functionality in ml-gradle) - https://github.com/marklogic-community/ml-app-deployer/releases/tag/3.10.0

Jenkins Templates

I am new to the Cloudbees Enterprise of Jenkins and to the concept of "templates".
I am trying to define a new template and this template will be used by 20-30 number of jobs. The job is a basic build job. After the build, I would like to have code analysis plugin. How can I define it in the Jenkins Template.
I can define it while creating a direct job in "Post Build Actions" but not sure how to define the same in a template.
Do you have any solutions/ suggestions ?

Cloudbees templates plugin is very powerful but not easy to master. Creation and administration of templates is not as user-friendly as one wishes it to be. ;)
For an introduction I recommend reading the following resources:
Basic concept
Template documentation
Tutorial for simple job template
Make sure you understand the difference between builder template and job template. I assume you want to create a number of jobs using the job template. Follow these steps:
First of all create a normal job that contains all actions you want to perform by all templated jobs.
Make sure this job works as expected for one example configuration.
Now create a new job template:
You will need to decide, which parts of the job configuration need to be adapted to each jobs configuration (e.g. source code repository). Create an parameter for each configuration option.
You might want to do some pre-processing on the job templates parameters using some transformation script - but we skip that for now.
Now you need to add an XML description of what the generated job should do. I recommend copying this XML description from our example job created in step #1. You can access it via this URL: http://your-jenkins/job/this-job/config.xml. Simply copy&paste the XML code in the browser. Newer Jenkins versions also allow to read the jobs XML configuration via the user interface.
Finally you need to fill in the templates arguments within the XML configuration. Simply replace the specific (hard-coded) values by a reference to the name of the templates parameters created before: ${param_name}
Save the template
Now create a new job. On the job creation screen you should be able to select your newly created job template as job type. After creating a job of the templates type you can define all template parameters for this specific job.
Try to run the template-based job and make sure it works as expected.
Create more template-based jobs as needed.
All template-based jobs share the build steps defined by the job template. If you change the job template later on, all depending jobs are updated accordingly. This is a very efficient way to administrate a large number of similar jobs. It is very much worth the effort. Good luck!

Run Jenkins Job Based on Another Job's Status

I have a number of different projects, with Jenkins CI jobs configured for each of them to run tests. When I create a new release, I have a second job that coordinates between a number of different jobs that go over each of the modules in the projects and updates the versions and the dependencies in the pom.xml's. I would like to make the "update" job conditional on the status of all the CI jobs - meaning that if one of the CI jobs is not green, then the update job will not run at all.
I had a look at the Run Condition Plugin as well as the Conditional BuildStep Plugin, however it does not seem possible do configure them to be dependent on the status of another Jenkins job.

you could hit the other jobs via the API at [JOB_URL]/lastCompletedBuild/api/json and verify the result for each.
to mess around with this:
curl `[JOB_URL]/lastCompletedBuild/api/json` | jq '.result'
you probably want result to say SUCCESS.
this is not fancy, but you don't want fancy in CI; you want something that is not likely to break when you upgrade jenkins. :)

Have a [https://wiki.jenkins.io/display/JENKINS/Multijob+Plugin] ["Multijob Plugin"] ,
In your case, you can add a job in first step and configure in that step, at which result condition of first step, you want to run second step.
Again, in second step, you can configure one/many jobs and can also configure if you want to run them in parallel.

Start wercker job hourly

I've just started using wercker and I'd like a job to run regularly (e.g. daily, hourly). I realize this may be an anti-pattern, but is it possible? My intent is not to keep the container running indefinitely, just that my workflow is executed on a particular interval.

You can use a call to the Wercker API to trigger a build for any project which is set up already in Wercker.
So maybe set up a cron job somewhere that uses curl to make the right API call?

Working with Flask-Script and cron jobs

So I've been meaning to create a cron job on my prototype Flask app running on Heroku. Searching the web I found that the best way is by using Flask-Script but I fail to see the point of using it. Do I get easier access to my app logic and storage info? And if I do use Flask-Script, how do I organize it around my app? I'm using it right now to start my server without really knowing the benefits. My folder structure is like this:
/app
/manage.py
/flask_prototype
all my Flask code
Should I put the 'script.py' to be run by the Heroku Scheduler on app folder, the same level as manage.py? If so, do I get access to the models defined within flask_prototype?
Thank you for any info

Flask-Script just provides a framework under which you can create your script(s). It does not give you any better access to the application than what you can obtain when you write a standalone script. But it handles a few mundane tasks for you, like command line arguments and help output. It also folds all of your scripts into a single, consistent command line master script (this is manage.py, in case it isn't clear).
As far as where to put the script, it does not really matter. As long as manage.py can import it and register it with Flask-Script, and that your script can import what it needs from the application you should be fine.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio