Where to put oozie.launcher.* configuration? - hadoop

While trying to use Oozie properly, I ended up setting a few parameters, namely:
oozie.launcher.mapreduce.map.memory.mb
oozie.launcher.mapreduce.map.java.opts
oozie.launcher.yarn.app.mapreduce.am.resource.mb
oozie.launcher.mapred.job..queue.name
If I set them in the worfklow configuration, they work as expected.
Is there a way/a place to set them globally, ie. not per workflow? I was expecting that custom-oozie-site.xml would be the right place but apparently not (they have no effect if put there). Is the workflow itself the only place where they can be configured?
If it is relevant, I am using hdp 2.5.

In the Oozie Parameterization of Workflows section of the documentation, they state
Workflow applications may define default values for the workflow job parameters. They must be defined in a config-default.xml file bundled with the workflow application archive... Workflow job properties have precedence over the default values.
Another option I've seen done is defining a parent workflow definition and propagating to child workflows. Granted, this only works in specific instances and isn't always a good idea.
In addition the documentation notes in the Workflow Deployment section
The config-default.xml file defines, if any, default values for the workflow job parameters. This file must be in the Hadoop Configuration XML format. EL expressions are not supported and user.name property cannot be specified in this file. Any other resources like job.xml files referenced from a workflow action action node must be included under the corresponding path, relative paths always start from the root of the workflow application.
This is a problem my team is currently trying to fix across 12 different ETL loads.

Related

how does a gradle task explicitly set itself having altered it's output or up to date for tasks dependent on it

I am creating a rather custom task that processes a number of input files and outputs a different number of output files.
I want to check the dates of the input files against the existing output files and also might look at the content of the input files to make the determination whether it is up to date or needs to be invoked to become up to date. What properties do I need to set in a doFirst, code the main action, or whatever ( where and when) in my task to set the right state for their dependency checker and task executor so as either appropriately force dependents to build or not.
Also any doc on standard lib utilities to do things like file date checking etc, getting lists of files etc that are easy like in ruby rake.
How do I specify the inputs and outputs to the task ? Especially as the outputs will not be known until the source is parsed and the output directory is scanned for what exists.
a sample that does this in a larger project that has tasks that are dependent on it would be really nice :)
What properties do I need to set in a doFirst, code the main action, or whatever ( where and when) in my task to set the right state for their dependency checker and task executor so as either appropriately force dependents to build or not.
Ideally this should be done as a custom task type. None of this logic should be in any of the Gradle files at all. Either have the logic in a dedicated plugin project that gets published somewhere which you can then reference in the project, or have the logic in buildSrc.
What you are trying to develop is what is known as an incremental task: https://docs.gradle.org/current/userguide/custom_tasks.html#incremental_tasks
These are used heavily throughout Gradle which makes the incremental build of Gradle possible: https://docs.gradle.org/current/userguide/more_about_tasks.html#sec:up_to_date_checks
How do I specify the inputs and outputs to the task ? Especially as the outputs will not be known until the source is parsed and the output directory is scanned for what exists.
Once you have your tasks defined and whatever else you need, in your main Gradle files you would configure them as you would any other plugin or task.
The two links above should be enough to help get you started.
As for a small example, I developed a Gradle plugin that generates files based on some input that is not known until its configured. The 'custom' task type just extends the provided JavaExec. The custom task is Wsdl2java. Then based on user configuration, tasks get registered here using the input file from the user. Since I reused built-in task types, I know for sure that no extra work will done and can rely on Gradle doing the heavy lifting. There's also a test to ensure that configuration cache works as expected: ConfigurationCacheFunctionalTests.
As I mentioned earlier, the links above should be enough to get you started.

Jenkins Templates

I am new to the Cloudbees Enterprise of Jenkins and to the concept of "templates".
I am trying to define a new template and this template will be used by 20-30 number of jobs. The job is a basic build job. After the build, I would like to have code analysis plugin. How can I define it in the Jenkins Template.
I can define it while creating a direct job in "Post Build Actions" but not sure how to define the same in a template.
Do you have any solutions/ suggestions ?
Cloudbees templates plugin is very powerful but not easy to master. Creation and administration of templates is not as user-friendly as one wishes it to be. ;)
For an introduction I recommend reading the following resources:
Basic concept
Template documentation
Tutorial for simple job template
Make sure you understand the difference between builder template and job template. I assume you want to create a number of jobs using the job template. Follow these steps:
First of all create a normal job that contains all actions you want to perform by all templated jobs.
Make sure this job works as expected for one example configuration.
Now create a new job template:
You will need to decide, which parts of the job configuration need to be adapted to each jobs configuration (e.g. source code repository). Create an parameter for each configuration option.
You might want to do some pre-processing on the job templates parameters using some transformation script - but we skip that for now.
Now you need to add an XML description of what the generated job should do. I recommend copying this XML description from our example job created in step #1. You can access it via this URL: http://your-jenkins/job/this-job/config.xml. Simply copy&paste the XML code in the browser. Newer Jenkins versions also allow to read the jobs XML configuration via the user interface.
Finally you need to fill in the templates arguments within the XML configuration. Simply replace the specific (hard-coded) values by a reference to the name of the templates parameters created before: ${param_name}
Save the template
Now create a new job. On the job creation screen you should be able to select your newly created job template as job type. After creating a job of the templates type you can define all template parameters for this specific job.
Try to run the template-based job and make sure it works as expected.
Create more template-based jobs as needed.
All template-based jobs share the build steps defined by the job template. If you change the job template later on, all depending jobs are updated accordingly. This is a very efficient way to administrate a large number of similar jobs. It is very much worth the effort. Good luck!

How to update properties of NiFi template programatically (rest-api?)

I have NiFi template exported as xml. I am using rest-api to upload template to a NiFi instance. Now, I want to update/add some properties (say, password) of the template from rest-api (or any other option available, programatically).
I read the docs and various community threads without success. Referred links:
How to set props of processor
Update nifi flow on the fly
Open for any approach,
Thanks
I think there is a bit of confusion in your wording. Correct me if I'm wrong but I believe what you want to do is:
Create a template in one location
Export it
Upload it to another NiFi instance
Add the template to the canvas (so now it's just components on your NiFi canvas)
Edit the properties of the components that were added
There are generally two different reasons you would want to edit the properties after importing a template: the properties are specific to the instance you're running on; they were sensitive properties.
With the addition of the "variable registry" in NiFi-0.7.0 you can have multiple files that at NiFi's start-up are read in to give custom variables to use. Here is a section about it in the NiFi docs. This allows you to have custom variables to reference via Expression Language (EL) specific to each environment you run on.
The "variable registry" doesn't help for the sensitive properties though, because the EL used to reference them doesn't get exported with the template (since the property is sensitive). You will need to use the rest-api to update the processor properties explicitly. The NiFi docs give the exact call to use to update a processor (under Processors -> Put). Upgrading the variable registry to work securely is on the NiFi roadmap.
If I was completely off and you simply want to modify a template after importing it into a NiFi instance. You would have to add the template to your graph, delete the template from the listing, re-create it using the components on your graph. After templates are imported/created they are immutable.

Is it possible to configure properties like jcr:PrimaryType from a maven install

I'm following the steps from the Adobe instructions on How to Build AEM Projects using Maven and I'm not seeing how to populate or configure the meta data for the contents.
I can edit and configure the actual files, but when I push the zip file to the CQ instance, the installed component has a jcr:primaryType of nt:folder and the item I'm trying to duplicate has a jcr:primaryType of cq:Component (as well as many other properties). So is there a way to populate that data without needing to manual interact with CQ?
I'm very new to AEM, so it's entirely possible I've overlooked something very simple.
Yes, this is possible to configure JCR node types without manually changing with CQ.
Make sure you have .content.xml file in component folder and it contains correct jcr:primaryType ( e.g. jcr:primaryType="cq:Component").
This file contains metadata for mapping JCR node on File System.
For beginners it may be useful take a look VLT, that used for import/export JCR on File System. Actually component's files in your project should be similar to VLT component export result from JCR.

How to enqueue more than one build of the same configuration?

We are using two teamcity servers (one for builds and one for GUI tests). The gui tests are triggered from http GET in the last step of the build. (as in http://confluence.jetbrains.com/display/TCD65/Accessing+Server+by+HTTP)
The problem is, that there is only one configuration of the same kind in the queue at the same time. Is there a way to enable multiple starts of the same configuration? Can I use some workaround like sending a dummy id?
At the bottom the section "Triggering a Custom Build" here: http://confluence.jetbrains.com/display/TCD65/Accessing+Server+by+HTTP, you can find information about passing custom parameters to the build.
Just define some unused configuration parameter, like "BuildId", and pass, for example, current date (guid will work as well) to it
...&buildId=12/12/23 12:12:12

Resources