Can we use control the source formats/layout in DMExpress using environment variables? - etl

I am using DMExpress tasks to do taransformations on my business data. These business data come in multiple format/layout. I need to be able to use single task for transformation on multiple source layouts. Any DMExpress experts here??

One way that I found out for doing transformation on multiple source layouts with the help of single task was by using the Dmexpress SDK to write the script for the task rather than building the tasks using the GUI task editor. SDK gives lot more flexibility compared to the GUI editor.
But if you are bound to GUI then there is a way around for this specific purpose. You should define a common name for the source layout. Only the source layout name is binded to the task but not the actual layout definition. So you can alter the layout definition while keeping the layout name constant to get a generic task.

FYI- DMExpress is now called DMX (Syncsort changed the name about a year ago).
Do you have multiple different record types within a single file or is each type of record in a separate file? Your question is not clear on this.
If they are in separate files, this is very easy, but you will need to create a separate DMX task for each file. In each of these tasks, define one of the files as the source and create a record layout that matches the format of that file.
If they are in the SAME file, it is only a little more difficult. You can split them into separate files by creating multiple targets and defining a named condition for each target using the SourceName() function (this function returns the name of the file that the current record came from). Then you can process them as separate files (see above). This works UNLESS you have a parent-child relationship going on between the different types of records in that single file. If that is the case, please post some sample data and I can advise on how to handle it.

Related

How can I auto generate inputs.tf and outputs.tf variables when working with Terraform?

Note: Please see the #### UPDATE ### section below. I've heavily modified the question for clarity on what I'm trying to achieve, but added it as an addendum rather than rewrite the question.
As my infrastructure grows, adding input variables in my variables.tf files and then syncing those values to output variables in my outputs.tf file is now impossible to do manually. Not only is it taking up a lot of unnecessary time, probably more time is spent going back and fixing the ones that terraform validate told me that I missed by human error. This is especially true when building / using modules whose arguments add an additional layer to manage.
There has to be a better way? Here is what I want to achieve.
Let's say I'm creating an Azure AKS Kubernetes cluster. The Terraform resource is azurerm_kubernetes_cluster.
Only 8 arguments are required to create a base install, but there are almost 250 additional ones. They all have default values. Per the documentation page, they also already have fantastic descriptions. (I'm tired of copying and pasting into my variables { description = "this"} block.)
The information is there in the documentation. terraform plan also has knowledge of every single additional one because it of course comes up in the pre-apply plan. (known after apply) means its optional, but will have a default value.
In my dream world, I'd run this hypothetical command sequence:
terraform plan
terraform document <- Here it auto generates every argument as a variables block and inserts it into variables.tf. It also auto generates every possible output "out_putable" {} block and inserts it into outputs.tf.
terraform apply -update-inputs -update-outputs <- Here everything that was optional (known after apply) is now known and it should auto update variables.tf and outputs.tf accordingly. Adding a -update-modules flag lets it take care of that additional layer introduced by using modules.
This feels like a problem that has been addressed before. Before I write a custom tool that parses Terraform web docs and the output of terraform show, is there already a way to do this? Terraform-docs is the closest I've come to finding a solution for README.md. If it can do what I need, I haven't figure it out yet.
How can I automate all this?
############
UPDATE
############
This article and video is spot-on when it comes to Terraform's evolution in an organization. My organization is somewhere between late-stage pattern 3 and early 5. As we decompose our "Terralith" we have inconsistencies among teams (patterns, naming conventions, variable and argument choices etc). These are starting to cause errors in CI/CD forcing a ticket-review process that is slowing things down.
All resources have required and optional arguments. But in my organization, we have, for example, additional optional arguments that are required for us.
Scenario: Dev A in Japan creates a resource, forgets an optional variable or two or names them something obscure, etc. Dev B in America is blocked until they can convene and discuss. Given time zones, language differences, ticket review, this one issue is now a week or more delayed.
I need to automate this and create exact consistency so that Dev A starts out with exactly what Dev B would start with or is expecting; and, what CI/CD tests are expecting - templating the initial process, if you will. In other words, I need to remove the human element of manually creating main.tf, variables.tf, outputs.tf, etc.
Here are thoughts on how to achieve this:
Use Golang to autogenerate the files by querying the API
How can I query the API to get a list of all required arguments for a specific resource?
I found that I can query for provider information, but I can't find info to retrieve resource information. My thinking is when a developer wants to create a new resource, He'll run a go or typescript to generate the manifest files along with expected naming conventions, and populate main.tf, variables.tf, outputs.tf, etc, with exactly what data that everyone is expecting. I'm looking form something like curl registry.terraform.io/providers/hashicorp/azurerm/v2.99/resource_group?required=yes This should show me all required arguments along with descriptions and other info I can use straight from the API.
Use CDKTF to generate an HCL manifest.tf file from JSON
How can I use CDKTF to generate an HCL .tf file?
CDKTF is EXACTLY what I'm looking for - except in reverse. HCL is seamlessly compatible with JSON. Running cdktf synth creates ./out/cdk.tf.out I'm so close! How do I turn that file into main.tf?!?
The goal here is to have a master file from which all future manifest files are derived. Whether we use azurerm_kubernetes_cluster 1 time or 1000 times, I know for certain that every argument, every variable name, every desired output is exactly the same. If a chance is needed in our desired structure, it will be updated at the JSON level, and CI/CD can ensure those changes are propagated across instances of its use.
I know that I can use the cdk.out.tf file as a drop in replacement for a module, but I don't want my team members to have to learn typescript or how to read json. If I can create a templatized JSON file containing exactly what I'm expecting users to start with, and if they can run some command like cdktf convert cdk.tf.out --HCL output-file.tf then I've accomplished my goal.
If cdktf synth can create an HCL JSON file, and cdktf convert can take a manifest.tf file and turn it into HCL JSON, can't it do the exact opposite? Turn the HCL JSON file into the human-readable, declarative, manifest.tf file?
Perhaps think of it this way. Terraform has a required file structure for a module if it's to be allowed into the module registry. I'm trying to create a similar required structure for each of the resources our organization uses regardless of when and where it's used.
If your goal is to derive input variables and output values from resource type schemas then Terraform can provide you with the information to do so.
In the working directory of a configuration that already uses the provider whose resource type you want to use, run the following command:
terraform providers schema -json
The result contains a JSON description of all of the resource types available in the providers for the current configuration, and for each one the metadata about its attributes, including the type constraint information and descriptions for each one.
From that you can generate whatever other files you need based on that information.
Note that if you are intending to build modules which export the entire surface area (all inputs and all outputs) of a particular resource type the Terraform documentation explicitly recommends against this, suggesting to just use the resource type directly instead since such a module would often not offer sufficient benefit to outweigh the additional complexity and maintenance overhead it implies:
In principle any combination of resources and other constructs can be factored out into a module, but over-using modules can make your overall Terraform configuration harder to understand and maintain, so we recommend moderation.
A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers.
For example, aws_instance and aws_elb are both resource types belonging to the AWS provider. You might use a module to represent the higher-level concept "HashiCorp Consul cluster running in AWS" which happens to be constructed from these and other AWS provider resources.
We do not recommend writing modules that are just thin wrappers around single other resource types. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.
I've got the same question and develop a small bash script to create output definitions based on module code
This code required the hcledit tool to extract blocks from hcl code
#!/usr/bin/env bash
set -o pipefail
_hcledit=$(which hcledit)
for tf_file in $(ls *.tf); do
cat $tf_file | $_hcledit block list | while read line; do
block_type="${line%%.*}"
line="${line#*.}"
case $block_type in
locals|output|variable|data) continue; break ;;
module)
output_name=$line
output_description="Module '$output_name' attributes"
output_value="$block_type.$output_name"
;;
resource)
label_kind="${line%.*}"
label_name="${line#*.}"
output_name="${label_kind}_${label_name//[\-]/_}"
output_description="Resource '$label_kind.$label_name' attributes"
output_value="$label_kind.$label_name"
;;
esac
cat <<-EOT
output "$output_name" {
description = "$output_description"
value = $output_value
}
EOT
done
done

Am I misunderstanding something about using the List and Fetch combination?

I am trying to understand the combination of List and Fetch processors.
I have a directory with three JSON files and I get the ListAzureDataLakeStorage to list them. But when I connect a FetchAzureDataLakeStorage with which I intend to take only one of the files, the Fetch takes the same file three times. In summary, it takes the file whose azure.filename matches with the value that I put in the File Name property, but as many times as there are files in the listed directory.
I really want to use a single List and connect three Fetches to it, each one to take a different file, and thus use them for different streams.
In each Fetch I put in the "File Name" property the name of the file that I want to take. For example:
File Name: fileName1.json
I have also tried putting in "File Name" with Expression Language the following:
FileName: $ {azure.filename: equals ('fileName1.json')}. But this option causes a 404 empty body error.
But there is no way. Am I misunderstanding something about using the List and Fetch combination?
If you are statically entering file names and you want to respond to each one differently, then the ListX processors aren't very beneficial to your flow.
The easier option would be to use a GenerateFlowFile processor with the appropriate schedule to trigger a corresponding FetchX processor.
If you're only doing this for 3 files, it's not too much manual overhead. You could also achieve something similar using RouteOnContent/Attribute.

Is there a way to have multiple programs without having multiple projects?

I'm using Codeblocks and don't want to create a new project every time i want to code something different. Is there any way to have something like a single project and then just open the files to work on them?
In other words for example: I have one CPP file with some arrays and another file to read and write data from a text file. What I'm currently doing is having another separated project for the second file I described.
In a single project (with a single call of the main function) you can have, for example, a menu that calls several functions that execute the different pieces of code that interest you.
You can only have one int main() function call.

Creating Multiple TOC for my ROBOHELP

I am creating a robo help for my project and want to create a role based robo help i.e some pages will be hidden for some roles. For that I created multiple Table Of Contents. My question is how to call multiple TOC from my Java project depending on the roles.
You can create multiple TOCs in RoboHelp, but you can only output one at a time. In other words, each compiled Help can only contain one TOC. But you can create multiple Helps, each with different TOCs and then you just have to tell your Java code which Help you want to open.
Create one Single Source Layout for each TOC you want to produce.
Specify which TOC you want to use in each Single Source Layout (Single Source Layout >> Content settings).
Generate each Single Source Layout to different folders (Single Source Layout >> General >> Output Location settings).
Tell Java which Help to launch based on your own critera.

Reference custom table in WiX fragment file

I want to create a fragment file that will only contain a CustomTable in the file. This is easy enough, but I do not know how to link/include it back into the main product.wxs file.
The fragment file is in the same project as the product file, and I have also tried adding an include tag for the file without success, and even putting the custom table into a WiX include file.
Is there a way to do this? Or is it going to have to live in the product file?
The WiX toolset compiles and links in a similar manner to the C/C++ compiler. The linker starts at the "main" entry point (Product element, in your case) then follows the references from there, which in turn follows references from there until all are resolved.
Part of your question is missing but based on the title I'm going to guess that you want a CustomTable element. Typically that CustomTable is processed by a CustomAction. There are a couple good ways to reference a CustomAction.
I would not use an include file.
You could try using EnsureTable if you'd like to make sure the table is created whether or not there is data in it. If you'd like to separate the custom table's schema definition from the data I believe you can just define them in separate fragments and reference the schema definition from the data definition fragment by opening with <CustomTable Id="your table name"> and defining the rows of data within it.
In general Wix won't pull fragments into the main authoring unless they contain elements that are referred to somewhere and since there is currently no such thing as CustomTableRef you may opt to use other elements such as an empty PayloadGroup or ComponentGroup that you can refer to (using a PayloadGroupRef or ComponentGroupRef respectively) from your main Bundle, Product or Module element as the case may be.

Resources