Magic Chunks wildcard path - magic-chunks

I want to replace the value property of multiple web.config file settings. For this we already use Magic Chunks so rather than write out multiple transformations, I was hoping to target every instance of my particular use.
The short question being: Is it possible to target multiple transformations with Magic Chunks using a wildcard expression?

Related

Is there some standard for handling sets of ranges of numbers?

When you open the print dialog of an editor, you typically get a field for specifying which pages you want to print - which can be multiple ranges, e.g.: "5,11,31-33"
Now, there are other scenarios in which this kind of input from a user is relevant - especially in configuration files for sequential or iterative processes where you want to qualify which iterations or elements a certain action or feature should apply to.
However, I'm not aware of a name for this kind of strings; nor of an accepted standard format/convention for them (i.e. can you add spaces? Can you use semicolons instead of commas? Must the ranges be sorted? Are overlaps allowed and are they maintained or discarded? Can ranges use ".." instead of "-"? Can you range down instead of up? etc.).
Is there some such convention or such standard?
My motivation is double: I need to parse such ranges in a piece of code I'm looking at, any I want both to do it correctly (or rather per-convention), and secondly to go look for parsing functionality in existing libraries. Right now I don't even have a name to go on.

Am I misunderstanding something about using the List and Fetch combination?

I am trying to understand the combination of List and Fetch processors.
I have a directory with three JSON files and I get the ListAzureDataLakeStorage to list them. But when I connect a FetchAzureDataLakeStorage with which I intend to take only one of the files, the Fetch takes the same file three times. In summary, it takes the file whose azure.filename matches with the value that I put in the File Name property, but as many times as there are files in the listed directory.
I really want to use a single List and connect three Fetches to it, each one to take a different file, and thus use them for different streams.
In each Fetch I put in the "File Name" property the name of the file that I want to take. For example:
File Name: fileName1.json
I have also tried putting in "File Name" with Expression Language the following:
FileName: $ {azure.filename: equals ('fileName1.json')}. But this option causes a 404 empty body error.
But there is no way. Am I misunderstanding something about using the List and Fetch combination?
If you are statically entering file names and you want to respond to each one differently, then the ListX processors aren't very beneficial to your flow.
The easier option would be to use a GenerateFlowFile processor with the appropriate schedule to trigger a corresponding FetchX processor.
If you're only doing this for 3 files, it's not too much manual overhead. You could also achieve something similar using RouteOnContent/Attribute.

Alternative to using ant <modified> selector?

I'm trying to find an alternative to a helpful piece of functionality provided by ant - the <modified> selector.
When specifying a set of files in ant, you can use the <modified> selector to only include files whose content has changed since the last time it was run.
The selector computes a value for a file, compares that to the value stored in a cache and selects the file if these two values differ.
Is there an existing way of doing this in bash? I don't want to use a full blown build tool or similar just to return a list of modified file paths.

What is the most efficient way to make sure hadoop skips certain input files?

I have a hadoop application that -depending on a parameter- only needs certain (few!) input files from the input directory. My question is now: where is the best place (read: as early as possible) to skip those files? Right now I customized a RecordReader to take care of that, but I was wondering whether I could skip those files sooner? In my current implmentation hadoop still has a huge overhead due to irrelevant files.
Maybe I should add that it is very easy to see whether I need a certain input file. If the filename starts with a parameter, it is needed. Structuring my input directory hierachically might be a solution, but one that is not very likely for my project since every files would end up lonely in a certain directory.
I'd propose you to filter out the input files by applying the appropriate pattern on the input Paths as mentioned here: https://stackoverflow.com/a/13454344/1050422
Note that this solution doesn't consider subdirectories. Alter it
to be able to recursively visit all subdirectories, within the base path.
I've had success with using the setInputPaths() method on TextInputFormat to specify a single String containing comma-separated file names.

What is the best way to edit the middle of an existing flat file?

I have tool that creates variables for a simulation. The current workflow involves hand copying those variables into the simulation input file. The input file is a standard flat file, i.e. not binary or XML. I would like to automate the addition of the variables to the flat input file.
The variables copy over existing variables in the file, e.g.
New Variables:
Length 10
Height 20
Depth 30
Old Variables:
...
Weight 100
Age 20
Length 10
Height 20
Depth 30
...
Would like to have the old variables copy over the new variable. They are 200 lines into the flat input file.
Thanks for any insights.
P.S. This is on Windows.
If you're stuck using flat, then you're stuck using the old fashioned way of updating them: read from original, write to temp file, either write the original row or change the data and then write that. To add data, write it to the temp file at the appropriate point; to delete data, simply don't copy it from the original file.
Finally, close both files and rename the temp file to the original file name.
Alternatively, it might be time to think about a little database.
For something like this I'd be looking at a simple template engine. You'd have a base template with predefined marker tokens instead of variable values and then just pass the values required to your engine along with the template and it will spit out the resultant file, all present and correct. There are a number of Open Source template engines available in Java that would meet your needs, I imagine such things are also available in your language of choice. You could even roll your own without too much difficulty.
Note that under Unix, one would probably look at using mmap() because you can then use functions such as memmove() to move the data around and add new data or truncate() the result if the file is then smaller (you may also want to use truncate() to grow the file).
Under MS-Windows, you have the MapViewOfFileEx() function to do the same thing. The API is different, though,
and I'm not exactly sure what happens or how to grow/shrink the file (MSDN also includes a truncate()-like function and maybe that works).
Of course, it's important to use memcpy() or memmove() properly to not overwrite the wrong data or go outside the buffer.

Resources