I have flowfiles named as (1,3,4,5 and etc) i use this ${filename} attribute for invoking online service, then i got big response and split it line by line but at the end i need to merge my flowfiles based on their name i think mergecontent doesn't work prperly i use Correlation atribute name -filename and i have also increased minimum and maximum number of entries but nothing helped me here is my workflow:
also there is several subject i am interested in:
I think that main reason my mergecontent doesn't work properly is that my file names don't go one after another , can this bean real reason?
Can you reccomend me any better solution for such task?
Simple trick, you should be using filename as its asking for correlation attribute name instead of ${filename}
PS: i guess this answer is 2years late, well you never know whom it will help :P
Related
I am trying to understand the combination of List and Fetch processors.
I have a directory with three JSON files and I get the ListAzureDataLakeStorage to list them. But when I connect a FetchAzureDataLakeStorage with which I intend to take only one of the files, the Fetch takes the same file three times. In summary, it takes the file whose azure.filename matches with the value that I put in the File Name property, but as many times as there are files in the listed directory.
I really want to use a single List and connect three Fetches to it, each one to take a different file, and thus use them for different streams.
In each Fetch I put in the "File Name" property the name of the file that I want to take. For example:
File Name: fileName1.json
I have also tried putting in "File Name" with Expression Language the following:
FileName: $ {azure.filename: equals ('fileName1.json')}. But this option causes a 404 empty body error.
But there is no way. Am I misunderstanding something about using the List and Fetch combination?
If you are statically entering file names and you want to respond to each one differently, then the ListX processors aren't very beneficial to your flow.
The easier option would be to use a GenerateFlowFile processor with the appropriate schedule to trigger a corresponding FetchX processor.
If you're only doing this for 3 files, it's not too much manual overhead. You could also achieve something similar using RouteOnContent/Attribute.
I have the following problem to solve:
There is a flat file to read, but the information is unfortunately spread over two rows. So i need to merge these two rows.
I thought about creating an incomplete object first and then add the information from the next row. Then move to the next couple. But i don't really see how to manage that.
Is there a way to read two lines and then process, or to remember an object from one to another step. I'm quite confused.
Any hint would be appreciated. Thanks.
This is a perfect use case for using a SingleItemPeekableItemReader. Check out this older answer for an example.
I have just started log stash, i have log files in that log file whole object is printed in the logs, Since my object is huge i cant write the grok patterns to the whole object and also i expecting only two values out of those object. Can you please let us know how can i get that?
my logs files looks like below
2015-06-10 13:02:57,903 your done OBJ[name:test;loc:blr;country:india,acc:test#abe.com]
This is just an example my object has lot attributes in int , in those object i need to get only name and acc.
Regards
Mohan.
You can use the following pattern for the same
%{GREEDYDATA}\[name:%{WORD:name};%{GREEDYDATA},acc:%{NOTSPACE:account}\]
GREEDYDATA us defined as follows -
GREEDYDATA .*
The key lie in understanding greedydata macro.
It eats up all possible characters as possible.
Logstash patterns don't have to match the entire line. You could also pull the leading information off (date, time, etc) in one grok{} and then use a different grok{} to pull off just the two fields that you want.
I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.
After the reduce phase in Hadoop, I wanted the output file names to be something meaningful depending on the input key value. However I'm not successful on following the example on "Hadoop: The Definative Guide" which used MultipleTextOutputFormat to do this. The reason is that it's based on old API and it doesn't work on the new API ?
Can anybody hint on the solution or point me to the relevant documentation ?
You are probably right. Most things that worked in the old API don't always work in the new one.
There is a "new way" of doing this now, called MultipleOutputs.