Spring Batch: Reading Multi-Line Record from Flat File - spring-boot

I have the following problem to solve:
There is a flat file to read, but the information is unfortunately spread over two rows. So i need to merge these two rows.
I thought about creating an incomplete object first and then add the information from the next row. Then move to the next couple. But i don't really see how to manage that.
Is there a way to read two lines and then process, or to remember an object from one to another step. I'm quite confused.
Any hint would be appreciated. Thanks.

This is a perfect use case for using a SingleItemPeekableItemReader. Check out this older answer for an example.

Related

How to rank values from asc/descending?

Struggling to find rank values from highest to lowest, please see attached example of what I'm trying to achieve.
My current custom expression is:
Sum([ViolationAmt])
I have tried this:
Sum([ViolationAmt]) over Rank([ViolationAmt])
I've played around with the rank expressions however unable to implement...would be very grateful for some help.
Spotfire Rank Example
I need to make a lot of assumptions here because I don't know anything about your data set or really what your end goal is, so please comment back and/or provide more info in your question if I am off base.
the first assumption is that each row in your dataset represents one, for simplicity, [AccountID] with a [ViolationAmt]. I'm also guessing you want to show the top N accounts with the highest violations in a table, since that's what you've shown here.
so it sounds like you are going to need two calculated columns: one for getting the total [ViolationAmt] per account, and then another to rank them.
for the first, create a column called [TotalViolationAmt] or somesuch and use:
Sum([ViolationAmt]) OVER ([AccountID])
for the second:
Rank([TotalViolationAmt])
it will be useful to read the documentation on ranking functions if you haven't already.
you could probably combine these two into a single column with something like:
Rank(Sum([ViolationAmt]) OVER ([AccountID]))
but I haven't tested this at all. again, if you put in a bit more detail about what you're trying to accomplish it will help you get a better, more detailed answer :)

Logstash drop lines in file

I am wondering if it is possible for logstash to ignore first 2 lines of a file? I have looked in many places and the only solution seems to be using an if to check if the line is certain text, and if so drop .. but this seems extremely inefficient as I know for a fact I only need to drop first 2 lines and dont need to "if..then" check millions or even thousands of lines that follow.
Thanks.
The simple answer is no, not with the existing versions. The logstash file input only has an option for start_postition => beginning.
If you feel strongly about it, you could always fork the file input, update it with a start_position => skip_lines, add another parameter to specify how many lines, and then submit a pull request back to Elastic and it might get implemented.

error: spc0027: No relationship found among attributes in grid 'mygrid'

Hope some1 can help.
I copied this grid from another WP and seems like its not working. No clue why. Checked all I can but it doesnt work. In the other WP it works properly with another grids.
Any ideas?
Check the columns in the new grid. Does it have the same attributes than the source grid?
If you copied them from a different KB, it is highly possible that the attributes does not keep the same...
The specifier tries to find to which table the attributes belong. Don't think 'it should work because I copied it from a working wkp' , instead think 'some attribute is messing the specification' and take one by one out until it works, check if the obtained table is the desired one or another one. If it is the desired one, then the last attribute you withdrew is the one messing up the specification process. If it is another table than the desired one, try to withdraw from the grid the remaining attributes and put the ones you withdrew before back into the grid, and repeat the process. At some point you will understand which one is messing up. Also keep in mind that there might be other places with attributes, such as in the 'Load Event', they also influence the process.
It means that Genexus can not resolve the base table, you may be using (or referencing) attributes of different transactions and genexus can not resolve the relationship between them.

How to split a large csv file into multiple files in GO lang?

I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.

MongoDB find and remove - the fastest way

I have a quick question, what is the fast way to grab and delete an object from a mongo collection. Here is the code, I have currently:
$cursor = $coll->find()->sort(array('created' => 1))->limit(1);
$obj = $cursor->getNext();
$coll->remove(array('name' => $obj['name']));
as you can see above it grabs one document from the database and deletes it (so it isn't processed again). However fast this may be, I need it to perform faster. The challenge is that we have multiple processes doing this and processing what they have found BUT sometimes two or more of the processes grab the same document therefore making duplicates. Basically I need to make it so a document can only be grabbed once. So any ideas would be much appreciated.
Peter,
It's hard to say what the best solution is here without understanding all the context - but one approach which you could use is findAndModify. This will query for a single document and return it, and also apply an update to it.
You could use this to find a document to process and simultaneously modify a "status" field to mark it as being processed, so that other workers can recognize it as such and ignore it.
There is an example here that may be useful:
http://docs.mongodb.org/manual/reference/command/findAndModify/
Use the findAndRemove function as documented here:
http://api.mongodb.org/java/current/com/mongodb/DBCollection.html
The findAndRemove function retrieve and object from the mongo database and delete it in a single (atomic) operation.
findAndRemove(query, sort[, options], callback)
The query object is used to retrieve the object from the database (see collection.find())
The sort parameter is used to sort the results (in case many where found)
I make a new answer to remark the fact:
As commented by #peterscodeproblems in the accepted answer. The native way to this in mongodb right now is to use the
findAndModify(query=<document>, remove=True)
As pointed out by the documentation.
As it is native, and atomic, I expect this to be the faster way to do this.
I am new to mongodb and not entirely sure what your query is trying to do, but here is how I would do it
# suppose database is staging
# suppose collection is data
use staging
db.data.remove(<your_query_criteria>)
where is a map and can contain any search criteria you want
Not sure if this would help you.

Resources