Handling Big Data with OSB Proxy - oracle

I have created a OSB Proxy Service(Messaging Service) which loading the data with a MFL file.
The format of data is:
1/1/2007;00:11:00;2.500;0.000;242.880;10.200;0.000;0.000;0.000;
1/1/2007;00:12:00;2.494;0.000;242.570;10.200;0.000;0.000;0.000;
All the data records are : 2075259
The total size of file(.txt or .data) is : 130MB.
Which is best way to handling all these data in order to inserted to an OSB Proxy and transformed all the data in a simple xml file?
I have tested with a small size of records(5000) and it works as expected but how i should insert all this data in the proxy?
The MFL transformation is a valid idea or i should create a FileAdapter Proxy which will received the data from a dbtable?
Please for your suggestion
Thank you in advance.

ESBs are efficient at handling messages in the order of KBs, not MBs, although this is very subjective and depends a lot on the number of concurrent requests, transactions per second, sizing of hardware etcetera. As Trent points out in a comment, you could implement a claim check pattern and delegate the file transformation to an external utility, such as perl or similar.

Related

Java8 Stream or Reactive / Observer for Database Requests

I'm rethinking our Spring MVC application behavior, whether it's better to pull (Java8 Stream) data from the database or let the database push (Reactive / Observable) it's data and use backpressure to control the amount.
Current situation:
User requests the 30 most recent articles
Service does a database query and puts the 30 results into a List
Jackson iterates over the List and generates the JSON response
Why switch the implementation?
It's quite memory consuming, because we keep those 30 objects in memory all the time. That's not needed, because the application processes one object at a time. Though the application should be able to retrieve one object, process it, throw it away, and get the next one.
Java8 Streams? (pull)
With java.util.Stream this is quite easy: The Service creates a Stream, which uses a database cursor behind the scenes. And each time Jackson has written the JSON String for one element of the Stream, it will ask for the next one, which then triggers the database cursor to return the next entry.
RxJava / Reactive / Observable? (push)
Here we have the opposite scenario: The database has to push entry by entry and Jackson has to create the JSON String for each element until the onComplete method has been called.
i.e. the Controller tells the Service: give me an Observable<Article>. Then Jackson can ask for as many database entries as it can process.
Differences and concern:
With Streams there's always some delay between asking for next database entry and retrieving / processing it. This could slow down the JSON response time if the network connection is slow or there is a huge amount of database requests that have to be made to fulfill the response.
Using RxJava there should be always data available to process. And if it's too much, we can use backpressure to slow down the data transfer from database to our application. In the worst case scenario the buffer/queue will contain all requested database entries. Then the memory consumption will be equal to our current solution using a List.
Why am I asking / What am I asking for?
What did I miss? Are there any other pros / cons?
Why did (especially) the Spring Data Team extend their API to support Stream responses from the database, if there's always a (short) delay between each database request/response? This could sum up to some noticeable delay for a huge amount of requested entries.
Is it recommended to go for RxJava (or some other reactive implementation) for this scenario? Or did I miss any drawbacks?
You seem to be talking about the fetch size for an underlying database engine.
If you reduce it to one (fetching and processing one row at a time), yes you will save some space during the request time...
But it usually makes sense to have a reasonable chunk size.
If it is too small you will have a lot of expensive network roundtrips. If the chunk size is too large, you are risking to run out of memory or introduce too much of a latency per fetch. So it is a compromise, and the right chunk/fetch size depends on your specific use case.
Regarding reactive approach or not, I believe it is not relevant. Like with RxJava and say Cassandra, one can create an Observable from an asynchronous result set, and it is up to the query (configuration) how many items should be fetched and pushed at a time.

Talend Open Studio for ESB 5.2 Route to Job Optimisation/Performance Issue

Using Talend ESB 5.2.0, I want to create a mediation route that will call a processing job on the payload of an inbound request to a CXF messaging endpoint, however my current implementation is suffering some performance issues with large payloads.
I’ve investigated the issue and found that the bottleneck is in marshalling my inbound XML payload from the tRouteInput component to the internal row structure for processing, using a tXMLMap.
Is it possible, using a built-in type converter in the route, to marshal the internal row structure from the route and stream through POJOs or transport objects that are cheaper to process in the job? Or is there a better way to marshal XML to Talend’s internal row structure from a route using a less expensive transform?
Any thoughts would be welcome.
Cheers,
mids
It turns out that the issue was caused by the format of the inbound XML payload - having more than one loop element mapping to separate output flows from the tXMLMap generates relative links for each item for each output flow, enabling more advanced processing involving the loops if required.
This caused the large memory overhead that led to poor throughput.
Not requiring any more advanced processing in the XML to Talend row conversion, we overcame this issue by splitting the payload to its distinct loop elements using tReplicate and tExtractXMLField components before mapping out of the XML in separate tXMLMaps to avoid the auto-generation of those links.- mids

GWT RequestFactory Performance

I have a question regarding the performance of RequestFactory and GWT. I have a Domain Entity with 8 fields that returns around 1000 EntityProxies. The time between the request fires and it responds is around 20 seconds. I do the same but returning 10 EntityProxies and the time is 17 seconds, almost the same.
Is this because I'm working in development mode, or when I release the code to the web the time will be the same?
Is there any way to improve the performance? , I'm only reading data so perhaps something that only read and doesn't writes may be the solution?
I read this post with something similar to my problem:
GWT Requestfactory performance suggestions
Thanks a lot.
PD: I read somewhere that one solution could be to create an xml in the server, send it to the client and recreate the object there, I don't want to do this because it would really change the design of my app.
Thank you all for the help, I realize now that perhaps using Request Factory to retrieve thousands of records was a mistake.
I initially used a Locator to override isLive() and Find() methods according to this post:
gwt-requestfactory-performance-suggestions
The response time was reduced to about 13 seconds, but it is still too high.
But I solved it easily. Instead of returning 1000+ Entities , I created a new database table which each field has all the same field records (1000+) concatenated by a separator (each db field has a length of about 10000 ) and I only have one record in the table with around 8 fields.
Something like this:
Field1 | Field2 | Field3
Field1val;Field1val;Field1val;....... | Field2val;Field2val;Field2val;...... | Field3val;Field3val;Field3val;......
I return that One record through RequestFactory to my client and it reduced the speed a lot!, around 1 second. I parse this large String in the client and the duration of that is about 500ms. So instead of wasting around 20 seconds now it takes around 1-2 seconds to accomplish the same.
By the way I am only displaying information, it is not necessary to Insert, Delete or Update records so this solution works for me.
Thought I could share this solution.
Performance Profiling and Fixing issues in GWT is tricky. Avoid all profiling in GWT Hosted mode. They do not mean anything useful.
You should profile only in WEB mode.
GWT RequestFactory by design is slower than GWT RPC and GWT JSON etc. This is a trade off w.r.t GWT RF ability to calculate delta and send only small amount information to server on save.
You should recheck you application design to avoid loading 1000's of proxies. RF is mean for "Form" like applications. The only reason you might need 1000's of proxies is for a Grid display. You probably can use paginated async grid in that scenario.
You should profile your app in order to find out how much time is spent on following steps:
Entities retrieved from the database (server): This can be improved using second level cache and optimized queries
Entities serialized to JSON (server): There is a overhead here because RequestFactory and AutoBean respectively rely on reflections. You can try to only transmit the Entities that you are also going to display on the client. Another optimization which greatly reduces latency is to override the isLive method of your EntitiyLocator and to return true
HTTP request from server to client to tranmit the data (wire): You can think about using gzip compression to reduce the amount of data that has to be transferred (important if you send a lof of objects over the wire).
De-serialization on the client (client): This should be quite fast. There was a benchmark that showed that AutoBean serialization was one of the fastest ways to serialize JSON. Again this will benefit from not sending the whole object graph over the wire.
One way to improve performance is to use caching. You can use HTML5 localstorage to cache data on the client. This applies specifically to data that doesn't change often.

Large number of concurrent ajax calls and ways to deal with it

I have a web page which, upon loading, needs to do a lot of JSON fetches from the server to populate various things dynamically. In particular, it updates parts of a large-ish data structure from which I derive a graphical representation of the data.
So it works great in Chrome; however, Safari and Firefox appear to suffer somewhat. Upon the querying of the numerous JSON requests, the browsers become sluggish and unusable. I am under the assumption that this is due to the rather expensive iteration of said data structure. Is this a valid assumption?
How can I mitigate this without changing the query language so that it's a single fetch?
I was thinking of applying a queue that could limit the number of concurrent Ajax queries (and hence also limit the number of concurrent updates to the data structure)... Any thoughts? Useful pointers? Other suggestions?
In browser-side JS, create a wrapper around jQuery.post() (or whichever method you are using)
that appends the requests to a queue.
Also create a function 'queue_send' that will actually call jQuery.post() passing the entire queue structure.
On server create a proxy function called 'queue_receive' that replays the JSON to your server interfaces as though it came from the browser, collects the results into a single response, sends back to browser.
Browser-side queue_send_success() (success handler for queue_send) must decode this response and populate your data structure.
With this, you should be able to reduce your initialization traffic to one actual request, and maybe consolidate some other requests on your website as well.
in particular, it updates parts of a largish data structure from which i derive a graphical representation of the data.
I'd try:
Queuing responses as they come in, then update the structure once
Hiding the representation invisible until the responses are in
Magicianeer's answer is also good - but I'm not sure if it fits your definition of "without changing the query language so that it's a single fetch" - it would avoid re-engineering existing logic.

ETL , Esper or Drools?

The question environment relates to JavaEE, Spring
I am developing a system which can start and stop arbitrary TCP (or other) listeners for incoming messages. There could be a need to authenticate these messages. These messages need to be parsed and stored in some other entities. These entities model which fields they store.
So for example if I have property1 that can have two text fields FillLevel1 and FillLevel2, I could receive messages on TCP which have both fill levels specified in text as F1=100;F2=90
Later I could add another filed say FillLevel3 when I start receiving messages F1=xx;F2=xx;F3=xx. But this is a conscious decision on the part of system modeler.
My question is what do you think is better to use for parsing and storing the message. ETL (using Pantaho, which is used in other system) where you store the raw message and use task executor to consume them one by one and store the transformed messages as per your rules.
One could use Espr or Drools to do the same thing , storing rules and executing them with timer, but I am not sure how dynamic you could get with making rules (they have to be made by end user in a running system and preferably in most user friendly way, ie no scripts or code, only GUI)
The end user should be capable of changing the parse rules. It is also possible that end user might want to change the archived data as well (for example in the above example if a new value of FillLevel is added, one would like to put a FillLevel=-99 in the previous values to make the data consistent).
Please ask for explanations, I have the feeling that I need to revise this question a bit.
Thanks
Well Esper is a great CEP engine, but drools has it's own implementation Drools Fusion which integrates really well with jBpm. That would be a good choice.

Resources