Clickhouse http interface read Native/RowBinary Format data - clickhouse

Want to get the query statistics info like rows_read/bytes_read for a query. But using the native interface(golang client), it seems impossible.
And using Http interface we could get it in the header returned. But using JSON/TabSeparated format, the server CPU seems cost much in encoding. So want to use the Native/RowBinary format instead to reduce the cost.
But don't find a recommended way to read and resolve the data...

Related

In nifi usgae of Evaluate jsonpath processor will it affect performance impact because of attribute creation

I'm trying to integrate nifi REST API's with my application. So by mapping input and output from my application, I am trying to call nifi REST api for flow creation. So, in my use case most of the times I will extract the JSON values and will apply expression languages.
So, for simplifying all the use-cases I am using evaluate JSONpath processor for fetching all attributes using jsonpath and apply expression language function on that in extract processor. Below is the flow diagram regarding that.
Is it the right approach because for JSON to JSON manipulation having 30 keys this is the simplest way, and as I am trying to integrate nifi REST API's with my application I cannot generate JOLT transformation logic dynamically based on the user mapping.
So, in this case, does the usage of evaluating JSONpath processor creates any performance issues for about 50 use case with different transformation logic because as I saw in documentation attribute usage creates performance(regarding memory) issues.
Your concern about having too many attributes in memory should not be an issue here; having 30 attributes per flowfile is higher than usual, but if these are all strings between 0 - ~100-200 characters, there should be minimal impact. If you start trying to extract KB worth of data from the flowfile content to the attributes on each flowfile, you will see increased heap usage, but the framework should still be able to handle this until you reach very high throughput (1000's of flowfiles per second on commodity hardware like a modern laptop).
You may want to investigate ReplaceTextWithMapping, as that processor can load from a definition file and handle many replace operations using a single processor.
It is usually a flow design "smell" to have multiple copies of the same flow process with different configuration values (with the occasional exception of database interaction). Rather, see if there is a way you can genericize the process and populate the relevant values for each flowfile using variable population (from the incoming flowfile attributes, the variable registry, environment variables, etc.).

How to conditionally process FlowFile's by a MongoDB query result?

I need to process a list of files based on the result of a MongoDB query, but I can't find any processor that would let me do that. I basically have to take each file and process it or completely discard based on the result of a query that involves that file attributes.
The only MongoDB-related processor that I see in NiFi 1.50 is GetMongo, which apparently can't receive connections, but only emit FlowFiles based on the configured parameters.
Am I looking in the wrong place?
NIFI-4827 is an Improvement Jira that aims to allow GetMongo to accept incoming flow files, the content would contain the query, and the properties will accept Expression Language. The code is still under review, but the intent is to make it available in the upcoming NiFi 1.6.0 release.
As a possible workaround in the meantime, if there is a REST API you could use InvokeHttp to make the call(s) manually and parse the result(s). Also if you have a JDBC driver for MongoDB (such as Unity), you might be able to use ExecuteSQL.

Handling Big Data with OSB Proxy

I have created a OSB Proxy Service(Messaging Service) which loading the data with a MFL file.
The format of data is:
1/1/2007;00:11:00;2.500;0.000;242.880;10.200;0.000;0.000;0.000;
1/1/2007;00:12:00;2.494;0.000;242.570;10.200;0.000;0.000;0.000;
All the data records are : 2075259
The total size of file(.txt or .data) is : 130MB.
Which is best way to handling all these data in order to inserted to an OSB Proxy and transformed all the data in a simple xml file?
I have tested with a small size of records(5000) and it works as expected but how i should insert all this data in the proxy?
The MFL transformation is a valid idea or i should create a FileAdapter Proxy which will received the data from a dbtable?
Please for your suggestion
Thank you in advance.
ESBs are efficient at handling messages in the order of KBs, not MBs, although this is very subjective and depends a lot on the number of concurrent requests, transactions per second, sizing of hardware etcetera. As Trent points out in a comment, you could implement a claim check pattern and delegate the file transformation to an external utility, such as perl or similar.

go rpc, http or websockets,which is fastest for transferring many small pieces of data, repeatedly, from one server to another

Background
I'm experimenting creating a memory + cpu profiler in go, and wish to transfer the information quickly, maybe every second, from the program/service being profiled to a server which will do all of the heavy lifting by saving the data to a database and/or serving it via http to a site; this will reduce the load on the program being profiled for more accurate measurements. It will be small pieces of data being transferred. I know there are some libraries out there already, but like I said, experimenting.
Transfer Content Type
I have not decided on a concrete transfer type but looks like JSON for HTTP or Websockets and just the Struct for RPC (if I've done my research correctly)
Summary
I will likely try each just to see for myself, but have little experience using RPC and Websockets and would like some opinions or recommendations on which may be faster or more suitable for what I'm trying to do:
HTTP
RPC
Websockets
Anything else I'm not thinking about
As you mentioned in your comment, HTTP is not a requirement.
In this case in search for the fastest transferring solution I would completely drop the HTTP transport layer and would use just plain (TCP) socket connections as HTTP gives quite a big overhead just for transferring a few bytes.
Define your own protocol (which may be very simple), open a TCP connection to the server, and send the data packets every seconds or so as your requirements dictate.
Your protocol for sending (and receiving) data can be as simple as:
Do an optional authenticating or client/server identification (to ensure you connected to the server/program you wanted to).
Use the encoding/gob packgae from the standard library to send data in binary form over the connection.
So basically the profiled program (client) should open the TCP connection, and use gob.NewEncoder() wrapping the connection to send data. The server should accept the incoming TCP connection and use gob.NewDecoder() to wrapping the connection to recieve data.
Client calls Encoder.Encode() so send profiling info, it can be typically a struct value. Server calls Decoder.Decode() to receive a profiling info, the struct that the client sent. That's all.
Sending data in binary form using the encoding/gob package requires you to use the same type to describe the profiling data on both sides. If you want more flexibility, you may also use the encoding/json package to send/receive profiling info as JSON text. The downside is that JSON will require more data to be sent and it takes more time to produce and parse the JSON text compared to the binary representation.
If loosing some profiling packets (or recieving duplicates) is not an issue, you may want to try/experiment using UDP instead of TCP which may be even more efficient.

GWT RequestFactory Performance

I have a question regarding the performance of RequestFactory and GWT. I have a Domain Entity with 8 fields that returns around 1000 EntityProxies. The time between the request fires and it responds is around 20 seconds. I do the same but returning 10 EntityProxies and the time is 17 seconds, almost the same.
Is this because I'm working in development mode, or when I release the code to the web the time will be the same?
Is there any way to improve the performance? , I'm only reading data so perhaps something that only read and doesn't writes may be the solution?
I read this post with something similar to my problem:
GWT Requestfactory performance suggestions
Thanks a lot.
PD: I read somewhere that one solution could be to create an xml in the server, send it to the client and recreate the object there, I don't want to do this because it would really change the design of my app.
Thank you all for the help, I realize now that perhaps using Request Factory to retrieve thousands of records was a mistake.
I initially used a Locator to override isLive() and Find() methods according to this post:
gwt-requestfactory-performance-suggestions
The response time was reduced to about 13 seconds, but it is still too high.
But I solved it easily. Instead of returning 1000+ Entities , I created a new database table which each field has all the same field records (1000+) concatenated by a separator (each db field has a length of about 10000 ) and I only have one record in the table with around 8 fields.
Something like this:
Field1 | Field2 | Field3
Field1val;Field1val;Field1val;....... | Field2val;Field2val;Field2val;...... | Field3val;Field3val;Field3val;......
I return that One record through RequestFactory to my client and it reduced the speed a lot!, around 1 second. I parse this large String in the client and the duration of that is about 500ms. So instead of wasting around 20 seconds now it takes around 1-2 seconds to accomplish the same.
By the way I am only displaying information, it is not necessary to Insert, Delete or Update records so this solution works for me.
Thought I could share this solution.
Performance Profiling and Fixing issues in GWT is tricky. Avoid all profiling in GWT Hosted mode. They do not mean anything useful.
You should profile only in WEB mode.
GWT RequestFactory by design is slower than GWT RPC and GWT JSON etc. This is a trade off w.r.t GWT RF ability to calculate delta and send only small amount information to server on save.
You should recheck you application design to avoid loading 1000's of proxies. RF is mean for "Form" like applications. The only reason you might need 1000's of proxies is for a Grid display. You probably can use paginated async grid in that scenario.
You should profile your app in order to find out how much time is spent on following steps:
Entities retrieved from the database (server): This can be improved using second level cache and optimized queries
Entities serialized to JSON (server): There is a overhead here because RequestFactory and AutoBean respectively rely on reflections. You can try to only transmit the Entities that you are also going to display on the client. Another optimization which greatly reduces latency is to override the isLive method of your EntitiyLocator and to return true
HTTP request from server to client to tranmit the data (wire): You can think about using gzip compression to reduce the amount of data that has to be transferred (important if you send a lof of objects over the wire).
De-serialization on the client (client): This should be quite fast. There was a benchmark that showed that AutoBean serialization was one of the fastest ways to serialize JSON. Again this will benefit from not sending the whole object graph over the wire.
One way to improve performance is to use caching. You can use HTML5 localstorage to cache data on the client. This applies specifically to data that doesn't change often.

Resources