Named pipe performance - windows

I have an application, where I send approx. 125 data items via a named pipe.
Each data item consists of data block 1 with max. 300 characters and data block 2 with max. 600 characters.
This gives 125 data items * (300 + 600) characters * 2 bytes per character = 125 * 900 * 2 = 225000 bytes.
Each data item is surrounded by curly braces like {Message1}{Message2}.
I noticed that if I send the messages, there are sending/receiving problems. Instead of {Message1}{Message2} the receiving application gets {Messa{Message2}.
Then I changed the sending code so that the messages are sent in 500 ms intervals. Then, the problem disappeared.
If I do everything correct (no bugs on my side, no misconfiguration of named pipes), how much time is required to send 225000 bytes over a named pipe from application in Delphi 2009 to application in .NET on the same machine?
What is a reasonable time for sending data of that size?

Related

How to sum certain values from Grafana-Loki logs?

I have this log line:
Successfully encrypted 189322 bytes for upload req_id=MediaUpload
Successfully encrypted 189322 bytes for upload req_id=MediaUpload
Successfully encrypted 492346 bytes for upload req_id=MediaUpload
There's a way to sum the bytes of the matching query log lines? Per example, by the those logs I would like to have a summed value of 870990 bytes or 0.87099 MB.
Is that possible?
Sure you can. Check this out.
I've used the pattern parser to extract the bytes as a number out of your log lines.
Then you can run a range query on top of that:
Eg.
sum by (app)
(sum_over_time(
{app="your-app"}
| pattern `Successfully encrypted <byte_size> bytes for upload req_id=<_>`
| unwrap byte_size
| __error__="" [$__interval]
))
you can change $__interval based on your needs.

Nifi Group Content by Given Attributes

I am trying to run a script or a custom processor to group data by given attributes every hour. Queue size is up to 30-40k on a single run and it might go up to 200k depending on the case.
MergeContent does not fit since there is no limit on min-max counts.
RouteOnAttribute does not fit since there are too many combinations.
Solution 1: Consume all flow files and group by attributes and create the new flow file and push the new one. Not ideal but gave it a try.
While running this when I had 33k flow files on queue waiting.
session.getQueueSize().getObjectCount()
This number is returning 10k all the time even though I increased the queue threshold numbers on output flows.
Solution 2: Better approach is consume one flow file and and filter flow files matching the provided attributes
final List<FlowFile> flowFiles = session.get(file -> {
if (correlationId.equals(Arrays.stream(keys).map(file::getAttribute).collect(Collectors.joining(":"))))
return FlowFileFilter.FlowFileFilterResult.ACCEPT_AND_CONTINUE;
return FlowFileFilter.FlowFileFilterResult.REJECT_AND_CONTINUE;
});
Again with 33k waiting in the queue I was expecting around 200 new grouped flow files but 320 is created. It looks like a similar issue above and does not scan all waiting flow files on filter query.
Problems-Question:
Is there a parameter to change so this getObjectCount can take up to 300k?
Is there a way to filter all waiting flow files again by changing a parameter or by changing the processor?
I tried making default queue threshold 300k on nifi.properties but it didn't help
in nifi.properties there is a parameter that affects batching behavior
nifi.queue.swap.threshold=20000
here is my test flow:
1. GenerateFlowFile with "batch size = 50K"
2. ExecuteGroovyScript with script below
3. LogAttrribute (disabled) - just to have queue after groovy
groovy script:
def ffList = session.get(100000) // get batch with maximum 100K files from incoming queue
if(!ffList)return
def ff = session.create() // create new empty file
ff.batch_size = ffList.size() // set attribute to real batch size
session.remove(ffList) // drop all incoming batch files
REL_SUCCESS << ff // transfer new file to success
with parameters above there are 4 files generated in output:
1. batch_size = 20000
2. batch_size = 10000
3. batch_size = 10000
4. batch_size = 10000
according to documentation:
There is also the notion of "swapping" FlowFiles. This occurs when the number of FlowFiles in a connection queue exceeds the value set in the nifi.queue.swap.threshold property. The FlowFiles with the lowest priority in the connection queue are serialized and written to disk in a "swap file" in batches of 10,000.
This explains that from 50K incoming files - 20K it keeps inmemory and others in swap batched by 10K.
i don't know how increasing of nifi.queue.swap.threshold property will affect your system performance and memory consumption, but i set it to 100K on my local nifi 1.16.3 and it looks good with multiple small files, and first batch increased to 100K by this.

Why does ZeroMQ not receive a string when it becomes too large on a PUSH/PULL MT4 - Python setup?

I have an EA set in place that loops history trades and builds one large string with trade information. I then send this string every second from MT4 to the python backend using a plain PUSH/PULL pattern.
For whatever reason, the data isn't received on the pull side when the string transferred becomes too long. The backend PULL-socket slices each string and further processes it.
Any chance that the PULL-side is too slow to grab and process all the data which then causes an overflow (so that a delay arises due to the processing part)?
Talking about file sizes we are well below 5kb per second.
This is the PULL-socket, which manipulates the data after receiving it:
while True:
# check 24/7 for available data in the pull socket
try:
msg = zmq_socket.recv_string()
data = msg.split("|")
print(data)
# if data is available and msg is account info, handle as follows
if data[0] == "account_info":
[...]
except zmq.error.Again:
print("\nResource timeout.. please try again.")
sleep(0.000001)
I am a bit curious now since the pull socket seems to not even be able to process a string containing 40 trades with their according information on a single MT4 client - Python connection. I actually planned to set it up to handle more than 5.000 MT4 clients - python backend connections at once.
Q : Any chance that the pull side is too slow to grab and process all the data which then causes an overflow (so that a delay arises due to the processing part)?
Zero chance.
Sending 640 B each second is definitely no showstopper ( 5kb per second - is nowhere near a performance ceiling... )
The posted problem formulation is otherwise undecidable.
Step 1) POSACK/NACK prove whether a PUSH side accepts the payload for sending error-free.
Step 2) prove the PULL side is not to be blamed - [PUSH.send(640*chr(64+i)) for i in range( 10 )] via a python-2-python tcp://-transport-class solo-channel crossing host-to-host hop, over at least your local physical network ( no VMCI/emulated vLAN, no other localhost colocation )
Step 3) if either steps above got POSACK-ed, your next chances are the ZeroMQ configuration space and/or the MT4-based PUSH-side incompatibility, most probably "hidden" inside a (not mentioned) third party ZeroMQ wrapper used / first-party issues with string handling / processing ( which you must have already read about, as it has been so many times observed and mentioned in the past posts about this trouble with well "hidden" MQL4 internal eco-system changes ).
Anyway, stay tuned. ZeroMQ is a sure bet and a truly horsepower for professional and low-latency designs in distributed-system's domain.

Yammer JSON Feed returning only 20 messages

I am trying to get all the messages from a particular group. I am getting the json feed back. The only problem is, its returning only 20 messages. Is this set as default or something. Is there any way by by which while doing the request, I can specify whether I want all the messages, by default just 20 or even messages posted between the start and the end date?
My RestApi call is:
https://www.yammer.com/api/v1/messages/in_group/[id].json
From Yammer Developer Documentation
<
Autocomplete: 10 requests in 10 seconds.
Messages: 10 requests in 30 seconds.
Notifications: 10 requests in 30 seconds.
All Other Resources: 10 requests in 10 seconds.
These limits are independent e.g. in the same 30 seconds period, you could make 10 message calls and 10 notification calls. The specific rate limits are subject to change but following the guidelines below will ensure that your app is not blocked.>>
I have tried using limit as the parameter to change the number of message more than 20. But it doesnt seem to be working?
Is this problem because of Rate Limit. If not, what's the problem?
Official documentation from Yammers Developer documentation
Messages - Viewing Messages
Endpoints:
1) All public messages in the user’s (whose access token is being used to make the API call henceforth referred to as current user) Yammer network. Corresponds to “All” conversations in the Yammer web interface.
GET https://www.yammer.com/api/v1/messages.json
2) The user’s feed, based on the selection they have made between “Following” and “Top” conversations.
GET https://www.yammer.com/api/v1/messages/my_feed.json
3) The algorithmic feed for the user that corresponds to “Top” conversations, which is what the vast majority of users will see in the Yammer web interface.
GET https://www.yammer.com/api/v1/messages/algo.json
4) The “Following” feed which is conversations involving people, groups and topics that the user is following.
GET https://www.yammer.com/api/v1/messages/following.json
5) All messages sent by the user. Alias for /api/v1/messages/from_user/logged-in_user_id.format.
GET https://www.yammer.com/api/v1/messages/sent.json
6) Private messages received by the user.
GET https://www.yammer.com/api/v1/messages/private.json
7) All messages received by the user.
GET https://www.yammer.com/api/v1/messages/received.json
Parameters:
The messages API endpoints return a similar structure and support the following query parameters:
older_than - Returns messages older than the message ID specified as a numeric string. This is useful for paginating messages. For example, if you’re currently viewing 20 messages and the oldest is number 2912, you could append “?older_than=2912″ to your request to get the 20 messages prior to those you’re seeing.
newer_than - Returns messages newer than the message ID specified as a numeric string. This should be used when polling for new messages. If you’re looking at messages, and the most recent message returned is 3516, you can make a request with the parameter “?newer_than=3516″ to ensure that you do not get duplicate copies of messages already on your page.
threaded=[true | extended] - threaded=true will only return the first message in each thread. This parameter is intended for apps which display message threads collapsed. threaded=extended will return the thread starter messages in order of most recently active as well as the two most recent messages, as they are viewed in the default view on the Yammer web interface.
limit - Return only the specified number of messages. Works for threaded=true and threaded=extended.
Noted the limit parameter that you can set on your GET request - so based on this documentation if it is correct (I'm not a Yammer Developer but I do use it) you should be able to do
https://www.yammer.com/api/v1/messages.json?limit=50
That is in theory but reading through the documentation there is a section on Search that has
page - Only 20 results of each type will be returned for each page, but a total count is returned with each query. page=1 (the default) will return items 1-20, page=2 will return items 21-30, etc.
Which says to me they are limited to 20 results returned.
UPDATE
After testing this with https://www.yammer.com/api/v1/messages.json?limit=50 and it not returning 50 messages but doing https://www.yammer.com/api/v1/messages.json?limit=5 will return only 5 messages I would say that Yammer restrict the number of messages to 20 Also after reading through the documents a bit more I read
For example, if you’re currently viewing 20 messages and the oldest is number 2912, you could append “?older_than=2912″ to your request to get the 20 messages prior to those you’re seeing"
This says to me that they will only return a max of 20. So I think you are stuck with 20 messages at a time.
Hope this helps.
You need to use Parameters:
The messages API endpoints return a similar structure and support the following query parameters:
older_than - Returns messages older than the message ID specified as a numeric string. This is useful for paginating messages. For example, if you’re currently viewing 20 messages and the oldest is number 2912, you could append “?older_than=2912″ to your request to get the 20 messages prior to those you’re seeing.
newer_than - Returns messages newer than the message ID specified as a numeric string. This should be used when polling for new messages. If you’re looking at messages, and the most recent message returned is 3516, you can make a request with the parameter “?newer_than=3516″ to ensure that you do not get duplicate copies of messages already on your page.
threaded=[true | extended] - threaded=true will only return the first message in each thread. This parameter is intended for apps which display message threads collapsed. threaded=extended will return the thread starter messages in order of most recently active as well as the two most recent messages, as they are viewed in the default view on the Yammer web interface.
limit - Return only the specified number of messages. Works for threaded=true and threaded=extended.
Example : GET https://www.yammer.com/api/v1/messages.json?older_than=2912
while older can be ID of message number 20 and so on you can get 20 by 20
I solved by requesting subsequent pages in a recursive manner.
You can simply increase the page parameter until the response is empty, or update the older_than parameter until the property meta.older_available is false.

Can Cube (js metrics framework) return more than 1000 events?

The Cube software (https://github.com/square/cube) allows you to retrieve events.
I want to retrieve a lot of events. But it appears that I am capped at 1000. There are well over 9000 in mongodb in the collection and time range I am querying
Example http GET queries I issue:
# 1000 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type
# 1000 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type&start=2012-02-02&stop=2013-07-03
# 7 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type&limit=7
# 1000 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type&limit=9999
It appears that the limit is pinned:
https://github.com/square/cube/blob/28dad4af27a6680deb46077b16952590f2c21cad/lib/cube/event.js
Line 166
based on the 'batchSize=1000'
Is it possible that you can 'page' through the data in some way? Or is this just a hard limit?
Looks like there is a hard cap on results in three places that need to be updated for large domains:
event.js - line 166
metric.js - line 11
metric.js - line 12
In addition, I was unable to find any query-string apis for the parameters. Ideally, we can leave the cap at 1000 (to avoid server bloat for people not tuning their queries correctly) and allow the consumer to define override behavior.

Resources