What would make a write call to a TCPSocket hang indefinitely?
lotsOfBytes = # a really large number of bytes, like 1 or 2 MB of data
socket = TCPSocket.new # some config
socket.write(lotsOfBytes) # this line hangs
I am trying to debug an issue where a get_multi operation sent to memcached with a large number of keys hangs indefinitely, and it does so on a line that resembles that code snippet. I'm trying to better understand how the low-level sockets on which this library is built are expected to work.
What are the values of following attributes on your TCPSocket:
Keep-alive activated and what value is set?
Timeout set and what value is set?
If you will do a Wireshark dump, it's much better so see what happens before hanging connection.
tcpdump? are there any attempts to send anything?
netstat - for see output queue.
does it work on a small number of bytes in your environment?
I have an EA set in place that loops history trades and builds one large string with trade information. I then send this string every second from MT4 to the python backend using a plain PUSH/PULL pattern.
For whatever reason, the data isn't received on the pull side when the string transferred becomes too long. The backend PULL-socket slices each string and further processes it.
Any chance that the PULL-side is too slow to grab and process all the data which then causes an overflow (so that a delay arises due to the processing part)?
Talking about file sizes we are well below 5kb per second.
This is the PULL-socket, which manipulates the data after receiving it:
while True:
# check 24/7 for available data in the pull socket
try:
msg = zmq_socket.recv_string()
data = msg.split("|")
print(data)
# if data is available and msg is account info, handle as follows
if data[0] == "account_info":
[...]
except zmq.error.Again:
print("\nResource timeout.. please try again.")
sleep(0.000001)
I am a bit curious now since the pull socket seems to not even be able to process a string containing 40 trades with their according information on a single MT4 client - Python connection. I actually planned to set it up to handle more than 5.000 MT4 clients - python backend connections at once.
Q : Any chance that the pull side is too slow to grab and process all the data which then causes an overflow (so that a delay arises due to the processing part)?
Zero chance.
Sending 640 B each second is definitely no showstopper ( 5kb per second - is nowhere near a performance ceiling... )
The posted problem formulation is otherwise undecidable.
Step 1) POSACK/NACK prove whether a PUSH side accepts the payload for sending error-free.
Step 2) prove the PULL side is not to be blamed - [PUSH.send(640*chr(64+i)) for i in range( 10 )] via a python-2-python tcp://-transport-class solo-channel crossing host-to-host hop, over at least your local physical network ( no VMCI/emulated vLAN, no other localhost colocation )
Step 3) if either steps above got POSACK-ed, your next chances are the ZeroMQ configuration space and/or the MT4-based PUSH-side incompatibility, most probably "hidden" inside a (not mentioned) third party ZeroMQ wrapper used / first-party issues with string handling / processing ( which you must have already read about, as it has been so many times observed and mentioned in the past posts about this trouble with well "hidden" MQL4 internal eco-system changes ).
Anyway, stay tuned. ZeroMQ is a sure bet and a truly horsepower for professional and low-latency designs in distributed-system's domain.
I am fairly new to Prometheus alertmanager and had a doubt regarding firing alerts only during a particular period
I have a microservice which receives a file and does some processing on it, which is only invoked when it gets a message through a Kafka queue. The aforementioned is supposed to come every day between 5 am and 6 am(UTC time). The microservice has a metric which is incremented by 1 every time it receives a file. I want to raise an alert if it does not receive a file in the interval. I have created a query like this :
expr : sum(increase(metric_name[1m]) and on() hour(vector(time()))==5) < 1
for: 1h
My questions:-
1) Is it correct or is there a better way to do it
2) In case of no update, will it return 0 or "datapoints not found"
3) Is increase the correct function as it tends to give results in decimals due to extrapolation, but I understand if increase is 0, it will show 0
I can't really play around with scrape_intervals, which is set at 30s.
I have not run this expression but I expect it will cause an alert to fire at 06:00 only and then go off at 06:01. It is the only time the expression would hold true for one hour.
Answering your questions
It is correct if what you want is a single fire of alert (sending a mail by example) but then no longer firing. Even with that, the schedule is a bit tight and may get hurt by alertmanager delay causing the alert to be lost.
In case of no increase, you will get the expression will evaluate to 0. It will be empty when there is an update
Increase is the right function. It even takes into account reset of the counter.
Answering if there is a better way to do it.
Regarding your expression, you can have the same result, without for clause, with:
expr: increase(metric_name[1h])==0 and on() hour()==6 and on() minute()<1
It reads a : starting at 6am and for 1 minutes, if there was no increase of metric over the lasthour.
Alerting longer
If you want the alert to last longer (say for the day and you silence it when it is solved), you can use sub-queries;
expr: increase((metric and on() hour()==5)[18h:])==0 and on() hour()>5
It reads as : starting at 6am (hour()>5), compute the increase over 5-6am for the next 18 hours. If you like having a pending, you can drop the trailing on() hour()>5 and use a for: 1h clause.
If you want to alert until a file is submitted and thus detect a resolution, simply transform the expression to evaluate the increase until now:
expr: increase((metric and on() hour()>5)[18h:])==0 and on() hour()>5
I am using veins4.6 with sumo 0.30 and omnet++5.1.1 in ubuntu 14.04. I have created a custom network with a cross(one intersection with 4 roads) and ran the simulation with 200 vehicles. I did not observe this behaviour for 4vehicles. I have seen it with 50 vehicles too. I need to get the count of total lost packets for my masters project. So I was looking at statistics and found that RXTXLostPackets is not zero. As far as I understood from documentation it should be zero if allowTxDuringRx=false. Default is false(PhyLayer80211p.ned). As I did not change any code yet, I was confused if that is expected behaviour.
What I have done so far.
from Mac1609_4::handleLowerControl, statsTXRXLostPackets is updated when Decider80211p responds with RECWHILESEND.
In Decider80211p::processSignalEnd, if value of whileSending is true RECWHILESEND is sent to mac layer as control message.
In Decider80211p::processSignalEnd, if(frame->getWasTransmitting() || phy11p->getRadioState() == Radio::TX) , this frame was considered as received while sending and sets the value for whileSending as true.
The wasTransmitting varilable is set to true in Decider80211p::switchToTx and Decider80211p::processNewSignal functions.
currentFrame->setWasTransmitting(true);
currentFrame->setBitError(true);
in Decider80211p::processNewSignal:
if (phy11p->getRadioState() == Radio::TX ) {
frame->setBitError(true); --> tried disabling both these values and the RXTXLostPackets was zero.
frame->setWasTransmitting(true);
DBG_D11P << "AirFrame: " << frame->getId() << " (" << recvPower << ") received, while already sending. Setting BitErrors to true" << std::endl;
}
There is one thread with similar issue with the fix of adding this line in processSignalEnd function. But looks like veins4.6 does not use curSyncFrame anymore.
Veins - Unexpected behavior with lost packets in certain vehicles
if (!frame->getWasTransmitting()){
curSyncFrame = 0;
}
I could not clearly understand the issue. The code and configuration files I have used are here. https://github.com/Rajeswar59/veins_learning.
Can anyone please take a look and help me with this. Thanks in advance.
edit: I went through the logs. This is what I could understand as of now.
https://drive.google.com/open?id=0BzjDW8PQhkSmSEUtZ2lpcld4ZXc --> some portion of logs are here.
---> order of sending
#13332 0.247987176594 node[30] --> node[48] id=22266
#13375 0.247987796864 node[18] --> node[20] id=22447
#13384 0.247987864534 node[20] --> node[30] id=22573
From logs I have concentrated on node 18. Two nodes that transmitted before 30 are 32 and 4. These 2 messages are received successfully by all 3 nodes. When a message arrives decider tries to set channel state as busy in processnewsignal and set idle after processing packet. This calls mac1609_4.cc channelBusy and channelIdle functions respectively. So the channelIdle variable is set accordingly. Also if channel is to be set busy it will stop contention and calculate currentBackoff if any packet is waiting to be transmitted. If channel is being set idle at the end of reception, startContent is called. Based on this only the lastIdle variable is set which is used to calculate nextMacEvent. So when the last successful message was received all the nodes which have a packet to send decide nextMacEvent and it is sent as self message in Mac1609_4.cc. on receiving the nextMacEvent self message we will start transmitting without checking if any other node has started transmission. We can not identify that probably because we are setting channel busy when we receive messages after some propagation delay. So between last successful transmission and nextMacEvent other nodes also take decision to transmit without checking current channel state. That's why the node has some receive events while sending. As far as my understanding goes before transmission we should sense current state of channel and retry backoff accordingly. We do not check this at the nextMacEvent. It looks like a collision behaviour but should we not check the current state of channel when backoff counter reaches zero and retry. Please correct me if I am wrong anywhere.
Thanks for your patience.
Any help or advice??
My Learnings(probably last update):
After Some digging, these are my learnings if it helps some one. The basic CSMA mechanism says before attempting for transmission, the node has to sense the channel, initiate transmission if the channel is sensed idle for AIFS time, or go in to back off if channel is busy. In veins the channel busy status is stored in idleChannel variable whose status is checked in Mac1609_4:channelBusySelf() function before initiating transmission (nextMacEvent in Mac1609_4::handleSelfMsg). The idleChannel is updated in Mac1609_4::channelBusy and Mac1609_4::channelIdle functions when a message reception starts and when message reception ends respectively. So when a previously transmitting node sends a packet, all the recieving nodes will receive the packet with varying delay i.e., starts receiving at different times and update their channelIdle variable. After that they calculate best time to transmit and starts transmission. It does check if channel is idle or not but as the channelIdle status is updated at next reception and because of transmission delay it takes some time between transmission start at sender and reception start at receiver side, both the transmitting nodes cant see other transmission. As far as I understand this is called a collision when more than two nodes start transmission at the same time. So the BitError statistic is set and statsTXRXLostPackets is also set. So while calculating totalLostPackets we can take only one of these two values.
I am listening to a server which sends certain messages to me with sequence numbers. My client parses out the sequence number in order to keep track of whether we get a duplicate or whether we miss a sequence number, though it is called generically by a wrapper object which expects a single incremental sequence number. Unfortunately this particular server sends different streams of sequence numbers, incremental only within each substream. In other words, a simpler server would send me:
1,2,3,4,5,7
and I would just report back 1,2,3,4,5,6,7 and the wrapper tool would notify of having lost one message. Unfortunately this more complex server sends me something like:
A1,A2,A3,B1,B2,A4,C1,A5,A7
(except the letters are actually numerical codes too, conveniently). The above has no gaps except for A6, but since I need to report one number to the wrapper object, i cannot report:
1,2,3,1,2,4,1,5,7
because that will be interpreted incorrectly. As such, I want to condense, in my client, what I receive into a single incremental stream of numbers. The example
A1,A2,A3,B1,B2,A4,C1,A5,A7
should really translate to something like this:
1,2,3,4 (because B1 is really the 4th unique message), 5, 6, 7, 8, 10 (since 9 could have been A6, B3, C2 or another letter-1)
then this would be picked up as having missed one message (A6). Another example sequence:
A1,A2,B1,A7,C1,A8
could be reported as:
1,2,3,8,9,10
because the first three are logically in a valid sequence without anything missing. Then we get A7 and that means we missed 4 messages (A3,A4,A5, and A6) so I report back 8 so the wrapper can tell. Then C1 comes in and that is fine so I give it #9, and then A8 is now the next expected A so I give it 10.
I am having difficulty figuring out a way to create this behavior though. What are some ways to go about it?
For each stream, make sure that that stream has the correct sequence. Then, emit the count of all valid sequence numbers you've seen as the aggregate one. Pseudocode:
function initialize()
for stream in streams do
stream = 0
aggregateSeqno = 0
function process(streamId, seqno)
if seqno = streams[streamId] then
streams[streamId] = seqno + 1
aggregateSeqno = aggregateSeqno + 1
return aggregateSeqno
else then
try to fix streams[streamId] by replying to the server
function main()
initialize()
while(server not finished) do
(streamId, seqno) = receive()
process(streamId, seqno)