How to condense a stream of incrementing sequence numbers down to one? - algorithm

I am listening to a server which sends certain messages to me with sequence numbers. My client parses out the sequence number in order to keep track of whether we get a duplicate or whether we miss a sequence number, though it is called generically by a wrapper object which expects a single incremental sequence number. Unfortunately this particular server sends different streams of sequence numbers, incremental only within each substream. In other words, a simpler server would send me:
1,2,3,4,5,7
and I would just report back 1,2,3,4,5,6,7 and the wrapper tool would notify of having lost one message. Unfortunately this more complex server sends me something like:
A1,A2,A3,B1,B2,A4,C1,A5,A7
(except the letters are actually numerical codes too, conveniently). The above has no gaps except for A6, but since I need to report one number to the wrapper object, i cannot report:
1,2,3,1,2,4,1,5,7
because that will be interpreted incorrectly. As such, I want to condense, in my client, what I receive into a single incremental stream of numbers. The example
A1,A2,A3,B1,B2,A4,C1,A5,A7
should really translate to something like this:
1,2,3,4 (because B1 is really the 4th unique message), 5, 6, 7, 8, 10 (since 9 could have been A6, B3, C2 or another letter-1)
then this would be picked up as having missed one message (A6). Another example sequence:
A1,A2,B1,A7,C1,A8
could be reported as:
1,2,3,8,9,10
because the first three are logically in a valid sequence without anything missing. Then we get A7 and that means we missed 4 messages (A3,A4,A5, and A6) so I report back 8 so the wrapper can tell. Then C1 comes in and that is fine so I give it #9, and then A8 is now the next expected A so I give it 10.
I am having difficulty figuring out a way to create this behavior though. What are some ways to go about it?

For each stream, make sure that that stream has the correct sequence. Then, emit the count of all valid sequence numbers you've seen as the aggregate one. Pseudocode:
function initialize()
for stream in streams do
stream = 0
aggregateSeqno = 0
function process(streamId, seqno)
if seqno = streams[streamId] then
streams[streamId] = seqno + 1
aggregateSeqno = aggregateSeqno + 1
return aggregateSeqno
else then
try to fix streams[streamId] by replying to the server
function main()
initialize()
while(server not finished) do
(streamId, seqno) = receive()
process(streamId, seqno)

Related

How many bytes are used for longer string when sending via ZMQ?

I'm using ZeroMQ / ZMQ from Python and Java and have a question. When sending a shorter string, ZMQ uses one byte as described here (http://zguide.zeromq.org/page:all#A-Minor-Note-on-Strings)
Then what goes onto the wire is a length (one byte for shorter
strings) and the string contents as individual characters.
Does anyone know how many bytes are used when sending a longer string?
How many bytes are used for longer string when sending via ZMQ?
That depends on hell more things, than just on the string itself :
Your post refers to indeed historical text - the zguide pages.
While this was sure a very helpful first-read source in the early days of ZeroMQ v.2.x, today, we live with distributed-sysems spanning many versions, from v.2.1+, 3.x, 4.x, 4.2 being so far the last stable API version in 2018-Q2.
No one can a priori guess what API-version was used on the message-sender's side, until a receiver actually sets/accepts the link-setup and .recv()-s the respective message. Relying on a C-lang based s_recv()-helper tricks in post v4.0 API is not a sure direction to follow.
Being in python, many protocol-hardwired details remain beyond your sight, yet there are byte-maps, that get filled under the hood exactly as the benevolent dictatorship, indoctrinated in the published ZeroMQ RFC/ZMTP-specifications, dictates.
If we cannot guess or know beforehand, can we ... ?
Yes, we can experiment. Best setup a controlled distributed-system experiment.
Node A : The Sender
can be pythonic, being a sender:
- setup a REQ-archetype AccessNode ( details in "ZeroMQ Hierarchy in less than a five seconds" ),
- setup .setsockopt( zmq.IDENTITY, ... ) with a randomness-generated static identity,
- setup a .setsockopt( zmq.REQ_RELAXED, 1 ),
- .bind() it to a known endpoint of a transport-class of one's choice
- start an xrange()-generator controlled for L in xrange( 1, int( 1E+9 ) )-loop of .send()-s
- there load a payload .send( r"{0:}|{1:}".format( str( L ), L * r"*" ) )
- handle the respective .recv() for a REP-side "answer",
- check the error-states and adapt the time.sleep()-s / re-loops, according to the sender-side socket capacity and capability to send further payloads
Node B : The Receiver | The MitM-sniffer
ought be as low level as possible, so as to dis-assemble the RFC/ZMTP wire-line protocol, so python gets out of the candidate list. Other option may include a wire-level sniffer(s), if the selected transport-class permits ( an ipc:// or a vmci:// one will not )
setup a ROUTER-archetype AccessNode,
.connect() it to the know Node A's transport-class endpoint used,
start .recv()-ing messages,
If your experiment has correctly acquired / sniffed the wire-level details about the ZMTP-compliant transport sizes of the know payload compositions, your question gets repeatable, verifiable, quantitatively correct records on string-size to message-size mapping-function.
A BONUS POINT: for those indeed interested . . .Next, re-run the controlled white-box distributed-system experiment above, now with having the Node A: The Sender-side extended it's behaviour to also{ randomly | deterministically } { change | alter } its own configuration ( or map both such options onto a pair of the same payload re-.send()-s )with a.setsockopt( zmq.REQ_CORRELATE, { on | off } ) inside its for-loop and record the observed changes in the expected outputs.
This adds a final touch to the definitive answer, as far as the API v.4.2.x permits in the 2018-Q2.

how to get recent event recorded in event logs(eg: logged before about 10 seconds) in Windows using C++?

I need to collect event logs from Windows those are logged before 10 seconds. Using pull subscription I could collect already saved logs before execution of program and saving logs while program is running. I tried with the code available on MSDN:
Subscribing to Events
"I need to start to collect the event logged 10 seconds ago". Here I think I need to set value for LPWSTR pwsQuery to achieve that.
L"*[System/Level= 2]" gives the events with level equal to 2.
L"*[System/EventID= 4624]" gives events with eventID is 4624.
L"*[System/Level < 1]" gives events with level < 2.
Like that I need to set the value for pwsQuery to get event logged near 10 seconds. Can I do in the same way as above? If so how? If not what are the other ways to do it?
EvtSubscribe() gives you new events as they happen. You need to use EvtQuery() to get existing events that have already been logged.
The Consuming Events documentation shows a sample query that retrieves events beginning at a specific time:
// The following query selects all events from the channel or log file where the severity level is
// less than or equal to 3 and the event occurred in the last 24 hour period.
XPath Query: *[System[(Level <= 3) and TimeCreated[timediff(#SystemTime) <= 86400000]]]
So, you can use TimeCreated[timediff(#SystemTime) <= 10000] to get events in the last 10 seconds.
The TimeCreated element is documented here:
TimeCreated (SystemPropertiesType) Element
The timediff() function is described on the Consuming Events documentation:
The timediff function is supported. The function computes the difference between the second argument and the first argument. One of the arguments must be a literal number. The arguments must use FILETIME representation. The result is the number of milliseconds between the two times. The result is positive if the second argument represents a later time; otherwise, it is negative. When the second argument is not provided, the current system time is used.
 

Overwriting Messages with Same Correlation ID and Sequence Number in Spring

Based on my own experiments (can't find this documented anywhere), if 2 messages, having the same correlation ID and sequence number, aggregator will only take the 1st message and discard/ignore the other message.
Is there a way to make aggregator use the last message received instead?
The aggregation will merge the payload into 1 string.
Simple scenario:
3 messages with same correlation ID and sequence size of 2, ordered by time received
sequence#: 1; payload: abc
sequence#: 1; payload: def
sequence#: 2; payload: ghi
Current output: abcghi
Expected output: defghi
This scenario happens when sequence# 2 is missing which is meant for the 1st message. And the correlation ID (obtained from decoded payload) is very limited, hence it will be used multiple times.
Original message
The raw messages came in this format:
"Sequence Size","Sequence Number","ID","Text"
ID ranges between 0-9
example message: 2,1,8,abc
Sample raw message payload:
2,1,8,abc
2,1,8,def
2,2,8,ghi
The aggregator basically combine the text
You need to use a custom release strategy (it can have the same logic as the default SequenceSizeReleaseStrategy, but it can't be that class). With the default strategy duplicate sequences are skipped.
However, you will also need a custom message group store to enact your desired behavior; otherwise the output will be abcdefghi.
However, the discarded message will not be sent to the discard channel in that case.
It's generally not a good idea to reuse correlation id; if you must, then use the group-timeout or a reaper to discard the partial group before any possibility of reuse of the correlation id.
BTW, you might find it easier to use a custom CorrelationStrategy and ReleaseStrategy rather than using the default and manipulating the headers.

Accessing New messages from Yahoo Mail using YQL

I am currently writing a JAVA application in which I need to access the following information from users Yahoo email messages (to display back to them). YQL looked like a 'quick easy way' to do this, however it's proving to be more difficult. All tests I ran were done here: http://developer.yahoo.com/yql/console/ I can replicate the same results using my webapp/oauth.
To
FromEmail
FromName
Subject
Message
Date
MID
I am having trouble getting this all in to 1 query call (or even 2, although I have not invested as much time researching that as a solution). Here is the short of it, currently I have the following YQL:
SELECT folder.unread, message FROM ymail.msgcontent
WHERE (fid,mids )
IN
(SELECT folder.folderInfo.fid, mid
FROM ymail.messages
WHERE numMid=2
AND startMid=0)
AND fid='Inbox'
AND message.flags.isRead=0;
This works the best out of all the solutions I have, however there is one major crippling flaw. If we have 10 emails, E1 - E10 and they are all unread with the exception of E2,E3 then after running that query, the result set will show E1, not E1, E4. Obviously this is not good. So I tried plugging the "AND message.flags.isRead=0" in the sub select:
SELECT folder.unread, message FROM ymail.msgcontent
WHERE (fid,mids )
IN
(SELECT folder.folderInfo.fid, mid
FROM ymail.messages
WHERE numMid=10
AND startMid=0
AND message.flags.isRead=0)
AND fid='Inbox'
However, this Yields 'null' as a result. In order to debug this I just run the sub select and come up with this:
SELECT folder.folderInfo.fid, mid
FROM ymail.messages
WHERE numMid=10
AND startMid=0
AND messageInfo.flags.isRead=0
This query returns 10, unfortunately after further review, it does not filter out the read VS unread. After some more toying around I change the select statement to the following query:
SELECT folder.folderInfo.fid, messageInfo.mid
FROM ymail.messages
WHERE numMid=10
AND startMid=0
AND messageInfo.flags.isRead=0
Finally, this works! EXCEPT 47 emails are returned instead of just 10. and to make things more interesting, I know for a fact I have 207 (unread) emails in my inbox, so why 47?? I have changed the 'numMid' (think of this as how many to show) from 0 - 300 and startMid (how many emails in to start, like an offset) from 0 - 300 and neither change the result set count. Of course when i change the select statement back from 'messageInfo.mid' to 'mid' the numMid / startMid 'work' again, however the filtering fromt he isRead no longer works. I know there are other solutions where I set numMid=50000 or something along those lines, however YQL is a bit slow to begin with, and I can only imagine that this will slow it down significantly.
So the question is, has any one done this? Is YQL just broke / not maintained or am I doing something wrong?
Thank you!
EDIT: Apparently this '47' that shows up is from the top 50 emails I have, 3 of which are read. I have yet to figure out how to 'trick' the YQL to allow me to override this 50 limit.
Bit late but I think I have the answer to your question.
Your query is query is almost correct except for the numInfo query parameter. Try changing the query to
SELECT *
FROM ymail.messages
WHERE numMid=75
AND startMid=0 AND numInfo=75
AND messageInfo.flags.isRead=0
Notice the numInfo=75. This should get you the last 75 unread messages. To read more about different query parameters refer to official documentation here
EDIT 1
The table ymail.messages should return unread messages by default. There is a GroupBy parameter which you should use if you want to get unread messages. Find documentation here

Chain of packets in MathLink: are the packets always strictly ordered?

The Documentation does not state clear the order of packets returned
by slave kernel via MathLink. It is natural to suppose that (when
sending an input expression with head EnterExpressionPacket and working in standard mode):
1) the last packet before the next InputNamePacket is always
ReturnExpressionPacket
2) there may be always only one ReturnExpressionPacket and one
OutputNamePacket for one EnterExpressionPacket
3) ReturnExpressionPacket is always the next after OutputNamePacket
4) after MessagePacket the next packet is always TextPacket with
all contents of that message
5) there are only 7 types of returned packets in the standard mode: InputNamePacket, OutputNamePacket, ReturnExpressionPacket, DisplayPacket, DisplayEndPacket,
MessagePacket, TextPacket.
Which of these statements are true?
1 is probably not guaranteed.
2 is definitely not true (evaluate: "2+2\n2+3").
3 is probably true but probably not guaranteed.
I believe 4 is true.
5 is not guaranteed.
In general you should write your code to not rely on the order of packets coming from the kernel. The evaluation should be considered "active" until you receive a new InputNamePacket. OutputNamePacket should update some variable. ReturnExpressionPacket should use the current output name from that variable. If you receive an unknown packet simply ignore it and move on to the next packet.

Resources