Let's assume I have two documents in the same collection/partition, both at "version 1": A1, B1.
I update A1 -> A2, the write operation returns a session token SA.
Using SA to read document A will guarantee I get version A2.
Now I update B1 -> B2, and get a new session token SB.
Using SB to read document B will guarantee I get version B2.
My question is:
does using token SB guarantee I can see older writes as well?
I.e. will reading A with token SB always get me A2 ?
Yes. In your case SB > SA and hence it will ensure latest version of A.
Related
I have a vinyl space with some entities, that are linked between themselves by some application logic. Insert/update of any entity needs some calculations and update of linked entities too.
Let's say we have some entities with ids: e1, e2, e3 ... eN, eK
Update of any entity is made by such code:
function updateEntity(eN) then
space:update(eN)
if eN linked to eK then
-- Update linked entity eK and do some calculations.
local eK = space:get('eK')
-- Calculations using eK (time consuming)
...
-- Modify linked entity eK
space:update(eK)
-- Some other calculations (time consuming)
...
-- Using linked entity eK later somewhere else.
local eKAgain = space.get('eK')
end
end
updateEntity() is exposed to global (box.schema.func.create('updateEntity') + rawset(_G, 'updateEntity', updateEntity)) and is called from outside by nodejs connector.
Problem:
When I call updateEntity() very frequently for entities that are linked with the same entity eK, I have multiple warnings like
get(['eK']) => ... took too long: 150.879 sec.
The value of 'stuck' time vary form 1 to 1500 sec! So, obviously I have some storage locks or something like that.
Questions:
How it can happen anyway? I thought Tarantool is one threaded, so if I call updateEntity(), then the other call of updateEntity() can be possible only if the first one is finished?
Can I sovle this problem using fibers, calling each updateEntity() as different fiber and making all inner operations with cK entity like transaction? Or I misunderstand fibers purpose?
Maybe problem somewhere else and I miss something?
Vinyl engine supports multiversion concurrency control by default, you can read about it here in more details
Consider five nodes(S1,S2,S3,S4,S5) getting key/value data. And the following steps happened
S3 becomes leader and client write a key equal 2 to cluster
S3 append entry to S1,S2,S4,S5, then S1,S2,S4,S5 accept it and write entry to itself log(as shown in the picture 1)
S3 received majority response(from S1,S2,S4,S5), S3 commit entity that key equal 2, and send write succeed message to client, and then send commit message to S1,S2,S4,S5
S2,S4,S5 received commit message and commit entry success, but S1 crashed. just like in picture 2, S1 will have an uncommitted entry.
Now, S1 restart and become a leader with an uncommitted entry that key equal 2(as shown in the picture 3)
After above steps, client query an entry(key equal 2, just wrote succeed). But now S1 is leader and key equal 2 entry is uncommitted. So, client will can't find entry that itself wrote a moment ago. So, where I made a mistake? please help me
(picture 1)
(picture 2)
(picture 3)
I have found the solution at Diego Ongaro paper more detailed version In Search of an Understandable Consensus Algorithm(Extended Version). In section 8(Client interaction), I found the answer,
Raft handles this by having each leader commit a blank no-op entry into the log at the start of its
term. After this, In question S1 will commit log index equal 1 entry by commit a blank no-op entry.
Seeking for an advise on the below faced COA correlation issue.
Background: there is an application A which is feeding data to an application B via MQ (nothing special - remote queue def pointing to the local q def on remote QM). Where the sending app A is requesting COAs. That is a stable setup working for years:
App A -> QM.A[Q1] -channel-> QM.B[Q2] -> App B
Here:
Q1 is a remote q def pointing to the Q2.
Problem: there is an application C which requires exactly the same data feed which A is sending to B via MQ. => it is required to duplicate data feed considering the following constraint.
Constraint: neither code, nor app config of applications A and B could be changed - duplication of the data feed from A to B should be transparent for applications A and B - A puts messages to the same queue Q1 on QM.A; B gets messages from the same queue Q2 on the QM.B
Proposed solution: duplicate the feed on the MQ layer by creation of the Topic/subscirbers configuration on the QM of the app B:
App A -> QM.A[Q1] -channel-> QM.B[QA->T->{S2,S3}->{Q2,Q3}] -> {App B, QM.C[Q4] -> App C}
Here:
Q1 - has the rname property updated to point to the QA for Topic
instead of Q2
QA - Queue Alias for Topic T
T - Topic
S2, S3 - subscribers publishing data to the Q2 and Q3
Q2 - unchanged, the same local queue definition where App B consumes from
Q3 - remote queue definition pointing to the Q4
Q4 - local queue definition on the QM.C, the queue with copy of messages sent from A to B
With this set up duplication of the messages from the app A to the app B and C works fine.
But ... there is an issue.
Issue: application A is not able to correlate COAs and that is the problem.
I'm not sure if app A is not able to correlate COAs at all, or (what is more likely guess) it is not able to correlate additional COAs e.g. from the QM.C
Any idea or advise is very much appreciated.
I have two different sources of data which I need to marry together. Data set A will have a foo_key attribute which can map to Data set B's bar_key attribute with a one to many relationship.
Data set A:
[{ foo_key: 12345, other: 'blahblah' }, ...]
Data set B:
[{ bar_key: 12345, other: '' }, { bar_key: 12345, other: '' }, { bar_key: 12345, other: '' }, ...]
Data set A is coming from a SQS queue and any relationships with data set B will be available as I poll A.
Data set B is coming from a separate SQS queue that I am trying to dump into a memcached cache to do quick look ups on when an object drops into data set A.
Originally I was planning on setting the memcached key to be bar_key from the objects in data set B but then realized that if I did that it would be possible to overwrite the value since there can be many of the same bar_key value. Then I was thinking well I can create a key bar_key and the value just be an array of the SQS messages. But since I have multiple hosts polling the SQS queue I think it might be possible that when I check to see if the key is in memcached, check it out, append the new message to it, and then set it, that another host could be trying to preform the same operation and thus the first host's attempt at appending the value would just be overwritten.
I've looked around at memcached key locking but I'm not sure I understand it entirely. Would the solution be that when I get the key/value pair from memcached I create a temporary dummy lock on a new key called bar_key_dummy that expires in x seconds, and if I try to fetch a key that has a bar_key_dummy lock active I just send the SQS message back to the queue without deleting to try again in x seconds?
Here's some pseudocode for what I have going on in my head. Does this make any sense?
store = MemCache.new(host)
sqs_messages.poll do |message|
dummy_key = "#{message.bar_key}_dummy"
sqs.dont_delete_message && next unless store.get(dummy_key).nil?
# set dummy_key in memcache with a value of 1 for 3 seconds
store.set(dummy_key, 1, 3)
temp_data = store.get(message.bar_key) || []
temp_data << message
store.set(message.bar_key, temp_data, 300)
# delete dummy key when done in case shorter than x seconds
store.delete(dummy_key)
end
Thanks for any help!
Memcached has a special operation - cas Compare and Swap.
Command gets returns Item along with its unique CAS value.
Then dataset can be searched and update must be issued with the cas command which takes original unique CAS value.
If CAS was changed in between two command, update operation will fail with the EXISTS error
I am listening to a server which sends certain messages to me with sequence numbers. My client parses out the sequence number in order to keep track of whether we get a duplicate or whether we miss a sequence number, though it is called generically by a wrapper object which expects a single incremental sequence number. Unfortunately this particular server sends different streams of sequence numbers, incremental only within each substream. In other words, a simpler server would send me:
1,2,3,4,5,7
and I would just report back 1,2,3,4,5,6,7 and the wrapper tool would notify of having lost one message. Unfortunately this more complex server sends me something like:
A1,A2,A3,B1,B2,A4,C1,A5,A7
(except the letters are actually numerical codes too, conveniently). The above has no gaps except for A6, but since I need to report one number to the wrapper object, i cannot report:
1,2,3,1,2,4,1,5,7
because that will be interpreted incorrectly. As such, I want to condense, in my client, what I receive into a single incremental stream of numbers. The example
A1,A2,A3,B1,B2,A4,C1,A5,A7
should really translate to something like this:
1,2,3,4 (because B1 is really the 4th unique message), 5, 6, 7, 8, 10 (since 9 could have been A6, B3, C2 or another letter-1)
then this would be picked up as having missed one message (A6). Another example sequence:
A1,A2,B1,A7,C1,A8
could be reported as:
1,2,3,8,9,10
because the first three are logically in a valid sequence without anything missing. Then we get A7 and that means we missed 4 messages (A3,A4,A5, and A6) so I report back 8 so the wrapper can tell. Then C1 comes in and that is fine so I give it #9, and then A8 is now the next expected A so I give it 10.
I am having difficulty figuring out a way to create this behavior though. What are some ways to go about it?
For each stream, make sure that that stream has the correct sequence. Then, emit the count of all valid sequence numbers you've seen as the aggregate one. Pseudocode:
function initialize()
for stream in streams do
stream = 0
aggregateSeqno = 0
function process(streamId, seqno)
if seqno = streams[streamId] then
streams[streamId] = seqno + 1
aggregateSeqno = aggregateSeqno + 1
return aggregateSeqno
else then
try to fix streams[streamId] by replying to the server
function main()
initialize()
while(server not finished) do
(streamId, seqno) = receive()
process(streamId, seqno)