Im using Tarantool 1.5 and lua procedures.
Documentation says a lua procedure can yield execution to another after network/io operation for example, a box.update call.
My main question is: if I get return tuple from box.update does it contain information "after update, before yield" or "after update, after yield" ?
Also, what is the best-practices to prevent possible race conditions?
If you need to do something like a transaction in 1.5, you may do either idempotent operation or do re-select and checks after any yield operation (update/delete/replace)
Related
I have a database filled with rows and multiple threads that are accessing these rows, inputting some of the data from them in a function, producing an output, and then filling the row's missing columns with the output.
Here's the issue: Each row has an unprocessed flag which is, by default, true. So each thread is looking for rows with this flag. But each thread is getting the SAME row, it turns out...because the row is being marked as processed after the thread's job is complete, which may happen after a few seconds.
One way I avoided this was to insert a currently_processed flag for each row, mark it as false, and once a thread accesses the row, change it to true. Then when the thread is done, just change if back to false. The problem with this is that I have to use some sort of locking and not allow any other thread to do anything until this occurs. I was wondering if there's an alternative approach where I wouldn't have to do thread locking (via a mutex or something) and thus slow down the whole process.
If it helps, the code is in Ruby, but this problem is language agnostic, but here's the code to demonstrate the type of threading I'm using. So nothing special, threading on the lowest level like almost all languages have:
3.times do
Thread.new do
row = get_database_row
result = do_some_processing(row)
insert_results_into_row(result)
end
end.each(&:join)
The "real" answer here is that you need a database transaction. When one thread gets that row, then the database needs to know that this row is currently up for processing.
You can't resolve that within your application! You see, when two threads look at the same row at the same time, they could both try to write that flag ... and yep, it for sure changes to "currently processed"; and then both threads will update row data and write that back. Maybe that is not the problem if any processing results in the same final result; but if not, then all kinds of data integrity problems will arise.
So the real answer is that you step back and look how your specific database is designed in order to deal with such things.
I was wondering if there's an alternative approach where I wouldn't have to do thread locking (via a mutex or something) and thus slow down the whole process.
There are some ways to do this:
1) One common dispatcher for all threads. It should read all rows and put them into shared queue from where processing theads will get rows.
2) Go deeper into DB, find out if it supports something like oracles's "select for update skip locking" syntax and utilize it. For oracle you need to use his syntax in cursor and make somewhat cumbersome interaction, but at least it can work this way.
3) Partition input by, say, index of worker thread. So 1st worker out of 3 will only process rows 1,4,7 etc. 2nd worker will only process rows 2, 5, 8 etc.
I am performing an import of data wrapped in a CMSTransactionScope.
What would be the most efficient and practical way to import data in parallel and rollback if any errors? The problem I see is that, with it being parallel, I don't know if I can have the inserted objects be part of the transaction if they are apart of a new thread.
Is there any way to do this or should it be handled differently?
If you're running the code in parallel in order to achieve better performance and you are basically inserting rows one by one then it's unlikely that it'll perform any better than it would while running in a single thread.
In this case I'd recommend using one thread in combination with CMSTransactionScope, and potentially ConnectionHelper.BulkInsert.
Anyway, if you still want to run your queries in parallel then you need to implement some kind of synchronization (locking, for instance) to ensure that all statements are executed before the code hits CMSTransactionScope.Commit() (this basically means a performance loss). Otherwise, queries would get executed in separate transactions. Moreover, you have to make sure that the CMSTransactionScope object always gets instantiated with the same IDataConnection (this should happen by default when you don't pass a connection to the constructor).
The second approach seems error prone to me and I'd rather take a look at different ways of optimizing the code (using async, etc.)
My Golang code gets different records from the database using goroutines, and increments the value in a determinated field in the record.
I can avoid the race condition If I use Mutex or Channels, but I have a bottleneck because every access to the database waits until the previous access is done.
I think I should do something like one Mutex for every different record, instead one only Mutex for all.
How could I do it?
Thanks for any reply.
In the comments you said you are using Couchbase. If the record you wish to update consists of only an integer, you can use the built in atomic increment functionality with Bucket.Incr.
If the value is part of a larger document, you can use the database's "Check and Set" functionality. In essence, you want to create a loop that does the following:
Retrieve the record to be updated along with its CAS value using Bucket.Gets
Modify the document returned by (1) as needed.
Store the modified document using Bucket.Cas, passing the CAS value retrieved in (1).
If (4) succeeds, break out of the loop. Otherwise, start again at (1).
Note that it is important to retrieve the document fresh each time in the loop. If the update fails due to an incorrect CAS value, it means the document was updated between the read and the write.
If the database has a way of handling that (i.e. an atomic counter built in) you should use that.
If not, or if you want to do this in go, you can use buffered channels. Inserting to a buffered channel is not blocking unless the buffer is full.
then, to handle the increments one at a time, in a goroutine you could have something like
for{
select{
value, _ := <-CounterChan
incrementCounterBy(value)
}
}
I have a process flow built by someone prior which calls a very simple stored procedure. upon completion of the procedure the process flow has 2 transitions, one if the stored procedure was successful and the other if not. However, the stored procedure itself does not return anything that can be directly evaluated by the process flow like a return result. Now this procedure if it fails (with the ubiquitious max extants problem) it will call the branch which will call a stored procedure for sending a failure email message. If it succeeds the contrary will occur.
I had to tweak the procedure so I created a new one. now if it fails or succeeds the success branch is called regardless. I have checked all the docs from oracle as to how to make this work and for the life of me cannot determine how to make it work correctly. I first posted this on the oracle forum and got no responses. Does anyone have an idea how to make this work?
According to the Oracle Warehouse Builder guide:
When you add a transition to the canvas, by default, the transition has no condition applied to it.
Make sure that you have correctly defined a conditional transition as described in the Defining Transition Conditions section of the documentation.
A User Defined Activity will return an ERROR outcome if:
it raises an exception, or
it returns the value 3 and the Use Return as Status option is set to true
"However, the stored procedure itself does not return anything that
can be directly evaluated by the process flow like a return result."
This is the crux: if the operating procedure procedure produces no signal how can you tell whether it was successful? Indeed, what is the definition of success under this circumstance?
I don't understand why, when you had to "tweak the procedure", you wrote a new one instead of, er, tweaking the original procedure. The only way you're going to solve this is to get some feedback out of the original procedure.
At this point we run out of details. The direct option is to edit the original procedure so it passes back result info, probably through OUT parameters or else by introducing some logging capability. Alternatively, re-write it to raise an exception on failure. The indirection option is to write some queries to establish what the procedure achieved on a given run, and then figure out whether that constitutes success.
Personally, re-writing the original procedure seems the better option.
If this answer doesn't help you then you need to explain more about your scenario. What your procedure does, how you need to evaluate it, why you can't re-write it.
I was wondering if there is a way how to perform actions in a function after it returning a value.
i.e there is a method which returns a string. Now after the string is returned I want the method to perform another action like checking whether a condition is met so it can send out a notification or something else. Is that somehow possible?
The thing is that I am using a framework called core plot to add some plots to my application. Unfortunately this framework does not have a didFinishAddingPlot method. So I have to manually program a mechanism which notifies me whenever the plot finished plotting. When the addPlot method is called another method is called which goes through an array of values and returns a value for a specific index to plot. My idea was to put in a "if (condition)" block to check if the index is equal to the count of my values array so I know that it is now fetching the last value. However it first needs to return the value before sending a message that it finished plotting. Otherwise the last value won't get passed.
As soon as a function hits a return statement the function stops running. You would need to perform whatever other action you want to do before you return.
So, you want to return a value from your function or method, which by definition returns control (as well as your answer) to the call site. On the face of it, it's not possible; you've returned control, so you're done.
But you could spawn a new Thread during your method execution, to (for example) perform some cleanup tasks later on.
Since you tagged the question as Cocoa, check out Apple's Threaded Programming Guide, which will teach you about NSThread, POSIX threads, and more.