Number sequences issues - session

We have an issue, more often than I would like, where whether worker or client sessions crash and these sessions were in the process of using a number sequences to create a new record, but they end up blocking that number sequence literally and anyone else trying to create a record using the same sequence will have its client frozen.
When this happens, I usually go in the NUMBERSEQUENCELIST table, I spot the correct DataAreadId and the user, and delete the row whose Status = 1.
But this kind of annoying really. Is there anything, any way I can configure the AOS server to release number sequence when client/workers crash ?
For the worker sessions, I guess we can fine tweak the code which runs in them, but for the client sessions crashing, not much we can do...
Any ideas ?
Thanks!
EDIT: Turns out that in this situation, after restarting the AOS server, you can go in List in the number sequence menu, and clean it up. Prior to the restart, my client would freeze trying to do that. So no need to do it directly through SQL.

Continuous numbers in NumberSequenceList are automatically cleaned up every 24 hours (or as set up on the number sequence). The cleanup process is quite slow if there are many "dead" numbers (hundreds or thousands). This may be considered as a hang, but is not.
Things to consider:
Is a continuous number sequence needed?
Do the cleanup more frequent (say every half hour instead of the default 24 hour)
Setup the cleanup process as a batch process
Fix the bug in the client code using the number sequence
Also avoid reserving the number, just use it. Instead of the anti-pattern:
NumberSeq idSequence = NumberSeq::newGetNum(IntrastatParameters::numRefIntrastatArchiveID(), true);
this.IntrastatArchiveID = idSequence.num();
idSequence.used();
Just use the number:
this.IntrastatArchiveID = NumberSeq::newGetNum(IntrastatParameters::numRefIntrastatArchiveID()).num();
The makeDecisionLater parameter should only be used in forms, where user may decide not to use the number (by delete or by escape). And in that case the NumberSeqFormHandler class should be used anyway.

Related

Auto save performance for rdbms

In my app user types in some content which I would like to auto save as the user types. The save call is not for every keystroke, rather I do autosave only when user pauses for more than 200ms. So in a typical paragraph there are 15-20 server calls. The content will not be read very often, so I need to optimize the writes.
I have to save data on MSSQL Server because of legacy code reasons. I'm getting 10 seconds avg response time in my load test. How do I improve the performance?
One approach I'm considering is instead of directly saving data in mssql I'll save it in Cassandra or redis, then eventually(maybe at regular time intervals) write it to mssql.
Another approach is instead of doing frequent updates, I'll insert new record for each auto save. Then a background process will clean up all records except for latest, every few minutes.
Update:
I replaced the existing logic with simple update calls to 2 tables and now I am seeing improvements. There was a long stored procedure which was taking upto 10 seconds under load. SO for now I have hold on the problem. Still I would like to know is there something I can do on application server layer to reduce frequent DB calls.
It is quite hard to answer yor question directly but here are some hints based on what we do in a multiple active user situation.
If you are writing/triggering on every keystroke, pass the keystroke to a background thread and do not perform the database write, or any network call, while blocking the users typing. A fast typist can hit 20 keystrokes/second, and you cannot afford to introduce latency.
If recording on a web page, you might be able to use localStorage. Do not issue an AJAX style call on every keystroke as there is a limit to outstanding requests. You need to implement some kind of buffered send. Remember that network calls in the real world can be 300mS sort of scale just to traverse the network.
Do you really need to save every keystroke, or is every N seconds acceptable? Every save operation will eventually turn into a disk operation, so you really want to coalesce as many saves as possible. The quickest way to do something is not to do it at all.
If you are recording to a database, then it is often quicker to update an existing row, if you can fetch it by direct key first. Unfortunatly it can sometimes be quicker to insert a new row and clean up excess later. This tends to be true if the table has few indexes. Which is quicker depends on database engine in use and how it is being used. We use both methods.
When using a database keep in mind that they often keep journals of some kind, so if you are updating frequently you might create a large load on the journal files.
If you are using techniques (Using C terminology) like fopen, fwrite these can perform very well, but if you are worried about system failure recovery, you may need to call fsync, which then limits your maximum performance rate. If you need fsync, a database might be better.
You might like to consider writing to a transactionlog table very frequently, and then posting to the real storage every N seconds. For example, if I am typing a customers name I might record every keystroke into a keylog table, and then have a background job read the keylog table and transfer the data to customers table. This helps reduce the operations to the customers table while also allowing the keylog table to be optimised to recording keystrokes. But, at the cost of more code server side.
Overall, you want logic like this
On keyup handler
Add keystroke to background queue
Wake background thread
Background thread
Read/remove ALL data from background queue
If no data, wait for wakeup and repeat
Write to database/network/file etc as one operation. (this can now be syncronous calls)
Optionally some velocity control, simple one is sleep(50mS) or sleep(2s)
Repeat
Keep in mind with the above the user can type and immediately hit close, so your final buffer write might not have flushed yet. You need to handle this.
If you get this correct, the user will not notice any delay. In our usage, we are recording around 1000 keystrokes/sec average, all of which ar routed over private networks to central points. This load is barely a blip, even network monitoring does not see such a small amount of traffic.
Good luck.

How to 'lock' database rows being processed

I have a database filled with rows and multiple threads that are accessing these rows, inputting some of the data from them in a function, producing an output, and then filling the row's missing columns with the output.
Here's the issue: Each row has an unprocessed flag which is, by default, true. So each thread is looking for rows with this flag. But each thread is getting the SAME row, it turns out...because the row is being marked as processed after the thread's job is complete, which may happen after a few seconds.
One way I avoided this was to insert a currently_processed flag for each row, mark it as false, and once a thread accesses the row, change it to true. Then when the thread is done, just change if back to false. The problem with this is that I have to use some sort of locking and not allow any other thread to do anything until this occurs. I was wondering if there's an alternative approach where I wouldn't have to do thread locking (via a mutex or something) and thus slow down the whole process.
If it helps, the code is in Ruby, but this problem is language agnostic, but here's the code to demonstrate the type of threading I'm using. So nothing special, threading on the lowest level like almost all languages have:
3.times do
Thread.new do
row = get_database_row
result = do_some_processing(row)
insert_results_into_row(result)
end
end.each(&:join)
The "real" answer here is that you need a database transaction. When one thread gets that row, then the database needs to know that this row is currently up for processing.
You can't resolve that within your application! You see, when two threads look at the same row at the same time, they could both try to write that flag ... and yep, it for sure changes to "currently processed"; and then both threads will update row data and write that back. Maybe that is not the problem if any processing results in the same final result; but if not, then all kinds of data integrity problems will arise.
So the real answer is that you step back and look how your specific database is designed in order to deal with such things.
I was wondering if there's an alternative approach where I wouldn't have to do thread locking (via a mutex or something) and thus slow down the whole process.
There are some ways to do this:
1) One common dispatcher for all threads. It should read all rows and put them into shared queue from where processing theads will get rows.
2) Go deeper into DB, find out if it supports something like oracles's "select for update skip locking" syntax and utilize it. For oracle you need to use his syntax in cursor and make somewhat cumbersome interaction, but at least it can work this way.
3) Partition input by, say, index of worker thread. So 1st worker out of 3 will only process rows 1,4,7 etc. 2nd worker will only process rows 2, 5, 8 etc.

Sequel (Ruby), how to increment and use a DB counter in a safe way?

I found 4 "proper" ways to do this:
In the cheat sheet for ActiveRecord users substitutes for ActiveRecord's increment and increment_counter are supposed to be album.values[:column] -= 1 # or += 1 for increment and album.update(:counter_name=>Sequel.+(:counter_name, 1))
In a SO solution update_sql is suggested for the same effect s[:query_volume].update_sql(:queries => Sequel.expr(3) + :queries)
In a random thread I found this one dataset.update_sql(:exp => 'exp + 10'.lit)
In the Sequels API docs for update I found this solution http://sequel.jeremyevans.net/rdoc/classes/Sequel/Dataset.html#method-i-update
yet none of the solutions actually update the value and return the result in a safe, atomic way.
Solutions based on "adding a value and then saving" should, afaik, fail nondeterministically in multiprocessing environments resulting with errors such as:
album's counter is 0
thread A and thread B both fetch album
thread A and thread B both increment the value in the hash/model/etc
thread A and thread B both update the counter to same value
as a result: A and B both set the counter to 1 and work with counter value 1
Sequel.expr and Sequel.+ on the other hand don't actually return a value, but a Sequel::SQL::NumericExpression and (afaik) you have no way of getting it out short of doing another DB roundtrip, which means this can happen:
album's counter is 0
thread A and B both increment the value, value is incremented by 2
thread A and B both fetch the row from the DB
as a result: A and B both set the counter to 2 and work with counter value 2
So, short of writing custom locking code, what's the solution? If there's none, short of writing custom locking code :) what's the best way to do it?
Update 1
I'm generally not happy with answers saying that I want too much of life, as 1 answer suggests :)
The albums are just an example from the docs.
Imagine for example that you have a transaction counter on an e-commerce POS which can accept 2 transactions at the same time on different hosts and to the bank you need to send them with an integer counter unique in 24h (called systan), send 2 trx with same systan and 1 will be declined, or worse, gaps in the counts are alerted (because they hint at "missing transactions") so it's not possible to use the DB's ID value.
A less severe example, but more related to my use case, several file exports get triggered simultaneously in a background worker, every file destination has its own counter. Gaps in the counters are alerted, workers are on different hosts (so mutexes are not useful). And I have a feeling I'll soon be solving the more severe problem anyway.
The DB sequences are no good either because it would mean doing DDL on addition of every terminal, and we're talking 1000s here. Even in my less sever use case DDLing on web portal actions is still a PITA, and might even not work depending on the cacheing scheme below (due to implementation of ActiveRecord and Sequel - and in my case I use both - might require server restart just to register a merchant).
Redis can do this, but it seems insane to add another infrastructure component just for counters when you're sitting on an ACID-compliant database.
If you are using PostgreSQL, you can use UPDATE RETURNING: DB[:table].returning(:counter).update(:counter => Sequel.expr(1) + :counter)
However, without support for UPDATE RETURNING or something similar, there is no way to atomically increment at the same time as return the incremented value.
The answer is - in a multithreaded environment, don't use DB counters. When faced with this dilema:
If I need a unique integer counter, use a threadsafe counter generator that parcels out counters as threads require them. This can be a simple integer or something more complex like a Twitter Snowflake-like generator.
If I need a unique identifier, I use something like a uuid
In your particular situation, where you need a count of albums - is there a reason you need this on the database rather than as a derived field on the model?
Update 1:
Given that you're dealing with something approximating file exports with workers on multiple hosts, you either need to parcel out the ids in advance (i.e. seed a worker with a job and the next available id from a single canonical source) or have the workers call in to a central service which allocates transaction ids on a first come first served basis.
I can't think of another way to do it. I've never worked with a POS system, but the telecoms network provisioning systems I've worked on have generally used a single transaction generator service which namespaced ids as appropriate.

What is the purpose of the MaxConnectionLifeTime setting

The Mongo C Sharp Driver (at least the 1.9.2 version) has a setting for MaxConnectionLifeTime. From looking at the code, it looks like connections are removed from the pool when their age exceeds that lifetime. The default is set to 30 minutes.
Why?
Do connections somehow degrade in performance the more times they are used?
We have received anecdotal reports that in some scenarios connections die after a certain amount of time. This is presumably because some firewall/router along the way is periodically dropping connections that have reached a certain age.
By having the driver periodically close connections and open new ones we can avoid being affected by this.
Most users are not affected by this and could use any value they want for this setting.

Distributed time synchronization and web applications

I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.
Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).
It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.
If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)
Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.
Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.

Resources