Is there a big performance difference with Cassandra LWTs? - spring

I have a similar question, but it's not clear, so I'm asking myself. Simple. I want to update only when there is a target row, and if it doesn't, it shouldn't be done. (No new rows should be added.)
After Select, check the value and Update
Direct processing via IF EXISTS
In case 2, the official documentation warns of a clear performance issue. Considering the various circumstances, is the 2nd performance issue seriously serious? Or 1 and 2 are similar, but if there is a lot of data, will 2 be at a disadvantage from then on? Wouldn't it be a big problem to use IF EXISTS?

The first option where you select-then-update is invalid because there is no guarantee that the data wouldn't change between the time that you've read it until the time that you update it -- the data isn't locked.
Lightweight transactions (LWTs), also known as compare-and-set (CAS) statements, are expensive because they require four round trips to work. The main difference between LWTs and your select-then-update method is that the row is locked for the duration of the "transaction" so no other threads can update its value.
Since you only want to perform an update if the row exists, then you must specify the IF EXISTS condition. Yes, there is a performance penalty in using lightweight transactions but it is unavoidable in your case where you have a conditional update. It isn't negotiable in your scenario. Cheers!

Related

How to deactivate safe mode in the mongo shell?

Short question is on the title: I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.
Long Question for those willing to know the context:
I am working on a huge set of data like
{
_id:ObjectId("azertyuiopqsdfghjkl"),
stringdate:"2008-03-08 06:36:00"
}
and some other fields and there are about 250M documents like that (whole database with the indexes weights 36Go). I want to convert the date in a real ISODATE field. I searched a bit how I could make an update query like
db.data.update({},{$set:{date:new Date("$stringdate")}},{multi:true})
but did not find how to make this work and resolved myself to make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value. The query use the _id so the default index is used.
Problem is that it takes a very long time. I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added. I also set an index on a relevant field to process the database chunk by chunk. Finally I ran several concurrent mongo clients on both the server and my workstation to ensure that the limitant factor is the database lock availability and not any other factor like cpu or network costs.
I monitored the whole thing with mongotop, mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time. I am a bit disappointed mongodb does not have a more precise granularity on its write lock, why not allowing concurrent write operations on the same collection as long as there is no risk of interference? Now that I think about it I should have sharded the collection on a dozen shards even while staying on the same server, because there would have been individual locks on each shard.
But since I can't do a thing right now to the current database structure, I searched how to improve performance to at least spend 90% of my time writing in mongo (from 70% currently), and I figured out that since I ran my script in the default mongo shell, every time I make an update, there is also a getLastError() which is called afterwards and I don't want it because there is a 99.99% chance of success and even in case of failure I can still make an aggregation request after the end of the big process to retrieve the single exceptions.
I don't think I would gain so much performance by deactivating the getLastError calls, but I think itis worth trying.
I took a look at the documentation and found confirmation of the default behavior, but not the procedure for changing it. Any suggestion?
I work with my mongo Shell wich is in safe mode by default, and I want to gain better performance by deactivating this behaviour.
You can use db.getLastError({w:0}) ( http://docs.mongodb.org/manual/reference/method/db.getLastError/ ) to do what you want but it won't help.
This is because for one:
make a script that take the documents one after the other and make an update to set a new field which takes the new Date(stringdate) as its value.
When using the shell in a non-interactive mode like within a loop it doesn't actually call getLastError(). As such downing your write concern to 0 will do nothing.
I already figured out that if only I had inserted empty dates object when I created the database I would now get better performances since there is the problem of data relocation when a new field is added.
I did tell people when they asked about this stuff to add those fields incase of movement but instead they listened to the guy who said "leave them out! They use space!".
I shouldn't feel smug but I do. That's an unfortunately side effect of being right when you were told you were wrong.
mongostats and the web monitoring interfaces which confirmed that write lock is taken 70% of the time
That's because of all the movement in your documents, kinda hard to fix that.
I am a bit disappointed mongodb does not have a more precise granularity on its write lock
The write lock doesn't actually denote the concurrency of MongoDB, this is another common misconception that stems from the transactional SQL technologies.
Write locks in MongoDB are mutexs for one.
Not only that but there are numerous rules which dictate that operations will subside to queued operations under certain circumstances, one being how many operations waiting, another being whether the data is in RAM or not, and more.
Unfortunately I believe you have got yourself stuck in between a rock and hard place and there is no easy way out. This does happen.

Write heavy dml-operations in MongoDB

I'm running MongoDB (2.2) on Linux, and I have a few questions.
I have schema with many fields + sub-fields and one index for this fields.
How fast are updates/delete done on the index -- I have about 3 Updates/Deletes etc. a second.
Is there a rule, like after 10,000 updates you have to compact or rebuild the index?
Are changes in the fields immediately visible in the index? If not is there a delay or a temporary table for this updates/deletes?
Thanks in advance - Brandon
Indexes are updated at the time of insert/update/remove. About performance the best answer would be to just test it.
Not that I would know of. If you need to do regular compaction or repair you should have replication too (but you can have it on the same host if resources permit)
Yes (well, on the same DB connection - on other it might take a bit more time. But if you're having that problem I'm not the right person to answer you anyway ;)
Having said that, I strongly suggest you take a look at some of the presentations at http://www.10gen.com/presentations - I'm sorry i can't point out the ones that were particularly interesting and usable, I suggest you browse and pick the ones that seem interesting to you.
Note that MongoDB does things VERY differently and has quite a few gotchas for the unprepared. It is however a great DB once you know how to use it.

Implementation of Unsaved Changes Detection

It seems like three ways to approach detecting unsaved changes in a text/image/data file might be to:
Update a boolean flag every time the user makes a change or saves, which would result in a lot of unnecessary updates.
Keep a cached copy of the original file and diff the two every time a save operation needs to be checked.
Keep a stack of all past operations and push/pop operations as needed, resulting in a lot of extra memory usage.
In general, how do commercial applications detect whether unsaved changes exist and what are the advantages/disadvantages of each approach? I ran into this issue while writing a custom application that has special saving behavior and would like to know if there is a known best practice.
As long as you need an undo/redo system, you need that stack of past operations. To detect in wich state the document is, an item of the stack is set to be the 'saved state'. Current stack node is not that item, the document is changed.
You can see an example of this in Qt QUndoStack( http://doc.qt.nokia.com/stable/qundostack.html ) and its isClean() and setClean()
For proposition 1, updating a boolean is not something problematic and take little time.
This depends on the features you want and the size/format of the files, I guess.
The first option is the simplest, and it gives you just what you want with minial overhead.
The second option has the advantage that you can detect when changes have been manually reverted, so that there is no real change after all (although that probably doesn't happen all too often). On the other hand it is much more costly to make a diff just to check if anything was modified. You probably don't want to do that everytime the user presses a key.
The third option gives the ability to provide an undo-history. You could limit the number of items in that history by grouping changes together that were made consecutively (without moving the cursor in between), or something like that.

Improving NHibernate performance with too many objects in session

Our app was originally built with NHibernate and its limitations of batch processing in mind. However, over time it has transformed into a data cruncher and we are observing a significant performance decay.
The session ends up having to maintain about 1000 objects or more and our profiling has revealed that auto flushing and dirty checking are the biggest offenders here. We tried shutting auto flush and managing it ourselves on Save/Update operations but that led to disastrous performance for a batch save/update.
We're now looking at the option of evicting unrequired objects from the session.
I came across 2nd level-cache eviction method (sessionFactory.Evict(typeof(Cat));) which lets us evict by type but we do not use a 2nd level cache. Can I still use this method to evict objects from the 1st level cache?
I also read about one pattern of fetching objects, evicting them from session, and then reassociating them, if needed, with session by calling Update() on them. Is this a recommended and accepted pattern cause I also read that NH3 has put up a wall to this? (We can still use it as we have not upgraded to NH3)
While we realize that we are not using NHibernate in the best way, we are just looking to improve the current situation somehow. Answers to the above questions and any other suggestions/recommendations are greatly appreciated. Thanks.
Update
After looking at NH documentation and code, I realize that 1 is probably not possible. I'm still looking at some pointers or tips on using Evict(). I was able to drastically reduce the number of objects in a session. But still do not know if there is a price to pay while updating or deleting evicted objects. Thanks for your help in advance.
It's hard to say without knowing more about your requirements but maybe you could use IStatelessSession. It doesn't have a 1st level cache to worry about.
Ayende has a good post on using it for bulk operations
here
Why not use more sessions, instead of one large one? That, in conjunction with turning off autoflush has helped me in the past. Also, you should really think about using HQL for bulk updates if possible.
I know that this is old, but I just came across this while looking for something else -- having just solved this. I did solve as Trent mentioned, by using more than one session. I would create one session to fetch all of the objects I wanted, then closed that session. The case I had, was iterating through the list and operating on each object and trying to commit on each iteration. I would then create the foreach over my list, creating and disposing of a new session inside the loop, reattaching my object from the list to the new session. That took a process that was taking about 2.5 hours down to 2 minutes 40 seconds!
See this article for the inspiration to how I solved it -- although not exactly as I have unit of work wrappers around NHibernate:
http://weblogs.asp.net/ricardoperes/archive/2013/03/21/attaching-disconnected-entities-in-nhibernate-without-going-to-the-database.aspx

Does soCaseInsensitive greatly impact performance for a TdxMemIndex on a TdxMemDataset?

I am adding some indexes to my DevExpress TdxMemDataset to improve performance. The TdxMemIndex has SortOptions which include the option for soCaseInsensitive. My data is usually a GUID string, so it is not case sensitive. I am wondering if I am better off just forcing all the data to the same case or if the soCaseInsensitive flag and using the loCaseInsensitive flag with the call to Locate has only a minor performance penalty (roughly equal to converting the case of my string every time I need to use the index).
At this point I am leaving the CaseInsentive off and just converting case.
IMHO, The best is to assure the data quality at Post time. Reasonings:
You (usually) know the nature of the data. So, eg. you can use UpperCase (knowing that GUIDs are all in ASCII range) instead of much slower AnsiUpperCase which a general component like TdxMemDataSet is forced to use.
You enter the data only once. Searching/Sorting/Filtering which all implies the internal upercassing engine of TdxMemDataSet it's a repeated action. Also, there are other chained actions which will trigger this engine whithout realizing. (Eg. a TcxGrid which is Sorted by default having GridMode:=True (I assume that you use the DevEx. components) and having a class acting like a broker passing the sort message to the underlying dataset.
Usually the data entry is done in steps, one or few records in a batch. The only notable exception is data aquisition applications. But in both cases above the user's usability culture allows way greater response times for you to play with. (IOW how much would add an UpperCase call to a record post which lasts 0.005 ms?) OTOH, users are very demanding with the speed of data retreival operations (searching, sorting, filtering etc.). Keep the data retreival as fast as you can.
Having the data in the database ready to expose reduces the risk of processing errors when you'll write (if you'll write) other modules (you need to remember to AnsiUpperCase the data in any module in any language you'll write). Also here a classical example is when you'll use other external tools to access the data (for ex. db managers to execute an SQL SELCT over the data).
hth.
Maybe the DevExpress forums (or ever a support email, if you have access to it) would be a better place to seek an authoritative answer on that performance question.
Anyway, is better to guarantee that data is on the format you want - for the reasons plainth already explained - the moment you save it. So, in that specific, make sure the GUID is written in upper(or lower, its a matter of taste)case. If it is SQL Server or another database server that have an guid datatype, make sure the SELECT make the work - if applicable and possible, even the sort.

Resources