RavenDB - slow write/save performance? - performance

I started porting a simple ASP.NET MVC web app from SQL to RavenDB. I noticed that the pages were faster on SQL than on RavenDB.
Drilling down with Miniprofiler it seems the culprit is the time it takes to do: session.SaveChanges (150-220ms). The code for saving in RavenDB looks like:
var btime = new TimeData() { Time1 = DateTime.Now, TheDay = new DateTime(2012, 4, 3), UserId = 76 };
session.Store(btime);
session.SaveChanges();
Authentication Mode: When RavenDB is running as a service, I assume it using "Windows Authentication". When deployed as an IIS application I just used the defaults - which was "Windows Authentication".
Background: The database machine is separate from my development machine which acts as the web server. The databases are running on the same database machine. The test data is quite small - say 100 rows. The queries are simple returning an object with 12 properties 48 bytes in size. Using fiddler to run a WCAT test against RavenDB generated higher utilization on the database machine (vs SQL) and far fewer pages. I tried running Raven as a service and as an IIS application, but didn't see a noticible difference.
Edit
I wanted to ensure it wasn't a problem with a) one of my machines or b) the solution I created. So, decided to try testing it on Appharbor using another solution created by Michael Friis: RavenDN sample app and simply add Miniprofiler to that solution. Michael is one of the awesome guys at Apharbor and you can download the code here if you want to look at it.
Results from Appharbor
You can try it here (for now):
Read: (7-12ms with a few outliers at 100+ms).
Write/Save: (197-312ms) * WOW that's a long time to save *. To test the save, just create a new "thingy" and save it. You might want to do it at least twice since the first one usually takes longer as the application warms up.
Unless we're both doing something wrong, RavenDB is very slow to save - around 10-20x slower to save than read. Given that it re-indexes asynchronously, this seems very slow.
Are there ways to speed it up or is this to be expected?

First - Ayende is "the man" behind RavenDB (he wrote it). I have no idea why he's not addressing the question, although even in the Google groups, he seems to chime in once to ask some pointed questions, but rarely comes back to provide a complete answer. Maybe he's working hard to to get RavenHQ off the ground?!?
Second - We experienced a similar problem. Below's a link to a discussion on Google Groups that may be the cause:
RavenDB Authentication and 401 Response.
A reasonable question might be: "If these recommendations fix the problem, why doesn't RavenDB work that way out of the box?" or at least provide documentation about how to get decent write performance.
We played for a while with the suggestions that were made in the thread above and the response-time did improve. In the end though, we switched back to MySQL because it's well-tested, we ran into this problem early (luckily) which caused concern that we might hit more problems and, finally, because we did not have the time to:
fully test whether it fixed the performance problems we saw on the RavenDB Server
investigate and test the implications of using UnsafeAuthenticatedConnectionSharing & pre-authentication.

To summarize Ayende's response you're actually testing the summation of network latency and authentication chatter. As Joe pointed out there's ways you can optimize the authentication to be less chatty. This does however arguably reduce security, clearly Microsoft built security to be secure first and performance secondary. You as the user of RavenDB can choose if the default security model is too robust as it arguably is for protected server-to-server communication.
RavenDB is clearly defined to be READ orientated. 10-20x slower for writes than reads is entirely acceptable because writes are full ACID and transactional.
If write speed is your limiting factor with RavenDB you've likely not modeled transaction boundaries properly. That you are saving documents that are too similar to RDBMS table rows and not actually well modeled documents.
Edit: Reading your question again and looking into the background section, you explicitly define your test conditions to be an optimal scenario for SQL Server while being one of the least efficient methods for RavenDB. For data that size, that's almost certainly 1 document if it would be real world usage.

Related

XPages performance - 2 apps on same server, 1 runs and 1 doesn't

We have been having a bit of a nightmare this last week with a business critical XPage application, all of a sudden it has started crawling really badly, to the point where I have to reboot the server daily and even then some pages can take 30 seconds to open.
The server has 12GB RAM, and 2 CPUs, I am waiting for another 2 to be added to see if this helps.
The database has around 100,000 documents in it, with no more than 50,000 displayed in any one view.
The same database set up as a training application with far fewer documents, on the same server always responds even when the main copy if crawling.
There are a number of view panels in this application - I have read these are really slow. Should I get rid of them and replace with a Repeat control?
There is also Readers fields on the documents containing Roles, and authors fields as it's a workflow application.
I removed quite a few unnecessary views from the back end over the weekend to help speed it up but that has done very little.
Any ideas where I can check to see what's causing this massive performance hit? It's only really become unworkable in the last week but as far as I know nothing in the design has changed, apart from me deleting some old views.
Try to get more info about state of your server and application.
Hardware troubleshooting is summarized here: http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Domino_Server_performance_troubleshooting_best_practices
According to your experience - only one of two applications is slowed down, it is rather code problem. The best thing is to profile your code: http://www.openntf.org/main.nsf/blog.xsp?permaLink=NHEF-84X8MU
To go deeper you can start to look for semaphore locks: http://www-01.ibm.com/support/docview.wss?uid=swg21094630, or to look at javadumps: http://lazynotesguy.net/blog/2013/10/04/peeking-inside-jvms-heap-part-2-usage/ and NSDs http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Using_NSD_A_Practical_Guide/$file/HND202%20-%20LAB.pdf and garbage collector Best setting for HTTPJVMMaxHeapSize in Domino 8.5.3 64 Bit.
This presentation gives a good overview of Domino troubleshooting (among many others on the web).
Ok so we resolved the performance issues by doing a number of things. I'll list the changes we did in order of the improvement gained, starting with the simple tweaks that weren't really noticeable.
Defrag Domino drive - it was showing as 32% fragmented and I thought I was on to a winner but it was really no better after the defrag. Even though IBM docs say even 1% fragmentation can cause performance issues.
Reviewed all the main code in the application and took a number of needless lookups out when they can be replaced with applicationScope variables. For instance on the search page, one of the drop down choices gets it's choices by doing an #Unique lookup on all documents in the database. Changed it to a keyword and put that in the application Scope.
Removed multiple checks on database.queryAccessRole and put the user's roles in a sessionScope.
DB had 103,000 documents - 70,000 of them were tiny little docs with about 5 fields on them. They don't need to be indexed by the FTIndex so we moved them in to a separate database and pointed the data source to that DB when these docs were needed. The FTIndex went from 500mb to 200mb = faster indexing and searches but the overall performance on the app was still rubbish.
The big one - I finally got around to checking the application properties, advanced tab. I set the following options :
Optimize document table map (ran copystyle compact)
Dont overwrite free space
Dont support specialized response hierarchy
Use LZ1 compression (ran copystyle compact with options to change existing attachments -ZU)
Dont allow headline monitoring
Limit entries in $UpdatedBy and $Revisions to 10 (as per domino documentation)
And also dont allow the use of stored forms.
Now I don't know which one of these options was the biggest gain, and not all of them will be applicable to your own apps, but after doing this the application flies! It's running like there are no documents in there at all, views load super fast, documents open like they should - quickly and everyone is happy.
Until the http threads get locked out - thats another question of mine that I am about to post so please take a look if you have any idea of what's going on :-)
Thanks to all who have suggested things to try.

Fast delivery webpages on shared hosting

I have a website (.org) for a project of mine on LAMP hosted on a shared plan.
It started very small but now I extended this community to other states (in US) and it's growing fast.
I had 30,000 (or so) visits per day (about 4 months ago) and my site was doing fine and today I reached 100,000 visits.
I want to make sure my site will load fast for everyone and since it's not making any money I can't really move it to a private server. (It's volunteer work).
Here's my setup:
- Apache 2
- PHP 5.1.6
- MySQL 5.5
I have 10 pages PER state and on each page people can contribute, write articles, like, share, etc... on few pages I can hit 10,000 per hours during lunch time and the rest of the day it's quiet.
All databases are setup properly (I personally paid a DBA expert to build the code). I am pretty sure the code is also good. Now, I can make page faster if I use memcached but the problem is I can't use it since I am on a shared hosting.
Will the MySQL be able to support that many people, with lots of requests per minutes? or I should create a fund to move to a private server and install all the tools I need to make it fast?
Thanks
To be honest there's not much you can do on shared hosting. There's a reason why they are cheap ... they limit you to do stuff like you want to do.
Either you move to a VPS that allow memcache (which are cheaper) and you put some google ads OR you keep going on your shared hosting using a pre-generated content system.
VPS can be very cheap (look for coupons) and you can install what ever you want since you are root.
for example hostmysite.com with the coupon: 50OffForLife you pay 20$ per month for life ... vs a 5$ shared hosting ...
If you want to keep the current hosting, then what you can do is this:
Pages are generated by a process (cronjob or on the fly), everytime someone write a comment or make an update. This process start and fetch all the data on the page and saves it to the web page.
So let say you have a page with comments, grab the contents (meta, h1, p, etc..) and the comments and save both into the file.
Example: (using .htaccess - based on your answer you are familiar with this)
/topic/1/
If the file exists, then simply echo ...
if not:
select * from pages where page_id = 1;
select * from comments where page_id = 1;
file_put_contents('/my/public_html/topic/1/index.html', $content);
Or something along these lines.
Therefore, saving static HTML will be very fast since you don't have to call any DB. It just loads the file once it's generated.
I know I'm stepping on unstable ground providing an answer to this question, but I think it is very indicative.
Pat R Ellery didn't provide enough details to do any kind of assessment, but the good news there can't be enough details. Explanation is quite simple: anybody can build as many mental model as he wants, but real system will always behave a bit differently.
So Pat, do test your system all the time, as much as you can. What you are trying to do is to plan the capacity of your solution.
You need the following:
Capacity test - To determine how many users and/or transactions a given system will support and still meet performance goals.
Stress test - To determine or validate an application’s behavior when it is pushed beyond normal or peak load conditions.
Load test - To verify application behavior under normal and peak load conditions.
Performance test - To determine or validate speed, scalability, and/or stability.
See details here:
Software performance testing
Types of Performance Testing
In the other words (and a bit primitive): if you want to know your system is capable to handle N requests per time_period simulate N requests per time_period and see the result.
(image source)
Another example:
There are a lot of tools available:
Load Tester LITE
Apache JMeter
Apache HTTP server benchmarking tool
See list here

Which is better: {REST API, website} --> {database}, or {website} --> {REST API} --> {database}?

I have a product that gathers and displays measurements of all kinds (won't go into it). The display portion is, as one would expect, a database + website built on top of it (with Symfony).
However, we'll probably be creating an API to expose the data to third-parties as well.
Now, we either have the choice of building both the website and the API on top of the database, or just build the API on top, and have the website implement the API.
I would greatly prefer the latter, since otherwise I'll have to adapt both model layers for the API and the website every time the schema changes (which can be a few times).
If I have the latter I obviously have the advantage of only adapting the API model. If the API contract stays the same, the website wouldn't need adapting.
However, obviously there is a downside in performance.
With website <-> database, vs website <-> API <-> database, the first will obviously be the fastest.
My question is: what is your opinion on this trade-off?
I'm hoping the performance can be almost evened out, since all the machines will be on the same LAN + there will be caching. If that's the case, the ease of development would certainly make my life easier :-)
Looking forward to your opinions and experience!
If there was ever a case of premature optimization, this is it! You're not going to know the answer without more information, and I suspect very much that the performance differences between the two will be so negligible as to be irrelevant in your domain.
The best approach, IMO, is to spike on a few of your models using both approaches and see where that gets you.
No better way to make sure your API is going to be usable by others than to use it yourself. I would go website -> API -> database. Write it once, you can always tune it and "cheat" later if you have too.
Many modern websites use JavaScript (AJAX etc) and then make service calls to an API. If you took that approach you would simply have a carefully designed, reusable API layer in front of your DB.
I find that there's little or no extra effort here, and I'm sceptical that you'll incur noticable performance penalties.

RETS data fetching problem

I am working on one real estate website which is Using RETS service to get the data to my local server.
but I have one little bit problem here,I can fetch data from RETS which is having about 3lacks record in RETS Database but I didn't find the way,How can I fetch that all records in bunch of 50k at a time ?
I didn't find any 'LIMIT' keyword on RETS.so how can I fetch without 'LIMIT' 50k records at a time?
Please help me.
RETS is not really much of a standard. It's more closely resembles a pseudo standard. It loosely defines an XML schema that describes real estate listings.
In version 1.x, the "standard" was composed of DTD documents. In 2.x, the "standard" uses XSD documents to describe the list.
http://www.rets.org/documentation
However, in practice, there is almost no consistency amongst implementers. Having connected to hundreds of "RETS Compliant" service providers, I'm convinced that not one of them is like any other one.
Furthermore, the 2.x "standard" has not changed in 3 years. It's an unmaintained, sloppy attempt at a standard. It (RETS) is often used as a business buzz word by non-technical people. In reality, it's just an arbitrary attempt at modeling real estate listing in XML.
Try asking the specific implementer for their documentation. Often, they don't have any. So, emailing the lead developer has frequently been helpful. Sometimes they'll provide a WSDL which will outline the supported calls. Often, the WSDL doesn't coincide with the actual service, so beware.
As for your specific question, try caching the results. Usually, the use of a limit on a RETS call is a sign of a direct dependency. As requests for your service increase, the load that your service puts on theirs will break (and not be appreciated). Also, if their service goes down (even temporarily), yours will be interrupted as well. Most importantly, it will make the live requests to your pages really, really slow (especially if their system is slow at the time). The listings usually don't change frequently enough for worries about stale data, so caching up to and hour is pretty acceptable.
Best of luck!
libRets provides support for generating a query with fetch limits:
http://www.crt.realtors.org/projects/rets/librets/documentation/api/classlibrets_1_1_search_request.html
But last I knew: I remember the company Intereality either ignored or outright didn't provide complete compatibility to RETS. Quickest way to know your dealing with them is that also thought making all "System" name's for table fields numeric.
If you're lucky, you're using a Rapattoni backed server and they do provide spec. compatible servers.
Last point, I can't for the life of me remember it's name, but I used to use a free Java based RETS tool to build valid queries ( included offset/limit clauses ) and that made it a tad easier to build automated fetchers for a client's batch processing system.
IN RETS if Count More Than limit then We can download using Batch form or we can remove that Limit using regex while downloading
Best way to solve Problem divide Data Count in small unit of download and while we have to consider download limit in mind Field for Divide that one in MLS/IDX I Suggest Modification Date and ListingDate

subsonic with support for caching

Having a project with following requirements in mind.
data reading intensive application.
100 max concurrent users a times. Application have very high traffic
Though data is huge it is getting modified only once a day
Decided to use subsonic cause of ease of development and potential to work in high traffic environment.
Though few things are not yet found/solved to work with SubSonic 3
Which type of layer to use Active Records, Repository, Linq To SQL
working with paging / sorting stored procedures (cause they will give better performance over inbuilt paging mechanism, when displaying 10000+ rows with paging and sorting. right?? )
Caching, with project requirement it is quite clear, heavy use of caching is required. But could not find suitable solution, which will work with subsonic.
do I have to create separate layer for it and if yes, a short example would be helpful.
I wrote a CacheUtil class for subsonic 2.x ActiveRecord. It's based on some code someone posted on the old subsonic forums. (This is from a forum that was deleted before the last forum was removed. This is why software forums should be permanent.) Here is an example of a cache Find method. You could adapt it to ss3. There are also inserts, fetchall, delete, clear, etc. Rob Connery said at the time that caching was problematic, and it was left out of ss2 on purpose. By using HttpRuntime.Cache I share the cache between a web application and service simultaneously. I believe I can do this since it's a small application, always on a single server.
public static RecordBase<T> Find<T, ListType>(object primaryKeyValue)
where T: RecordBase<T>, new()
where ListType: AbstractList<T, ListType>, new()
{
string key = typeof(T).ToString();
if(HttpRuntime.Cache[key] == null)
FetchAll<T, ListType>();
if(HttpRuntime.Cache[key] != null)
{
ListType collection = (ListType)HttpRuntime.Cache[key];
foreach(T item in collection)
{
if(item.GetPrimaryKeyValue().Equals(primaryKeyValue))
return item;
}
}
return null;
}
I wrote a post about how I used caching with SubSonic 2.x. It isn't 100% compatible with 3.x but the concepts are the same.
I answered this similarly over here Thread-safe cache libraries for .NET. Basically you need a CollectionCacheManager - I then add a layer on top for each type and funnel all requests through this individual cache controllers, which in turn are using the 1 collectioncachecontroller. At the outer layer I mix pure subsonic, linq, whatever fits the bill at the time. That's the beauty of SubSonic is that it should not get in your way. As far as stored proc performance I would point to Jeff Atwoods articles over at CodingHorror and reevaulaute your savings. Hardware is dirt cheap, as is memory, databases are not. Personally I keep the database super simple and lightweight, and prefer to let my webserver cache everything in memory. The database server gets to do very little work which is the way I like it. Adding a few extra load balanced web servers isn't nearly as big of a deal as increasing database throughput, clustering, or sharding a a DB. SQL & Stored Procs can also be ridiculously difficult to write, and maintain. Take that budget that you would have spent on your time doing that, and instead beef up your hardware... Remember hardware is dirt cheap, good developers are not. Good luck!

Resources