Couch DB 1.5 & The NodeJS Query Server - performance

I've recently noticed that Couch DB is back in heavy development. One of the more interesting features to have been added in the last iteration (https://blogs.apache.org/couchdb/entry/apache_couchdb_1_5_0) is a NodeJS query server.
As the default Spidermonkey query server is known to be notoriously slow under mid/heavy loads, I was wondering if anyone knows what kind of performance benefit can be achieved with the Nodejs server, and how it might compare to writing views directly in Erlang.
Anyone have any experience with it?

I have no experience, but since CouchDB is built in erlang I feel any other query servers would be slower because of the interface tasking between core and others. Still I think we should have facts to prove it I have read quite
May be this gives you some idea.. and I nerd on NodeJS on this
http://wiki.apache.org/couchdb/Performance#View_generation

Related

PostgreSQL from NodeJS application

I am exploring how best to access a PostgreSQL/PostGIS DB from NodeJS. All I need is simple SQL SELECT queries. Nothing more complex than:
SELECT *
FROM portal.catalog AS cat
WHERE ST_Intersects(st_geogfromtext('SRID=4326;POLYGON((20 50 ,19 50,19 49,20 50 ))'), cat.gpoly)
LIMIT 5000;
This will be on a windows7 or windows2008 server, running PostgreSQL 9.2/PostGIS 2.0, The traffic will be pretty light (only a few requests per minute).
Some preliminary research I have done has come up with the following potential directions. But I was interested in hearing from others what is working for them (as an easy implementation).
https://github.com/brianc/node-postgres (But I am having trouble building it do to firewall issues), Supposed the "pure" solution is better, but I am having issues there also) https://github.com/brianc/node-postgres-pure
http://www.infoq.com/articles/the_edge_of_net_and_node (And then I guess I would write my own ADO.NET adapter to PostgreSQL)
I have also seen references to ODBC for NodeJS (unclear if this is the way to go)
Is there something like the SQL adapter for NodeJS? http://blogs.msdn.com/b/sqlphp/archive/2012/06/08/introducing-the-microsoft-driver-for-node-js-for-sql-server.aspx
There was also a full blown ORM by EntitySpaces (which went bankrupt). Now a defunct opensource project: https://github.com/EntitySpaces/entityspaces.js
I've used node-postgres in the past, but recently opted for any-db, which has support for PostgreSQL.
Both have worked well, although I prefer any-db, particularly with respect to pooling and transactions. I believe any-db deserves more recognition.
Any-db is layered on top of BrianC's node-postgres.
But I just got the https://github.com/brianc/node-postgres-pure working, and it is a pleasure.
EntitySpaces is the way to go, not defunct at all.
http://download.cnet.com/EntitySpaces-Studio/3000-10250_4-10590953.html?tag=mncol;1
I got BrianC's PostgresPure system working (must have been a dependent module malfunctioning, since I did not do anything special.
Works just great.
See: https://github.com/brianc/node-postgres-pure

monetdb - anyone uses it in production?

I am very interested in using monetdb as a datamart, holding some huge data tables for querying and reporting
However, after some searching, I am unable to find any online posts / blogs regarding their use of Monetdb in any kind of production capacity.
Also, there seems to be little or next to no activity online regarding Monetdb.
Is this a bad sign for the future of Monetdb ?
I am very interested in using monetdb as a datamart, holding some huge data tables for >querying and reporting
My boss is also interested in MonetDB and I had the same reaction as you. No one is writing about MonetDB... is no one using MonetDB?
Regardless, I have been running performance tests on datasets of 500,000 to 1,000,000 records comparing MonetDB (column-oriented dbms) vs. MySQL (row-oriented dbms) and MonetDB beats MySQL in all regards- even in bulk inserts... which hypothetically it should not be as good at.
I can't speculate as to what all this means for MonetDB's future, but while it's around you might want to check it out because it performs well.
(I run Windows 7 and am communicating with each database using PHP)
I react a bit late to this post, but I'd like to add my voice to the ones using MonetDB in a production environment. We use it as the back-end of Spinque, a framework for designing complex search solutions. I've been using MonetDB for about 10 years, but only in the past 3 years in a production environment. Clearly, it has pros and cons and bugs like all other products, but it is being developed and improved very actively (I don't understand the low-activity signs that you refer to). If you want a DB that allows you to be ahead of the market standards, it's a good choice. Otherwise, just go for MS SQL ;)
I've been evaluating it lately for a client so I've had some time with it. My impression at this point is that it is just finishing "growing up" from being an academic experimental playground. It clearly has yet to be really discovered, though it does have some rough edges which might hinder certain applications.
As I write, I'm in the process of trying to load over 100 million rows into an instance (at 27mil presently). So far, it performs startlingly well in some areas (aggregates), but is oddly sluggish in others (most joins I've tried so far); that said, I've not yet run the recommended sampling process yet and I'm forcing it to live in just a single service with 32GB RAM.
I've found a few little glitches and one thing that caused a full service crash (obscure and reported), but I'm thinking that for many applications MonetDB could be just the ticket. Columnar storage (rather than NoSQL) seems to be the future IMO.
I'll update this if I find anything particularly interesting.
MonetDB is first and for all a research system, but has progressed far beyond the level of the average research prototype. It is the (only) relational column-store platform in open source that I know of that supports full SQL. I have used it myself at CWI in many research projects that are not core DB research, but do need advanced DB technology.
You can see on the user's mailing list that deployments happen in many different organisations. As Roberto Cornacchia stated in a different answer, it is the backend of all Spinque deployments and we are happy MonetDB users. MonetDB is also used at a variety of non-profit projects like open streetmap and open kvk.
More and more commercial parties deploy MonetDB for analytics. (They do not always like to advertise that their analyses depend on an open source system.) Recently, MonetDB Solutions has started to provide dedicated commercial support for these deployments.
We have been using MonetDB in our business. We analyse very large data sets with many millions of rows. Traditional methods of data warehousing on SQL databases became so slow. The problem we were facing was that the data was only going to get bigger! The only way forward was to go columnar.
The results have been amazing. When you have very few joins it is staggeringly quick. Even with joins on the data sets we are looking at it is still frightening how fast it comes back.
Having seen some of the commercial partnerships I think MonetDB is going to boom over the next few years. I believe some of the major BI suppliers are using Monet under their hood to perform the large data work.

Difference between Memcache, APC, XCache and other alternatives I've not heard of

At work, we've recently started designing an application to me "large scale" (we're engineering for the potential to serve up many millions of hits a day). One of the senior devs and the sysadmin have set up memcache on the server.
As I understand it, Memcache will hold query results and certain tables in memory for X amount of time and keep everything hunky dory.
A drawback of memcache it seems is that I just can't for the life of me manage to set it up on my local dev environment. I've followed a few different instructionals on how to compile it for yourself. Most, if not all of the steps seem to work properly but get this error on PHPLoad:
[11-Sep-2010 16:02:30] PHP Warning: PHP Startup: Unable to load dynamic library '/Applications/MAMP/bin/php5.3/lib/php/extensions/no-debug-non-zts-20090626/memcached.so' - dlopen(/Applications/MAMP/bin/php5.3/lib/php/extensions/no-debug-non-zts-20090626/memcached.so, 9): image not found in Unknown on line 0
Not the primary question but incedentally, if you've been able to compile Memcache for MAMP 1.9 on Snow Leopard, please let me know the trick.
My primary question is about what the differences are between the various web caching technologies. I've seen mention of Memcache, APC and Xcache (here: Cache results of a mysql query manually to a txt file) but don't know the pros, cons and differences between each.
To my mind, Memcache has the advantage of being the one that the project's lead dev and our sysadmin chose. It has the disadvantage of being utter foobar to try and set up and compile on a Mac. :-^)
Anyone who I'd love to hear from anyone who can enumerate the pros and cons of each (or even one of) the other cachine technologies. Where are they best used, how are they best used. And so on.
It's all useful information I think.
Thanks so much for lending your time to expanding my knowledge.
- Alex.
First, a list of opcode cachers for php.
Second Memcache/MemcacheD is not an Opcode Cacher. It is a distributed memory caching system. It does not improve the speed/performance of your PHP code. It can be used to store data only.
APC, EAccelerator, XCache and the others are non distributed, meaning you can only store data on the local web-server. However all of these are opcode cachers and can improve the performance of your PHP app. Most, excluding EAccelerator (in the current version) can also store data.
I generally choose APC for the opcode cacher (It reportedly will be included into the core of PHP 6). However if I also have more than one web-server for the site I will also make use of MemcacheD.
Edit 1 I agree it is very annoying to setup APC, Memcache on MAMP. There are however tutorials out there dealing with such.
Edit 2 Also with regards to the best Opcode Cacher for your app really depends on which server you are using. Some work better on some systems. It also depends on the size and scale of your app as to how the cachers perform.
Edit 3 Very interesting article here about comparing performance of a few different cachers. (This article appears to be written in 2006 and should not really be used for current reference)
APC is a opcode cache. It will store parsed PHP code so that every time your PHP files do not need to get parsed.
Memcache is a data cache. It will store data as a key value pair.

SQLite for client-server

I've seen a couple of SQLite performance questions here on Stackoverflow, but the focus was on websites, and I'm considering using this DB in a client-server scenario:
I expect 1-10 clients for one server for now, could go up to 50 or more in the future.
slightly more reads than writes
the DB would sit behind a server process (i.e: not using direct DB access through a network)
Would using SQLite make the app less responsive as opposed to using PostgreSQL? My intuition tells me that it should be ok for these loads, but maybe someone has some practical experience with this kind of scenario.
I did use SQLite for a major client/server product used with ~10 concurrent users and I deeply regret that decision. In my opinion - PostgreSQL is much more suitable for client/server scenarios than SQLite due to its fine locking granularity.
You simply can't get very far when the entire database is locked whenever someone needs to write something ..
I like SQLite very much (I even wrote a commercial utility for comparing SQLite databases - SQLite Compare but I don't think it fits the bill when you have client/server scenarios.
Even SQLite's author says that it should be used as a replacement for custom file formats and not as a full blown database server. I wish I took his advice more seriously..
You didn't mention what operating system and Postgres versions you are using. However, before considering change of database engine, try to do some logging and benchmarking your current database with typical usage, then optimize "heaviest" questions. And maybe your backend processing load makes DB question time irrelevant? As SQLite is a file-based DBMS, concurrent access from multiple processes will degrade performance when client number grows up (edited after comment)
Following question may be helpful: How Scalable is SQLite?
I would confirm to S.Lott's answer.
I dont know how SQLite performs in comparison to PostgreSQL, since I don't know any newer meassurements, but my own experience with SQLite in a rather similar environment is rather good.
The only thing that might cause troubles in my view is that you have rather many writes. But it all depends on the total number per second I would say.
Also your setting to have one server process is optimal for SQLite in my opinion -- so you circumvent its weakness in multi-tasking.

High traffic web sites

What makes a site good for high traffic?
Does it have more to do with the hardware/infrastructure, or with how one writes the software, using Java as the example, if it matters?
I'm wondering how the software changes just because it is expected that billions of users will be on the site, if at all.
My understanding up to this point is that the code doesn't change, but that it is deployed on multiple servers, in a cluster, and a load balancer distributes the load, so really, on any one server/deployment, the application is just as any other standard application/website.
I highly recommend reading Jeff Atwood's blog on Micro-Optimization. In previous blogs he talks somewhat about how this site was created and the hardware upgrades he has had (which quickly summarized said that better hardware performs better only the extent that it is faster/better), but the real speed of a site comes from good programming, and this article seems like it should sum up some of your site programming questions quite well.
Hardware is cheap. Programming is expensive.
There are some programming techniques to make sure your code can handle multiple simultaneous views/updates. If you're using an existing framework, much of that work is (hopefully) done for you, but otherwise you're going to find stuff that worked for a few hundred hits an hour on one server isn't going to work when you're getting hundreds of thousands of hits and you have to deploy multiple load balancing machines.
Well, it is primarily an issue of hardware scaling but there are a few things to keep in mind with respect to the software involved in scaling. For example, if you are on a server farm, you'll need to work with a session management server (either via SQL Server or via a state server - which has implications in that your session variables need to be serializable).
But, in the bigger picture, there are a variety of things that you would want to do to scale to an enterprise level. For example, it becomes particularly important that you abstract out your database calls to a DAL because you may well need to adopt the use of a middleware package for high volume environments.

Resources