monetdb - anyone uses it in production? - monetdb

I am very interested in using monetdb as a datamart, holding some huge data tables for querying and reporting
However, after some searching, I am unable to find any online posts / blogs regarding their use of Monetdb in any kind of production capacity.
Also, there seems to be little or next to no activity online regarding Monetdb.
Is this a bad sign for the future of Monetdb ?

I am very interested in using monetdb as a datamart, holding some huge data tables for >querying and reporting
My boss is also interested in MonetDB and I had the same reaction as you. No one is writing about MonetDB... is no one using MonetDB?
Regardless, I have been running performance tests on datasets of 500,000 to 1,000,000 records comparing MonetDB (column-oriented dbms) vs. MySQL (row-oriented dbms) and MonetDB beats MySQL in all regards- even in bulk inserts... which hypothetically it should not be as good at.
I can't speculate as to what all this means for MonetDB's future, but while it's around you might want to check it out because it performs well.
(I run Windows 7 and am communicating with each database using PHP)

I react a bit late to this post, but I'd like to add my voice to the ones using MonetDB in a production environment. We use it as the back-end of Spinque, a framework for designing complex search solutions. I've been using MonetDB for about 10 years, but only in the past 3 years in a production environment. Clearly, it has pros and cons and bugs like all other products, but it is being developed and improved very actively (I don't understand the low-activity signs that you refer to). If you want a DB that allows you to be ahead of the market standards, it's a good choice. Otherwise, just go for MS SQL ;)

I've been evaluating it lately for a client so I've had some time with it. My impression at this point is that it is just finishing "growing up" from being an academic experimental playground. It clearly has yet to be really discovered, though it does have some rough edges which might hinder certain applications.
As I write, I'm in the process of trying to load over 100 million rows into an instance (at 27mil presently). So far, it performs startlingly well in some areas (aggregates), but is oddly sluggish in others (most joins I've tried so far); that said, I've not yet run the recommended sampling process yet and I'm forcing it to live in just a single service with 32GB RAM.
I've found a few little glitches and one thing that caused a full service crash (obscure and reported), but I'm thinking that for many applications MonetDB could be just the ticket. Columnar storage (rather than NoSQL) seems to be the future IMO.
I'll update this if I find anything particularly interesting.

MonetDB is first and for all a research system, but has progressed far beyond the level of the average research prototype. It is the (only) relational column-store platform in open source that I know of that supports full SQL. I have used it myself at CWI in many research projects that are not core DB research, but do need advanced DB technology.
You can see on the user's mailing list that deployments happen in many different organisations. As Roberto Cornacchia stated in a different answer, it is the backend of all Spinque deployments and we are happy MonetDB users. MonetDB is also used at a variety of non-profit projects like open streetmap and open kvk.
More and more commercial parties deploy MonetDB for analytics. (They do not always like to advertise that their analyses depend on an open source system.) Recently, MonetDB Solutions has started to provide dedicated commercial support for these deployments.

We have been using MonetDB in our business. We analyse very large data sets with many millions of rows. Traditional methods of data warehousing on SQL databases became so slow. The problem we were facing was that the data was only going to get bigger! The only way forward was to go columnar.
The results have been amazing. When you have very few joins it is staggeringly quick. Even with joins on the data sets we are looking at it is still frightening how fast it comes back.
Having seen some of the commercial partnerships I think MonetDB is going to boom over the next few years. I believe some of the major BI suppliers are using Monet under their hood to perform the large data work.

Related

Couchbase and NoSQL DBs for a serious Web-Application?

I read some posts here, but a real answer I didnt find.
Normally I work and worked with normal SQL Databaeses (MS SQL, MySQL), when I developed applications (ERP, CRM, PPS, Web Shops etc.). A real contact/experience with document-oriented databases in real business was not possible.
Only in a private sector (hobby, experimental projects) I tested MongoDB and CouchDB. The experience was good, but not enough to say "Yes, let it use for business!", because I could not test it in a real environment.
But now, there is a chance to program from zero, which could be a big start for a business.
So my question:
Can I use Couchebase for a big business application, where thousands users would use it. Is it so fast and with good performance to handling thousnds of queries, requests/reposts etc.?
How looks like with backup and restore?
Where is the limit of couchbase?
Thank you for the anwser.
In short, yes.
Your questions are too broad to fully address here. Couchbase has many real-world installations with clients doing production work at large scale. You can see several references with write ups of their uses on the Couchbase site. (Note this is not a complete list of customers, only ones that have agreed to have their use highlighted.) You will definitely recognize some names.

NoSQL for multi-site archival logging with full-text search

I'm looking at building a somewhat complex log handling system to replace an old ad-hoc setup and could use a bit of advice. I'm pretty familiar with SQL databases and networking, but am very new to NoSQL stores, which seem to be the key to solving this mess. Note that we have a very good team, but a limited licensing budget, so free/open-source options are vastly preferred. (That said, availability of support if something goes pear-shaped would be nice.)
Requirements:
Archive (test) logs generated in the several GB/day range at multiple sites around the world.
Provide full text search of those logs at each site fairly instantaneous for debugging purposes.
Push that archived data back to a central location (though a replica at each site would be absolutely okay).
Provide for analytics of that data back at the central location.
Constraints:
The sites have fairly crap Internet connections for the moment (high latency and fairly low bandwidth). Much of the data is generated during the day and a good portion of the sync would have to lag behind and finish overnight each day.
Sites MUST be able to function if the WAN goes completely off-line.
Extras
The log data is (as usual) highly compressible. Any solution that compresses data transacting from node to node across the WAN is preferred.
Many log files are related to each other in multi-level hierarchies, and that relationship is very important and must be maintained!
Sites will generally not modify the same data or modify it again once stored. This is all archival for the most part.
We can either stream as the logs are generated or push blocks of logs. Streaming is preferred, as it would simplify things considerably.
Options I'm aware of:
Local MySQL and folder structure for logging and local configuration management.
This is what we have now and it's running, but not a long-term solution by any means.
Elasticsearch
I've read that ElasticSearch would probably be really good for this, though from what I understand that doesn't support multi-site.
Cassandra
This seems to have built-in multi-site support, but I'm not exactly familiar with the data-model. Is this a good choice for something like this, or will I hate myself if I give it a try?
CouchDB
This is a document store that seems(?) like a good match for log data, but again doesn't appear to have multi-site support.
Apache Kafka
I read up on this, but I haven't quite wrapped my head around it yet...
Questions:
Do any of these actually let you stream-append logs or are they best suited to dumping completed files in?
Is there a solution I'm missing that might be better?
Any recommendations on multi-site with some of the options that don't support multi-site by themselves?
Interesting links:
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
http://blog.cloudera.com/blog/2015/07/deploying-apache-kafka-a-practical-faq/
https://www.elastic.co/blog/scaling_elasticsearch_across_data_centers_with_kafka
https://kafka.apache.org/08/ops.html
https://github.com/Stratio/cassandra-lucene-index
I may be a bit biased, since Couchbase is my employer, but this sounds like the kind of problem that XDCR (Cross Datacenter Replication) was made to solve.
You could stand up a cluster on multiple geographical sites (Couchbase calls these "datacenters") and then XDCR would automatically replicate (bidirectionally) the data between sites. If I understand your requirements correctly, this sounds like just what you need.

SQLite for client-server

I've seen a couple of SQLite performance questions here on Stackoverflow, but the focus was on websites, and I'm considering using this DB in a client-server scenario:
I expect 1-10 clients for one server for now, could go up to 50 or more in the future.
slightly more reads than writes
the DB would sit behind a server process (i.e: not using direct DB access through a network)
Would using SQLite make the app less responsive as opposed to using PostgreSQL? My intuition tells me that it should be ok for these loads, but maybe someone has some practical experience with this kind of scenario.
I did use SQLite for a major client/server product used with ~10 concurrent users and I deeply regret that decision. In my opinion - PostgreSQL is much more suitable for client/server scenarios than SQLite due to its fine locking granularity.
You simply can't get very far when the entire database is locked whenever someone needs to write something ..
I like SQLite very much (I even wrote a commercial utility for comparing SQLite databases - SQLite Compare but I don't think it fits the bill when you have client/server scenarios.
Even SQLite's author says that it should be used as a replacement for custom file formats and not as a full blown database server. I wish I took his advice more seriously..
You didn't mention what operating system and Postgres versions you are using. However, before considering change of database engine, try to do some logging and benchmarking your current database with typical usage, then optimize "heaviest" questions. And maybe your backend processing load makes DB question time irrelevant? As SQLite is a file-based DBMS, concurrent access from multiple processes will degrade performance when client number grows up (edited after comment)
Following question may be helpful: How Scalable is SQLite?
I would confirm to S.Lott's answer.
I dont know how SQLite performs in comparison to PostgreSQL, since I don't know any newer meassurements, but my own experience with SQLite in a rather similar environment is rather good.
The only thing that might cause troubles in my view is that you have rather many writes. But it all depends on the total number per second I would say.
Also your setting to have one server process is optimal for SQLite in my opinion -- so you circumvent its weakness in multi-tasking.

About dynamics CRM performance

My boss asked me to do a research on available CMSes on market because cms we are using currently is rather a mess.
For me as a .NET developer it would be great to choose and implement Dynamics CRM because of extensibility and perfect integration with .NET environment and well-known tools.
All marketing sounds great but I'd like to know about common DISADVANTAGES, ISSUES concerning this system.
The most important is how it is performing in a company with about 150 concurrent and very active users. I heard that it's really slow comparing to competitors system.
The Dynamics CRM Product team has published an excellent whitepaper with guidance and benchmarks for 500 concurrent users. You can learn a lot by studying this paper. The link is here:
Microsoft Dynamics CRM 4.0 Suggested Hardware for Deployments of up to 500 Concurrent Users
I can't answer regarding the number of users/activity. I can refer you to the SDK article 'Performance Best Practices'. I'll speak to the side of you that would be writing plugins (to data access messages), custom pages accessing the CRM web services, and writing SSRS reports. A couple of points I can relate to:
Disable Plug-ins. This is an attractive and major integration point into CRM. The fact that they list it as a performance issue is disheartening. We have seen OutOfMemory exceptions stemming from the plugin cache. We got around this issue by deploying to disk rather than database. In the database they reload the assembly and confirm the signature every time a plugin is called. We believe this was eating up the Large Object Heap. Probably not an issue for your normal CRM implementation.
Limit Data Retrieved. Definitely. Avoid lookups/picklists/bits you don't need when you can as these cause an extra join. Not going to be a huge deal on smaller entities. But if you need entities with a large number of attributes it could be. Probably not an issue for normal CRM customization. A good design in other cases should avoid this issue.
I can't really offer any advice on how it compares to its leading competitors. I know the main thing is that its cheaper and very actively developed.
I can say a bit about performance though which might help.
We have about 400 - 600 concurrent users using the system. The system isn't particularly web server intensive. We have two for resliency - it would be a disaster if it went offline, but these servers are never taxed. They have a couple of virtual cores and 4 gig of ram.
Our database is 130GB in size and is hosted on a 24 core database server with 48GB of RAM. It is clustered but because SQL server can't handle two active nodes, only one server is ever active.
The database server really never gets maxed out. However, there is one very important change we needed to make and one that I think MS are advising all users of large CRM installs to do now. By default SQL Server has a locking mode that will block people writing to the database when a row is being read. In our system (and numerous others apparently) that was causing huge issues.
We switched on a different mode (I think its called "snapshot isolation") or something like that. To be fair though even if you did have 200 concurrent users, it won't be any issue until the more central tables like activitypointer and account get pretty large (in the millions)
So - there is no doubt that CRM 2011 can handle that many users as long as you have some suitable hardware and have someone who understands SQL Server
HTH
S

Would SQLite be a 'better' choice for Joomla than MySQL, if it would be available?

Since this doesn't touch a real problem of mine I'm somwhat uncertain, if it is even worth to be asked here. However maybe some of you would like to share your opinion on that.
In general I have to admit, that 'better' means anything and nothing at all at the same time. So I probably should be more specific, but I tried not to overflow the topic. In a regular hosted environment on one of those cheap webhosters (like Dreamhost), with around 1000 articles in Joomla, a couple of users and a few hundreds visitors a day, would a SQLite database with a persistent connection (sqlite_popen) perform noticeable faster than the MySQL equivalent (with the TCP/IP overhead etc.)?
Or in short: Would it be wise to call Joomla to support SQLite?
I have never used sqlite on a website, but I have used it extensively for other purposes and I quite like it. The truth is, you won't know till you try. If you try, I reccomend creating a db abstraction layer first so that you can easily swap in other db's.
The downside to sqlite is that it's not really meant to be a multi user database. If you rarely write to the db, but do lots of reading, sqlite will probably be fine. If you find that you need multiple processes writing to the same db, I believe sqlite uses file level locking to maintain database consistency.. So, if all you're tables are in the same file, you'll lock the whole file while it's being written to even if another process wants to modify a completely different table.
In my opinion it's not the big multi user databases of the world that should be worried about competition from sqlite... It's all the regular files out there (and there custom file formats) that applications create and use that should be shaking in their boots about sqlite...
Linux ISPs for whatever reason seem to have settled on MySQL. This is what they offer and you will lock yourself to a limited number of service providers if you wander outside the norm.

Resources