The nosql database most similar to Rethinkdb - rethinkdb

I need to do a project in rethinkdb. However, it can not work in windows. Is there a database that is seen to be most similar so that i can learn the ropes and then start my project in a unix environment. I heard this could be mongodb or couchdb. However, I would like some opinion
Thanks

There are many different way to tackle this problem.
Installing RethinkDB in Windows
First, you can, in fact, get RethinkDB working on Windows. You just have to use a virtual machine. Here is a tutorial on how to install RethinkDB on Windows. That being said, it's not as simple to get it running on Windows as in UNIX environments.
Similar Databases
What NoSQL database is more similar to RethinkDB depends on what features of RethinkDB you are looking for.
If you're looking for a NoSQL JSON document store that you want to use in your server and has a similar API, MongoDB is probably the closes thing to RethinkDB. That being said, there are some pretty big differences between MongoDB and RethinkDB. I would take a look at an objective look at the differences and a more opinionated look at the differences.
If you're looking for something to provide RethinkDB's similar realtime features, you might take a look at FireBase. FireBase is not a NoSQL database, obviously, but it provides some of the realtime features of RethinkDB.
CouchDB might also be a good option, since it's also an open-source, NoSQL database and it also has support for change notifications, which are a way to pull changes from a particular table (Different from changefeed's push model).
Native RethinkDB for Windows
If you're interested in keeping track of Windows support for Windows, you should take a look at this GitHub issue.

Related

How to handle database in a gem intended for system-wide command line tool

I'm developing a gem that's basically used as a system-wide command line tool. This gem stores necessary data for the app in database. I'm wondering if there's any defacto-standard ish way to handle database in this situation.
So far, I'm thinking of using sqlite3 because I don't want users to go through these pain-in-the-ass processes to install system-wide mysql or postgress. (and yup, I'm using relational database and sqlite is more than enough in terms of performance etc, my app is just a simple small one)
If this is the right decision, the question boils down to where I should put sqlite3 database file. Definitely putting this under gem directory isn't a good idea, and so far I'm thinking of locating at /usr/local/MY_GEM/*.
Sorry the question might sound a bit vague for some people, but if I were to define a single question, it would be "Am I doing all right?" or "You guys have any better idea?".
Is the database completely user specific? Or is it static data your app needs? If it's user specific I'd put it in ~/.my_gem.db or ~/.my_gem/data.sqlite3 or similar.
#Philip's answer is great but you might also be interested in the XDG Base Directory Specification http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html I believe it is aimed at desktop applications but provides specification for data and config per user as well as system-wide configuration. High profile applications like Chrome and and Inkscape seem to make use of this specification.
The spec is based on environment variables but the result will likely be something like this:
~/.config/mygem/myconfig.db
~/.local/share/mygem/mydata.db

PostgreSQL config settings on dynamically created EC2 instances

Let me start by saying that I think there is a better way of doing things than I'm doing now... so, please don't post comments and answers saying that I should be using a different technology, etc. I have a "reasonably" specific question.
A little background:
Basically, I have system where I'm processing a lot of varied, but fairly structured data feeds each day (CSV files). It's a fairly generic ETL type of system. I started off writing Python scripts to do it all in memory. But, I found that I was writing a lot of code to check and enforce rules that could easily be described by a db schema. So, I've got a of a series of SQS queue (one for each source) that has file locations (on s3) to process and a PostgreSQL db script to load to do it. Hacky? Yes; probably. But, in a way, it's pretty easy to just define all of your rules in PostgreSQL. At least for me with approx 15 years of RDBMS experience (what's that old saying about when you only have a hammer, everything looks like a nail?)
So, all works pretty well. But, when creating EC2 instances, I have a choice of an image_id and a type/size. I have my base "PostgreSQL worker image" that I use, but it's really geared for one size (micro).
But, now I'm thinking about trying to play around and see what kind of gains I could get if I went with small or medium. My initial thought is that I would just created separate image_ids with a postgres conf settings geared to them. But, seems a bit messy. (but, the whole thing is a bit messy and hacky)
Given what I have in place, is there a better way to accomplish this than just separate AMIs?
Final notes:
My AMIs are all PostgreSQL 9.1 and Ubuntu 12.04. And the DBs are just temporary storage. They only exist for the 15 or 20 minutes they are needed to load/process/output the data.
If you feel like this question could be better answered on the SE's DBA site, then please feel free to add a comment. I usually start with StackOverflow because it's a bigger community and it's a community that I feel more at home with. I'm much more of a developer than a DBA.

Is Pentaho ETL and Data Analyzer good choice?

I was looking for ETL tool and on google found lot about Pentaho Kettle.
I also need a Data Analyzer to run on Star Schema so that business user can play around and generate any kind of report or matrix. Again PentaHo Analyzer is looking good.
Other part of the application will be developed in java and the application should be database agnostic.
Is Pentaho good enough or there are other tools I should check.
Pentaho seems to be pretty solid, offering the whole suite of BI tools, with improved integration reportedly on the way. But...the chances are that companies wanting to go the open source route for their BI solution are also most likely to end up using open source database technology...and in that sense "database agnostic" can easily be a double-edged sword. For instance, you can develop a cube in Microsoft's Analysis Services in the comfortable knowledge that whatver MDX/XMLA your cube sends to the database will be intrepeted consistently, holding very little in the way of nasty surprises.
Compare that to the Pentaho stack, which will typically end interacting with Postgresql or Mysql. I can't vouch for how Postgresql performs in the OLAP realm, but I do know from experience that Mysql - for all its undoubted strengths - has "issues" with the types of SQL that typically crops up all over the place in an OLAP solution (you can't get far in a cube without using GROUP BY or COUNT DISTINCT). So part of what you save in licence costs will almost certainly be used to solve issues arising from the fact the Pentaho doesn't always know which database it is talking to - robbing Peter to (at least partially) pay Paul, so to speak.
Unfortunately, more info is needed. For example:
will you need to exchange data with well-known apps (Oracle Financials, Remedy, etc)? If so, you can save a ton of time & money with an ETL solution that has support for that interface already built-in.
what database products (and versions) and file types do you need to talk to?
do you need to support querying of web-services?
do you need near real-time trickling of data?
do you need rule-level auditing & counts for accounting for every single row
do you need delta processing?
what kinds of machines do you need this to run on? linux? windows? mainframe?
what kind of version control, testing and build processes will this tool have to comply with?
what kind of performance & scalability do you need?
do you mind if the database ends up driving the transformations?
do you need this to run in userspace?
do you need to run parts of it on various networks disconnected from the rest? (not uncommon for extract processes)
how many interfaces and of what complexity do you need to support?
You can spend a lot of time deploying and learning an ETL tool - only to discover that it really doesn't meet your needs very well. You're best off taking a couple of hours to figure that out first.
I've used Talend before with some success. You create your translation by chaining operations together in a graphical designer. There were definitely some WTF's and it was difficult to deal with multi-line records, but it worked well otherwise.
Talend also generates Java and you can access the ETL processes remotely. The tool is also free, although they provide enterprise training and support.
There are lots of choices. Look at BIRT, Talend and Pentaho, if you want free tools. If you want much more robustness, look at Tableau and BIRT Analytics.

Which full-text search package should I use for SQLite3?

SQLite3 appears to come with three different full-text search engines, called FTS1, FTS2, and FTS3. The documentation available on the website mentions that FTS1 is stable, FTS2 is in development, and that you should use FTS2. Examples I find online use FTS3, which is in CVS, and not documented versus FTS2. None of the full-text search engines come with the amalgamated source, as near as I can tell.
So, my question: which of these three engines, if any, should I use for full-text indexing in SQLite? Or should I simply use a third-party tool like Sphinx, or a custom solution in Lucene, instead?
As of 3.6.21, FTS3 is well documented, and gained a more officially visible status.
FTS3 is part of the standard sqlite DLL build on Windows, not sure about the amalgamated source.
We've been using it on production for about a year with no particular issues.
I've looked into full-text solutions recently too. It seems like SQLite has no de facto choice right now. No matter what you choose, it's inevitable that you'll have to re-architect it as the various FT2, FT3, etc. solutions mature. So bite the bullet and assume you'll need to do more development in the future to keep pace with changing full-text technology.
Sphinx Search has no direct support for SQLite yet. It supports only MySQL and PostgreSQL right now (ca. August 2009). So you'd have to hack your own SQLite connector or else migrate SQLite data to MySQL or PostgreSQL and then index the data with Sphinx Search. I think someone is working on a Sphinx Search patch to support Firebird, so maybe it's not so hard if you're willing to roll up your sleeves.
Also be aware that Sphinx Search has some limitations about incrementally adding data to the index. You should spend an hour or so reading the doc before you decide to use it.
I don't know of any direct way to index SQLite data in Lucene either. You'd probably have to write your own code to process batches of SQLite data, adding rows to the Lucene index one at a time. This seems to be the usage of Lucene no matter what the database.
update: Solr is a great companion technology for Lucene. Solr gives that search engine many features, including the ability to bulk-load query result data from any JDBC data source.

How to implement in-process full text search engine

In one of our commercial applications (Win32, written in Delphi) we'd like to implement full text search. The application is storing user data in some kind of binary format that is not directly recognizable as a text.
Ideally, I'd like to find either an in-process solution (DLL would be OK) or a local server that I could access via TCP (preferably). The API should allow me to submit a textual information to the server (along with the metadata representing the binary blob it came from) and, of course, it should allow me to do a full-text search with at least minimal support for logical operators and substring searching. Unicode support is required.
I found extensive list of search engines on Stack Overflow (What are some Search Servers out there?) but I don't really understand which of those engines could satisfy my needs. I thought of asking The Collective for opinion before I spend a day or two testing each of them.
Any suggestions?
There are a number of options on the market. Either fully fledge commercial products or open source variants. Your choice of a search provider is very dependent on the customers you are targetting.
Microsoft has a free Express version of their Search Server. As far as I know the Express edition is limited to running the Application Tier on one server.
There is also the Apache Lucene project which is open source. It has a nice API that's easy to use and a large community of users. The original project is based on Java, but there are also other implementations such as NLucene for .NET that I have used personally.
I'd recommend having a look at SQLite -- full-text search is included in the latest version.
I suppose the answer depends on your db. For example SQL Server has full text search and also English Language Queries if ever needed.
Take a look at using PostgreSQL and tsearch.
Try using postgresql with tsearch
Sphinx is probably the most efficient and scalable option while SQLite - FTS3 is the most straightforward option.
While not in-process, Solr is very fast (based on Lucene) and easily accessible from any platform (HTTP)

Resources