How to integrate Oracle and Kafka - oracle

I've been trying to find the most efficient/effective way capture change notifications in a single Oracle 11g R2 instance and deliver those events to an Apache Kafka queue, but I haven't been able to find any simple examples or tutorials along these lines.
I've seen some possibilities on the Oracle side (Streams, Change Data Capture,triggers (yuck), etc..), but I'm still not sure which would be best to pursue.
Here is a project utilizing MySQL and Kafka on GitHub called mypipe, I just haven't seen anything similar for Oracle. I'm not sure if it would be best to focus writing an Oracle package for this, or a layer similar to the mypipe project, etc. etc..
Any recommendations, suggestions or examples would be greatly appreciated. Thank you.

There is currently just one tool which is open source and has minimal impact on the database. This is OpenLogReplicator.
license is GPL - it is fully open source
it has very low impact on the source database - it requires no licensing options and just turning on supplemental logging on the source (like all other replication tools)
it is written completely in C++ - so it has very low latency and high throughput
it works completely in memory
it supports all Oracle database versions since 11.2.0.1 (11.2, 12.1, 12.2, 18, 19)
It reads binary format of Oracle Redo logs and sends them to Kafka. It can work on the database host, but you can also configure it to read the redo logs using sshfs from another host - with minimal load of the database.
disclaimer #1: I am the author of this solution
disclaimer #2: to other StackOverflow users: please do not delete this answer. This question has a lot of duplicates. But this is the first question and other duplicates should be redirected here and marked as duplicates. Not the other way. I have deleted all other answers from other questions and just leaving this answer as the primary answer.

I think one approach might be to utilize Oracle GoldenGate for Big Data (researching this myself), obviosuly its most likely a costly solution ($)?
https://blogs.oracle.com/dataintegration/entry/introducing_oracle_goldengate_for_big
Let me know if you got anywhere with this, good luck ...

Related

Tibco Ems ha/dr solution

I am new to Tibco EMS. We are currently using EMS 8 and are looking for a HA/DR option for it. I have heard of using Veritas for this purpose but that might not be our option for now.
So am looking for a open source alternative for veritas. I have also seen a few discussions where people suggested using rdbms/mysql for this but not sure how to do it.
Can someone please put me in the right direction?
For High Availability of EMS there are two modes you could consider. The first mode is called "Unshared State" and means that while two servers act as a fault-tolerant pair, their state (and thus the messages) are not shared between the two. The other mode is called "Shared State" in which the secondary server has access to the state in case the primary server goes down. I've seen both being used for different types of use cases, so you'll have to judge yourself which fits best.
If you want to go with a shared state, you'll have to make sure that both servers can access the state and generally speaking you'll have two options to do so:
Filesystem
Database
For the database option, there are a few databases that are supported by TIBCO. Please refer to the EMS User Guide, page 343 for more details on the supported databases as well as how to set up the data stores.
For the filesystem option, you'll have to make sure that your filesystem (either software or hardware) supports the four main characteristics that EMS needs:
Write Order
Synchronous Write Persistence
Distributed File Locking
Unique Write Ownership
Source: EMS User Guide, page 520
I've seen Veritas being used a lot, though I've also seen people use a clustered file system (like RedHat GFS or Oracle OCFS). Please be aware that depending on which option you choose, you want to properly test your scenarios and potentially reach out to TIBCO Support.

PostgreSQL from NodeJS application

I am exploring how best to access a PostgreSQL/PostGIS DB from NodeJS. All I need is simple SQL SELECT queries. Nothing more complex than:
SELECT *
FROM portal.catalog AS cat
WHERE ST_Intersects(st_geogfromtext('SRID=4326;POLYGON((20 50 ,19 50,19 49,20 50 ))'), cat.gpoly)
LIMIT 5000;
This will be on a windows7 or windows2008 server, running PostgreSQL 9.2/PostGIS 2.0, The traffic will be pretty light (only a few requests per minute).
Some preliminary research I have done has come up with the following potential directions. But I was interested in hearing from others what is working for them (as an easy implementation).
https://github.com/brianc/node-postgres (But I am having trouble building it do to firewall issues), Supposed the "pure" solution is better, but I am having issues there also) https://github.com/brianc/node-postgres-pure
http://www.infoq.com/articles/the_edge_of_net_and_node (And then I guess I would write my own ADO.NET adapter to PostgreSQL)
I have also seen references to ODBC for NodeJS (unclear if this is the way to go)
Is there something like the SQL adapter for NodeJS? http://blogs.msdn.com/b/sqlphp/archive/2012/06/08/introducing-the-microsoft-driver-for-node-js-for-sql-server.aspx
There was also a full blown ORM by EntitySpaces (which went bankrupt). Now a defunct opensource project: https://github.com/EntitySpaces/entityspaces.js
I've used node-postgres in the past, but recently opted for any-db, which has support for PostgreSQL.
Both have worked well, although I prefer any-db, particularly with respect to pooling and transactions. I believe any-db deserves more recognition.
Any-db is layered on top of BrianC's node-postgres.
But I just got the https://github.com/brianc/node-postgres-pure working, and it is a pleasure.
EntitySpaces is the way to go, not defunct at all.
http://download.cnet.com/EntitySpaces-Studio/3000-10250_4-10590953.html?tag=mncol;1
I got BrianC's PostgresPure system working (must have been a dependent module malfunctioning, since I did not do anything special.
Works just great.
See: https://github.com/brianc/node-postgres-pure

Migrating from Oracle to PostgreSQL : where are the limits?

I've found similar posts on this forum, but migrate it to another one if needed.
We want to migrate to PostgreSQL from Oracle, but we have 6000 users simultaneously connected to a 4 To GIS database(divided in 1 To instances) and many other instances for WebServices.
Before looking at other problems, we heard that 500 max connected users is the max limit supported before performances decrease, decrease augmented when size of database become huge.
Have you got any (or do you know links to) successful experience on such a migration?Do we have to wait for PostgreSQL better performances to migrate?
EDIT
Found another example.
Please, read this article on the subject by Kevin Grittner, it will explain a lot on why many connections are problematic and what were the decisions by the PostgreSQL Core Team to approach this issue.
For the list of success stories, refer to the EnterpriseDB site, this is a company offering support for the standard PostgreSQL distribution as well as support and licensing for the advanced products built on top of standard distribution. For the enterprise database usage Postgres Plus Advanced Server might be a good choice.

monetdb - anyone uses it in production?

I am very interested in using monetdb as a datamart, holding some huge data tables for querying and reporting
However, after some searching, I am unable to find any online posts / blogs regarding their use of Monetdb in any kind of production capacity.
Also, there seems to be little or next to no activity online regarding Monetdb.
Is this a bad sign for the future of Monetdb ?
I am very interested in using monetdb as a datamart, holding some huge data tables for >querying and reporting
My boss is also interested in MonetDB and I had the same reaction as you. No one is writing about MonetDB... is no one using MonetDB?
Regardless, I have been running performance tests on datasets of 500,000 to 1,000,000 records comparing MonetDB (column-oriented dbms) vs. MySQL (row-oriented dbms) and MonetDB beats MySQL in all regards- even in bulk inserts... which hypothetically it should not be as good at.
I can't speculate as to what all this means for MonetDB's future, but while it's around you might want to check it out because it performs well.
(I run Windows 7 and am communicating with each database using PHP)
I react a bit late to this post, but I'd like to add my voice to the ones using MonetDB in a production environment. We use it as the back-end of Spinque, a framework for designing complex search solutions. I've been using MonetDB for about 10 years, but only in the past 3 years in a production environment. Clearly, it has pros and cons and bugs like all other products, but it is being developed and improved very actively (I don't understand the low-activity signs that you refer to). If you want a DB that allows you to be ahead of the market standards, it's a good choice. Otherwise, just go for MS SQL ;)
I've been evaluating it lately for a client so I've had some time with it. My impression at this point is that it is just finishing "growing up" from being an academic experimental playground. It clearly has yet to be really discovered, though it does have some rough edges which might hinder certain applications.
As I write, I'm in the process of trying to load over 100 million rows into an instance (at 27mil presently). So far, it performs startlingly well in some areas (aggregates), but is oddly sluggish in others (most joins I've tried so far); that said, I've not yet run the recommended sampling process yet and I'm forcing it to live in just a single service with 32GB RAM.
I've found a few little glitches and one thing that caused a full service crash (obscure and reported), but I'm thinking that for many applications MonetDB could be just the ticket. Columnar storage (rather than NoSQL) seems to be the future IMO.
I'll update this if I find anything particularly interesting.
MonetDB is first and for all a research system, but has progressed far beyond the level of the average research prototype. It is the (only) relational column-store platform in open source that I know of that supports full SQL. I have used it myself at CWI in many research projects that are not core DB research, but do need advanced DB technology.
You can see on the user's mailing list that deployments happen in many different organisations. As Roberto Cornacchia stated in a different answer, it is the backend of all Spinque deployments and we are happy MonetDB users. MonetDB is also used at a variety of non-profit projects like open streetmap and open kvk.
More and more commercial parties deploy MonetDB for analytics. (They do not always like to advertise that their analyses depend on an open source system.) Recently, MonetDB Solutions has started to provide dedicated commercial support for these deployments.
We have been using MonetDB in our business. We analyse very large data sets with many millions of rows. Traditional methods of data warehousing on SQL databases became so slow. The problem we were facing was that the data was only going to get bigger! The only way forward was to go columnar.
The results have been amazing. When you have very few joins it is staggeringly quick. Even with joins on the data sets we are looking at it is still frightening how fast it comes back.
Having seen some of the commercial partnerships I think MonetDB is going to boom over the next few years. I believe some of the major BI suppliers are using Monet under their hood to perform the large data work.

SQLite for client-server

I've seen a couple of SQLite performance questions here on Stackoverflow, but the focus was on websites, and I'm considering using this DB in a client-server scenario:
I expect 1-10 clients for one server for now, could go up to 50 or more in the future.
slightly more reads than writes
the DB would sit behind a server process (i.e: not using direct DB access through a network)
Would using SQLite make the app less responsive as opposed to using PostgreSQL? My intuition tells me that it should be ok for these loads, but maybe someone has some practical experience with this kind of scenario.
I did use SQLite for a major client/server product used with ~10 concurrent users and I deeply regret that decision. In my opinion - PostgreSQL is much more suitable for client/server scenarios than SQLite due to its fine locking granularity.
You simply can't get very far when the entire database is locked whenever someone needs to write something ..
I like SQLite very much (I even wrote a commercial utility for comparing SQLite databases - SQLite Compare but I don't think it fits the bill when you have client/server scenarios.
Even SQLite's author says that it should be used as a replacement for custom file formats and not as a full blown database server. I wish I took his advice more seriously..
You didn't mention what operating system and Postgres versions you are using. However, before considering change of database engine, try to do some logging and benchmarking your current database with typical usage, then optimize "heaviest" questions. And maybe your backend processing load makes DB question time irrelevant? As SQLite is a file-based DBMS, concurrent access from multiple processes will degrade performance when client number grows up (edited after comment)
Following question may be helpful: How Scalable is SQLite?
I would confirm to S.Lott's answer.
I dont know how SQLite performs in comparison to PostgreSQL, since I don't know any newer meassurements, but my own experience with SQLite in a rather similar environment is rather good.
The only thing that might cause troubles in my view is that you have rather many writes. But it all depends on the total number per second I would say.
Also your setting to have one server process is optimal for SQLite in my opinion -- so you circumvent its weakness in multi-tasking.

Resources