dataflow output to neo4j using jdbcIO vs java driver - jdbc

I'm currently working on a setup that uses Google Cloud Dataflow to transform data and save into a Neo4j Database hosted on a Compute Engine VM. The current setup uses a JdbcIO to write to Neo4j by running a prepared statement, but it seems also possible to use a Neo4j driver directly in Java, which allows more flexibility on creating the query dynamically.
I wonder if anyone had compared the 2 approaches and notice any differences? I would guess the Jdbc approach would be more efficient since it only need to run the prepared statement, but that is just my guess.
Update
I'm going to post my findings as I experiment more with the two methods

One difference I found was that when using JdbcIO, I run into a deadlock when multiple transaction is trying to write to the same node at the same time. However that issue did not occur when using Neo4j driver directly.

Related

Neo4j to Oracle real time data sync

In one of the use cases in my application there is a requirement to publish neo4j transaction data to oracle database in real time. I did google on it, but couldn't find a tool or plug-in which can help. Everywhere on internet talks about rdbms to neo4j sync. So I am planning to do this by manually invoking jdbc commands.
Can you please suggest something?
Had to write my own jdbc code.

Using Saiku-ui without saiku-server/mondrian?

Is it possible to use the saiku-ui component with a different jolap provider than mondrian, or with a different server backend than the saiku-server component?
I have been looking but I have not found an architecture description of how these pieces fit together and what interfaces they use to communicate. Can anyone point me towards an understanding of what the saiku-ui wants to speak with and what the saiku-server is providing?
The reason for my interest is that I have a set of data spread across hundreds of csv files that I would like to query with a pivot and charting tool. It looks like the standard way to use this with saiku would be to have an ETL process to load in to an RDBMS. However, this would not be a simple process because the files and content and the way the files relate to each other vary, so the ETL would have to do a lot of inspection of the data sources to figure it out.
Given this it seems to me that I would have three options in how to use saiku:
1) write a complex ETL to load in to a rdbms, and then use a standard jdbc driver to provide the data to modrian. A side function of the ETL would be to analyze the inputs and write the mondrian schema file describing the cubes.
2) write a jdbc driver to access the data natively. This driver would parse sql and provide access to the underlying tables. Essentially this would be a custom r/o dbms written on top of the csv files. The jdbc connection would be used by mondrian to access the data. A side function of this custom dbms would be to produce the mondrian schema file.
3) write a tool that provides a jolap interface to the native data (accepts discovery and mdx queries). This would bypass mondrian entirely and interface with the ui.
I may be a bit naive here but I consider each of the three options to be feasible. Option #1 is my least preferred because of the likelihood of the data in the rdbms becoming out of sync with the cvs files. Option #3 is most preferred because the data are simple, so not much aggregating required and I suspect that mdx will be easier to parse than sql.
So, if i could produce my own jolap data source would it be possible to hook the saiku-ui tools up to it? Where would I look to find out the interface configuration details?
Many years ago, #ronaldbouman created xmondrian - the set of tools with the olap server, and web ui tools for xmla browsing and visualusation. But that project was not updating, and has no source code.
I just updated olap server and libraries to the latest versions.
You may get it here and build:.
https://github.com/Muritiku/xmondrian-build.
You may use web package as the example. The mondrian server works with the saiku-ui.
IMHO,
I would not be as confident as your are, because it took Julian Hyde more than a decade to build Mondrian (MDX->SQL) and Calcite (SQL), fulfilling your last two proposals.
You might simply consider using Calcite, or even better Dremio. Dremio has a JDBC interface, and can query directories of CSV files in SQL. I tested Saiku over Dremio successfully (with a schema based on two separate RDBMS). Just be careful to setup tables' schema accordingly in the Mondrian v4 schema.
Best regards,
Fabrice Etanchaud
Dremio

Ruby: Mongoid Criteria Selector to SQL Query

As part of my bachelor's thesis I'm building a Microservice using Postgres which would in part replace an existing part of an application using MongoDB. Now to change as little as possible at the moment on the client side I was wondering if there was an easy way to translate a Mongoid::Criteria to an SQL query (assuming all fields are named the same, of course), without having to write a complete parser myself. Are there any gems out there that might support this?
Any input is highly appreciated.
Maybe you're looking for this : https://github.com/stripe/mosql.
I don't dig it but it seems to work for what you need :
"MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL."

Integration test with in memory db and spring jdbc

We have multiple oracle schema which we want to import in to somekind of inmemory db so that when we run our integration test we can use that db and run our tests faster.
Is there anyway we this can be achieved using something like HSQL db. We are using spring framework and it does support inmemory db.
Any link to some resource would be highly appreciated.
Try force full database caching mode, if you're using 12.1.0.2. It's not exactly the same as a full in-memory database, but it should be closer.
alter database force full database caching;
In-memory database performance is over-rated anyway. Oracle's "old-fashioned" asynchronous IO and caching often work just fine. For example, in this question, accessing a temporary table (which is stored on disk) runs faster than an equivalent solution using in-memory data structures. And I've seen a small Oracle database handle petabytes of IO with the "boring" old buffer cache.
Or when you say "run our tests faster", are you referring to a more agile database; one that can be controlled by an individual, instead of the typical monolithic Oracle database installed on a server? I see that issue a lot, and there's no technical reason why Oracle can't be installed on your desktop. But that can be a tough cultural battle.
Yes, you can use HSQLDB for the purpose of unit testing - see this post for more information on how to integrate with Spring.
Also, see this list as a good starting point for different usages of HSQLDB.

Neo4j - get friend list up to the 4th degree

I am working with one application which requires to show a friend list up to the 4th degree. After some research I came to know about one solution i.e. Neo4j.
I didn't get a clear idea from their tutorial, can I connect Neo4j to MySQL, and if not how should I implement that myself? I am currently using the codeigniter framework with MySQL.
Thanks.
neo4j is a database, and mysql is a database. so, this question is largely about connecting databases from different vendors together.
at this time, neo4j and mysql do not support direct connections to each other. you'd typically accomplish your desired task by exporting your data from mysql as CSV files (http://www.mysqltutorial.org/mysql-export-table-to-csv/) and importing to neo4j (http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/)
michael hunger, a colleague of mine at neo4j, recently wrote this auto importer. you might want to check it out to make this process much easier:
https://github.com/jexp/neo4j-rdbms-import
before going through this data export/import, you may just want to download neo4j and play with the movie dataset. you can do this in about a minute (https://www.youtube.com/watch?v=om6E-HqtrZ0).
then, there are standalone PHP drivers for Neo4j:
http://neo4j.com/developer/php/
josh addell, the author of Neo4jPHP, has even written a post about how to use codeignitor 2 with his library:
http://blog.everymansoftware.com/2011/08/getting-neo4jphp-working-with.html

Resources