How can I reuse a join in multiple Specification - spring

How do I reuse a join in multiple Specifications of a JPA-Query? I'd like to avoid joining the same table multiple times if multiple Specifications require the same join.

Related

JPA Specification on union of two tables

I'm trying to create a JPA specification (criteria builder) to retrieve data from two separate tables (union of two tables). Any thought ?
The main purpose is to make it easy to add search filter and pagination when querying from the two tables.

How to read multiple tables using Spring Batch

I am looking to read data from multiple tables (different database tables) and aggregate and create final result set. In my case, each query will return the List of object. I went through web many times, I found no link other than - Spring Batch How to read multiple table (queries) as Reader and write it as flat file write, but it returns only single object.
Is there any way if we can do this ? Any working sample example would help a lot.
Example -
One query gives List of Departments - from Oracle DB
One query gives List of Employee - from Postgres
Now I want to build Employee and Department relationship and send final object to processor to further lookup against MongoDB and send the final object to reader.
The question should rather be "how to join three tables from three different databases and write the result in a file". There is no built-in reader in Spring Batch that reads from multiple tables. You either need to create a custom reader, or decompose the problem at hand into tasks that can be implemented using Spring Batch tasklet/chunk-oriented steps.
I believe you can use the driving query pattern in a single chunk-oriented step. The reader reads employee items, then a processor enrich items with 1) department from postgres and 2) other info from mongo. This should work for small/medium datasets. If you have a lot of data, you can use partitioning to parallelize things and improve performance.
Another option if you want to avoid a query per item is to load all departments in a cache for example (I guess there should be less departments than employees) and enrich items from the cache rather than with individual queries to the db.

NIFI: join two tables from different databases

I have two transactional tables originating from different databases in different servers. I would like to join them based on common attribute and store the result altogether in different database.
I have been looking for various options in NIFI to execute this as a job which runs monthly.
So far, I have been trying out various options but doesn't seem to work out. For example, I used ExecuteSQL1 & ExecuteSQL2 -> MergeContent-> PutSQL
Could anyone provide pointers on the same?
NiFi is not really meant to do a streaming join like this. The best option would be to implement the join in the SQL query using a single ExecuteSQL processor.
As Bryan said, NiFi doesn't (currently) do this. Perhaps look at Presto, you can set up multiple connections "under the hood" and use its JDBC driver to do what Bryan described, a join across tables in different DBs.
I'm thinking about adding a JoinTables processor that would let you join two tables using two different DBCPConnectionPool controller services, but there are lots of things to consider, such as being able to do the join in memory for example. For joining dimensions to fact tables, we could try to load the smaller table into memory and then we could do more of a streaming join for larger fact tables, for example. Feel free to file a New Feature Jira if you like, and we can discuss there.

GraphQL as an abstraction for a data modelling tool

I'm trying to think out loud here to understand if graphql is a likely candidate for my need.
We have a home-grown self servicing report creation tool. This is web-based. It starts with user selecting a particular report type.
The report type in itself is a base SQL query. In subsequent screens, one can select the required columns, filters, etc. As we The output of all these steps is a SQL query, which is then run on an Oracle database.
As you can see, there are lot of cons with this tool. It is tightly coupled with the Oracle OLTP tables. There are hundreds of tables.
Given the current data model, and the presence of many tables, I'm wondering if GraphQL would be the right approach to design a UI that could act like a "data explorer". If I could combine some of the closely related tables and abstract them via GraphQL into logical groups, I'm wondering if I could create a report out of them.
**Logical Group 1**
Table1
Table2
Table3
Table4
Table5
**Logical Group 2**
Table6
Table7
Table8
Table9
Table10
and so on..
Let's say, I want 2 columns from tables in Logical group 1 and 4 Columns from Logical Group 2, is this something that could be defined as a GraphQL object and retrieved to be either rendered on a screen or written to a file?
I think I'm trying to write a data modelling UI via GraphQL. Is this even a good candidate for such a need?
We have also been evaluating Looker as a possible data modelling layer. However, it seems like there could be some
Thanks.
Without understanding your data better, it is hard to say for certain, but at first glance, this does not seem like a problem that is well suited to GraphQL.
GraphQL's strength is its ability to model + traverse a graph of data. It sounds to me like you are not so much traversing a continuous graph of data as cherry picking tables from a DB. It certainly is possible, but there may be a good deal of friction since this was not its intended design.
The litmus test I would use is the following two questions:
Can you imagine your problem mapping well to a REST API?
Does your API get consumed by performance sensitive clients?
If so, then GraphQL may serve your needs well, if not you may want to look at something like https://grpc.io/

Does RethinkDb support push updates on complex JOIN/Group by type of query

My use case: I have 20-30 tables that need to be store in DB. User requirement is to have dynamic ability to query data from multiple tables that can be joined and aggregated and pushed to the client. Is this something that RethinkDB can support. Also what is the scalability and how many simultaneous queries can it support lets say with 4 table join (each with 100K of rows).
Also I do not see support for RedHat, can I use other linux distribution version or it will not work ?
RethinkDB currently doesn't support changefeeds on joins. You can track progress on that feature at https://github.com/rethinkdb/rethinkdb/issues/3997 . You can get changefeeds on individual tables and do subqueries based on the changes to that table, however (r.table('test').changes().concatMap(whatever)).

Resources