Hibernate search : JSONB columns - jsonb

We are currently using columns with type Jsonb in PostgreSQL and using Hibernate-types for mapping and insert/update (which is working very well so far) , the only limitations that we are finding right now is related to search with filters on properties of the JSON document (between two dates, ,== , <= , >= , etc ).
Q1 : Is there any way to use Hibernate for querying JSON documents ?
Q2 : Is it a good idea to use Hibernate-search to update Elasticsearch and then use Lucene syntax to query ?

Hard to address Q2 without knowing what your requirements are exactly. IMO it's always a good idea to implement search using a dedicated solution such as Elasticsearch (through Hibernate Search of course), but I may a little bit biased :) If you're fine with using PostgreSQL's non-standard features and those features are enough (e.g. you don't really want to use full-text search or faceting), then Hibernate Search + Elasticsearch may be overkill. I'd argue you probably should be using Elasticsearch's advanced full-text search, but to each their own.
The question really is: does PostgreSQL provide a syntax to do what you want, i.e. extract a value from the JSON and apply an operator to it? That's likely, though I'm not familiar enough with JSON in PostgreSQL to give you that syntax.
Once you found the proper syntax, you can use it in HQL (Hibernate ORM's extension of the JPA's query language, JQPL). Either:
[ORM 6.0+ only] by using the sql() function in your HQL, i.e. sql('<put some SQL here, using ? to represent arguments>', <put comma-separated arguments here>). Hibernate ORM will just insert the proper SQL into the query it sends to the database.
by declaring custom HQL functions and calling these functions in you (HQL) query.
Of course, if necessary, you can also fall back to native SQL for your whole query, though then mapping the results back to managed entities will prove a bit more cumbersome.

Related

Faceted Search Compared to a Field on a RDMS

I am trying to grasp the ins and outs of Elasticsearch and others of its type. The problem for me has been all the new vocabulary that does not obviously track to systems I already know and understand, hence this post.
It looks like a faceted search is directly analogous to doing a field search on Postgres or any other RDBMS, but I'm uncertain if that's right, since Elasticsearch, I'm told, is 'sort of' a NoSQL. Can someone here either clarify for me directly, or point me to some good, non jargony explanations?
Also, what is the analogy to searching on a foreign key relation? Or is the fk just another "facet"?
Thanks.
Faceted search is aggregatons similar to sql groupby with aggr.functions. Query by field is completely different matter. In most cases there's nothing like foreign key in ES
I found the answers.
1. Yes, something closely analagous to a field search on postgres (PG) can be done on ES, and in fact is done all the time. In fact, they are called fields on ES, even though there is not a one to one correspondence between what you would do on PG and what you do on ES.
A corollary to that is that you can, and probably should, use both, for different purposes, since ES is not transactional. Although this guy is teaching Ruby (I use Python) his video and script make syncing from PG to ES ridiculously easy: https://www.youtube.com/watch?v=uMctvIIgBGY&index=3&list=PLjQo0sojbbxWcy_byqkbe7j3boVTQurf9 (Episode 2 Mapping and Syncing).
Finally, on this point, "facets" are being deprecated by ES. Instead, the new term of art is "aggs", aka "aggregations", in this case, specifically, "bucket" aggregations. See Episode 7 from the Ruby guy mentioned above, and https://www.youtube.com/watch?v=H4V9ukR5fYQ Elasticsearch - Aggregations.
ES does in fact support relations that look very much like foreign keys: has_a_child and has_a_parent. What's more, you can use nesting to make your fields / attributes / whatever you want to call them "relate" to what had been fk's on PG when you sync. See the official docs as well as the aforementioned videos: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html.

Solr Slice v Page

Is it possible to use Slice via solrTemplate ?
actually I am struggling to see if it will even make a difference because even without using spring, there doesnt appear to be any way of telling Solr to exclude its "numFound" (total results) from a query
And when I use a normal spring data Page<..> query , when I look under the hood I only see one query issued to solr, i.e. no extra one for count. Or is the count simply done inside Solr somehow in an extra step ?
confused
Total document count is part of the Solr query. No additional query is required. Therefore, there is no advantage to Slice vs. Page.
The only related concept is when somebody wants to export a significant amount of data, in which case built-in paging becomes slower the further is data requested. For that, Solr has exporting functionality.

Why SOLR has a schema and ElasticSearch does not?

We were comparing those search solutions and started to wonder why one does need a schema and the other does not. What are tradeoffs? Is it because one is like SQL and the other is like NoSQL in sense of schema configuration?
ES does have a schema defined as templates and mappings. You don't have to use it, but in practice you will. Schema is actually a good thing, and if you notice a database claiming to be pure schemaless - there will be performance implication.
Schema is a tradeoff between ease of developing and adoption against performance. It is easy to read/write into a schemaless database, but it it will be less performant, particularly for any non-trivial query.
Elasticsearch definitely has a schema. If you think it does not, try indexing a date into a field and then an int into the same field. Or even into different types with the same name (I think ES 2.0 disallows that now).
What Elasticsearch does is simplifies auto-creation of a schema. That has tradeoffs such as possible incorrect type detection, fields that are single-valued or multivalued in the result output based on number of elements they contain (they are always multivalued under the covers), and so on. Elasticsearch has some ways to work around that, mostly by defining some of the schema elements and explicit schema mapping as Oleksii wrote.
Solr also has schemaless mode that closely matches Elasticsearch mode, down to storing all JSON as a single field. And when you enable it, you get both similar benefits and similar disadvantages Elasticsearch has. Except, in Solr, you can change things like order of auto-type strategies and mapping to field types. In Elasticsearch (1.x at least) it was hard coded. You can see - slightly dated - comparison in my presentation from 2014.
As Slomo said, they both use Lucene underneath for storing and most of the search. So, the core engine approach cannot change.

How to search for multiple strings in very large database

I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?
The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.
You have the choice between elasticsearch, solr or lucene

Elastic search and "databases"

Sorry for the ambiguous title, couldn't thing of anything better fitting.
I 'm exploring Elastic Search and it looks very cool. My question is conceptual since I 'm used to sql.
In Sql, you have different databases and you store the data for each application there. Does the same concept exist in ES? Or is all data from all my application going to end up in the same place? In that case, what are the best practices to avoid unwanted results from unfitting data?
Schemaless doesn't mean structureless:
In elastic search you can organize your data into document collections
A top-level document collection is roughly equivalent to a database
You can also hierarchically create new document collections inside top-level collections, which is a very rough equivalent of a database table
When you search documents, you search for documents inside specific document collections (such as search for all posts inside blog1)
Individual documents can be viewed as equivalent to rows in a database table
Also please note that I say roughly equivalent -- data in SQL is often normalized into tables by relations, while documents (in ES) often hold large entities of data. For instance, it generally makes sense to embed all comments inside a blog post document, whereas in SQL you would normalize comments and blogposts into individual tables.
For a nice tutorial, I recommend taking look at "ElasticSearch in 5 minutes" tutorial.
Switching from SQL to a search engine can be challenging at times. Elasticsearch has a concept of index, that can be roughly mapped to a database and type that can, again very roughly, mapped to a table. Elasticsearch has very powerful mechanism of selecting records (rows) of a single type and combining results from different types and indices (union). However, there is no support for joins at the moment. The only relationship that elasticsearch supports is has_child, but it's not suitable for modeling many-to-many relationships. So, in most cases, you need to be prepared to denormalize your data, so it can be stored in a single table.

Resources