I know there is a built-in DuplicateFilter in Lucene, to deduplicate the results from lucene. This is a very important feature for the users to search on the document database, where duplicating rate is very high.
As I am using Hibernate Search to do the full text index/search, and wondering if there is a way for me bring the DuplicateFilter on Lucene to the Hibernate Search?
It is possible by using filters. See for BestDriversFilter - it extends org.apache.lucene.search.Filter in the same way as DuplicateFilter.
Related
We are using hibernate search api for elastic search. I came across lenient option in elastic search: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html but unable to figure out how can we set this through hibernate search.
Some of the fields in elastic search are boolean, and range, and we don't want to search through them.
There is no such option in Hibernate Search at the moment.
In Hibernate Search 5, you can write your whole query as JSON in order to take advantage of native options. That will require you to write everything as JSON, though, not just the simple-query-string query.
In Hibernate Search 6, you write just the simple-query-string predicate as JSON to take advantage of a native option in this predicate and continue using the Hibernate Search DSL for other predicates in the same query, which should be more convenient.
If you need the feature in the DSL, you can create a JIRA ticket and it will be addressed in Hibernate Search 6 if it makes it to the top of the priority list. If you need it before then, you can of course submit a pull request. Note that new features should go into Hibernate Search 6 (master branch), not Hibernate Search 5 which is currently in maintenance mode.
Is it possible to execute match_phrase_prefix in using hibernate search Query? I did not find any appropriate query class so far. I also don't want user ElasticsearchQueries.fromJson since I need to combine different conditions using bool
If you do not want to use ElasticsearchQueries.fromJson, then no, it's not possible to do that through the Hibernate Search APIs. That's a limitation of the experimental support for Elasticsearch in Hibernate Search 5.
It will be possible in Hibernate Search 6, but it's still an Alpha.
How can I write aggregation query on a particular term in hibernate search using elastic search.Is there any way to do this?
If you are looking to perform faceting (which is one way of using aggregations), Hibernate Search provides a dedicated feature: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#query-faceting
If you want to aggregate for other purposes, I'm afraid this is not possible (yet) using Hibernate Search. We are planning to introduce it in Hibernate Search 6, though (or at least allow to perform aggregations by bypassing Hibernate Search's abstractions).
I have a doubt about how indexed properties works in Alfresco 4.1.6 with SOLR 1.4.
I use something like this for my queries:
SearchParameters sp = new SearchParameters();
sp.addStore(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);
sp.setLanguage(SearchService.LANGUAGE_FTS_ALFRESCO);
sp.setQuery(query);
ResultSet results = getSearchService().query(sp);
where query variable is something like this:
PATH:" /app:company_home/app:user_homes/cm:_x0030_123//*" AND
((#cm\:title:food) OR (#cm\:name:abcde) OR (TEXT:valles) OR
(#doc\:custom_property:"report") OR (#doc\:custom_property2:"report")
AND (#doc\:custom_property3:"report") AND TYPE:"{my.model}voc_document"
On my model.xml I specify what custom properties are indexed
<index enabled="true">
My question is... How works SOLR 1.4 with the indexes if I put on the search query two or more indexed properties? Like Oracle? Oracle try the best index and use only this. Or maybe SOLR combine all the indexed properties and uses all the index on the query?
I need this answer to determine how many indexes put on my model.xml. Maybe put a lot of indexes don't give me the best and efficient result and is better index only a few properties.
And finally, one question. I use LANGUAGE_FTS_ALFRESCO, but I can see that exists a LANGUAGE_SOLR_FTS_ALFRESCO. Is the same? I need to use the second if I use SOLR?
Thanks a lot!
Best regards
There is only one "index". Every field you mark as indexable (which is enabled by default) ends up in your solr index. Alfresco takes your query and sends it to SOLR for processing.
If you don't have a lot of documents, you can go ahead and index every field. By far the biggest impact on indexing and search is the full text index of the content field, which is enabled by default also.
LANGUAGE_FTS_ALFRESCO will use whatever index subsystem you have enabled. In later versions it may use SOLR or the database depending on your configuration. If you try to LANGUAGE_SOLR_FTS_ALFRESCO, it's forcing SOLR, so if you don't have solr enabled, you would have an error.
Regards!
I user Hibernate as well as the Grails Searchable Plugin which is based on Lucene and Compass. I was wondering when I should use what for querying objects from the database.
Is there a rule of thumb when to use Hibernate and when to user Searchable?
Searcable plugin will be highly useful when you think of free form text search through out your application.
To cite an example, if you are working on a banking application and you are building a portal with a search feature. And you want the search to be free form for all the key elements like customer name, ssn, phone number and/or email id, then you would like to index those using searchable and provide the search talking to searchable to get immediate search results. For this to happen you would have to index those key elements at the least. The indices would grow as ans when you add more key search elements.
On the other hand, hibernate will help you provide the detail information if you do not want to index lot of elements. To extend the above example, once you did a search on SSN and you got a hit, on selecting that entry you can use hibernate to fetch the detail information from the underlying persistence layer using hibernate.
Inference:
For speedy, high performance, free form search searhable is an option.
For gathering detailed information, post the search, I think hibernate is the way to go unless you want to use searchable for the detail info as well in which case the size of the indices will be in Gigs.
Follow here in elastic search which might help to understand.
My point is to make elastic/searchable lighter keeping the heavy lifting part taken care by hibernate.
NOTE
On a side note, I would suggest using elastic instead of searchable. It has also got a groovy API which is useful. Also note that elastic plugin uses v0.20.0 version of elastic search right now, the latest one being v0.90.2 I guess. If required you can directly use elastic search as a dependency and get the latest feature.