Hibernate full text search matching - spring

I have two users with name 'Alex' and 'Andrei'. When i write query like 'A', I get 0 results. I have to search with the full name and matching capitalization to get a result.
I want for example just query for 'e' end receive 2 records.
Session s = session.getCurrentSession();
FullTextSession fullTextSession = Search.getFullTextSession(s);
QueryBuilder qb = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(User.class).get();
org.apache.lucene.search.Query q = qb
.keyword().onFields("name")
.matching(query)
.createQuery();
org.hibernate.Query hibQuery =
fullTextSession.createFullTextQuery(q, User.class);
List<User> results = hibQuery.list();

You are not showing how you index the first names. This is as well important. For example if you store index them un-analyzed, you need to take indeed capitalization into account. Also as mentioned in the comment, you need to look at the right analyzer. You case would only work if you were to use wildcard queries.
On a tangent, I don't know how close your example is to your actual usecase, but searching for a single character is probably not a typical free-text search use-case.

This particular behavior reminds me one error I got once.
'A' is a stop word (in English language) used by the standard analyzer of Hibernate Search, so you have to precise a custom analyser without this stop word.
These Links could help you
To use a custom analyzer: https://docs.jboss.org/hibernate/search/3.1/reference/en/html/search-mapping.html
To understand what I mean with 'a' like a stop word: Getting error on a specific query

Related

How do I use a phrase query in liferay with stop words

I am using Liferay 7.1 together with ElasticSearch and all I want to do is to search for (EXAMPLE): "This is a test".
But in this case "is" and "a" are stop words, they get filtered out, and therefore I do get results that I do not want like : "This test rocks".
I am using a BooleanQuery like this:
BooleanQuery keywordQuery = new BooleanQueryImpl();
keywordQuery.addTerms(KEYWORDS, keyword, false);
Keyword in this case is "this is a test".
Can anyone tell me how to make the BooleanQuery not filter out stop words ?
Best regards,
Daniel
Stop-Words are a concept of the analysis phase when indexing. So your index does not contain "is" and "a". Therefore, there is no param at query time to use stop words.
What you could do, is to use a different search index attribute which contains the full content with stop words. This depends on your configuration, maybe the is already an attribute without stopword, or you need to add one using a Index Post-Proccessor or modify your elastic Mapping Configuration.
Please check your documents structure (e.g. with elastic HQ) to inspect the attributes for stopwords.

How to overcome maxClauseCount error when using multi_match query

I have 10+ Indexes on my Elasticsearch server.
Each Index has 1 or more fields with different kind of Analyzers: keyword, standard, ngram and etc...
For Global search I am using multi_match without specifying any explicit fields.
For querying I am using using elasticsearch-dsl library, the code is bellow:
def search_for_index(indice, term, num_of_result=10):
s = Search(index=indice).sort({"_score": "desc"})
s = s[:num_of_result]
s = s.query('multi_match', query=term, operator='and')
response = s.execute()
return response.to_dict()['hits']['hits']
I get very good result, and search is working just fine, but sometimes someone enters a bit longer text, and I am getting maxClauseCount error.
For example, search that raises an error when search term term is equal to:
term=We are working on your request and will keep you posted at the earliest.
Or any other little longer text raises the same error.
Can you help me figure it out maybe some better approach for my Global search so that I can avoid this kind of error?
First of all - this limitation is there for a reason. The more boolean clauses you have - the heavier search would be. Think of it as crossing (AND) or joining (OR) subset of document ids for each of the clause. This is very heavy operation, that is why initially it has a limit of 1024 clauses.
General recommendation would be to try reduce number of fields you're searching. Maybe you have fields which consist no text data or just have some internal ids. You could cross them out during multi_match query by specifying fields section explicitly.
If you're still decided to go with current approach and you're using Elasticsearch 5.5+ and higher you could alter those by adding following line in elasticsearch.yml and restart your instance.
indices.query.bool.max_clause_count: 250000
If you're using pre-5 version of Elasticsearch the setting is called index.query.bool.max_clause_count

ElasticSearch: is it possible to highlight words in the query rather than the results

We use ElasticSearch in a reverse manner from what I usually see. We store lots of small documents, usually 1 or 2 words, for example, Job Titles like "software engineering", "car mechanics", "architect", etc.
Then we query with a longer string, for example a 1000 word Job Spec. This way we get all Job Titles present in the text of the Job Spec.
It works well. But I was wondering whether I could get ElasticSearch to highlight the matching Job Titles in the Job Spec, i.e. highlight the results in the query. I have tried the highlight keyword, but it doesn't highlight the query text, it highlights the results. I'm not sure how to get the query to be returned in the ElasticSearch response, let alone whether it can be highlighted.
You might wonder why I need ElasticSearch to highlight the query, can't I just pick out all the results from the text and highlight them myself? Yes I can, but there's various things to think about that makes it hard such as stemming and stopword removal. for example "jquery" is stemmed to "jqueri" when doing the tokenising in ElasticSearch, so it's found as a result, but if I want to highlight it myself, I have to unstem it so it matches the original text. Elasticsearch also removes symbols, so terms & conditions would become terms conditions which is problematic if I want to highlight it manually as I have to add back the "&" symbol. There's a hundred other problem cases, hence the question about whether ElasticSearch can do it for me.
I'm quite sure highlighting the query string isn't possible - only highlighting parts of documents in an index.
What you might try is indexing the query string itself in it's own index and then using the results of the first query as the query terms for a second query against the query string (in the second index). You could then have highlighting on the query string. You'll have to make an extra request to ES each time, but I think it'll get what you want.

Kibana 4 - Why does my simple query return correct results when using .raw but not without?

I'm trying out Elasticsearch/Kibana 4 and while my simple query:
program.raw:"MYAPPLICATION" AND entityId.raw:"12345-67N"
will return the results I want - i.e. result posts having the program and entityId field and containing the queried terms straight off, as I want.
However, I guess the right way to query this would be:
program:"MYAPPLICATION" AND entityId:"12345-67N"
but that gives my correct results only regarding the program query, and then a lot of hits on terms containing N or n. The entityId-part seems to only query on N?. I'm confused, please explain this. I've read up on the Lucene query syntax and can't find anything explaining this.
The .raw fields are setup by logstash as "not_analyzed" fields in elasticsearch. As such, they are not split into tokens and can be used intact.
To elasticsearch, entityId really looks like ['12345', '67n'], which is why your query doesn't match.
Note that, in your example, program:myapplication should work (since there are no special characters). Lowercase is automatic, IIRC.

Elasticsearch multi term search

I am using Elasticsearch to allow a user to type in a term to search. I have the following property 'name' I'd like to search, for instance:
'name': 'The car is black'
I'd like to have this document returned if the following is used to search black car or car black.
I've tried doing a bool must and doing multiple terms ['black', 'car'] but it seems like it only works if the entire string is a match.
So what I'd really like to do is more of a, does the term contain both words in any order.
Can someone please get me on the right track? I've been banging my head on this one for a while.
If it seems like it only works if the entire string is a match, first make sure that in index mapping your string property name is analysed, i.e. mapping for this property doesn't contain "index": "not_analyzed". If it isn't so, you'll need to reindex your index in order to be able to search for tokens rather than for the whole phrase only.
Once you're sure your strings are analysed you can use:
Terms query with "minimum_should_match" parameter equalling to the number of words entered.
Bool query with must clause containing term queries per each word.
Common terms query which has a nice clean syntax for this purpose (you don't need to break down input string and construct more complex query structure in your app like with previous two) in addition to taking a smarter approach to stopwords analysing.

Resources