Hibernate search : use Elastic Search decay_function in spatial() query? - elasticsearch

I have different fields to make a query from, like a title, a text, and a location. I want to put more weight on the location field. I found the decay_function from Elastic Search, which exactly answers my needs according to this example : https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_detailed_example .
However the system i'm on uses Hibernate Search queries. I found the Spatial ( https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#spatial ) filter, but is there a way to replace the "within()" by the decay_function ? Or by a personnalized function or something like that ?
I know I could use the within() but it means I would have to define specific ranges. It could work but it's not the optimal solution in my case.

Not using the DSL itself (for now, it will be possible in 6 to integrate native predicates in the DSL).
But you can create a native Elasticsearch query for this specific case as explained in our documentation: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_queries :
FullTextEntityManager fullTextEm = Search.getFullTextEntityManager(entityManager);
QueryDescriptor query = ElasticsearchQueries.fromJson(
"{ 'query': { 'match' : { 'lastName' : 'Brand' } } }");
List<?> result = fullTextEm.createFullTextQuery(query, GolfPlayer.class).getResultList();
This way you can use whatever Elasticsearch construct you see fit. And you can still use the DSL for the other supported queries.
Unrelated to your question: I saw you are a student, are you using Hibernate Search for a school project? It would be nice having Search taught in a French school :).

Related

Simple query without a specified field searching in whole ElasticSearch index

Say we have an ElasticSearch instance and one index. I now want to search the whole index for documents that contain a specific value. It's relevant to the search for this query over multiple fields, so I don't want to specify every field to search in.
My attempt so far (using NEST) is the following:
var res2 = client.Search<ElasticCompanyModelDTO>(s => s.Index("cvr-permanent").AllTypes().
Query(q => q
.Bool(bo => bo
.Must( sh => sh
.Term(c=>c.Value(query))
)
)
));
However, the query above results in an empty query:
I get the following output, ### ES REQEUST ### {} , after applying the following debug on my connectionstring:
.DisableDirectStreaming()
.OnRequestCompleted(details =>
{
Debug.WriteLine("### ES REQEUST ###");
if (details.RequestBodyInBytes != null) Debug.WriteLine(Encoding.UTF8.GetString(details.RequestBodyInBytes));
})
.PrettyJson();
How do I do this? Why is my query wrong?
Your problem is that you must specify a single field to search as part of a TermQuery. In fact, all ElasticSearch queries require a field or fields to be specified as part of the query. If you want to search every field in your document, you can use the built-in "_all" field (unless you've disabled it in your mapping.)
You should be sure you really want a TermQuery, too, since that will only match exact strings in the text. This type of query is typically used when querying short, unanalyzed string fields (for example, a field containing an enumeration of known values like US state abbreviations.)
If you'd like to query longer full-text fields, consider the MultiMatchQuery (it lets you specify multiple fields, too.)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
Try this
var res2 = client.Search<ElasticCompanyModelDTO>(s =>
s.Index("cvr-permanent").AllTypes()
.Query(qry => qry
.Bool(b => b
.Must(m => m
.QueryString(qs => qs
.DefaultField("_all")
.Query(query))))));
The existing answers rely on the presence of _all. In case anyone comes across this question at a later date, it is worth knowing that _all was removed in ElasticSearch 6.0
There's a really good video explaining the reasons behind this and the way the replacements work from ElasticOn starting at around 07:30 in.
In short, the _all query can be replaced by a simple_query_string and it will work with same way. The form for the _search API would be;
GET <index>/_search
{
"query": {
"simple_query_string" : {
"query": "<queryTerm>"
}
}
}
The NEST pages on Elastic's documentation for this query are here;

How can you match long strings?

One of the main challenges that I am facing at the moment is how to match long string applying fuzziness to them .
For example , let's say that we have the following document :
PUT my_index/type/2
{
"name":"longnameyesverylong"
}
if I apply a fuzzy search on that name , like the following :
"match": {
"name": {
"query": "longnameyesverylong",
"fuzziness": 2
}
I can find it but my goal would be to be able to open the net and allow more than two mistakes for this type of strings.
Let's say for example that I index something like :
PUT my_index/type/2
{
"name":"l1ngnam2yesver3long"
}
The previous match query won't be able to find this document, as the fuzziness is greater than 2 and that is not supported in ES.
I tried to use ngrams , but the tokens did not meet the requirement either and the index would grow too much.
The only option that I have on top of my head is to split the string manually at index time creating my "own tokenizer" and create a document that looks like
PUT my_index/type/2
{
"name":"longnamey esverylong"
}
And then , at search time , split the string again and apply a Boolean query with fuzziness on each token. This can probably do what I need , but I feel that there is probably a better solution for this problem.
Is there any other approach that you think it might be appropriate?
Thank you.
Problem solved. They key for this problem is the pattern_capture filter.

How can I find the true score from Elasticsearch query string with a wildcard?

My ElasticSearch 2.x NEST query string search contains a wildcard:
Using NEST in C#:
var results = _client.Search<IEntity>(s => s
.Index(Indices.AllIndices)
.AllTypes()
.Query(qs => qs
.QueryString(qsq => qsq.Query("Micro*")))
.From(pageNumber)
.Size(pageSize));
Comes up with something like this:
$ curl -XGET 'http://localhost:9200/_all/_search?q=Micro*'
This code was derived from the ElasticSearch page on using Co-variants. The results are co-variant; they are of mixed type coming from multiple indices. The problem I am having is that all of the hits come back with a score of 1.
This is regardless of type or boosting. Can I boost by type or, alternatively, is there a way to reveal or "explain" the search result so I can order by score?
Multi term queries like wildcard query are given a constant score equal to the boosting by default. You can change this behaviour using .Rewrite().
var results = client.Search<IEntity>(s => s
.Index(Indices.AllIndices)
.AllTypes()
.Query(qs => qs
.QueryString(qsq => qsq
.Query("Micro*")
.Rewrite(RewriteMultiTerm.ScoringBoolean)
)
)
.From(pageNumber)
.Size(pageSize)
);
With RewriteMultiTerm.ScoringBoolean, the rewrite method first translates each term into a should clause in a bool query and keeps the scores as computed by the query.
Note that this can be CPU intensive and there is a default limit of 1024 bool query clauses that can be easily hit for a large document corpus; running your query on the complete StackOverflow data set (questions, answers and users) for example, hits the clause limit for questions. You may want to analyze some text with an analyzer that uses an edgengram token filter.
Wildcard searches will always return a score of 1.
You can boost by a particular type. See this:
How to boost index type in elasticsearch?

Elasticsearch and NEST: Filtering with Lookups

I'm using Elasticsearch 1.7.x (and NEST 1.7.2) and trying to take advantage of filtering with lookups as documented here: Terms Filter Lookup. I'm able to craft the JSON by hand for the request I desire and execute it using Sense. Works great, awesome feature! However, in the NEST library, I don't see a way to create such a terms clause. For example, from the link referenced above, I can do something like:
"terms" : {
"proteins" : {
"index" : "microarrays",
"type" : "experiment",
"id" : "experiment1234",
"path" : "upregulated_proteins"
},
"_cache_key" : "experiment_1234"
}
Is there a way to build this query using NEST? If not, is there a way I can inject some JSON into a NEST query as I'm building it? I don't know if NEST 2.x+ supports this but upgrading to ES 2.x is a longer term plan for us and I'd like to leverage functionality that is already available in ES 1.7.
Awesome, I've already received an answer from Greg Marzouka of Elastic! He says:
It's mapped as TermsLookup() or TermsLookupFilter in 1.x. Check out the unit tests for some examples.
client.Search<Paper>(s => s
.Query(q => q
.Filtered(fq => qf
.Filter(f => f
.CacheKey("experiment_1234")
.TermsLookup(t => t
.Lookup<Protein>(p => p.UnregulatedProteins, "experiment1234", "microarrays", "experiment")
)
)
)
));
In 2.x it's a little more aligned with the ES query DSL.

How to use the elasticseach java api for dynamic searches?

So I'm trying to use elasticsearch for dynamic query building. Imagine that I can have a query like:
a = "something" AND b >= "other something" AND (c LIKE "stuff" OR c LIKE "stuff2" OR d BETWEEN "x" AND "y");
or like this:
(c>= 23 OR d<=43) AND (a LIKE "text" OR a LIKE "text2") AND f="text"
Should I use the QueryBuilder or the FilterBuilder, how do you match both? The official documentation says that for exact values we should use the filter approach? I assume I should use filters for equal comparisons? what about dates and numbers? Should I use the Filter or Query?
For the Like/Equals for the number/number problem I tried this:
#Field(type = String, index = FieldIndex.analyzed, pattern = "(\\d+\\/\\d+)|(\\d+\\/)|(\\d+)|(\\/\\d+)")
public String processNumber;
The pattern would deal with the structure number + slash + number, but also number and number + slash.
But when using either the term filter or the match_query I can't get only hits with the exact structure like 20/2014, if I type 20 I would still get hits on the term filter.
Query is the main component when you search for something, it takes into consideration ranking and other features such as stemming, synonyms and other things. Filter, on the other hand, just filters the result set you get from your query.
I suggest that if you don't care about the ranking use filters because they are faster. Otherwise, use query.

Resources