Sorting with only by term frequency in elasticsearch

Sorting with only by term frequency in elasticsearch - sorting

I have users with fields city, country, followersAmount and some others. When I search by "New York, USA" in city and country fields with sorting by followers amount, I need firstly display people from "New York, USA" sorted by followersAmount descending, and after them i need display people from other cities from USA sorted also by followersAmount descending. I think i can do it with scoring only by term frequency and sorting firstly by score, secondly by followers amount, but I cannot found how can i configure that.

What about something like this:
{
"query" : {
"bool" : {
"should" : [
{
"constant_score" : {
"query" : {
"match" : {
"city" : "New York"
}
}
}
},
{
"constant_score" : {
"query" : {
"match" : {
"country" : "USA"
}
}
}
}
]
}
},
"sort" : [
"_score",
{ "followersAmount" : { "order" : "desc"} }
]
}
You can expect the people from "New York, USA" to get the same score. The people not from New York but from USA will get the same score which is lower. For those with the same score they will be sorted by followersAmount. Of course this is just a initial query to get you started - might need more tweaks and stuff.
EDIT: Updated with constant_score
I expected the basic TF-IDF algorithm and the incorporation of field length to help out. Generally, I would expect the cities' terms to have a larger associated IDF when compared to the countries' terms. So having the higher scores for city match seems desirable. In terms of TF and field length norms, scoring a person with only a single matching city higher than a person with say two cities (if you happen to have arrays for these fields to allow multiple cities) also seems favorable. But then, I am not sure what your data looks like. I have updated the query so that Elasticsearch's basic algorithm does not have such an impact using constant_score query.

Related

Filter on score after rescore in Elasticsearch

I have been on an internet manhunt for days for this and getting ready to give up. I need to filter on _score in Elasticsearch after the rescore function has completed. So given an example query like this:
POST /_search
{
"query" : {
"match" : {
"message" : {
"operator" : "or",
"query" : "the quick brown"
}
}
},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"match_phrase" : {
"message" : {
"query" : "the quick brown",
"slop" : 2
}
}
},
"query_weight" : 0.7,
"rescore_query_weight" : 1.2
}
}
}
Say just for simplicity's sake that the above returns 5 documents with scores ranging from 0.0 to 1.0. I want the final returned results set to only be the documents with a score above 0.90. In other words, take those newly-rescored docs, and hand them off to a filter where it drops all documents scored below 0.90.
I have tried many, many different ways but nothing is working. Post_filter is apparently meant to come after the main query but before rescore, so that one doesn't work. min_score does not work at all with rescore, it only works with the original ES scores from the main query. Aggs is one functionality that I am able to get to work after rescore, but aggregating is not what I need to do here. But at least it shows me that ES has the ability to continue operating on the data after a rescore query.
Any thoughts on how to get this seemingly simple task accomplished? I have also tried using function_score and script_score but really those are just ways to further modify the scores, whereas I need to filter on the scores generated by the rescore. The requirement here is to get it done in the query. We can't do it as a post-processing step.

Elasticsearch 6.5 query scoring changed, how do we get the ES 5 type results?

I am making a recommender with Elasticsearch. I know what people have bought and this forms the query. The index is of items and has a field that contains items bought in common.
We were using ES 5 and the following query finds the highest score, meaning items that have the most in common with the query. But this query in ES 6 returns only score = 1.0 and so no longer find the most similar items.
{
"query": {
"bool": {
"should": [
{
"terms": {
"bought": [
"iPad Pro",
"iPhone 8"
]
}
}
]
}
}
}
How do we get the same results with an ES 6 query?

It’s listed as a breaking change in the Elasticsearch 6.0. Basically, terms query now always return scores equal to 1
Unfortunately, as I stated already, it would be very difficult to have exactly the same behaviour, but according to your logic in question - I would recommend to use boolean query, e.g.
{
"query": {
"bool" : {
"should" : [
{ "term" : { "bought" : "Ipad PRO" } },
{ "term" : { "bought" : "Iphone XS" } }
]
}
}
}
In this case you would be able to mimic the same terms query behaviour, but also keep the score related to exactly logic you want. If person just bought 1 thing out of 2 score will be less`

Elastic Search: find document by analyzed field only if all words from field are contained in query

Let's consider following case - I have list of some video titles and video categories which are indexed in elastic.
And task is to match video with categories.
I have three categories in the index:
{
"category" : "rock climbing"
...
},
{
"category" : "rock and roll"
...
}
And
{
"category" : "outdoor"
...
}
"category" field is mapped to analyzed string and uses multiple analyzers.
So inverted index for first document is "rock, climb", second - "outdoor" and for the last one is "rock, roll".
Now I want to find all available categories to video with title "outdoor rock climbing in national park"
So I need to receive "rock climbing" and "outdoor"
I am trying to run following query
{
"query": {
"match": {
"category": "outdoor rock climbing in national park"
}
}
}
And it returns all documents, because it founds documents by occurrences of "outdoor, rock and climb".
But I want query to return only those documents where query contains all inverted index words.
Analyzed query: outdoor,rock,climb,nation,park
Expected result: outdoor, rock+climb
I am beginner in the elastic search, please help.
Thanks in advance!

Elastic Search boost query corresponding to first search term

I am using PyElasticsearch (elasticsearch python client library). I am searching strings like Arvind Kejriwal India Today Economic Times and that gives me reasonable results. I was hoping I could increase weight of the first words more in the search query. How can I do that?
res = es.search(index="article-index", fields="url", body={
"query": {
"query_string": {
"query": "keywordstr",
"fields": [
"text",
"title",
"tags",
"domain"
]
}
}
})
I am using the above command to search right now.

split given query into multiple terms. In your example it will be Arvind, Kejriwal... Now form query string queries(or field query or any other which fits into the need) for each of the given terms. A query string query will look like this
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-query-string-query.html
{
"query_string" : {
"default_field" : "content",
"query" : "<one of the given term>",
"boost": <any number>
}
}
Now you have got multiple queries like above with different boost values(depending upon which have higher weight). Combine all of those queries into one query using BOOL query. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
If you want all of the terms to be present in the result, query will be like this.
{
"bool" : {
"must" : [q1, q2, q3 ...]
}
}
you can use different options of bool query. for example you want any of 3 terms to present in result then query will be like
{
"bool" : {
"should" : [q1, q2,q3 ...]
},
"minimum_should_match" : 3,
}

theoretically:
split into terms using api
query against terms with different boosting

Lucene Query Syntax does the trick. Thanks
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Boosting%20a%20Term

Full-text schema in ElasticSearch

I'm (extremely) new to ElasticSearch so forgive my potentially ridiculous question. I currently use MySQL to perform full-text searches, and want to move this to ElasticSearch. Currently my table has a fulltext index spanning three columns:
title,description,tags
In ES, each document would therefore have title, description and tags fields, allowing me to do a fulltext search for a general phrase, or filter on a given tag.
I also want to add further searchable fields such as username (so I can retrieve posts by a given user). So, how do I specify that a fulltext search should match title OR description OR tags but not username?
From the OR filter example, I'd assume I'd have to use something like this:
{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"term" : { "title" : "foobar" }
},
{
"term" : { "description" : "foobar" }
},
{
"term" : { "tags" : "foobar" }
}
]
}
}
}
Coming at this new, it doesn't seem like this is very efficient. Is there a better way of doing this, or do I need to move the username field to a separate index?

This is fine.
I general I would suggest getting familiar with ElasticSearch mapping types and options.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio