How to return term scores in elasticsearch? - elasticsearch

How do I get the term vector for an indexed document in elasticsearch?
That is, once I have uploaded several documents to my elasticsearch index, I would like to get the scored term vectors back so that I can see which which terms are over indexed for a given document and thus for a document show the most influential terms.
Is this possible?

You can achieve that using the Term Vectors API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html
Say you already indexed several documents. Then it's possible to get the term vectors using the following API call:
curl -XGET 'http://localhost:9200/twitter/tweet/<id>/_termvectors?pretty=true'
Just replace <id> with the id of the document you want create the query for.

Related

Elastic Search and Search Ranking Models

I am new to Elastic Search. I would like to know if the following steps are how typically people use ES to build a search engine.
Use Elastic Search to get a list of qualified documents/results based on a user's input.
Build and use a search ranking model to sort this list.
Use this sorted list as the output of the search engine to the user.
I would probably add a few steps
Think about your information model.
What kinds of documents are you indexing?
What are the important fields and what field types are they?
What fields should be shown in the search result?
All this becomes part of your mapping
Index documents
Are the underlying data changing or can you index it just once?
How are you detecting new docuemtns/deletes/updates?
This will be included in your connetors, that can be set up in multiple ways, for example using the Documents API
A bit of trial and error to sort out your ranking model
Depending on your use case, the default ranking may be enough.
have a look at the Search API to try out different ranking.
Use the search result list to present the results to the end user

Elasticsearch: get a list of the terms that were matched in each result

How can I get the list of terms that elasticsearch matched in each result? I know the highlight contains this but I want to get a list of the terms that were found without manually performing postprocessing on the highlight for each result.
You could use named queries with unique query for each term.
Search result will contain matched queries for each document in result.

How can I get response(which is searched by score) in Elasticsearch?

I need your help. I want to a search which can be search by common conditions and its score range also used as conditions。Can I do it successfully? if you know ,I hope you can share.
I have a example in the picture:
In the picture,we know the score range is [0,1] ,if I want to get response which scores is [0.2,0.6],How do it! help! SOS! Execute my English!
Elasticsearch provides a min_score field that can be included in a request body search to filter out documents with a _score less than a specified value.
There is no way to filter out documents with a _score greater than a certain value, but: why do you want to do this? Scores in Lucene by definition mean that documents were found matching your search query, and that some results are more relevant than others. I recommend that you read "What is Relevance?" in the Elasticsearch documentation, and "Apache Lucene - Scoring" for a basic understanding of how the scoring formula works.
Also, the Lucene score range isn't always [0,1]: it can be greater than 1.

How fields are associtated with terms in inverted index in elasticsearch?

As per my understanding, elasticsearch uses a structure called inverted index to provide full text search. It is clear that inverted index has terms and ids of the documents which has that term but the document can have any number of fields and the field name can be used in the query time to look/search only on that field. In that case how elasticsearch restricts/limits search only to a particular field? I would like to know if inverted index contains fields name or field id along with terms and document id.
Similar thing happens when you sort based on any field. So there could be a way to associate terms with field names. Please help me understand the intricacies involved here.
Thanks in advance.
I would like to know if inverted index contains fields name or field id
along with terms and document id.
Quoting from Lucene Docs
The same string in two different fields is considered a different term. Thus terms are represented as a pair of strings, the first naming the field, and the second naming text within the field.
In that case how elasticsearch restricts/limits search only to a
particular field?
Each segment index maintains Term Vectors : For each field in each document, the term vector is stored. A term vector consists of term text and term frequency.
Hence, the indexes are maintained for each field in each document.
We have a inverted index per field per index.
And there is something called field data cache ( or doc values ) which has the inverted "inverted index". All doc to field value lookup happens here.
I was also having this question
I can share my understanding here with you.
Elasticsearch creates an inverted index for each full-text field of the document. So if an index has 10 fields that allow full-text search then Elasticsearch will create 10 different inverted index for the 10 fields and store the analyzer results in those inverted indices for each field.
Thus when you perform a search operation and specify what all fields you want to search then Elasticsearch will search on the inverted indices of those specific fields only
Thus to summarize, an inverted index is created at the field level.
I hope that helps
Thanks

Grouping documents based on named and Lat ,long in Elastic Search

I wanted to group documents based on Name and Lat, Lang in Elastic Search.I explored the aggregations API but it gives only a count for a specific criteria not the actual documents.Is there a way in which we can do this in Elastic Search
you could use nested aggregations - something like aggregate by name, _id. And use second query to get document by ids.

Resources