Elasticsearch search does not return exact results - elasticsearch

I was trying to run elasticsearch on couchdb using river plugin. Unfortunately, the number of hits i have got is the same with the total of actual documents. Say that i have 10000 documents in my couch database, however, it only returns 9340 hits in elasticsearch. Anyone knows why this problem arise? Would you mind to explain it to me please?
Regards,
Jemie Effendy

Look into the bool query. This will return results that MUST have what you've defined in your query. Also look into your mapping, Elasticsearch may be tokenising your data in a way in which you don't expect.

Related

Get last document from index in Elasticsearch

I'm playing around the package github.com/olivere/elastic; all works fine, but I've a question: is it possible to get the last N inserted documents?
The From statement has 0 as default starting point for the Search action and I didn't understand if is possible to omit it in search.
Tldr;
Although I am not aware of a feature in elasticsearch api to retrieve the latest inserted documents.
There is a way to achieve something alike if you store the ingest time of the documents.
Then you can sort on the ingest time, and retrieve the top N documents.

elasticsearch query is giving results very slow on a simple query

I am using elasticsearch to perform some aggregations. Everything used to work fine, but currently I have 2 million docs in an index. I am performing a very simple search query list all documents in a given type of a given index.
{
"size":100000,
"query":
{"match_all":{}
}
}
This query is very slow and gives about 300k hits. What could be the possible reasons for it?
NOTE: i am having 2G ram . 2 cores
You are trying to get a response with 100.000 documents in it. This is just too much. Elasticsearch is intended for paging. Paging means fetch in small chunks. You try to fetch a bulk of 100.000. There is a reason why it defaults with a size of 10.
I finally found out that this configuration is enough for my needs that is searching over 2 million documents. i was having a wrong configuration and also the method of simply doing match_all is not correct even if we have 2 million docs performing a search based on some criteria would be very fast.

how can I find related keywords with elasticsearch?

I am pretty new to elasticsearch and already love it.
Right know I am interested in understanding on how I can let elasticsearch make suggestions for similar keywords.
I have already read this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.
The More Like This Query (MLT Query) finds documents that are "like" a given set of documents.
This is already more than I am looking for. I dont need similar documents but only related / similar keywords.
So lets say I have an index of documents about movies and I start a query about "godfather". Then elasticsearch should suggest related keywords - e.g. "al pacino" or "Marlon Brando" because they are likely to occur in the same documents.
any ideas how this can be done?
Unfortunately, there is no built-in way to do that in Elastic. What you could possibly do, is to write a program, that will query Elastic, return matched documents, then you will get the _source data, or just retrieve it from your original datasource (like DB or file), later you will need to calculate TF-IDF for each term in the retrieved ones and somehow combine everything all together to get top K terms out of all returned terms.

Elasticsearch: count returning wrong value

ES 1.7.3
We have around 20M documents. Each document has a unique ID. When we do a count-request (/index/type/_count) we get around 30K less documents than we indexed.
I checked the existence of each document by making requests on the ID field. Result: there is none missing.
Is there any reasons why _count returns not the exact count?
PS: I read about estimates when doing aggregations. Is this perhaps related?
Coutn API may result in inaccurate results. You can use search_type=count instead. It works in the same way as searching works but returns only count.
Use it like
GET /index/type/_search?search_type=count
Study more about search_type here.
You can also refer to this question

Can I narrow results from Elastic Search _stats get?

I am using elastic search for the project I'm working on and I was wondering if there was a way to narrow the results I get from an indices stats search.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html
I currently use the docs to narrow the data I get back about the indices but now I want to only get back ones with a doc count greater than 0. Does anyone know if this is possible or how to?
Thanks!
For elastic search 1.5.2
If you're concerned about the size of the response (i.e. if you many many indices with many shards), the best you can do is to use response filtering (available only since ES 1.7) and only retrieve the docs field that you can further filter on the client-side:
curl 'localhost:9200/_stats/docs?pretty&filter_path=**.docs.count'

Resources