how to eliminate duplicate's documents while searching in elasticsearch search server - elasticsearch

>
Is there any technique To eliminate duplicate documents while search in elastic-search.how to compare the values among the different documents in the search results.is any script available.
>

You can use the More Like This API to look for documents that match a specified documents field values. Some customization maybe required.

Related

How to get all the index patterns which never had any documents?

For Kibana server decommissioning purposes, I want to get a list of index patterns which never had any single document and had documents.
How to achieve this using Kibana only?
I tried this but it doesn't give the list based on the document count.
GET /_cat/indices
Also in individual level getting the count to check the documents are there is time consuming .
GET index-pattern*/_count
You can try this. V is for verbose and s stands for sort.
GET /_cat/indices?v&s=store.size:desc
From the docs :
These metrics are retrieved directly from Lucene, which {es} uses internally to power indexing and search. As a result, all document counts include hidden nested documents.

Elastic search/lucene index on multiple words?

When I search say car engine(this is first time any user has searched for this keyword) in Elastic search/lucene , does search engine search the index for individual words in index table first and then find intersection. For example :- Say engine found the 10
documents for car and then it will search for engine say it got 5 documents. Now in 5 documents(minimal no of documents), it will search for car. It has found 2 documents.
Now search engine will rank it based on above results . Is this how multiple words are searched in index table at high level ?
For future searches against same keyword, does search engine make new entry for key car engine in index table ?
Yes, it does search for individual terms and takes the intersection or union of the results, according to your query. It uses something called an "inverted index" which it generates, as and when the documents to be searched are "indexed" into elasticsearch.
Indexing operations are different from searching. So, No, it wouldn't index user searches unless you tell it to (in your application).
The basic functioning of elasticsearch can be split into two parts:
Indexing. You create an index of documents by indexing all the documents that you want to search in. These documents could be anything from your MySQL store, or from Logstash etc, or could be made up of users' search queries that your application indexes into a relevant elastic index.
Searching. You search for the indexed documents using some keywords that could be user generated or application generated or a mixture, using ElasticSearch queries (DSL). If a result is found (according to your query) then elasticsearch returns the relevant records.
I'd encourage you to read this doc for a better understanding of how elastic searches docs:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

how can I find related keywords with elasticsearch?

I am pretty new to elasticsearch and already love it.
Right know I am interested in understanding on how I can let elasticsearch make suggestions for similar keywords.
I have already read this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.
The More Like This Query (MLT Query) finds documents that are "like" a given set of documents.
This is already more than I am looking for. I dont need similar documents but only related / similar keywords.
So lets say I have an index of documents about movies and I start a query about "godfather". Then elasticsearch should suggest related keywords - e.g. "al pacino" or "Marlon Brando" because they are likely to occur in the same documents.
any ideas how this can be done?
Unfortunately, there is no built-in way to do that in Elastic. What you could possibly do, is to write a program, that will query Elastic, return matched documents, then you will get the _source data, or just retrieve it from your original datasource (like DB or file), later you will need to calculate TF-IDF for each term in the retrieved ones and somehow combine everything all together to get top K terms out of all returned terms.

Can ElasticSearch create/store just the indexes while leaving the source document where it is?

Assuming I already have a set of documents living in some document store can I have ElasticSearch create its indexes and store them in its various replicated nodes while leaving the documents themselves where they are? In other words can I use ES just for search and not for storage? (I understand this might not be ideal but assume there are good reasons I need to keep the documents themselves where they are).
If I take this approach does it remove any functionality from search, for example showing where in a document the search term was found?
Thanks.
The link Konstantin referenced should show you how to disable _source.
There is another way to store fields (store=true). You are better off using _source and excluding any specific fields you don't want stored as part of _source, though.
Functionality removed:
Viewing fields that are returned from search
Highlighting
Easily rebuilding an index from _source. Probably not an issue, since data is stored elsewhere
There are probably other features I am missing.
The only case I've come across where I really don't need _source is when building an analytics engine where I am only returning aggregates (term and histogram).

retaining case in elasticsearch faceted search

Is there a way to do faceted searches using the elasticsearch Search API maintaining case (as opposed to having the results be converted to lowercase).
Thanks in advance, Chuck
Assuming you are using the "terms" facet, the facet entries are exactly the terms in the index. Briefly, analysis is the process of converting a field value into a sequence of terms, and lowercasing is a step in the default analyzer; that's why you're seeing lowercased terms. So you will want to change your analysis configuration (and perhaps introduce a multi_field if you want to run several different analyzers.)
There's a great explanation in Lucene in Action (2nd Ed.); it's applicable to ElasticSearch, too.

Resources