Intersectional search from two indices - elasticsearch

I have two indices in Elastic: Track and Messages.
Each index has ServiceId field.
Search in the Tracks index is always performed by fields (exact value) in the index.
Search in the Messages is always performed by free text search in the index.
I need to implement an intersectional search from the both indices. To find ServiceIds from Tracks index by exact fields values and then find ServiceIds from Messages index by free text search and then to intersect the results = array of ServiceIds.
The actual result to the user is the list of Tracks by these ServiceIds received from the intersect action.
How can I do it? Each search (from Tracks and Messages) should return all the documents and only then intersect should be performed but as I understand Elastic can not return all the documents (suppose I have hundred of millions documents in each index) and intersect action is not correct...
Intersectional search from two indices

It could be achieved by using aliasing. Refer https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html for more information.
You should have some common fields in both your indices, in this case, it is serviceIds.

Related

ElasticSearch: how to search from multiple indexes

I have a situation where I need to search from multiple indexes (products and users). Below is a sample query I am using to do that search
http://localhost:9200/_all/_search?q=*wood*
http://localhost:9200/users,products/_search?q=*wood*
With the above API request, it only returns search results for the product index. But if I search using the below API it returns search results for users index
http://localhost:9200/users/_search?q=*wood*
As you can see I am passing same value for "q" parameter. I need to search for both product and users index and check if there is the word "wood" in any attribute in both indexes. How can I achieve this
You can pass multiple index names instead of _all as it will search in other indices that you don't intent to by using the comma seprated index name like
http://localhost:9200/users,products/_search?q=*wood*
Although, _all should also fetch the result from users index which you get when you specify its name, you need to debug why its happening, maybe increase the size param to 1000 as by default Elasticsearch returns only 10 results and it seems in case of _all all the top results coming from products index only.

How to retrieve and list the first element of a field use Elasticsearch query (two compare and find end deleted duplicated documents in same index)?

in my elasticsearch index all logs have a field called RES and the structure look like this :
Number:"12131", amount:8, referenceNumber:"140102129728883", expire:"1365", securityControl:0
I want to compare number in all indexed documents and delete duplicated documents.
can anybody help me?

screen out document results that share the same property value accept the first one

I have a db of documents. Every document has a property(keyword) called index (noting to do with the elastic index) and a property(keyword) named superIndex. There can be multiple documents with the same index and multiple documents with the same superIndex in the DB, these fields are not unique.
I run a compound query searching free text on the text content of these documents, with sorting, and get the results I want. However, I get many documents having the same index and/or superIndex. Currently I programmatically filter the result list and take only the first result from each index and superIndex. My requirement is that at the end I'm left with the top results from the sort, the first from each index and superIndex.
Can this be done using elastic query. If so how?
Field collapsing allows you to collapse all search results having the same value in a field (e.g. index). (See Elasticsearch Reference: Field Collapsing)

Elasticsearch extract/add id's from multiple queries

I have multiple queries that need to filter data on elasticsearch. This queries are returning document ids from indexes that match the filter.
However i need to do another operation depending from user selection, to extract/add document unique id's from previous sum of queries with current query. The maximum number of query search is 5.
Is there an option in elastic so it will extract/add document id's from previous query? Right now i am doing this part in PHP with foreach iteration that takes a lot of time.
Edit
Example :
Ok let say we have one query on same index that contains :
{"query":{"bool":{"filter":[{"wildcard":{"182_empanalyzed":"example"}}]}}}
we will need to substract the document ids from the following query on same index :
{"query":{"bool":{"must_not":[{"nested":{"path":"184","query":{"exists":{"field":"184.*"}}}}]}}}
Keep in mind that this queries are example with only one condition in it, there might be more complexes queries with many fields to be searched on in each query. And from each following query there is an option to substract/add documents ids

How fields are associtated with terms in inverted index in elasticsearch?

As per my understanding, elasticsearch uses a structure called inverted index to provide full text search. It is clear that inverted index has terms and ids of the documents which has that term but the document can have any number of fields and the field name can be used in the query time to look/search only on that field. In that case how elasticsearch restricts/limits search only to a particular field? I would like to know if inverted index contains fields name or field id along with terms and document id.
Similar thing happens when you sort based on any field. So there could be a way to associate terms with field names. Please help me understand the intricacies involved here.
Thanks in advance.
I would like to know if inverted index contains fields name or field id
along with terms and document id.
Quoting from Lucene Docs
The same string in two different fields is considered a different term. Thus terms are represented as a pair of strings, the first naming the field, and the second naming text within the field.
In that case how elasticsearch restricts/limits search only to a
particular field?
Each segment index maintains Term Vectors : For each field in each document, the term vector is stored. A term vector consists of term text and term frequency.
Hence, the indexes are maintained for each field in each document.
We have a inverted index per field per index.
And there is something called field data cache ( or doc values ) which has the inverted "inverted index". All doc to field value lookup happens here.
I was also having this question
I can share my understanding here with you.
Elasticsearch creates an inverted index for each full-text field of the document. So if an index has 10 fields that allow full-text search then Elasticsearch will create 10 different inverted index for the 10 fields and store the analyzer results in those inverted indices for each field.
Thus when you perform a search operation and specify what all fields you want to search then Elasticsearch will search on the inverted indices of those specific fields only
Thus to summarize, an inverted index is created at the field level.
I hope that helps
Thanks

Resources