query the first element of a list in ElasticSearch - elasticsearch

In my Elasticsearch index I have fields that are lists of strings:
"city" = ["Boston","NY","Chicago"]
I need to write a query that searches only the first element of the list.
I have accomplished this by adding a new field that contains only the first element.
"city_first"="Boston"
I like to avoid creating a new field. Is there a way to write a query that searches only the 1st element of the list in Elasticsearch?

Related

How to filter text array with items in list case insensitively in Supabase?

I have a text array (text[]) in Supabase. Let's say it's like this:
["Car", "Red", "New"]
I would like to get this row by filtering with list of words. So when I query new or car or red or any combination of these words, I should get the row with the array above. In other words, querying text[] with list of items.
So this query with ["new", "red"] should return the column above.
Any idea how to do that?

Searching HTML content in elastic search

I am storing html data in elastic search. On search of word or phrase if records having match data then list out all match records inclusive of pre tag and post tag for the match words to be highlighted. Below is the example of my query.
< p>This is paragrpah tag< span>This is span tag</ span>< /p>
Search word : "span tag"
What query I should pass to get list of records that match the search word.

Elastic Search - Tokenization and Multi Match query

I need to perform tokenization and multi match in a single query in Elastic Search.
Currently,
1)I am using the analyzer to get the tokens like below
String text = // 4 line log data;
List<AnalyzeToken> analyzeTokenList = new ArrayList<AnalyzeToken>();
AnalyzeRequestBuilder analyzeRequestBuilder = this.client.admin().indices().prepareAnalyze();
for (String newIndex : newIndexes) {
analyzeRequestBuilder.setIndex(newIndex);
analyzeRequestBuilder.setText(text);
analyzeRequestBuilder.setAnalyzer(analyzer);
Response analyzeResponse = analyzeRequestBuilder.get();
analyzeTokenList.addAll(analyzeResponse.getTokens());
}
then, I will iterate through the AnalyzeToken and get the list of tokens,
List<String> tokens = new ArrayList<String>();
for (AnalyzeToken token : tokens)
{
tokens.addAll(token.getTerm().replaceAll("\\s+"," "));
}
then use the tokens and frame the multi-match query like below,
String query = "";
for(string data : tokens) {
query = query + data;
}
MultiMatchQueryBuilder multiMatchQueryBuilder = new MultiMatchQueryBuilder(query, "abstract", "title");
Iterable<Document> result = documentRepository.search(multiMatchQueryBuilder);
Based on the result, I am checking whether similar data exists in the database.
Is it possible to combine as single query - the analyze and multi match query as single query?
Any help is appreciated!
EDIT :
Problem Statement : Say I have 90 entries in one index, In which each 10 entries in that index are identical (not exactly but will have 70% match) so I will have 9 pairs.
I need to process only one entry in each pair, so I went in the following approach (which is not the good way - but as of now I end up with this approach)
Approach :
Get each entry from the 90 entries in the index
Tokenize using the analyzer (this removes the unwanted keywords)
Search in the same index (It checks whether the same kind of data is there in the index) and also filters the flag as processed. --> this flag will be updated after the first log gets processed.
If there is no flag available as processed for the similar kind of data (70% match) then I will process these logs and update the current log flag as processed.
If any data already exist with the flag as processed then I will consider this data is already processed and I will continue with the next one.
So Ideal goal is to, process only one data in the 10 unique entries.
Thanks,
Harry
Multi-match queries internally uses the match queries which are analyzed means they apply the same analyzer which is defined in the fields mapping(standard) if there is no analyzer defined.
From the multi-match query doc
The multi_match query builds on the match query to allow multi-field
queries:
Also, accepts analyzer, boost, operator, minimum_should_match,
fuzziness, lenient, as explained in match query.
So what you are trying to do is overkill, even if you want to change the analyzer(need different tokens during search time) then you can use the search analyzer instead of creating tokens and then using them in multi-match query.

Elasticsearch : Query on one of the fields given in the list

Elasticsearch has documents indexed with the following fields:
{"id":"1", "title":"test", "locale_1_title":"locale_test"}
Given a query, following behaviour is needed at each document level:
1) If locale_1_title field is not empty(""), search only on locale_1_title field. Do not search on title field.
2) If locale_1_title field is empty, search on title field.
What can be a simple elasticsearch query to get the above behaviour ?

Sum of total tokens in array

I have a document as below -
{
"array" : [ "Aone" , "Btwo" , "Aone" ]
}
I need to aggregate the sum of number of elements in array using aggregation.
value_count is giving me the unique tokens , but that is not what i am looking for.
First you need to make array a multi field with a new field called numOfTokens . Declare this field as token count.
You can find more about it here -http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count
This will create an addition field called array.numOfTokens per document that will have the number of tokens for that field.
Next you can do a simple sum aggregation on that field using - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html#search-aggregations-metrics-sum-aggregation

Resources