Querying Elasticsearch using array of values - elasticsearch

I index items in elasticsearch where in each item has these properties:
tags - array of strings eg. [ 'c++', 'java', 'python' ]
submitter_id - uuid
id - uuid
Also i have user who has these properties:
tags - array of strings
following_ids - array of uuids
What i want to do is query elasticsearch for items where tags match tags of the user or submitter_id is one of user's following_ids, also i boost fields. Right now i form the query like this
"should"=>[{"match"=>{"tags"=>{"query"=>"yoga", "boost"=>3}}}, {"match"=>{"tags"=>{"query"=>"yogic technique", "boost"=>3}}},
{"match"=>{"tags"=>{"query"=>"lag jaa gale", "boost"=>3}}}, {"match"=>{"tags"=>{"query"=>"jonita gandhiband", "boost"=>3}}}
{"match"=>{"submitter_id"=>"fc8b720f-a306-4849-8bc1-38fafae7c92b"}},
{"match"=>{"submitter_id"=>"c35ec42f-2df0-4870-89a4-9e59c9df04ea"}}]
But if the user has a lot of tags or following_ids, i would soon run into maximum clauses limit. How should i handle this ?

Since you're looking for the exact ids and tags you should be using the Terms Query anyway but the added advantage for you in this case is that it allows you to give multiple terms so you would only need 1 clause for all your tags and 1 for your user ids.

Related

Can't get Power Automate Filter Array to render distinct result

I've got 2 Arrays. One of the Arrays is a list of Office 365 Contacts. The other Array contains a list of Customers from an API. I've used a Select statement on my Contacts Array, so that it is in the same format as my Customer Array. When I create the following Filter Array, I never get the distinct item I'm looking for.
Here is the Code when I use Peek Code.
{
"inputs": {
"from": "#variables('Customers Arrary')",
"where": "#not(contains(variables('Contacts Array'), item()))"
},
"metadata": {
"operationMetadataId": "9ab3697e-6f5c-41b4-b94e-41641ce3dacf"
}
}
The result I'm looking for is to find all the items in the Customer Array that don't match to anything in the Contacts Array. The unique identifiers are email address and mobile number.
Any help is much appreciated! Thanks in advance!
I finally got this to work. In order for this to work, the data sets have to match. For example, if you are comparing two data sets against an email address, that's the only data you can have, and you better not have any null values. Here is an example filter I created. The JSON from the two arrays only contains email address values.
So, make sure and remove any data that does not appear in both data sets. I would also remove any items that contain null values.

How to have one common filter for different fields in Kibana?

I have an index with 2 index pattern using alias.
Example:
Index Name: my_index
Fields: sender_name, receiver_name, item_name
Alias: my_index_alias_1, my_index_alias_2
Index Patterns: my_index_alias_1, my_index_alias_2
I have a dashboard with two data tables using my_index_alias_1 and my_index_alias_2.
Same person can also be sender and receiver but there should be only one filter to select the user.
Example:
If a user named Bob is filtered.
my_index_alias_1 Data Table should filter by received_name
my_index_alias_2 Data Table should filter by sender_name
I don't want do have duplicate index, so I think scripted field is the better option.
But scripted field can solve this only when I can access the alias name using doc_value, so then I can write condition like the below Pseudocode
if doc['_alias'].value=='my_index_alias_1' then doc['received_name'].value
if doc['_alias'].value=='my_index_alias_2 ' then doc['sender_name'].value

Hashing methodology for collection of strings and integer ranges

I have a data, for example per the following:
I need to match the content with the input provided for the Content & Range fields to return the matching rows. As you can see the Content field is a collection of strings & the Range field is a range between two numbers. I am looking at hashing the data, to be used for matching with the hashed input. Was thinking about Iterating through the collection of individual strings hashcode & storing it for the Content field. For the Range field I was looking at using interval trees. But then the challenge is when i hash the Content input & Range input how will i find if it that hashcode is present in the hashcode generated for the collection of strings in the Content fields & the same for the Range fields.
Please do let me know if there are any other alternate ways in which this can be achieved. Thanks.
There is a simple solution to your problem: Inverted Index.
For each item in content, create the inverted index that maps 'Content' to 'RowID', i.e. create another table of 2 columns viz. Content(string), RowIDs(comma separated strings).
For your first row, add the entries {Azd, 1}, {Zax, 1}, {Gfd, 1}..., {Mni, 1} in that table. For the second row, add entries for new Content strings. For the Content string already present in the first row ('Gfd', for example), just append the new row id to the entry you created for first row. So, Gfd's row will look like {Gfd, 1,2}.
When done processing, you will have the table that will have 'Content' strings mapped to all the rows in which this content string is present.
Do the same inverted indexing for mapping 'Range' to 'RowID' and create another table of Range(int), RowIDs(comma seperated strings).
Now, you will have a table whose rows will tell which range is present in which row ids.
Finally, for each query that you have to process, get the corresponding Content and Range row from the inverted index tables and do an intersection of those comma seperated list. You will get your answer.

Search Multiple Indexes with condition

Here is requirement I am working on
There are multiple indexes with name content_ssc, content_teal, content_mmy.
These indexes can have common data (co_code is one of the field in the documents of these indexes)
a. content_ssc can have documents with co_code = teal/ssc/mmy
b. content_mmy can have documents with co_code = ssc/mmy
I need to get the data using below condition (this is one of the approach to get the unique data from these indexes)
a. (Index = content_ssc and site_code = ssc) OR (Index = content_mmy and site_code = mmy)
Basically I am getting a duplicate data from these indexes currently so I need any solution which should fetch unique data from these indexes using the above condition.
I have tried using boolean query with multiple indices from this link but it didn't produce unique result.
Please suggest.
You can use distinct query , and you will get unique result

Searching without duplication - aggregations and tophit

I am beginning with ElasticSearch and really like it, hovewer I am stuck with quite simple scenario.
I am indexing such structure of a Worker:
NAME SURENAME ID AGE SEX NAME_SURENAME BIRTH_DATE
NAME_SURENAME - not analyzed - this field is indexed for grouping purposes
NAME, SURENAME - analyzed
The task is simple - search 5 unique workers sorted by birth_date (unique means the same name and surename, even if they are in different age and are different people)
I read about aggregation queries and as I understand, I can get only aggregations without documents. Unfortunatelly I aggregate by name and surename so I won't have other fields in results in buckets, like for example document ID field at least. But I also read about TopHit aggregation, that it returns document, and i tried it - the second idea below.
I have two ideas
1) Not use aggregations, just search 5 workers, filter duplicates in java and again search workers and filter duplicates in Java till I reach 5 unique results
2) Use aggregations. I event tried it like below, it even works on test data but since it is my first time, please advice, whether it works accidentially or it is done correctly? So generally I thought I could get 5 buckets with one TopHit document. I have no idea how TopHit document is chosen but it seems to work. Below is the code
String searchString = "test";
BoolQueryBuilder query = boolQuery().minimumNumberShouldMatch(1).should(matchQuery("name", searchString).should(matchQuery("surename", searchString));
TermsBuilder terms = AggregationBuilders.terms("namesAgg").size(5);
terms.field("name_surename");
terms.order(Terms.Order.aggregation("birthAgg", false)).subAggregation(AggregationBuilders.max("birthAgg")
.field("birth_date")
.subAggregation(AggregationBuilders.topHits("topHit").setSize(1).addSort("birth_date", SortOrder.DESC));
SearchRequestBuilder searchRequestBuilder = client.prepareSearch("workers")
.addAggregation(terms).setQuery(query).setSize(1).addSort(SortBuilders.fieldSort("birth_date")
.order(SortOrder.DESC));
Terms aggregations = searchRequestBuilder.execute().actionGet().getAggregations().get("namesAgg");
List<Worker> results = new ArrayList<>();
for (Terms.Bucket bucket : aggregations.getBuckets()) {
Optional<Aggregation> first = bucket.getAggregations().asList().stream().filter(aggregation -> aggregation instanceof TopHits).findFirst();
SearchHit searchHitFields = ((TopHits) first.get()).getHits().getHits()[0];
Transformer<SearchHit, Worker> transformer = transformers.get(Worker.class);
Worker transform = transformer.transform(searchHitFields);
results.add(transform);
}
return results;//

Resources