How to filter a query using another query in ElasticSearch - elasticsearch

Given the example user and product docs below:
{
"_id": "1",
"_type": "user",
"_source": {
"id": "1",
"following": ["2", "3", ... , "10000"]
}
{
"_id": "1",
"_type": "product",
"_source": {
"id": "1",
"owner_id": "2"
}
{
"_id": "2",
"_type": "product",
"_source": {
"id": "2",
"owner_id": "10001"
}
I want to get the products that belongs to the users who are followed by user with id=1. I don't want to make 2 different queries (first for getting the users followed by user id=1 and then second for getting the products) since user id=1 is following ~10000 users.
Is there any way of getting the result using only one query?

I think you're looking for the terms query:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-dsl-terms-query.html
Check out their documentation, their example is very similar to your case:
GET /product/_search
{
"query" : {
"terms" : {
"owner_id": {
"index": "<your_index>",
"type": "user",
"id": "1",
"path": "following"
}
}
}
}

Related

Difference between match vs wild card query

What is the difference between the Match and Wild card query? If the requirement is to search a combination of words in a paragraph or log which approach is better?
Match query is used to find all those documents that have the exact search term (ignore the case), whereas Wildcard query returns the documents that contain the search term.
Adding a working example
Index Data:
{
"name":"breadsticks with soup"
}
{
"name":"multi grain bread"
}
Search Query using Match query:
{
"query": {
"match": {
"name": "bread"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"name": "multi grain bread"
}
}
]
Search Query using wildcard query:
{
"query": {
"wildcard": {
"name": "*bread*"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "multi grain bread"
}
},
{
"_index": "67706115",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "breadsticks with soup"
}
}
]

Elastic search - query-string - return result based on custom order

Below search query result provides data based in an order when the search keywords are more than one.
{
"query": {
"query_string" : {
"query" : "(Sony Music) OR (Sony Music*) OR (*Sony Music) OR (*Sony Music*)",
"fields" : ["MDMGlobalData.Name1"]
}
}
}
Exact Matches first.
Then, show those that start with search term.
Then, show those that end with search term.
Then, show the remainder.
But if its just one word, say sony in query data. The order is messed up.
Someone please let me why this is happening? and what's the best approach to have above ordered results using query-string search.
When you only query sony, it should have the lowest score. Is that not what you expect? By default, the query string does seem to take into consideration the order of the OR clauses so I'd say yours is already pretty optimized.
Have you tried tinkering w/ the default_operator option?
Also, what do you mean by sony "being in the query data"? The query string itself or a document whose field MDMGlobalData.Name1 is sony?
But if its just one word, say sony in query data. The order is messed
up.
Based on your above statement and the comment which you mentioned in the above answer
Adding Working example with sample docs, and search query
Index Sample Data:
{
"MDMGlobalData":{
"name":"Sony Music"
}
}
{
"MDMGlobalData":{
"name":"Sony Music Corp"
}
}
{
"MDMGlobalData":{
"name":"All Sony Music Corp"
}
}
{
"MDMGlobalData":{
"name":"Sony"
}
}
Search Query:
{
"query": {
"query_string": {
"query": "(Sony) OR (Sony*) OR (*Sony) OR (*Sony*)",
"fields": [
"MDMGlobalData.name"
]
}
}
}
Search Result:
"hits": [
{
"_index": "foo1",
"_type": "_doc",
"_id": "4",
"_score": 3.1396344,
"_source": {
"MDMGlobalData": {
"name": "Sony"
}
}
},
{
"_index": "foo1",
"_type": "_doc",
"_id": "1",
"_score": 3.114749,
"_source": {
"MDMGlobalData": {
"name": "Sony Music"
}
}
},
{
"_index": "foo1",
"_type": "_doc",
"_id": "2",
"_score": 3.097392,
"_source": {
"MDMGlobalData": {
"name": "Sony Music Corp"
}
}
},
{
"_index": "foo1",
"_type": "_doc",
"_id": "3",
"_score": 3.084596,
"_source": {
"MDMGlobalData": {
"name": "All Sony Music Corp"
}
}
}
]
As you can see the order is still maintained, Sony is having maximum score (as it should be according to the query taken) and then further scoring is done on the basis of the order of the OR clauses.

How to remove a field from json field in Elastic Search

I would like to remove member2 from members. I saw script
ctx._source.list_data.removeIf{list_item -> list_item.list_id == remove_id}
for a list but in my case it's not working. Is that possible?
"_index": "test",
"_type": "test",
"_id": "5",
"_score": 1.0,
"_source": {
"id": "1",
"description": "desc",
"name": "ss",
"members": {
"member1": {
"id": "2",
"role": "owner"
},
"member2": {
"role": "owner",
"id": "3"
}
}
}
}
You can use the update API:
POST test/_update/5
{
"script": "ctx._source.members.remove('member2')"
}
removeIf is for list. Your members2 is of type object so you need to use remove
{
"script": "if(ctx._source.members.member2.id=='3')
ctx._source.members.remove('member2')"
}

find by query and push to array in Elastic search

I store data in the elastic search like this:
{
"_index": "my_index",
"_type": "doc",
"_id": "6lDquGEBFRQVe0x93eHk",
"_version": 1,
"_score": 1,
"_source": {
"ID_Number": "6947503728601",
"Userrname":"Jack.m07",
"name": "Jack",
"photos": ["img/one.png"]
}
}
I want find user by ID_Number and push new value to Photos
e.g)
"photos": ["img/one.png","img/two.png"]
How can I implement this? What is the query?
I found answer,this is the query
POST my_index/_update_by_query?conflicts=proceed
{
"script": {
"inline": "ctx._source.photos.add(params.new_photos)",
"params": {
"new_photos": "img/two.png"
}
},
"query": {
"terms": {
"ID_Number": "6947503728601"
}
}
}

Is it possible to perform user count / cardinality with logical relationship in ElasticSearch?

I have documents of Users with the following format:
{
userId: "<userId>",
userAttributes: [
"<Attribute1>",
"<Attribute2>",
...
"<AttributeN>"
]
}
I want to be able to get the number of unique users that answer a logic statement, for example How many users have attribute1 AND attribute2 OR attribute3?
I've read about the cardinality function in cardinality-aggregation but it seems to work for a single value, lacking the logic abilities of "AND" and "OR".
Note that I have around 1,000,000,000 documents and I need the results as fast as possible, this why I was looking at the cardinality estimation.
What about this attempt, considering the userAttributes as a simple array of strings (analyzed in my case, but single lowercase terms):
POST /users/user/_bulk
{"index":{"_id":1}}
{"userId":123,"userAttributes":["xxx","yyy","zzz"]}
{"index":{"_id":2}}
{"userId":234,"userAttributes":["xxx","yyy","aaa"]}
{"index":{"_id":3}}
{"userId":345,"userAttributes":["xxx","yyy","bbb"]}
{"index":{"_id":4}}
{"userId":456,"userAttributes":["xxx","ccc","zzz"]}
{"index":{"_id":5}}
{"userId":567,"userAttributes":["xxx","ddd","ooo"]}
GET /users/user/_search
{
"query": {
"query_string": {
"query": "userAttributes:(((xxx AND yyy) NOT zzz) OR ooo)"
}
},
"aggs": {
"unique_ids": {
"cardinality": {
"field": "userId"
}
}
}
}
which gives the following:
"hits": [
{
"_index": "users",
"_type": "user",
"_id": "2",
"_score": 0.16471066,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"aaa"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "3",
"_score": 0.04318809,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"bbb"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "5",
"_score": 0.021594046,
"_source": {
"userAttributes": [
"xxx",
"ddd",
"ooo"
]
}
}
]

Resources