Is it possible to perform user count / cardinality with logical relationship in ElasticSearch? - elasticsearch

I have documents of Users with the following format:
{
userId: "<userId>",
userAttributes: [
"<Attribute1>",
"<Attribute2>",
...
"<AttributeN>"
]
}
I want to be able to get the number of unique users that answer a logic statement, for example How many users have attribute1 AND attribute2 OR attribute3?
I've read about the cardinality function in cardinality-aggregation but it seems to work for a single value, lacking the logic abilities of "AND" and "OR".
Note that I have around 1,000,000,000 documents and I need the results as fast as possible, this why I was looking at the cardinality estimation.

What about this attempt, considering the userAttributes as a simple array of strings (analyzed in my case, but single lowercase terms):
POST /users/user/_bulk
{"index":{"_id":1}}
{"userId":123,"userAttributes":["xxx","yyy","zzz"]}
{"index":{"_id":2}}
{"userId":234,"userAttributes":["xxx","yyy","aaa"]}
{"index":{"_id":3}}
{"userId":345,"userAttributes":["xxx","yyy","bbb"]}
{"index":{"_id":4}}
{"userId":456,"userAttributes":["xxx","ccc","zzz"]}
{"index":{"_id":5}}
{"userId":567,"userAttributes":["xxx","ddd","ooo"]}
GET /users/user/_search
{
"query": {
"query_string": {
"query": "userAttributes:(((xxx AND yyy) NOT zzz) OR ooo)"
}
},
"aggs": {
"unique_ids": {
"cardinality": {
"field": "userId"
}
}
}
}
which gives the following:
"hits": [
{
"_index": "users",
"_type": "user",
"_id": "2",
"_score": 0.16471066,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"aaa"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "3",
"_score": 0.04318809,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"bbb"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "5",
"_score": 0.021594046,
"_source": {
"userAttributes": [
"xxx",
"ddd",
"ooo"
]
}
}
]

Related

Difference between match vs wild card query

What is the difference between the Match and Wild card query? If the requirement is to search a combination of words in a paragraph or log which approach is better?
Match query is used to find all those documents that have the exact search term (ignore the case), whereas Wildcard query returns the documents that contain the search term.
Adding a working example
Index Data:
{
"name":"breadsticks with soup"
}
{
"name":"multi grain bread"
}
Search Query using Match query:
{
"query": {
"match": {
"name": "bread"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"name": "multi grain bread"
}
}
]
Search Query using wildcard query:
{
"query": {
"wildcard": {
"name": "*bread*"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "multi grain bread"
}
},
{
"_index": "67706115",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "breadsticks with soup"
}
}
]

Elastic search negate phrase and words in simple query string

I'm trying to negate some words and phrases in an Elastic Search request using the simple query string.
This is what I do:
&q=-"the witcher 3"-game-novel
So basically, trying to negate a phrase AND the words after it. But that doesn't seem to work.
If I try to negate the words alone it works.
How can I negate phrases and sentences in a simple query string?
Adding a working example with index data,search query, and search result.
Index Data:
{
"name":"test"
}
{
"name":"game"
}
{
"name":"the witcher"
}
{
"name":"the witcher 3"
}
{
"name":"the"
}
Search Query:
{
"query": {
"simple_query_string" : {
"query": "-(game | novel) -(the witcher 3)",
"fields": ["name"],
"default_operator": "and"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "4",
"_score": 2.0,
"_source": {
"name": "the"
}
},
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"name": "the witcher"
}
},
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "1",
"_score": 2.0,
"_source": {
"name": "test"
}
}
]

elasticsearch query for finding id in fields in json file

I have a json file that I indexed on elasticsearch and I need a query to retrieve "_id_osm". can you help me plz.
and this is one line of my json file:
{
"index": {
"_index": "pariss",
"_type": "sig",
"_id": 1
}
}{
"fields": {
"_id_osm": 416747747,
"_categorie": "",
"_name": [
""
],
"_location": [
36.1941834,
5.3595221
]
}
}
Based on the comments in the answer updated the answer,
If you have store true in your mapping for _id_osm then you can use below query to fetch the field value.
{
"stored_fields" : ["_id_osm"],
"query": {
"match": {
"_id": 1
}
}
}
Above call returns below response and you can notice the fields section in the response which contains the field name and value.
"hits": [
{
"_index": "intqu",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"_id_osm": [
416747747
]
}
}
]
If you don't have store true which is default, then use _source filtering to get the data.
{
"_source": [ "_id_osm" ],
"query": {
"match": {
"_id": 1
}
}
}
which returns below response, you can see _source has the data.
"hits": [
{
"_index": "intqu",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"_id_osm": 416747747
}
}
]

ElasticSearch: why it is not possible to get suggest by criteria?

I want to get suggestions from some text for concrete user.
As I understand Elasticsearch provides suggestions based on the whole dictionary(inverted index) that contains all the terms in the index.
So if user1 posts some text then this text can be suggested to user2. Am I right?
Is it possible to add filter by criteria (by user for example) to reduce the set of terms to be suggested?
Yes, that's very much possible, let me show you by an example, which uses the query with filter context:
Index def
{
"mappings": {
"properties": {
"title": {
"type": "text" --> inverted index for storing suggestions on title field
},
"userId" : {
"type" : "keyword" --> like in you example
}
}
}
}
Index sample doc
{
"title" : "foo baz",
"userId" : "katrin"
}
{
"title" : "foo bar",
"userId" : "opster"
}
Search query without userId filter
{
"query": {
"bool": {
"must": {
"match": {
"title": "foo"
}
}
}
}
}
Search results(bring both results)
"hits": [
{
"_index": "so_suggest",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"title": "foo bar",
"userId": "posted" --> note another user
}
},
{
"_index": "so_suggest",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"title": "foo baz",
"userId": "katrin" -> note user
}
}
]
Now lets reduce the suggestion by filtering the docs created by user katrin
Search query
{
"query": {
"bool": {
"must": {
"match": {
"title": "foo"
}
},
"filter": {. --> note filter on userId field
"term": {
"userId": "katrin"
}
}
}
}
}
Search result
"hits": [
{
"_index": "so_suggest",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"title": "foo baz",
"userId": "katrin"
}
}
]

elasticsearch : merge 2 fields from different types in search result

Do you know if we can merge dynamically 2 fields which belong to different types in one unique field
I have an index my_index with 2 types type1 and type2
I am doing a search on those 2 types :
POST /my_index/_search
{
"min_score": 1,
"query": {
"bool": {
"should": [
{
"match": {
"titreType1": {
"query": "boy"
}
}
},
{
"match": {
"titreType2": {
"query": "boy"
}
}
}
]
}
}
}
I will have results from the 2 different types that looks like to:
"hits": [
{
"_index": "my_index",
"_type": "type1",
"_id": "AVo0LhFj8N13TOVDqMo9",
"_score": 13.171456,
"_source": {
"titreType1": "the boy !"
}
},
{
"_index": "my_index",
"_type": "type1",
"_id": "AVo0Lg5X8N13TOVDqMUH",
"_score": 12.986091,
"_source": {
"titreType1": "if i were a boy"
}
},
{
"_index": "my_index",
"_type": "type2",
"_id": "AVo0S-nM8N13TOVDqNPX",
"_score": 12.34135,
"_source": {
"titreType2": "boy are very nasty and it is sad"
}
},
...
]
I would like to have in my result just one column named "title" that display value from titreType1 or titreType2
Do you know how to do this?

Resources