How to perform a multi level logical query in elasticsearch - elasticsearch

Let's say that we have such query
(a or b or c) and (x or y or z) and (f or g or h) ...
How we can perform such query using Elasticsearch?
I tried the bool query, but I still confused about must, should... clauses and how to place different conditions in these clauses to obtain a logical result
Here is a sample of my data(courses type):
[{
"_index": "training-hub",
"_type": "courses",
"_id": "58ad8090604aff26df131971",
"_score": 1,
"_source": {
"title": "Getting Started with Vue.js",
"description": "In this course, you will learn the basics of Vue.js framework and how to create amazing web applications using this framework",
"slug": "hzrthtrjhyjyt--by--Jemli-Fathi",
"format": [
"Intra-entreprise"
],
"duration": 6,
"language": "EN",
"requirements": [
"Basics of web development"
],
"tags": [],
"audience": [
"Web developers"
],
"price": 600,
"cover": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487840598/training-hub/k1h0tzciyjflqvtlyr2l.png",
"scheduleUrl": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487765627/training-hub/sozofm68nrwxhta3ga3u.png",
"trainer": {
"firstname": "Jemli",
"lastname": "Fathi",
"photo": "https://ucarecdn.com/5923a1bb-3e77-47d0-bd5b-7d07b0f559fe/16487460_1269388943138870_3235853817844339449_o.jpg"
},
"courseCategory": {
"name": "Game Development"
},
"createdAt": "2017-02-22T12:14:08.186Z",
"updatedAt": "2017-02-24T10:39:22.896Z"
}
},
{
"_index": "training-hub",
"_type": "courses",
"_id": "58ad81c0604aff26df131973",
"_score": 1,
"_source": {
"title": "Create your first Laravel application today!",
"description": "In this course, you're gonna learn how create fancy web applications using Laravel PHP Framework ...",
"slug": "sdvrehtrthrth--by--Jemli-Fathi",
"format": [
"Intra-entreprise",
"Inter-entreprise"
],
"duration": 6,
"language": "EN",
"requirements": [
"Basics of Web development",
"Basics of PHP language"
],
"tags": [],
"audience": [
"Web developers",
"PHP developers"
],
"price": 600,
"cover": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487841464/training-hub/dpgqbchualnfc78n69gs.png",
"scheduleUrl": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487765627/training-hub/sozofm68nrwxhta3ga3u.png",
"trainer": {
"firstname": "Jemli",
"lastname": "Fathi",
"photo": "https://ucarecdn.com/5923a1bb-3e77-47d0-bd5b-7d07b0f559fe/16487460_1269388943138870_3235853817844339449_o.jpg"
},
"courseCategory": {
"name": "Web Development"
},
"createdAt": "2017-02-22T12:19:12.376Z",
"updatedAt": "2017-02-23T09:39:23.414Z"
}
},
{
"_index": "training-hub",
"_type": "courses",
"_id": "58aead4faecfc31e4559d49b",
"_score": 1,
"_source": {
"title": "Getting Started with Docker",
"description": "After taking this course, you will be able to use Docker in real life projects and you can bootstrap your projects into different containers",
"slug": "regrehgreh--by--Jemli-Fathi",
"format": [
"Intra-entreprise"
],
"duration": 5,
"language": "EN",
"requirements": [
"Basic Linux Shell skills",
"Basic knowledge of Linux environment"
],
"tags": [],
"audience": [
"Dev-Ops",
"Web Developers"
],
"price": 999,
"cover": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487939101/training-hub/vkqupkiqerq0kgjgd0km.png",
"scheduleUrl": "http://res.cloudinary.com/dqihnnzaj/image/upload/v1487842509/training-hub/r9y1qisyfelseeuzgtt3.png",
"trainer": {
"firstname": "Jemli",
"lastname": "Fathi",
"photo": "https://ucarecdn.com/5acb1b5f-b550-4560-b085-2d75384e5ec8/13567067_1064268623650904_3773193220506255422_n.jpg"
},
"courseCategory": {
"name": "Web Development"
},
"createdAt": "2017-02-23T09:37:19.758Z",
"updatedAt": "2017-02-24T12:31:32.078Z"
}
}]
What I need exactly is to filter courses like this:
(category1 or category2 ...) and (format1 or format2...) and
(language1 or language2)...

You may use bool queries combined with the terms query for the desired behavior
GET _search
{
"query" : {
"bool": {
"must": [ //AND
{
"terms": {
"category.name": [ // Category with value VALUE1 or VALUE2
"VALUE1",
"VALUE2"
]
}
},
{
"terms": {
"format": [// format with value FORMAT1 or FORMAT2
"FORMAT1",
"FORMAT2"
]
}
},
{
"terms": {
"language": [// language with value LANG1 or LANG2
"LANG1",
"LANG2"
]
}
}
]
}
}
}
Also, please make sure that the fields that are used for term matching, are not_analyzed.
Please read up on mappings starting here: http://www.elasticsearch.org/guide/reference/mapping/.

Elasticsearch Query String Query does the same what you are looking for.
Look at following code snippet extracted from official docs here
GET /_search
{
"query": {
"query_string": {
"query": "(content:this OR name:this) AND (content:that OR name:that)"
}
}
}
Hope this helps!

Related

Elastic Search Wildcard query with space failing 7.11

I am having my data indexed in elastic search in version 7.11. This is my mapping i got when i directly added documents to my index.
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}
I havent added the keyword part but no idea where it came from.
I am running a wild card query on the same. But unable to get data for keywords with spaces.
{
"query": {
"bool":{
"should":[
{"wildcard": {"name":"*hello world*"}}
]
}
}
}
Have seen many answers related to not_analyzed . And i have tried updating {"index":"true"} in mapping but with no help. How to make the wild card search work in this version of elastic search
Tried adding the wildcard field
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type" :"wildcard"
}
}
}
And got following response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
},
"status": 400
}
Adding a sample document to match
{
"_index": "accelerators",
"_type": "_doc",
"_id": "602ec047a70f7f30bcf75dec",
"_score": 1.0,
"_source": {
"acc_id": "602ec047a70f7f30bcf75dec",
"name": "hello world example",
"type": "Accelerator",
"description": "khdkhfk ldsjl klsdkl",
"teamMembers": [
{
"userId": "karthik.r#gmail.com",
"name": "Karthik Ganesh R",
"shortName": "KR",
"isOwner": true
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS",
"isOwner": false
}
],
"sectorObj": [
{
"item_id": 14,
"item_text": "Cross-sector"
}
],
"geographyObj": [
{
"item_id": 4,
"item_text": "Global"
}
],
"technologyObj": [
{
"item_id": 1,
"item_text": "Artificial Intelligence"
}
],
"themeColor": 1,
"mainImage": "assets/images/Graphics/Asset 35.svg",
"features": [
{
"name": "Ideation",
"icon": "Asset 1007.svg"
},
{
"name": "Innovation",
"icon": "Asset 1044.svg"
},
{
"name": "Strategy",
"icon": "Asset 1129.svg"
},
{
"name": "Intuitive",
"icon": "Asset 964.svg"
},
],
"logo": {
"actualFileName": "",
"fileExtension": "",
"fileName": "",
"fileSize": 0,
"fileUrl": ""
},
"customLogo": {
"logoColor": "#B9241C",
"logoText": "EC",
"logoTextColor": "#F6F6FA"
},
"collaborators": [
{
"userId": "muhammed.arif#gmail.com",
"name": "muhammed Arif P T",
"shortName": "MA"
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS"
}
],
"created_date": "2021-02-18T19:30:15.238000Z",
"modified_date": "2021-03-11T11:45:49.583000Z"
}
}
You cannot modify a field mapping once created. However, you can create another sub-field of type wildcard, like this:
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type": "text",
"fields": {
"wildcard": {
"type" :"wildcard"
},
"keyword": {
"type" :"keyword",
"ignore_above":256
}
}
}
}
}
When the mapping is updated, you need to reindex your data so that the new field gets indexed, like this:
POST http://localhost:9001/indexname/_update_by_query
And then when this finishes, you'll be able to query on this new field like this:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.wildcard": "*hello world*"
}
}
]
}
}
}

Rest Query on the Patient Resource for Finding BOTH/ALL Given Name(s)

How do I search for a person with BOTH given names I provide?
I have the following 2 patients who are "close". Everything (in the Human Name area) is the same except one of the GivenNames are the same.
Note "Apple" vs "Banana".
{
"resourceType": "Bundle",
"id": "269caf66-0ccc-43e7-b9a5-f16f84db0149",
"meta": {
"lastUpdated": "2019-11-20T19:30:26.858917+00:00"
},
"type": "searchset",
"link": [
{
"relation": "self",
"url": "https://localhost:44348/Patient?given=Jingerheimer"
}
],
"entry": [
{
"fullUrl": "https://localhost:44348/Patient/504f6bd3-e9b4-4846-8948-97bf09c70722",
"resource": {
"resourceType": "Patient",
"id": "504f6bd3-e9b4-4846-8948-97bf09c70722",
"meta": {
"versionId": "1",
"lastUpdated": "2019-11-20T19:26:11.005+00:00"
},
"identifier": [
{
"system": "ssn",
"value": "111-11-1111"
},
{
"system": "uuid",
"value": "da55d068e0784b359fa97498a11543c5"
}
],
"name": [
{
"family": "Smith",
"given": [
"John",
"Apple",
"Jingerheimer"
]
}
]
},
"search": {
"mode": "match"
}
},
{
"fullUrl": "https://localhost:44348/Patient/10054ce9-6141-4eca-bc5b-0978f8c8afcb",
"resource": {
"resourceType": "Patient",
"id": "10054ce9-6141-4eca-bc5b-0978f8c8afcb",
"meta": {
"versionId": "1",
"lastUpdated": "2019-11-20T19:26:48.962+00:00"
},
"identifier": [
{
"system": "ssn",
"value": "222-22-2222"
},
{
"system": "uuid",
"value": "52d09f9436d44591816fd229dd139523"
}
],
"name": [
{
"family": "Smith",
"given": [
"John",
"Banana",
"Jingerheimer"
]
}
]
},
"search": {
"mode": "match"
}
}
]
}
One has GivenNames that include "Apple". The other includes GivenNames that include "Banana".
This search works fine:
https://localhost:44348/Patient/?given=Jingerheimer
What I have tried is:
https://localhost:44348/Patient/?given=Jingerheimer&given=Apple
but that gives me no results.
Note, omitting "given=Jingerheimer" is not an option....that filters a bunch of others.
I'm trying to get
"Has BOTH of the given names I provide"
Your syntax is correct, so I think the server does not handle the search correctly. Can you check the self link for your second search to see if it reflects the search you performed? Does the result Bundle have an OperationOutcome detailing something went wrong? If all that seems okay, you'll need to check your server's code.

Highlight on ElasticSearch autocomplete

I have the following data to be indexed on ElasticSearch.
I want to implement an autocomplete feature, and highlight why a specific document matched a query.
This are the settings of my index:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"autocomplete_filter"
]
}
}
}
}
}
Index Analyzing
Splits text on word boundaries.
Removes pontuation.
Lowercases
Edge NGrams each token
So the Inverted Index looks like:
This is how i defined the mappings for a name field:
{
"index_type": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
When I query:
GET http://localhost:9200/index/type/_search
{
"query": {
"match": {
"name": "soft"
}
},
"highlight": {
"fields" : {
"name" : {}
}
}
}
Search for: soft
Applying the Standard Tokenizer, the "soft" is the term, to find on the inverted index. This search matches the Documents: 1, 3, 4, 5, 6, 7 which is correct, but the highlighted part I would expect to be "soft" and not the whole word:
{
"hits": [
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
},
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> AG"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> AG2"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> AG good <em>software</em> better"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> AG"
]
}
},
{
"_source": {
"name": "is soft ware ok"
},
"highlight": {
"name": [
"is <em>soft</em> ware ok"
]
}
}
]
}
Search for: software ag
Applying the Standard Tokenizer, the "software ag" is transformed into "software" and "ag", to find on the inverted index. This search matches the Documents: 1, 3, 4, 5, 6, which is correct, but the highlighted part I would expect to be "software" and "ag" and not the whole word around "software" and "ag":
{
"hits": [
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG2</em>"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em> good <em>software</em> better"
]
}
},
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
}
]
}
I read the highlight documentation on elasticsearch, but I cannot understand how the highlighting is performed. For the two examples above I expect only the matched token on the inverted index to be highlighted and not the whole word.
Can anyone help how to highlight only the passed value?
Update
So, in seems that on ElasticSearch website, the autocomplete on the server side is similar to my implementation. However it seems that they highlight the matched query on the client.
If they do like this, I started to think that there is not a proper solution to do it on ElasticSearch side, so I implemented the highlight feature on server side instead of on client side(as they seem to do).
My implementation on server side(using PHP) is:
public function search($term)
{
$params = [
'index' => $this->getIndexName(),
'type' => $this->getIndexType(),
'body' => [
'query' => [
'match' => [
'name' => $term
]
]
]
];
$results = $this->client->search($params);
$hits = $results['hits']['hits'];
$data = [];
$wrapBefore = '<strong>';
$wrapAfter = '</strong>';
foreach ($hits as $hit) {
$data[] = [
$hit['_source']['id'],
$hit['_source']['name'],
preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
];
}
return $data;
}
Outputs what I aimed with this question:
I added a bounty to see if there is a solution at ElasticSearch level to achive what I described above.
As of now with latest version of elastic this is not possible as highligh documentation don't refer any settings or query for this. I checked elastic autocomplete example in browser console under xhr requests tab and found the response for "att" autocomplete response for keyword as follows.
url - https://search.elastic.co/suggest?q=att
{
"current_page": 1,
"last_page": 4,
"total_hits": 49,
"hits": [
{
"tags": [],
"url": "/elasticon/tour/2016/jp/not-attending",
"section": "Elasticon",
"title": "Not <em>Attending</em> - JP"
},
{
"section": "Elasticon",
"title": "<em>Attending</em> from Training - JP",
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-training"
},
{
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-keynote",
"title": "<em>Attending</em> from Keynote - JP",
"section": "Elasticon"
},
{
"tags": [],
"url": "/elasticon/tour/2016/not-attending",
"section": "Elasticon",
"title": "Thank You - Not <em>Attending</em>"
},
{
"tags": [],
"url": "/elasticon/tour/2016/attending",
"section": "Elasticon",
"title": "Thank You - <em>Attending</em>"
},
{
"section": "Blog",
"title": "What It's Like to <em>Attend</em> Elastic Training",
"tags": [],
"url": "/blog/what-its-like-to-attend-elastic-training"
},
{
"tags": "Elasticsearch",
"url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
"section": "Docs/",
"title": "Highlighting <em>attachments</em>"
},
{
"title": "<em>attachments</em> » email",
"section": "Docs/",
"tags": "Logstash",
"url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
},
{
"section": "Docs/",
"title": "Configuring Email <em>Attachments</em> » Actions",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
},
{
"url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
"tags": "Watcher",
"title": "HipChat Action <em>Attributes</em> » Actions",
"section": "Docs/"
},
{
"title": "Slack Action <em>Attributes</em> » Actions",
"section": "Docs/",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
}
],
"aggs": {
"sections": [
{
"Elasticon": 5
},
{
"Blog": 1
},
{
"Docs/": 43
}
],
"top_tags": [
{
"XPack": 14
},
{
"Elasticsearch": 12
},
{
"Watcher": 9
},
{
"Logstash": 4
},
{
"Clients": 3
},
{
"Shield": 1
}
]
}
}
But on frontend they are showing "att" only highlighted on in the autosuggest results. Hence they are handling the highlight stuff on browser layer.

Elasticsearch query on inner list and get only matching objects from list instead of entire list in result document

In following elastic search documents need to find comments from specific name eg "Mary Brown". Basically query on inner list and get only matching objects from list instead of entire list in result document. Is it possible. I have defined nested as mapping for 'comments'
{
"title": "Investment secrets",
"body": "What they don't tell you ...",
"tags": [ "shares", "equities" ],
"comments": [
{
"name": "Mary Brown",
"comment": "Lies, lies, lies",
"age": 42,
"stars": 1,
"date": "2014-10-18"
},
{
"name": "John Smith",
"comment": "You're making it up!",
"age": 28,
"stars": 2,
"date": "2014-10-16"
},
{
"name": "Mary Brown",
"comment": "making it!!!",
"age": 42,
"stars": 3,
"date": "2014-10-20"
}
]
}
Since you have properly mapped your comments field as nested, then yes this is possible using inner_hits, like this:
{
"_source": false,
"query": {
"nested": {
"path": "comments",
"inner_hits": { <---- use inner_hits here
"_source": [
"comment", "date"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"comments.name": "Mary Brown"
}
}
]
}
}
}
}
}

How to search exact text in nested document in elasticsearch

I have a index like this,
"_index": "test",
"_type": "products",
"_id": "URpYIFBAQRiPPu1BFOZiQg",
"_score": null,
"_source": {
"currency": null,
"colors": [],
"api": 1,
"sku": 9999227900050002,
"category_path": [
{
"id": "cat00000",
"name": "B1"
},
{
"id": "abcat0400000",
"name": "Cameras & Camcorders"
},
{
"id": "abcat0401000",
"name": "Digital Cameras"
},
{
"id": "abcat0401005",
"name": "Digital SLR Cameras"
},
{
"id": "pcmcat180400050006",
"name": "DSLR Package Deals"
}
],
"price": 1034.99,
"status": 1,
"description": null,
}
And i want to search only exact text ["Camcorders"] in category_path field.
I did some match query, but it search all the products which has "Camcorders" as a part of the text. Can some one help me to solve this.
Thanks
To search in nested field use like following query
{
"query": {
"term": {
"category_path.name": {
"value": "b1"
}
}
}
}
HOpe it helps..!
you could add one more nested field raw_name with not_analyzed analyzer and match against it.

Resources