Complex nested arrays search with boosting - elasticsearch

So I have an elastic index with a few thousand docuemnts each with the following rather complex structure which I need to search as follows:
lets say I have the following search string : "I am single and have a dog", I need to be able to
search all doc.ratedTags.tags.keywords and all doc.describedTagCombos.tags.keywords, but giving a boost to matched doc.ratedTags.tags.keywords where its rating sibling has a higher value
tbh i don't think its even possible
{
"_index": "neighborhood",
"_type": "neighborhood",
"_id": "2338",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"id": 2338,
"neighborhoodId": 4427,
"name": "East Village",
"xMin": -73.9926238051035,
"xMax": -73.9718363715137,
"xAvg": -73.9822300883086,
"yMin": 40.7192801452175,
"yMax": 40.7337200243512,
"yAvg": 40.7265000847843,
"city": {
"id": 393,
"name": "New York City-Manhattan",
"county": "New York",
"state": {
"id": 118,
"name": "New York",
"abbreviation": "NY",
"country": "USA"
}
},
"ratedTags": [
{
"id": 1064,
"rating": 4,
"tags": [
{
"id": 213,
"tagId": 15,
"name": "restaurants",
"synonym": "",
"keywords": "restaurants, restaurant, eatery, eateries"
}
]
},
{
"id": 995,
"rating": 5,
"tags": [
{
"id": 199,
"tagId": 1,
"name": "artists and creative types",
"synonym": "",
"keywords": "artsy, artistic, artists, creative types, musicians, artist, creative type, musician"
}
]
}
],
"describedTagCombos": [
{
"id": 9524,
"descriptor": "Area has tons of excellent bars, restaurants, and fun that appeals to post-college, students, and creative types",
"tags": [
{
"id": 213,
"tagId": 15,
"name": "restaurants",
"synonym": "",
"keywords": "restaurants, restaurant, eatery, eateries"
},
{
"id": 219,
"tagId": 21,
"name": "nightlife",
"synonym": "",
"keywords": "nightlife, bars, nightclubs, lounge, lounges, nightclub, bar, clubs, club, party, parties, night, pubs, tavern, sports bar, taverns, pub, wine bar, night life, night club, night clubs, sportsbar, sportsbars, night light, night lights, nightlight, nite light"
},
{
"id": 226,
"tagId": 28,
"name": "bar scene",
"synonym": "",
"keywords": "bar scene, bars, lounges, wine bar, lounge, pub, pubs, tavern, taverns"
},
{
"id": 238,
"tagId": 40,
"name": "dining options",
"synonym": "",
"keywords": "dining places, dining spots, dining options, restaurants, food options, food places, places to eat, eateries, dining, fast food, chain restaurant, chain restaurants, eatery"
}
]
},
{
"id": 9525,
"descriptor": "Traveler Tip: walk around St Marks and check out the shops, people, & vibe",
"tags": [
{
"id": 235,
"tagId": 37,
"name": "tourist attractions",
"synonym": "sightseeing options",
"keywords": "tourist attractions, sightseeing, attractions, sightsee, tourist"
},
{
"id": 236,
"tagId": 38,
"name": "entertainment options",
"synonym": "fun options",
"keywords": "fun, entertainment, good time, enjoyment, enjoy, pleasure"
}
]
}
]
}
}
"_nayberz": {
"type": "text",
"analyzer": "autocomplete",
"store": true
},
...
"keywords": {
"type": "text",
"analyzer": "autocomplete",
"copy_to": "_nayberz"
},

Related

Ranking results based on field in nested array

Suppose I have an elasticsearch index called 'neighborhood' and in there I am storing documents ( example far below) with the following settings and mappings.
There are many nested fields in the model and so I am using copy_to on many of them to make searching easier - e.g. I just do a match query on the _nayberz field.
The search works quite well, however I would like for matches where the ratedTags[n].rating (it can be 1-5) is higher to rank higher
Mappings:
{
"properties": {
"_nayberz": {
"type": "text",
"analyzer": "autocomplete",
"store" : true
},
"describedTagCombos": {
"type": "nested",
"properties": {
"tags": {
"type": "nested",
"properties": {
"keywords": {
"type": "text",
"analyzer": "autocomplete",
"boost": 5,
"copy_to": "_nayberz"
},
"name": {
"type": "text",
"analyzer": "autocomplete",
"boost": 5,
"copy_to": "_nayberz"
},
"synonym": {
"type": "text",
"analyzer": "autocomplete",
"boost": 5,
"copy_to": "_nayberz"
}
}
}
}
},
"name": {
"type": "text",
"fielddata" : true,
"copy_to": "_nayberz",
"boost": 1
},
"ratedTags": {
"type": "nested",
"properties": {
"tags": {
"type": "nested",
"properties": {
"keywords": {
"type": "text",
"boost": 5,
"copy_to": "_nayberz"
},
"name": {
"type": "text",
"boost": 5,
"copy_to": "_nayberz"
},
"synonym": {
"type": "text",
"boost": 5,
"copy_to": "_nayberz"
}
}
}
}
}
}
}
{
"id": 6475,
"neighborhoodId": 2495,
"name": "Some neighborhood name",
"xMin": -87.7351229157221,
"xMax": -87.687849915678,
"xAvg": -87.7114864157001,
"yMin": 41.9316410223988,
"yMax": 41.9466920224128,
"yAvg": 41.9391665224058,
"city": {
"id": 539,
"name": "SomeCity",
"county": "SomeCounty",
"state": {
"id": 174,
"name": "Illinois",
"abbreviation": "IL",
"country": "USA"
}
},
"ratedTags": [
{
"id": 11572,
"rating": 2,
"tags": [
{
"id": 2323,
"tagId": 36,
"name": "shopping options",
"synonym": "",
"keywords": "shopping options, shopping, shop, shopper, shoppers, shops"
}
]
},
{
"id": 11418,
"rating": 3,
"tags": [
{
"id": 2292,
"tagId": 5,
"name": "public transportation options",
"synonym": "",
"keywords": "public transport, public transportation"
}
]
},
{
"id": 11434,
"rating": 4,
"tags": [
{
"id": 2295,
"tagId": 8,
"name": "quiet",
"synonym": "",
"keywords": "quiet, chill, peaceful, not noisy, not loud, not too noisy, not too loud, relaxed"
}
]
},
{
"id": 11458,
"rating": 3,
"tags": [
{
"id": 2300,
"tagId": 13,
"name": "expensive relative to other neighborhoods",
"synonym": "costly",
"keywords": "upscale, chic"
}
]
},
{
"id": 11469,
"rating": 4,
"tags": [
{
"id": 2302,
"tagId": 15,
"name": "restaurants",
"synonym": "",
"keywords": "restaurants, restaurant, eatery, eateries"
}
]
},
{
"id": 11477,
"rating": 2,
"tags": [
{
"id": 2304,
"tagId": 17,
"name": "clean",
"synonym": "",
"keywords": "clean, cleanest, not dirty"
}
]
},
{
"id": 11603,
"rating": 3,
"tags": [
{
"id": 2329,
"tagId": 42,
"name": "safe compared to other neighborhoods",
"synonym": "",
"keywords": "safe, safety, safest, not dangerous"
}
]
},
{
"id": 11557,
"rating": 2,
"tags": [
{
"id": 2320,
"tagId": 33,
"name": "green space and parks",
"synonym": "",
"keywords": "green space, green, parks, open space, nature"
}
]
},
{
"id": 11577,
"rating": 2,
"tags": [
{
"id": 2324,
"tagId": 37,
"name": "tourist attractions",
"synonym": "sightseeing options",
"keywords": "tourist attractions, sightseeing, attractions, sightsee, tourist"
}
]
},
{
"id": 11582,
"rating": 2,
"tags": [
{
"id": 2325,
"tagId": 38,
"name": "entertainment options",
"synonym": "fun options",
"keywords": "fun, entertainment, good time, enjoyment, enjoy, pleasure"
}
]
},
{
"id": 11588,
"rating": 3,
"tags": [
{
"id": 2326,
"tagId": 39,
"name": "cafes",
"synonym": "coffee shops",
"keywords": "cafes, coffee shops, cafe, coffee shop, coffee houses, coffee house, café, cafés"
}
]
},
{
"id": 11594,
"rating": 4,
"tags": [
{
"id": 2327,
"tagId": 40,
"name": "dining options",
"synonym": "",
"keywords": "dining places, dining spots, dining options, restaurants, food options, food places, places to eat, eateries, dining, fast food, chain restaurant, chain restaurants, eatery"
}
]
}
],
"describedTagCombos": [
{
"id": 26842,
"descriptor": "Foodies: must try a polish sausage in this neighborhood",
"tags": [
{
"id": 2302,
"tagId": 15,
"name": "restaurants",
"synonym": "",
"keywords": "restaurants, restaurant, eatery, eateries"
},
{
"id": 2327,
"tagId": 40,
"name": "dining options",
"synonym": "",
"keywords": "dining places, dining spots, dining options, restaurants, food options, food places, places to eat, eateries, dining, fast food, chain restaurant, chain restaurants, eatery"
}
]
},
{
"id": 26843,
"descriptor": "Addison Mall in area with a Target, Starbucks, etc.",
"tags": [
{
"id": 2323,
"tagId": 36,
"name": "shopping options",
"synonym": "",
"keywords": "shopping options, shopping, shop, shopper, shoppers, shops"
}
]
}
]
}
Settings:
{
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20
},
"english_stop": {
"type": "stop",
"stopwords": [
"a",
"an",
"and",
"are",
"as",
"at",
"be",
"but",
"by",
"for",
"if",
"in",
"into",
"is",
"it",
"no",
"not",
"of",
"on",
"or",
"such",
"that",
"the",
"their",
"then",
"there",
"these",
"they",
"this",
"to",
"was",
"will",
"with",
"have"
]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter",
"english_stop"
]
}
}
}
}
I think the best candidate here is the field_value_factor function:
GET neighborhoods/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"default_field": "_nayberz",
"query": "enjoy"
}
},
{
"nested": {
"path": "ratedTags",
"query": {
"function_score": {
"functions": [
{
"field_value_factor": { <--
"field": "ratedTags.rating",
"factor": 1
}
}
],
"boost_mode": "sum"
}
}
}
}
]
}
}
}

LUIS consider TIME and DURATION as number ENTITY

I was expecting entities such DATE, TIME, DURATION & No Of People Joining the Call in below JSON.
Now i got entities back such as DATE,TIME and DURATION correctly but for No Of People there is problem for me.
I am getting four entities as NUMBER so now i am confused as how to pick exact entity that represent No Of People. Ideally it is No. 6, but i am not getting on which basis i should decide that 6 is the No Of People
{
"query": "book audio bridge tomorrow for 6 people for 30 mins starts at 5:30 PM",
"topScoringIntent": {
"intent": "BookAudioBridge",
"score": 0.9895838
},
"intents": [
{
"intent": "BookAudioBridge",
"score": 0.9895838
},
{
"intent": "ListBooking",
"score": 0.00677821552
}
],
"entities": [
{
"entity": "tomorrow",
"type": "builtin.datetimeV2.date",
"startIndex": 18,
"endIndex": 25,
"resolution": {
"values": [
{
"timex": "2018-06-21",
"type": "date",
"value": "2018-06-21"
}
]
}
},
{
"entity": "30 mins",
"type": "builtin.datetimeV2.duration",
"startIndex": 44,
"endIndex": 50,
"resolution": {
"values": [
{
"timex": "PT30M",
"type": "duration",
"value": "1800"
}
]
}
},
{
"entity": "5:30 pm",
"type": "builtin.datetimeV2.time",
"startIndex": 62,
"endIndex": 68,
"resolution": {
"values": [
{
"timex": "T17:30",
"type": "time",
"value": "17:30:00"
}
]
}
},
{
"entity": "6",
"type": "builtin.number",
"startIndex": 31,
"endIndex": 31,
"resolution": {
"value": "6"
}
},
{
"entity": "30",
"type": "builtin.number",
"startIndex": 44,
"endIndex": 45,
"resolution": {
"value": "30"
}
},
{
"entity": "5",
"type": "builtin.number",
"startIndex": 62,
"endIndex": 62,
"resolution": {
"value": "5"
}
},
{
"entity": "30",
"type": "builtin.number",
"startIndex": 64,
"endIndex": 65,
"resolution": {
"value": "30"
}
}
]
}
You could build a composite entity of number and simple entity for "people" so that the entity that comes back indicates 6 as number and simple entity name as people.

elasticsearch get specific fields

I am trying to search data from following indexed data
PUT /nested_test100/t/1
{
"title": "Nest eggs1",
"body": "Making your money work...",
"tags": [ "cash1", "shares1" ],
"comments": [
{
"name": "John Smith1",
"comment": "Great article1",
"age": 28,
"stars": 1,
"date": "2014-09-01"
},
{
"name": "Alice White1",
"comment": "More like this please1",
"age": 31,
"stars": 1,
"date": "2014-10-22"
}
]
}
PUT /nested_test100/t/2
{
"title": "Nest eggs2",
"body": "Making your money work...",
"tags": [ "cash", "shares" ],
"comments": [
{
"name": "John Smith2",
"comment": "Great article2",
"age": 30,
"stars": 2,
"date": "2014-09-01"
},
{
"name": "Alice White2",
"comment": "More like this please2",
"age": 31,
"stars": 2,
"date": "2014-10-22"
}
]
}
PUT /nested_test100/t/3
{
"title": "Nest eggs3",
"body": "Making your money work...",
"tags": [ "cash3", "shares3" ],
"comments": [
{
"name": "John Smith3",
"comment": "Great article3",
"age": 28,
"stars": 3,
"date": "2014-09-01"
},
{
"name": "Alice White3",
"comment": "More like this please3",
"age": 30,
"stars": 3,
"date": "2014-10-22"
}
]
}
GET /nested_test100/t/_search
what I want is to get title, body tags only and only comments having age=28.
How I should write query dsl for that.
what I have written is following
POST /nested_test100/t/_search
{
"fields" : ["title","comments.age","body","tags"],
"query" : {
"term" : { "comments.age" : "28" }
}
}
and its giving me data like this
"hits": [
{
"_index": "nested_test100",
"_type": "t",
"_id": "1",
"_score": 1,
"fields": {
"comments.age": [
28,
31
],
"title": [
"Nest eggs1"
],
"body": [
"Making your money work..."
],
"tags": [
"cash1",
"shares1"
]
}
},
{
"_index": "nested_test100",
"_type": "t",
"_id": "3",
"_score": 1,
"fields": {
"comments.age": [
28,
30
],
"title": [
"Nest eggs3"
],
"body": [
"Making your money work..."
],
"tags": [
"cash3",
"shares3"
]
}
}
]
but I don't want comments having age other than 28.
I am using elasticsearch version 1.7
Please use inner_hits to include nested inner objects as inner hits to a search hit.
The search request would be like :
POST test/_search
{
"query": {
"nested": {
"path": "comments",
"query": {
"match": {"comments.age" : 28}
},
"inner_hits": {}
}
}
}
P.S : You will need to map the comments as a nested field (as already mentioned by Val)

Query string for nested field in elasticsearch ( how to make AND work?)

I am trying to hit a query string on a nested field abc.answer which does not use any analyzer. Here is the query I am using:
{
"query": {
"nested":{
"path": "abc",
"query":{
"query_string": {
"query": "photo AND delhi",
"fields": ["answer"]
}
},
"inner_hits": {
"explain": true
}
}
},
"explain" : true,
"sort" : [ {
"_score" : { }
} ]
}
I want to search for the document which has both dtp operator and bangalore in the nested field abc. But it is not showing any results.
My nested field structure :
"abc": [
{
"updatedAt": 1452073190000,
"questionId": 1,
"labelOriginal": "name",
"createdAt": 1452073190000,
"answerOriginal": "wexe",
"answer": "wexe",
"answerId": 0,
"label": "name",
"question": "what is your name?"
},
{
"updatedAt": 1452073190000,
"questionId": 2,
"labelOriginal": "mobile",
"createdAt": 1452073190000,
"answerOriginal": "9000000000",
"answer": "9000000000",
"answerId": 0,
"label": "mobile",
"question": "What is your mobile number?"
},
{
"updatedAt": 1452073190000,
"questionId": 3,
"labelOriginal": "email id",
"createdAt": 1452073190000,
"answerOriginal": "legallss#yahoo.com",
"answer": "legallss#yahoo.com",
"answerId": 0,
"label": "email Id",
"question": "What is your e-mail id ?"
},
{
"updatedAt": 1452073190000,
"questionId": 4,
"labelOriginal": "current role",
"createdAt": 1452073190000,
"answerOriginal": "dtp operator",
"answer": "DTP Operator",
"answerId": 597,
"label": "current role",
"question": "What is your current role?"
},
{
"updatedAt": 1452073190000,
"questionId": 5,
"labelOriginal": "city",
"createdAt": 1452073190000,
"answerOriginal": "bangalore",
"answer": "Bangalore",
"answerId": 23,
"label": "city",
"question": "Which city do you live in ?"
},
{
"updatedAt": 1452073190000,
"questionId": 6,
"labelOriginal": "locality",
"createdAt": 1452073190000,
"answerOriginal": "80 ft. road",
"answer": "80 Ft. Road",
"answerId": 0,
"label": "locality",
"question": "Which locality do you live in ?"
},
{
"updatedAt": 1452073190000,
"questionId": 13,
"labelOriginal": "job type",
"createdAt": 1452073190000,
"answerOriginal": "part time jobs",
"answer": "Part Time Jobs",
"answerId": 64,
"label": "Job Type",
"question": "Are you comfortable with working Full Time or Part Time?"
},
{
"labelOriginal": "userDesiredCity",
"answerOriginal": "bangalore",
"answer": "Bangalore",
"answerId": 23,
"label": "userDesiredCity"
}
You don't need a nested functionality for this kind of query, but the flat structure of a simple array.
Make your rsa field to have include_in_parent: true:
"rsa": {
"type": "nested",
"include_in_parent": true,
"properties": {
Re-index the test documents and then use a query like this one (no nested query):
"query": {
"query_string": {
"query": "dtp operator AND bangalore",
"fields": [
"rsa.answer"
]
}
}

REQL to match string expression

I have the following json:
{
"release": {
"genres": {
"genre": "Electronic"
},
"identifiers": {
"identifier": [
{
"description": "Text",
"value": "5 709498 101026",
"type": "Barcode"
},
{
"description": "String",
"value": 5709498101026,
"type": "Barcode"
}
]
},
"status": "Accepted",
"videos": {
"video": [
{
"title": "Future 3 - Renaldo",
"duration": 446,
"description": "Future 3 - Renaldo",
"src": "http://www.youtube.com/watch?v=hpc9aQpnUjc",
"embed": true
},
{
"title": "Future 3 - Silver M from album We are the Future / 1995 Denmark / Archivos de Kraftwerkmusik",
"duration": 461,
"description": "Future 3 - Silver M from album We are the Future / 1995 Denmark / Archivos de Kraftwerkmusik",
"src": "http://www.youtube.com/watch?v=nlcHRI8iV4g",
"embed": true
},
{
"title": "Future 3 - Bubbles At Dawn",
"duration": 710,
"description": "Future 3 - Bubbles At Dawn",
"src": "http://www.youtube.com/watch?v=ABBCyvGMOFw",
"embed": true
}
]
},
"labels": {
"label": {
"catno": "APR 010CD",
"name": "April Records"
}
},
"companies": {
"company": {
"id": 26184,
"catno": "",
"name": "Voices Of Wonder",
"entity_type_name": "Published By",
"resource_url": "http://api.discogs.com/labels/26184",
"entity_type": 21
}
},
"styles": {
"style": [
"Abstract",
"IDM",
"Downtempo"
]
},
"formats": {
"format": {
"text": "",
"name": "CD",
"qty": 1,
"descriptions": {
"description": "Album"
}
}
},
"country": "Denmark",
"id": 5375,
"released": "1995-00-00",
"artists": {
"artist": {
"id": 5139,
"anv": "",
"name": "Future 3",
"role": "",
"tracks": "",
"join": ""
}
},
"title": "We Are The Future 3",
"master_id": 638422,
"tracklist": {
"track": [
{
"position": 1,
"duration": "8:04",
"title": "Future 3"
},
{
"position": 2,
"duration": "7:38",
"title": "Silver M"
},
{
"position": 3,
"duration": "7:27",
"title": "Renaldo"
},
{
"position": 4,
"duration": "6:04",
"title": "B.O.Y.D."
},
{
"position": 5,
"duration": "6:12",
"title": "Fumble"
},
{
"position": 6,
"duration": "6:12",
"title": "Dawn"
},
{
"position": 7,
"duration": "11:54",
"title": "Bubbles At Dawn"
},
{
"position": 8,
"duration": "6:03",
"title": "D.A.W.N. At 6"
},
{
"position": 9,
"duration": "8:50",
"title": 4684351684651
}
]
},
"data_quality": "Needs Vote",
"extraartists": {
"artist": [
{
"id": 2647642,
"anv": "",
"name": "Danesadwork",
"role": "Cover",
"tracks": "",
"join": ""
},
{
"id": 2647647,
"anv": "",
"name": "Djon Edvard Petersen",
"role": "Photography By",
"tracks": "",
"join": ""
},
{
"id": 114164,
"anv": "",
"name": "Anders Remmer",
"role": "Written-By",
"tracks": "",
"join": ""
},
{
"id": 435979,
"anv": "",
"name": "Jesper Skaaning",
"role": "Written-By",
"tracks": "",
"join": ""
},
{
"id": 15691,
"anv": "",
"name": "Thomas Knak",
"role": "Written-By",
"tracks": "",
"join": ""
}
]
},
"notes": "© 1995 April Records APS ℗ 1995 April Records APS"
}
}
I am trying to get those titles which end with 'At Dawn'.
I am using the following command
r.db("discogs1").table("releases").filter(function(doc){ return doc('release')('title').match('At Dawn$')})
But I get errors as follows:
RqlRuntimeError: Expected type STRING but found NUMBER in:r.db("discogs1").table("releases").filter(function(var_24) { return var_24("release")("title").match("At Dawn$"); })
I tried different combinations but I can't seem to get it to work
It seems that some of your documents don't have a row('release')('title') property that is a string. Some of them are numbers, so when you try to call .match on them, they throw an error because .match only works on strings.
To see if this is true, try the following:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().ne('STRING'))
.count()
Ideally, the result of this should be 0, since no document should have a title property that's not a string. If it's higher than 0, that's why you're getting an error.
If you want to only get documents where the title is a string, you can do the following:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().eq('STRING'))
.filter(function(doc){ return doc('release')('title').match('At Dawn$')})
This query will work, because it will filter our all documents where the title is not a string.
If you want to coerce all title into strings, you can do the following:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().ne('STRING'))
.merge(function (row) {
return {
'title': row('title').coerceTo('string')
}
})
If you want to delete all documents where the title is not a string, you can do this:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().ne('STRING'))
.delete()

Resources