Related
Hello StackOverflowers,
I have hundreds of thousands of documents in the following structure. I can modify the document before loading it into Elasticsearch, like adding vectors, synonyms or other annotations. Please assume that all the documents are well attributed. Attributes vary based on the category of the product.
If there is a query, I want to show the precise product for the query.
For example, if someone is searching for "Lee jeans" I want to show all the products which are "Jeans" from the brand "Lee".
If the user searches for "Lee black jeans" I want to filter out by the variant "Black"
If the user searches for "Lee spring summer jeans" then I just want to show only the following product.
It should be capable of understanding typo's
It should lemmatize. for example, "Chocolate milk" is "Milk", "Milk chocolate" is "Chocolate"
I've seen some of the approaches on the internet (some of the projects are outdated or not maintained anymore), but I want recommendations the developer community here on what opensource solutions which I can use and what are the changes I should make to the document before loading into Elasticsearch to achieve this.
{
"product_id": 489929,
"name_en": "Spring Summer Jeans",
"attributes": {
"category": "Pants",
"type": [
"Jeans",
"Denim"
],
"brand": "Lee",
"material": [
"Cotton"
]
},
"variants": {
"size": [
28,
30,
32,
34,
36
],
"colors": [
"Blue",
"Black"
],
"fit": [
"Regular",
"Narrow"
],
"gender": [
"Men",
"Women"
]
},
"description_en": "Quick brown fox jumps over the lazy dog.",
"variant_ids": {
"1467547": {
"size": 30,
"color": "Black",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"7487751": {
"size": 32,
"color": "Blue",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"11089927": {
"size": 32,
"color": "Blue",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"11258137": {
"size": 34,
"color": "Blue",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"13266321": {
"size": 30,
"color": "Black",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"13549929": {
"size": 30,
"color": "Blue",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"17846649": {
"size": 28,
"color": "Blue",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"22602397": {
"size": 36,
"color": "Blue",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"22709931": {
"size": 28,
"color": "Black",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"23937102": {
"size": 28,
"color": "Black",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"28519361": {
"size": 30,
"color": "Blue",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"31165878": {
"size": 36,
"color": "Black",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"31631591": {
"size": 30,
"color": "Blue",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"36914467": {
"size": 36,
"color": "Black",
"fit": "Regular",
"gender": "Men",
"in_stock": false
},
"39141069": {
"size": 28,
"color": "Blue",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"41416888": {
"size": 36,
"color": "Blue",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"43504246": {
"size": 34,
"color": "Black",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"45374599": {
"size": 34,
"color": "Blue",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"46361047": {
"size": 28,
"color": "Blue",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"46909634": {
"size": 32,
"color": "Black",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"49407526": {
"size": 32,
"color": "Black",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"54529078": {
"size": 34,
"color": "Black",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"55659499": {
"size": 28,
"color": "Blue",
"fit": "Narrow",
"gender": "Women",
"in_stock": false
},
"55762371": {
"size": 34,
"color": "Blue",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"57049076": {
"size": 36,
"color": "Black",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"57973674": {
"size": 36,
"color": "Black",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"58218538": {
"size": 28,
"color": "Black",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"58227462": {
"size": 30,
"color": "Blue",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"58232621": {
"size": 30,
"color": "Black",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"59320783": {
"size": 30,
"color": "Black",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"63244508": {
"size": 32,
"color": "Black",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"66194331": {
"size": 36,
"color": "Blue",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"71212553": {
"size": 32,
"color": "Blue",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"84143801": {
"size": 34,
"color": "Black",
"fit": "Narrow",
"gender": "Men",
"in_stock": true
},
"86881320": {
"size": 34,
"color": "Blue",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"89177537": {
"size": 32,
"color": "Black",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"90449959": {
"size": 36,
"color": "Blue",
"fit": "Narrow",
"gender": "Women",
"in_stock": true
},
"92989653": {
"size": 34,
"color": "Black",
"fit": "Regular",
"gender": "Men",
"in_stock": true
},
"93319121": {
"size": 32,
"color": "Blue",
"fit": "Regular",
"gender": "Women",
"in_stock": true
},
"95212291": {
"size": 28,
"color": "Black",
"fit": "Regular",
"gender": "Men",
"in_stock": true
}
}
}
Elasticsearch don't support OOTB NLP. Recently, they have include feature in version 8 but it required license version of the Elasticsearch. Also, it seems like as of now you can use at index time only using Ingest pipeline but at query time you need to write custom wrapper which will do it. You can read my article here on how to use at Index time.
You can use below multi_match query with cross_fieldwith your current index mapping for first 3 usecases and it will give you your expected result without any NLP integration.
{
"query": {
"multi_match": {
"query": "Lee spring summer jeans",
"fields": [
"attributes.brand",
"attributes.type",
"variants.colors",
"name_en"
],
"operator": "and",
"type": "cross_fields"
}
}
}
multi_match with cross field dont support fuzzy query and hence it will not work with above query for usecase 4. You can read this answer which show how you can use copy_to and create one field with all values and apply fuzziness. You can add same query with bool clause in above one. something like below:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "Lee spring summer jeans",
"fields": [
"attributes.brand",
"attributes.type",
"variants.colors",
"name_en"
],
"operator": "and",
"type": "cross_fields"
}
},
{
"match": {
"my_search_field": {
"query": "Lee spring summer jeans",
"fuzziness": 1
}
}
}
]
}
}
}
Usecase 5 is not about lemmatize but it is synonyms usecase. You can configures synonyms for "Chocolate milk" is "Milk", "Milk chocolate" is "Chocolate" and all other words and apply at index time or query time using custom analyzer. You can read about synonyms here.
I came across this pie chart vega lite visualization:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A simple pie chart with labels.",
"data": {
"values": [
{"category": "a", "value": 4},
{"category": "b", "value": 6},
{"category": "c", "value": 10},
{"category": "d", "value": 3},
{"category": "e", "value": 7},
{"category": "f", "value": 8}
]
},
"encoding": {
"theta": {"field": "value", "type": "quantitative", "stack": true},
"color": {"field": "category", "type": "nominal", "legend": null}
},
"layer": [{
"mark": {"type": "arc", "outerRadius": 80}
}, {
"mark": {"type": "text", "radius": 90},
"encoding": {
"text": {"field": "category", "type": "nominal"}
}
}]
}
It renders as follows:
My data contains color key:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A simple pie chart with labels.",
"data": {
"values": [
{"category": "a", "value": 4, "color": "rgb(121, 199, 227)"},
{"category": "b", "value": 6, "color": "rgb(140, 129, 22)"},
{"category": "c", "value": 10, "color": "rgb(96, 43, 199)"},
{"category": "d", "value": 3, "color": "rgb(196, 143, 99)"},
{"category": "e", "value": 7, "color": "rgb(12, 103, 19)"},
{"category": "f", "value": 8, "color": "rgb(196, 243, 129)"}
]
},
"encoding": {
"theta": {"field": "value", "type": "quantitative", "stack": true},
"color": {"field": "color", "type": "nominal", "legend": null}
},
"layer": [{
"mark": {"type": "arc", "outerRadius": 80}
}, {
"mark": {"type": "text", "radius": 90},
"encoding": {
"text": {"field": "category", "type": "nominal"}
}
}]
}
I want to use the rgb() color value specified in this color key's value to color individual pie. I tried specifying this field in color channel: "field": "color".
"color": {"field": "color", "type": "nominal", "legend": null}
However, no use. It still renders the same as above. How can use color value specified in field's value as actual color?
PS: Link to above visualization.
The documentation says:
To directly encode the data value, the scale property can be set to null.
So you need to set the scale to null.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A simple pie chart with labels.",
"data": {
"values": [
{"category": "a", "value": 4, "color": "rgb(121, 199, 227)"},
{"category": "b", "value": 6, "color": "rgb(140, 129, 22)"},
{"category": "c", "value": 10, "color": "rgb(96, 43, 199)"},
{"category": "d", "value": 3, "color": "rgb(196, 143, 99)"},
{"category": "e", "value": 7, "color": "rgb(12, 103, 19)"},
{"category": "f", "value": 8, "color": "rgb(196, 243, 129)"}
]
},
"encoding": {
"theta": {"field": "value", "type": "quantitative", "stack": true},
"color": {"field": "color", "type": "nominal", "legend": null, "scale":null}
},
"layer": [
{"mark": {"type": "arc", "outerRadius": 80}},
{
"mark": {"type": "text", "radius": 90},
"encoding": {"text": {"field": "category", "type": "nominal"}}
}
]
}
This outputs:
I want to have two typecharts in one visual.
Desired example
Current example, Editor
You can use 2 layers having bar chart instead of column. First layer will be a stacked layer in which filter is applied for used and free fields. Second layer can have the total field. Using xOffset you can position both the bar charts. Refer the code below or editor:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"config": {"view": {"stroke": "transparent"}, "axis": {"domainWidth": 1}},
"width": 300,
"data": {
"values": [
{"branch": "V", "free": 300, "used": 800, "total": 1000},
{"branch": "K", "free": 100, "used": 400, "total": 500},
{"branch": "D", "free": 100, "used": 200, "total": 300}
]
},
"transform": [
{"calculate": "[ datum.used, datum.free, datum.total]", "as": "table"},
{"calculate": "['used', 'free', 'total']", "as": "headline"},
{"flatten": ["table", "headline"]},
{
"calculate": "datum.headline +':'+ datum.table + ' ('+round(datum.table *100/ datum.total)+'%)'",
"as": "tooltip"
}
],
"encoding": {
"y": {"aggregate": "sum", "field": "table", "axis": {"grid": false}},
"x": {
"field": "branch",
"axis": {"grid": false, "domain": false, "labelAngle": 0, "ticks": false},
"sort": {"op": "sum", "field": "table", "order": "descending"}
},
"tooltip": {"field": "tooltip", "type": "nominal"},
"color": {
"field": "headline",
"type": "nominal",
"scale": {"range": ["#409b66", "#878787", "#1b5c9e"]}
}
},
"layer": [
{
"transform": [
{"filter": {"field": "headline", "oneOf": ["used", "free"]}}
],
"mark": {"type": "bar", "width": 15, "xOffset": -10}
},
{
"mark": {"type": "bar", "width": 15, "xOffset": 8},
"encoding": {
"y": {"field": "total", "axis": {"grid": false}, "stack": false}
}
}
]
}
I'm very new with elastic search and kibana . I'm using vega plugin in kibana visualization.
But not able to create Bar Chart using elastic search aggs.
I'm getting proper result when I'm using kibana dev tools.
I'am attaching the following details with the sample code after run this I'm getting a blank page
Visualization Section:
{
"$schema": "https://vega.github.io/schema/vega/v3.0.json",
"autosize": "fit",
"padding": 6,
"data": [
{
"name": "traffic-revenue",
"url": {
"index": "brnl_tms_plaza",
"body": {
"size": "0",
"aggs": {
"group_by_vehicle_subcat": {
"terms": {
"field": "VehicleSubCatCode.keyword"
}
}
}
},
"format": {
"property": "aggregations.group_by_vehicle_subcat.buckets"
}
}
}
],
"scales": [
{
"name": "xscale",
"type": "band",
"domain": {
"data": "traffic-revenue",
"field": "key"
},
"range": "width",
"padding": 0.05,
"round": true
},
{
"name": "yscale",
"domain": {
"data": "traffic-revenue",
"field": "doc_count"
},
"nice": true,
"range": "height"
}
],
"axes": [
{
"orient": "bottom",
"scale": "xscale"
},
{"orient": "left", "scale": "yscale"}
],
"marks": [
{
"type": "rect",
"from": {
"data": "traffic-revenue"
},
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "key",
"axis": {"title": "Vehicle category"}
},
"width": {
"scale": "xscale",
"band": 1
},
"y": {
"scale": "yscale",
"field": "doc_count",
"axis": {"title": "Vehicle Rate Count"}
},
"y2": {
"scale": "yscale",
"value": 0
}
},
"update": {
"fill": {"value": "steelblue"}
},
"hover": {"fill": {"value": "red"}}
}
}
]
}
Data Set
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 48,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_vehicle_subcat": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "LMV",
"doc_count": 35
},
{
"key": "BUS",
"doc_count": 3
},
{
"key": "LCV",
"doc_count": 3
},
{
"key": "MAV-5",
"doc_count": 3
},
{
"key": "MAV-4 with trailer",
"doc_count": 2
},
{
"key": "MAV-3 without trailer",
"doc_count": 1
},
{
"key": "MINI-BUS",
"doc_count": 1
}
]
}
}
}
I would recommend debugging your vega code using static data to make sure it is defined properly.
I'm not sure why, but I was able to get your visualization to draw when I set the autosize property to none and set the height and width explicitly.
Here is a vega specification based off of the one you provided which should run in the online vega editor.
{
"$schema": "https://vega.github.io/schema/vega/v3.0.json",
"autosize": "none",
"width": 400,
"height": 500,
"padding": 20,
"data": [
{
"name": "traffic-revenue",
"values": [
{"key": "a", "doc_count": 5},
{"key": "b", "doc_count": 22},
{"key": "c", "doc_count": 1},
{"key": "d", "doc_count": 7},
{"key": "e", "doc_count": 12},
{"key": "f", "doc_count": 2}
]
}
],
"scales": [
{
"name": "xscale",
"type": "band",
"domain": {
"data": "traffic-revenue",
"field": "key"
},
"range": "width",
"padding": 0.05,
"round": true
},
{
"name": "yscale",
"domain": {
"data": "traffic-revenue",
"field": "doc_count"
},
"nice": true,
"range": "height"
}
],
"axes": [
{
"orient": "bottom",
"scale": "xscale"
},
{"orient": "left", "scale": "yscale"}
],
"marks": [
{
"type": "rect",
"from": {
"data": "traffic-revenue"
},
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "key",
"axis": {"title": "Vehicle category"}
},
"width": {
"scale": "xscale",
"band": 1
},
"y": {
"scale": "yscale",
"field": "doc_count",
"axis": {"title": "Vehicle Rate Count"}
},
"y2": {
"scale": "yscale",
"value": 0
}
},
"update": {
"fill": {"value": "steelblue"}
},
"hover": {"fill": {"value": "red"}}
}
}
]
}
You may already know this since you have the format tag on your elasticsearch data, but if your visualization is working with statically defined data, and not when you pull data from an elasticsearch query, try looking at the data source directly using the vega debuggging functions described here https://vega.github.io/vega/docs/api/debugging/.
Running the following in the browser console should let you look at the data in the format vega is receiving it. VEGA_DEBUG.view.data("")
I have the following json:
{
"release": {
"genres": {
"genre": "Electronic"
},
"identifiers": {
"identifier": [
{
"description": "Text",
"value": "5 709498 101026",
"type": "Barcode"
},
{
"description": "String",
"value": 5709498101026,
"type": "Barcode"
}
]
},
"status": "Accepted",
"videos": {
"video": [
{
"title": "Future 3 - Renaldo",
"duration": 446,
"description": "Future 3 - Renaldo",
"src": "http://www.youtube.com/watch?v=hpc9aQpnUjc",
"embed": true
},
{
"title": "Future 3 - Silver M from album We are the Future / 1995 Denmark / Archivos de Kraftwerkmusik",
"duration": 461,
"description": "Future 3 - Silver M from album We are the Future / 1995 Denmark / Archivos de Kraftwerkmusik",
"src": "http://www.youtube.com/watch?v=nlcHRI8iV4g",
"embed": true
},
{
"title": "Future 3 - Bubbles At Dawn",
"duration": 710,
"description": "Future 3 - Bubbles At Dawn",
"src": "http://www.youtube.com/watch?v=ABBCyvGMOFw",
"embed": true
}
]
},
"labels": {
"label": {
"catno": "APR 010CD",
"name": "April Records"
}
},
"companies": {
"company": {
"id": 26184,
"catno": "",
"name": "Voices Of Wonder",
"entity_type_name": "Published By",
"resource_url": "http://api.discogs.com/labels/26184",
"entity_type": 21
}
},
"styles": {
"style": [
"Abstract",
"IDM",
"Downtempo"
]
},
"formats": {
"format": {
"text": "",
"name": "CD",
"qty": 1,
"descriptions": {
"description": "Album"
}
}
},
"country": "Denmark",
"id": 5375,
"released": "1995-00-00",
"artists": {
"artist": {
"id": 5139,
"anv": "",
"name": "Future 3",
"role": "",
"tracks": "",
"join": ""
}
},
"title": "We Are The Future 3",
"master_id": 638422,
"tracklist": {
"track": [
{
"position": 1,
"duration": "8:04",
"title": "Future 3"
},
{
"position": 2,
"duration": "7:38",
"title": "Silver M"
},
{
"position": 3,
"duration": "7:27",
"title": "Renaldo"
},
{
"position": 4,
"duration": "6:04",
"title": "B.O.Y.D."
},
{
"position": 5,
"duration": "6:12",
"title": "Fumble"
},
{
"position": 6,
"duration": "6:12",
"title": "Dawn"
},
{
"position": 7,
"duration": "11:54",
"title": "Bubbles At Dawn"
},
{
"position": 8,
"duration": "6:03",
"title": "D.A.W.N. At 6"
},
{
"position": 9,
"duration": "8:50",
"title": 4684351684651
}
]
},
"data_quality": "Needs Vote",
"extraartists": {
"artist": [
{
"id": 2647642,
"anv": "",
"name": "Danesadwork",
"role": "Cover",
"tracks": "",
"join": ""
},
{
"id": 2647647,
"anv": "",
"name": "Djon Edvard Petersen",
"role": "Photography By",
"tracks": "",
"join": ""
},
{
"id": 114164,
"anv": "",
"name": "Anders Remmer",
"role": "Written-By",
"tracks": "",
"join": ""
},
{
"id": 435979,
"anv": "",
"name": "Jesper Skaaning",
"role": "Written-By",
"tracks": "",
"join": ""
},
{
"id": 15691,
"anv": "",
"name": "Thomas Knak",
"role": "Written-By",
"tracks": "",
"join": ""
}
]
},
"notes": "© 1995 April Records APS ℗ 1995 April Records APS"
}
}
I am trying to get those titles which end with 'At Dawn'.
I am using the following command
r.db("discogs1").table("releases").filter(function(doc){ return doc('release')('title').match('At Dawn$')})
But I get errors as follows:
RqlRuntimeError: Expected type STRING but found NUMBER in:r.db("discogs1").table("releases").filter(function(var_24) { return var_24("release")("title").match("At Dawn$"); })
I tried different combinations but I can't seem to get it to work
It seems that some of your documents don't have a row('release')('title') property that is a string. Some of them are numbers, so when you try to call .match on them, they throw an error because .match only works on strings.
To see if this is true, try the following:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().ne('STRING'))
.count()
Ideally, the result of this should be 0, since no document should have a title property that's not a string. If it's higher than 0, that's why you're getting an error.
If you want to only get documents where the title is a string, you can do the following:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().eq('STRING'))
.filter(function(doc){ return doc('release')('title').match('At Dawn$')})
This query will work, because it will filter our all documents where the title is not a string.
If you want to coerce all title into strings, you can do the following:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().ne('STRING'))
.merge(function (row) {
return {
'title': row('title').coerceTo('string')
}
})
If you want to delete all documents where the title is not a string, you can do this:
r.db("discogs1").table("releases")
.filter(r.row('release')('title').typeOf().ne('STRING'))
.delete()