Vega-Lite / Kibana difference to manage URL object - elasticsearch

I found an interesting article that used several data models on Vega-Lite. Tabular data were combined by key like in relational databases.
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"title": "Test",
"datasets": {
"stores": [
{"cmdb_id1": 1, "group": "type1"},
{"cmdb_id1": 2, "group": "type1"},
{"cmdb_id1": 3, "group": "type2"}
],
"labelsTimelines": [
{"cmdb_id2": 1, "value": 83},
{"cmdb_id2": 2, "value": 53},
{"cmdb_id2": 3, "value": 23}
]
},
"data": {"name": "stores"},
"transform": [
{
"lookup": "cmdb_id1",
"from": {
"data": {"name": "labelsTimelines"},
"key": "cmdb_id2",
"fields": ["value"]
}
}
],
"mark": "bar",
"encoding": {
"y": {"aggregate": "sum", "field": "value", "type": "quantitative"},
"x": {"field": "group", "type": "ordinal"}
}
}
Vega Editor
The question arose as to whether it was possible to obtain the same result using the construction:
"data": {"url": "...."}
Changed the source for Elasticsearch query:
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"datasets": {
"stores": [{
"url": {
"%context%": "true"
"index": "test_cmdb"
"body": {
"size": 1000,
"_source": ["cmdb_id", "street","group"]
}
}
format: {property: "hits.hits"}
}]}
"data": {
"name": "stores"
},
"encoding": {
"x": {"field": "url.body.size", "type": "ordinal", "title": "X"},
"y": {"field": "url.body.size", "type": "ordinal", "title": "Y"}
},
"layer": [
{
"mark": "rect",
"encoding": {
"tooltip": [
{"field": "url"}]
}
}
]
}
I understand that there is a syntactical error, the data did not come from Elasticsearch.
Thanks in advance!
example.png

No, it is not currently possible to specify URL data within top-level "datasets". The relevant open feature request in Vega-Lite is here: https://github.com/vega/vega-lite/issues/4004.

Your much better off using Vega rather than Vega-lite for this. In Vega you can specify as many datasets as you like with a URL. For example...
...
data: [
{
name: dataset_1
url: {
...
}
}
{
name: dataset_2
url: {
...
}
}
]
...
This can actually get very interesting since it means you can combine data from multiple indices into one visualisation.
I know this is late, but figured this might help people who are looking around.

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Hl7 FHIR Extensions: nesting, slicing, ids and discriminators

Due to a (somehow) special requirement to store data alongside a FHIR resource, I am looking for the cleanest way to extend it to maintain such extra data, which is basically dynamic key-value lists.
On the one hand I have something like the following, with 0..* sets of key-value, that I have tried to store nesting three extensions (adjusting the context by level) being the last a single limitation to the accepted types (string/integer), although it does not allow me to repeat ids, anyway I am getting around it using prefixes XD.
"n-lists-of-key-value": [
{
"list1-key1": "list1-value1",
"list1-key2": "list1-value1",
"list1-key3": "list1-value1"
},
{
"list2-key1": "list2-value1",
"list2-key2": "list2-value2"
}
]
becomes:
···,
"extension": [ {
"id": "n-lists-of-key-value",
"url": "http://x.org/fhir/Extension/NListOfKeyValueExtension",
"extension": [ {
"id": "list_1",
"url": "http://x.org/fhir/Extension/ListKeyValueExtension,
"extension": [ {
"id": "list1-key1",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list1-value1"
}, {
"id": "list1-key2",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list1-value1"
}, {
"id": "list1-key3",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list1-value1"
} ]
}, {
"id": "list_2",
"url": "http://x.org/fhir/Extension/ListKeyValueExtension,
"extension": [ {
"id": "list2-key1",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list2-value1"
}, {
"id": "list2-key2",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list2-value2"
} ]
} ]
} ],
But on the other hand I have this, which is what I find most difficult to model, because trying to reuse the extension definitions I run into (conceptual?) difficulties with slicing and the use of the discriminators, failing to get past the error "Element matches more than one slice".
"my-3-fixed-named-sets": {
"fixed-named-set1": {
"list1-key1": "list1-value1",
"list1-key2": "list1-value2",
"list1-key3": "list1-value3"
},
"fixed-named-set2": {
"list2-key1": "list2-value1",
"list2-key2": "list2-value2"
},
"fixed-named-set3": {
"list2-key1": "list2-value1",
"list2-key2": "list2-value2"
}
}
becomes:
···,
"extension": [ {
"id": "my-3-fixed-named-sets",
"url": "http://x.org/fhir/Extension/NListOfKeyValueExtension",
"extension": [ {
"id": "fixed-named-set1",
"url": "http://x.org/fhir/Extension/ListKeyValueExtension,
"extension": [ {
"id": "list1-key1",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list1-value1"
}, {
"id": "list1-key2",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list1-value2"
}, {
"id": "list1-key3",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list1-value3"
} ]
}, {
"id": "fixed-named-set2",
"url": "http://x.org/fhir/Extension/ListKeyValueExtension,
"extension": [ {
"id": "list2-key1",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list2-value1"
}, {
"id": "list2-key2",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list2-value2"
} ],
}, {
"id": "fixed-named-set3",
"url": "http://x.org/fhir/Extension/ListKeyValueExtension,
"extension": [ {
"id": "list2-key1",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list2-value1"
}, {
"id": "list2-key3",
"url": "http://x.org/fhir/Extension/KeyValueExtension,
"valueString": "list2-value3"
} ]
} ]
} ],
I am aware that by indiscriminately including new extensions with different names/urls, I can end up ingesting all the data, but I suspect that there is a much cleaner way and that I am failing in , probably, the way to include an extra condition into the discriminator (https://hl7.org/fhir/profiling.html#slicing), so that I could point to the proper slide? or de fact of extending extensions or extending value[x] slices?
"discriminator": [
{
"type": "value",
"path": "url"
},
{ "???": "??Extension.value.slideName??
···
Then my specific question would be... can I slice the value[x] of the extension of a specific profile, and which in turn make them extensions? this lasy is to get having n key value lists
Any comment is very welcome :)

Cannot sort search results within nested objects in Elasticsearch 7

I want to sort objects in ascending order but the sort doesn't work.
Here is a sort query below.
"sort":[
{
"category.position": {
"order":"asc",
"mode":"min",
"nested": {
"path": "category",
"filter": {
"term": {"category_category_id":42} }
}
}
}]
And here are the objects below.
"name": "Yeti",
"category": [
{
"category_id": 42,
"name": "Raamiga",
"position": 3
},
],
"name": "Venus",
"category": [
{
"category_id": 42,
"name": "Raamiga",
"position": 4
}
],
Please, help! Many thanks in advance!
Solved. There was a typo… Must be "category.category_id" indtead of "category_category_id".

How to allow AJAX endpoints in mkdocs

Visual is to get this working in mkdocs driven by a local AJAX server.
This one is hard to give an example to but I will. Before I do that, the problem is that I want to use various ajax endpoints to drive Vega visuals in MkDocs. But I run into the CORS permissions.
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://machine1:8080/dataflare. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 200.
I have struggled to find documentation on how to solve this so, does anyone know how do we enable this in mkdocs?
The sample is long but this works until you use a local data source other than the vega data. When you change the AJAX endpoint from "url": "https://vega.github.io/vega/data/flare.json" to http://myhost:8080/shortflare (or fullflare endpoint) you get the CORS permission error above.
So should the client mkdocs enable the endpoint as safe cross site source or should bottle AJAX be sending a CORS header? I dont understand why the original AJAX works when the bottle endpoint does not work.
Now the hard part. Showing an example.
This simple bottle server will be the AJAX endpoint mimic of the original flaredata :
sample of data is on http://myhost:8080/shortflare
full dataset http://myhost:8080/fullflare is served via proxy from https://vega.github.io/vega/data/flare.json
from bottle import route, run, template
import requests
DATAFLARE = '''
[
{
"id": 1,
"name": "flare"
},
{
"id": 2,
"name": "analytics",
"parent": 1
},
{
"id": 3,
"name": "cluster",
"parent": 2
},
{
"id": 4,
"name": "AgglomerativeCluster",
"parent": 3,
"size": 3938
}
]
'''
#route('/shortflare')
def getflare():
return DATAFLARE
#route('/fullflare')
def proxyflare():
return requests.get('https://vega.github.io/vega/data/flare.json').text
if __name__ == "__main__":
run(host='0.0.0.0', port=8080, debug=True)
mkdocs setup (i.e mkdocs.yml)
site_name: VEGA
dev_addr: '0.0.0.0:2001'
theme:
name: material
nav_style: dark
palette:
accent: pink
primary: lime
plugins:
- search
- charts
markdown_extensions:
- pymdownx.superfences:
custom_fences:
- name: vegalite
class: vegalite
format: !!python/name:mkdocs_charts_plugin.fences.fence_vegalite
extra_javascript:
- https://cdn.jsdelivr.net/npm/vega#5
- https://cdn.jsdelivr.net/npm/vega-lite#5
- https://cdn.jsdelivr.net/npm/vega-embed#6
ve requires
mkdocs==1.2.3
mkdocs-charts-plugin==0.0.6
mkdocs-material==7.3.6
and the markdown to make the vegalite graphic (add this in index.md or anywhere)
Relational maps.
```vegalite
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"description": "An example of Cartesian layouts for a node-link diagram of hierarchical data.",
"width": 600,
"height": 1600,
"padding": 5,
"signals": [
{
"name": "labels", "value": true,
"bind": {"input": "checkbox"}
},
{
"name": "layout", "value": "tidy",
"bind": {"input": "radio", "options": ["tidy", "cluster"]}
},
{
"name": "links", "value": "diagonal",
"bind": {
"input": "select",
"options": ["line", "curve", "diagonal", "orthogonal"]
}
},
{
"name": "separation", "value": false,
"bind": {"input": "checkbox"}
}
],
"data": [
{
"name": "tree",
"url": "https://vega.github.io/vega/data/flare.json",
"transform": [
{
"type": "stratify",
"key": "id",
"parentKey": "parent"
},
{
"type": "tree",
"method": {"signal": "layout"},
"size": [{"signal": "height"}, {"signal": "width - 100"}],
"separation": {"signal": "separation"},
"as": ["y", "x", "depth", "children"]
}
]
},
{
"name": "links",
"source": "tree",
"transform": [
{ "type": "treelinks" },
{
"type": "linkpath",
"orient": "horizontal",
"shape": {"signal": "links"}
}
]
}
],
"scales": [
{
"name": "color",
"type": "linear",
"range": {"scheme": "magma"},
"domain": {"data": "tree", "field": "depth"},
"zero": true
}
],
"marks": [
{
"type": "path",
"from": {"data": "links"},
"encode": {
"update": {
"path": {"field": "path"},
"stroke": {"value": "#ccc"}
}
}
},
{
"type": "symbol",
"from": {"data": "tree"},
"encode": {
"enter": {
"size": {"value": 100},
"stroke": {"value": "#fff"}
},
"update": {
"x": {"field": "x"},
"y": {"field": "y"},
"fill": {"scale": "color", "field": "depth"}
}
}
},
{
"type": "text",
"from": {"data": "tree"},
"encode": {
"enter": {
"text": {"field": "name"},
"fontSize": {"value": 9},
"baseline": {"value": "middle"}
},
"update": {
"x": {"field": "x"},
"y": {"field": "y"},
"dx": {"signal": "datum.children ? -7 : 7"},
"align": {"signal": "datum.children ? 'right' : 'left'"},
"opacity": {"signal": "labels ? 1 : 0"}
}
}
}
]
}
```
I actually found out why. The bottle server would need to send the proper header. This is done by adding these few lines to the bottle server.
from bottle_cors_plugin import cors_plugin
from bottle import app
app = app()
app.install(cors_plugin('*'))
if __name__ == "__main__":
run(app=app, host='0.0.0.0', port=8080, debug=True)

Highlight on ElasticSearch autocomplete

I have the following data to be indexed on ElasticSearch.
I want to implement an autocomplete feature, and highlight why a specific document matched a query.
This are the settings of my index:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"autocomplete_filter"
]
}
}
}
}
}
Index Analyzing
Splits text on word boundaries.
Removes pontuation.
Lowercases
Edge NGrams each token
So the Inverted Index looks like:
This is how i defined the mappings for a name field:
{
"index_type": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
When I query:
GET http://localhost:9200/index/type/_search
{
"query": {
"match": {
"name": "soft"
}
},
"highlight": {
"fields" : {
"name" : {}
}
}
}
Search for: soft
Applying the Standard Tokenizer, the "soft" is the term, to find on the inverted index. This search matches the Documents: 1, 3, 4, 5, 6, 7 which is correct, but the highlighted part I would expect to be "soft" and not the whole word:
{
"hits": [
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
},
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> AG"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> AG2"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> AG good <em>software</em> better"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> AG"
]
}
},
{
"_source": {
"name": "is soft ware ok"
},
"highlight": {
"name": [
"is <em>soft</em> ware ok"
]
}
}
]
}
Search for: software ag
Applying the Standard Tokenizer, the "software ag" is transformed into "software" and "ag", to find on the inverted index. This search matches the Documents: 1, 3, 4, 5, 6, which is correct, but the highlighted part I would expect to be "software" and "ag" and not the whole word around "software" and "ag":
{
"hits": [
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG2</em>"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em> good <em>software</em> better"
]
}
},
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
}
]
}
I read the highlight documentation on elasticsearch, but I cannot understand how the highlighting is performed. For the two examples above I expect only the matched token on the inverted index to be highlighted and not the whole word.
Can anyone help how to highlight only the passed value?
Update
So, in seems that on ElasticSearch website, the autocomplete on the server side is similar to my implementation. However it seems that they highlight the matched query on the client.
If they do like this, I started to think that there is not a proper solution to do it on ElasticSearch side, so I implemented the highlight feature on server side instead of on client side(as they seem to do).
My implementation on server side(using PHP) is:
public function search($term)
{
$params = [
'index' => $this->getIndexName(),
'type' => $this->getIndexType(),
'body' => [
'query' => [
'match' => [
'name' => $term
]
]
]
];
$results = $this->client->search($params);
$hits = $results['hits']['hits'];
$data = [];
$wrapBefore = '<strong>';
$wrapAfter = '</strong>';
foreach ($hits as $hit) {
$data[] = [
$hit['_source']['id'],
$hit['_source']['name'],
preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
];
}
return $data;
}
Outputs what I aimed with this question:
I added a bounty to see if there is a solution at ElasticSearch level to achive what I described above.
As of now with latest version of elastic this is not possible as highligh documentation don't refer any settings or query for this. I checked elastic autocomplete example in browser console under xhr requests tab and found the response for "att" autocomplete response for keyword as follows.
url - https://search.elastic.co/suggest?q=att
{
"current_page": 1,
"last_page": 4,
"total_hits": 49,
"hits": [
{
"tags": [],
"url": "/elasticon/tour/2016/jp/not-attending",
"section": "Elasticon",
"title": "Not <em>Attending</em> - JP"
},
{
"section": "Elasticon",
"title": "<em>Attending</em> from Training - JP",
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-training"
},
{
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-keynote",
"title": "<em>Attending</em> from Keynote - JP",
"section": "Elasticon"
},
{
"tags": [],
"url": "/elasticon/tour/2016/not-attending",
"section": "Elasticon",
"title": "Thank You - Not <em>Attending</em>"
},
{
"tags": [],
"url": "/elasticon/tour/2016/attending",
"section": "Elasticon",
"title": "Thank You - <em>Attending</em>"
},
{
"section": "Blog",
"title": "What It's Like to <em>Attend</em> Elastic Training",
"tags": [],
"url": "/blog/what-its-like-to-attend-elastic-training"
},
{
"tags": "Elasticsearch",
"url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
"section": "Docs/",
"title": "Highlighting <em>attachments</em>"
},
{
"title": "<em>attachments</em> » email",
"section": "Docs/",
"tags": "Logstash",
"url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
},
{
"section": "Docs/",
"title": "Configuring Email <em>Attachments</em> » Actions",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
},
{
"url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
"tags": "Watcher",
"title": "HipChat Action <em>Attributes</em> » Actions",
"section": "Docs/"
},
{
"title": "Slack Action <em>Attributes</em> » Actions",
"section": "Docs/",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
}
],
"aggs": {
"sections": [
{
"Elasticon": 5
},
{
"Blog": 1
},
{
"Docs/": 43
}
],
"top_tags": [
{
"XPack": 14
},
{
"Elasticsearch": 12
},
{
"Watcher": 9
},
{
"Logstash": 4
},
{
"Clients": 3
},
{
"Shield": 1
}
]
}
}
But on frontend they are showing "att" only highlighted on in the autosuggest results. Hence they are handling the highlight stuff on browser layer.

Resources