How do I sort colour stacks by a specific field in Vegalite in this example? - sorting

I'm trying to see how to sort the a bar chart where the color channel is used to encode some information, and in this chart linked below, and well… I'm stumped.
I'm trying to sort the stacked colors by 'yield', so items with the largest yield are at the bottom, but keep the grouping based on 'site' here.
Is this possible with vegalite?
Here's what I would assume would handle the sorting, based on I read in the documentation on sorting, but I'm not having much luck.
"encoding": {
"color": {
"type": "nominal",
"field": "site",
"sort": {
"field":"yield",
"op": "count",
"order": "ascending"
}
},
"x": {"type": "nominal", "field": "variety"},
"y": {"type": "quantitative", "aggregate": "sum", "field": "yield"}
}
What do I need to do to sort a bar chart in this way?
Here's the chart in Vega Editor

You can use the order channel as described at https://vega.github.io/vega-lite/docs/stack.html#order
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"data": {"url": "data/barley.json"},
"mark": "bar",
"encoding": {
"color": {"type": "nominal", "field": "site"},
"y": {"type": "quantitative", "aggregate": "sum", "field": "yield"},
"order": {"aggregate": "sum", "field": "yield", "type": "quantitative"}
}
}

Related

Vega-Lite / Kibana difference to manage URL object

I found an interesting article that used several data models on Vega-Lite. Tabular data were combined by key like in relational databases.
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"title": "Test",
"datasets": {
"stores": [
{"cmdb_id1": 1, "group": "type1"},
{"cmdb_id1": 2, "group": "type1"},
{"cmdb_id1": 3, "group": "type2"}
],
"labelsTimelines": [
{"cmdb_id2": 1, "value": 83},
{"cmdb_id2": 2, "value": 53},
{"cmdb_id2": 3, "value": 23}
]
},
"data": {"name": "stores"},
"transform": [
{
"lookup": "cmdb_id1",
"from": {
"data": {"name": "labelsTimelines"},
"key": "cmdb_id2",
"fields": ["value"]
}
}
],
"mark": "bar",
"encoding": {
"y": {"aggregate": "sum", "field": "value", "type": "quantitative"},
"x": {"field": "group", "type": "ordinal"}
}
}
Vega Editor
The question arose as to whether it was possible to obtain the same result using the construction:
"data": {"url": "...."}
Changed the source for Elasticsearch query:
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"datasets": {
"stores": [{
"url": {
"%context%": "true"
"index": "test_cmdb"
"body": {
"size": 1000,
"_source": ["cmdb_id", "street","group"]
}
}
format: {property: "hits.hits"}
}]}
"data": {
"name": "stores"
},
"encoding": {
"x": {"field": "url.body.size", "type": "ordinal", "title": "X"},
"y": {"field": "url.body.size", "type": "ordinal", "title": "Y"}
},
"layer": [
{
"mark": "rect",
"encoding": {
"tooltip": [
{"field": "url"}]
}
}
]
}
I understand that there is a syntactical error, the data did not come from Elasticsearch.
Thanks in advance!
example.png
No, it is not currently possible to specify URL data within top-level "datasets". The relevant open feature request in Vega-Lite is here: https://github.com/vega/vega-lite/issues/4004.
Your much better off using Vega rather than Vega-lite for this. In Vega you can specify as many datasets as you like with a URL. For example...
...
data: [
{
name: dataset_1
url: {
...
}
}
{
name: dataset_2
url: {
...
}
}
]
...
This can actually get very interesting since it means you can combine data from multiple indices into one visualisation.
I know this is late, but figured this might help people who are looking around.

How to sort data in elastic search based on the filter data

I am relatively new to this elastic search. So I have data stored in the elastic search in a below-mentioned way:
[{
"name": "user1",
"city": [{
"name": "city1",
"count": 18
},{
"name": "city2",
"count": 15
},{
"name": "city3",
"count": 10
},{
"name": "city4",
"count": 5
}]
},{
"name": "user2",
"city": [{
"name": "city2",
"count": 2
},{
"name": "city5",
"count": 5
},{
"name": "city6",
"count": 8
},{
"name": "city8",
"count": 15
}]
},{
"name": "user3",
"city": [{
"name": "city1",
"count": 2
},{
"name": "city5",
"count": 5
},{
"name": "city7",
"count": 28
},{
"name": "city2",
"count": 1
}]
}]
So, what I am trying to do is, find out those users who have "city2" in their city list and order the data based on the "count" of "city2".
Here is my query what I have tried:
{
"sort": [{
"city.count": {
"order" : "desc"
}
}],
"query": {
"bool": {
"must": [
{"match": {"city.name": "city2"}}
]
}
}
}
So I am not able to figure out the sort part how to do it!
The sorting part is considering all the "count" value of all the cities based on the filter, but I just want the order to happen only based on the "count" of "city2".
Any kind of help would be appreciated. Thanks in advance.
Since the field city is object and not nested object, what you are trying to achieve won't be possible. The reason for this is when you define a field as object, elastics flattens each of the object field values as an array. So,
"city": [
{
"name": "city1",
"count": 18
},
{
"name": "city2",
"count": 15
},
{
"name": "city3",
"count": 10
},
{
"name": "city4",
"count": 5
}
]
is indexed as :
"city.name" : ["city1", "city2", "city3", "city4"]
"city.count": [18, 15, 10, 5]
As you can see, because of the way elastic index the object the relation between each city and its count is lost.
So, whenever you want to preserve the relation you should define the field as nested type.
{
"city": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"count": {
"type": "long"
}
}
}
}
Sorting then can be achieved by using this nested field.
{
"sort": [
{
"city.count": {
"order": "desc",
"mode": "avg",
"nested": {
"path": "city",
"filter": {
"match": {
"city.name": "city2"
}
}
}
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"city.name": "city2"
}
}
]
}
}
}
Reaching your goal will be a little complex.
First, your query says that you want to get the docs with "city2" in them. Since at least one of the elements in the array "city" matches, the whole document will be returned.
The problem is that you only want to return the count for city2, not for all of them. This is where the complex part comes.
There are plenty of paths you can follow:
Change your index design. Instead of having an array of users, have one document per user with all their info, including the cities they have visited. However, the "I only want 1 element from the array" problem will still be there, but you will only will fight with one array at time, instead of n.
You can use Painless to only bring back the count of that particular city, but it would imply a lot of scripting. Don't trust the name. Painless is very Painful.
You can bring back all the elements and do the filtering within your code. For example, if you use the Python Elasticsearch Client, you can execute the query, return all the objects and only selec the wanted elements with Python.
Don't consider using the Terms aggregation. It would bring back the total counting of all the cities, without having the relationship with each user. And this is not what you want to do.
Hope this is helpful and sorry we can't get a straight-forward solution :(

How to find records matching the result of a previous search using ElasticSearch Painless scripting

I have the index I attached below.
Each doc in the index holds the name and height of Alice or Bob and the age at which the height was measured. Measurements taken at the age of 10 are flagged as "baseline_height_at_age_10": true
I need to do the following:
Find the height of Alice and Bob at age 10.
List item Return for Alice and Bob, the records where the height is lower than their height at age 10.
So my question is: Can Painless do such type of search?
I'd appriciate if you could point me at a good example for that.
Also: Is ElasticSearch Painless even a good approach for this problem? Can you sugges
The Index Mappings
PUT /shlomi_test/
{
"mappings": {
"_doc": {
"properties": {
"first_name": {
"type": "keyword",
"fields": {
"raw": {
"type": "text"
}
}
},
"surname": {
"type": "keyword",
"fields": {
"raw": {
"type": "text"
}
}
},
"baseline_height_at_age_10": {
"type": "boolean"
},
"age": {
"type": "integer"
},
"height": {
"type": "integer"
}
}
}
}
}
The Index Data
POST /test/_doc/alice_green_8_110
{
"first_name": "Alice",
"surname": "Green",
"age": 8,
"height": 110,
"baseline_height_at_age_10": false
}
POST /test/_doc/alice_green_10_120
{
"first_name": "Alice",
"surname": "Green",
"age": 10,
"height": 120,
"baseline_height_at_age_10": true
}
POST /test/_doc/alice_green_13_140
{
"first_name": "Alice",
"surname": "Green",
"age": 13,
"height": 140,
"baseline_height_at_age_10": false
}
POST /test/_doc/alice_green_23_170
{
"first_name": "Alice",
"surname": "Green",
"age": 23,
"height": 170,
"baseline_height_at_age_10": false
}
POST /test/_doc/bob_green_8_120
{
"first_name": "Alice",
"surname": "Green",
"age": 8,
"height": 120,
"baseline_height_at_age_10": false
}
POST /test/_doc/bob_green_10_130
{
"first_name": "Alice",
"surname": "Green",
"age": 10,
"height": 130,
"baseline_height_at_age_10": true
}
POST /test/_doc/bob_green_15_160
{
"first_name": "Alice",
"surname": "Green",
"age": 15,
"height": 160,
"baseline_height_at_age_10": false
}
POST /test/_doc/bob_green_21_180
{
"first_name": "Alice",
"surname": "Green",
"age": 21,
"height": 180,
"baseline_height_at_age_10": false
}
You should be able to do it just using aggregations. Assuming people only ever get taller, and the measurements are accurate, you could restrict the query to only those documents aged 10 or under, find the max height of those, then filter the results of those to exclude the baseline result
POST test/_search
{
"size": 0,
"query": {
"range": {
"age": {
"lte": 10
}
}
},
"aggs": {
"names": {
"terms": {
"field": "first_name",
"size": 10
},
"aggs": {
"max_height": {
"max": {
"field": "height"
}
},
"non-baseline": {
"filter": {
"match": {
"baseline_height_at_age_10": false
}
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
}
}
I've posted the same question, with emphasis on Painless scripting, ElasticSearch Support Forum How to find records matching the result of a previous search using ElasticSearch Painless scripting
and the answer was:
"I don't think the Painless approach will work here. You cannot use
the results of one query to execute a second query with Painless.
The two-step approach that you outline at the end of your post is the
way to go."
The bottom line is that you cannot use a result from one query as an input to another query. You can filter and aggregate and more, but not this.
So the approcah is pretty much as follows:
according to my understanding, suggests to do the 1st search, process
the data and do an additional search. This basically translates to:
Search the record where first_name=Alice and baseline_height_at_age_10=True.
Process externally, to extract the value of height for Alice at age 10.
Search for Alice's records where her height is lower than the value calculated externally.

Elasticsearch Context Suggester geo context - boost without filtering?

I'm creating a completion suggester with a geo context (Elastic 5.x).
mapping...
"suggest": {
"type": "completion",
...
"contexts": [
{
"name": "geoloc",
"type": "geo",
"precision": 3,
"path": "geolocation"
}
]
When I query this, I'd like to have it not filter by the geo context, only boost results that are within the geohash. It works great to filter by a single geohash, or filter by a lower precision, and then boost a higher precision within that original filter like this:
GET /my-index/_search
{
"suggest": {
...
"completion": {
"field": "suggest",
"size": "10",
"contexts": {
"geoloc": [
{
"lat": 44.8214564,
"lon": -93.475399,
"precision": 1
},
{
"lat": 44.8214564,
"lon": -93.475399,
"boost": 2
}
]
}
}
}
}
However, I can't get it to only boost on a single geo context without filtering.
When I submit the following query, it filters and boosts:
GET /my-index/_search
{
"suggest": {
...
"completion": {
"field": "suggest",
"size": "10",
"contexts": {
"geoloc": [
{
"lat": 44.8214564,
"lon": -93.475399,
"boost": 2
}
]
}
}
}
}
Is what I'm trying to do just not supported, or am I missing something?
Thanks!
Jason
Just ran into this issue as well.
The solution I came up with through trial and error was to use the category context to filter first to all my documents. Say you had added a category to your documents named "all" you could do this:
GET /my-index/_search
{
"suggest": {
...
"completion": {
"field": "suggest",
"size": "10",
"contexts": {
"category": ["all"],
"geoloc": [
{
"lat": 44.8214564,
"lon": -93.475399,
"precision": 2,
"boost": 2
}
]
}
}
}
}
When this is done, it seems to be selecting everything with the "all" category and then boosts the ones within the precision level specified to the top.
Using Elastic 6.*

Calculating GEO transformation params in Vega/d3

Copy this code to https://en.wikipedia.org/wiki/Special:GraphSandbox
The scale and translate parameters of the geo transformation were manually set to match the width & height of the image (see red crosses). How can I make it so that geo transformation matches the entire graph size (or maybe some signal values) automatically, without the manual adjustments?
UPDATE: The translation parameter should have been set to HALF of WIDTH and HEIGHT of the image. See the answer below, and center should have been set to [0,0]. For equirectangular projection, the graph size should have ration 2:1.
{
"version": 2, "width": 800, "height": 400, "padding": 0,
"data": [
{
"name": "data",
"values": [
{"lat":0, "lon":0},
{"lat":90, "lon":-180},
{"lat":-90, "lon":180}
]
}
],
"marks": [
{
"type": "image",
"properties": {
"enter": {
"url": {"value": "wikirawupload:{{filepath:Earthmap1000x500compac.jpg|190}}"},
"width": {"signal": "width"},
"height": {"signal": "height"}
}
}
},
{
"name": "points",
"type": "symbol",
"from": {
"data": "data",
"transform": [{
"type": "geo",
"projection": "equirectangular",
"scale": 127,
"center": [0,0],
"translate": [400,200],
"lon": "lon",
"lat": "lat"
}]
},
"properties": {
"enter": {
"x": {"field": "layout_x"},
"y": {"field": "layout_y"},
"fill": {"value": "#ff0000"},
"size": {"value": 500},
"shape": {"value": "cross"}
}
}
}
]
}
Found an answer (the example above was updated): The "translate" should be set to the center of the image. The "center" on the other hand should be set to [0,0]. The "scale" for the equirectangular projection needs to be set to width/(2*PI)

Resources