I'm new to trino and I'm trying to use it to query nested objects in elastic search.
This is my mapping in elasticsearch:
{
"product_index": {
"mappings": {
"properties" :{
"id" : { "type" : "keyword"},
"name" { "type" : "keyword"},
"linked_products" :{
"type": "nested",
"properties" :{
"id" : { "type" : "keyword"}
}
}
}
}
}
}
I need to perform a query on the id field under linked_products .
what is the syntax in trino to perform a query on the id field?
Do I need to use special definitions on the target index mapping in elastic to map the nested section for trino?
=========================================================
Hi,
I will try to add some clarifications to my question.
We are trying to query the data according to the id field.
This is the query in Elastic:
get product_index/_search
{
"query": {
"nested" : {
"path" : "linked_products",
"query": {
"bool": {
"should" : [
{ "match" : {"linked_products.id" :123}}
]
}
}
}
}
}
We tried to query the id field in 2 ways:
Trino query -
select count(*)
from es_table aaa
where any_match(aaa.linked_products, x-> x.id=123)
When we try to query according to the id field the Pushdown to elastic doesn't happen and the connector retrieve all the documents to trino (this only happens with queries on nested documents).
send es-query from trino to elastic:
SELECT * FROM es.default."$query:"
It works but when we are trying to retrieve id's with many documents we got timeout from the elastic client.
I don't understand from the documentation if it is possible to perform scrolling when we are using es-query to avoid the timeout problem.
Trino maps nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.
Assume we push the following document to your index.
curl -X POST "localhost:9200/product_index/_doc?pretty"
-H 'Content-Type: application/json' -d'
{
"id": "1",
"name": "foo",
"linked_products": {
"id": "123"
}
}
'
The way you would read this out in Trino would just be to use the standard ROW syntax.
SELECT
id,
name,
linked_products.id
FROM elasticsearch.default.product_index;
Result:
|id |name|id |
|---|----|---|
|1 |foo |123|
This is fine and well, but judging from the fact that the name of your nested object is plural, I'll assume you want to store an array of objects like so.
curl -X POST "localhost:9200/product_index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"id": "2",
"name": "bar",
"linked_products": [
{
"id": "123"
},
{
"id": "456"
}
]
}
'
If you run the same query as above, with the second document inserted, you'll get the following error.
SQL Error [58]: Query failed (#20210604_202723_00009_nskc4): Expected object for field 'linked_products' of type ROW: [{id=123}, {id=456}] [ArrayList]
This is because, Trino has no way of knowing which fields are arrays from the default Elasticsearch mapping. So to enable querying over this array, you'll need to follow the instructions in the docs to explicitly identify that field as an Array type in Trino using the _meta field. Here is the command that would be used in this example to indetify linked_products as an ARRAY.
curl --request PUT \
--url localhost:9200/product_index/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"presto":{
"linked_products":{
"isArray":true
}
}
}
}'
Now, you will need to account in the SELECT statement that linked_products is an ARRAY of type ROW. Not all of the indexes will have values, so you should use the index safe element_at function to avoid errors.
SELECT
id,
name,
element_at(linked_products, 1).id AS id1,
element_at(linked_products, 2).id AS id2
FROM elasticsearch.default.product_index;
Result:
|id |name|id1|id2 |
|---|----|---|----|
|1 |foo |123|NULL|
|2 |bar |123|456 |
=========================================================
Update to answer #gil bob's updated question.
There is currently no support for pushdown aggregates in the Elasticsearch connector but this is getting added in PR 7131
You can set the elasticsearch.request-timeout properties in your elasticsearch.properties file to increase the request timeout as a workaround until the pushdown occurs. If it's taking Elasticsearch this long to return it, this will need to get set whether you run the aggregation in Trino or Elasticsearch.
I am trying to import a Kibana 6 visualization into Elasticsearch 6, to be viewed in Kibana. I am trying to do this with a curl command, or essentially a script without going through the Kibana UI. This is the command I’m using:
curl -XPUT http://localhost:9200/.kibana/doc/visualization:vis1 -H
'Content-Type: application/json' -d #visual1.json
And this is visual1.json:
{
"type": "visualization",
"visualization": {
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\”,\”filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\”,\”negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"ALOG\”,\”params\":{\"query\":\"AUTH_LOGIN\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"AUTH_LOGIN\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
Now a couple things to note about the curl command and this json file. The index I push the visualization to is .kibana. I found that when I pushed these to other index’s such as “test”, my data would not show up as a stored object in Kibana, and thus wouldn’t show up on the visualization tab. When I PUT to .kibana with this syntax ‘.kibana/doc/visualization:vis1 ‘, my object shows up on the visualization tab.
Now concerning the json file. Note that when you export a visualization from Kibana 6, it doesn’t look like this. It looks like:
{
"_id": "vis1",
"_type": "visualization",
"_source": {
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"LOG\",\"params\":{\"query\":\"LOG\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"LOG\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
Note the first few lines. I found from this link Unable to create visualization using curl command in elaticearch that you have to modify the json export in order to import it. Seems strange right?
Anyway, then I’ve had two errors on the actual visualization object once in Kibana. The first was that “The index pattern associated with this object no longer exists.” I was able to get around this by creating an index pattern with the id referenced in the searchSourceJson of my visualization. I had to do this within the Kibana UI, so technically this solution would not work for me. In any case, I created an index with a document in it by calling
curl -X PUT "localhost:9200/test57/_doc/1" -H 'Content-Type: application/json' -d'
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
'
And then in the Kibana UI, created an index pattern and gave it the custom index pattern ID def097e0-550f-11e8-9266-93ce640e5839.
Now when I go try to view my visualization, I get a new error. “A field associated with this object no longer exists in the index pattern.”
I am guessing this has something to do with me pushing a random object into the index, but even with debug settings on for elastic and kibana, I don’t really get enough information to fix this problem.
If anyone could point me in the right direction that would be great! Thanks in advance.
You need to make sure that the fields you reference in your visualization definition are also present in the Kibana index pattern (Kibana main screen > Management > Index Patterns). The easiest way to do that would be to include said fields in the dummy index you created and then 'refresh field list' in the Kibana Index Patterns screen.
You can do this via CLI by creating a document of _type index-pattern in the .kibana index.
It is possible to import through kibana endpoint using api saved_objects.
This needs to modify the exported json wrapping it inside {"attributes":....}
Base on your example it should be something like:
curl -XPOST "http://localhost:5601/api/saved_objects/visualization/myvisualisation?overwrite=true" -H "kbn-xsrf: reporting" -H 'Content-Type: application/json' -d'
{"attributes":{
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"LOG\",\"params\":{\"query\":\"LOG\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"LOG\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
'
I have a very simple question :
I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.
I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.
Can you provide me some explaination ?
Thanks in advance!
This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.
Given an index created with the following documents:
PUT /cars/car/1
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }
We can get the upsert functionality you want using the update api with the following api call.
POST /cars/car/3/_update
{
"doc": {
"color" : "brown",
"brand" : "ford"
},
"doc_as_upsert" : true
}
This api call will add the document to the index since it does not exist.
Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.
POST /cars/car/3/_update
{
"doc": {
"color" : "black",
"brand" : "ford"
},
"doc_as_upsert" : true
}
AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.
UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.
Quote from the documentation:
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
For bulk API use
bulks.push({
update: {
_index: 'index',
_type: 'type',
_id: id
}
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});
As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.
after_commit on: :update do
begin
__elasticsearch__.update_document
rescue Elasticsearch::Transport::Transport::Errors::NotFound
__elasticsearch__.index_document
end
end
I think you want "create" action
Here's the bulk API documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.
Difference between actions:
create
(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.
index
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
update
(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.
doc
(Optional, object) The partial document to index. Required for update operations.