I'm new to trino and I'm trying to use it to query nested objects in elastic search.
This is my mapping in elasticsearch:
{
"product_index": {
"mappings": {
"properties" :{
"id" : { "type" : "keyword"},
"name" { "type" : "keyword"},
"linked_products" :{
"type": "nested",
"properties" :{
"id" : { "type" : "keyword"}
}
}
}
}
}
}
I need to perform a query on the id field under linked_products .
what is the syntax in trino to perform a query on the id field?
Do I need to use special definitions on the target index mapping in elastic to map the nested section for trino?
=========================================================
Hi,
I will try to add some clarifications to my question.
We are trying to query the data according to the id field.
This is the query in Elastic:
get product_index/_search
{
"query": {
"nested" : {
"path" : "linked_products",
"query": {
"bool": {
"should" : [
{ "match" : {"linked_products.id" :123}}
]
}
}
}
}
}
We tried to query the id field in 2 ways:
Trino query -
select count(*)
from es_table aaa
where any_match(aaa.linked_products, x-> x.id=123)
When we try to query according to the id field the Pushdown to elastic doesn't happen and the connector retrieve all the documents to trino (this only happens with queries on nested documents).
send es-query from trino to elastic:
SELECT * FROM es.default."$query:"
It works but when we are trying to retrieve id's with many documents we got timeout from the elastic client.
I don't understand from the documentation if it is possible to perform scrolling when we are using es-query to avoid the timeout problem.
Trino maps nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.
Assume we push the following document to your index.
curl -X POST "localhost:9200/product_index/_doc?pretty"
-H 'Content-Type: application/json' -d'
{
"id": "1",
"name": "foo",
"linked_products": {
"id": "123"
}
}
'
The way you would read this out in Trino would just be to use the standard ROW syntax.
SELECT
id,
name,
linked_products.id
FROM elasticsearch.default.product_index;
Result:
|id |name|id |
|---|----|---|
|1 |foo |123|
This is fine and well, but judging from the fact that the name of your nested object is plural, I'll assume you want to store an array of objects like so.
curl -X POST "localhost:9200/product_index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"id": "2",
"name": "bar",
"linked_products": [
{
"id": "123"
},
{
"id": "456"
}
]
}
'
If you run the same query as above, with the second document inserted, you'll get the following error.
SQL Error [58]: Query failed (#20210604_202723_00009_nskc4): Expected object for field 'linked_products' of type ROW: [{id=123}, {id=456}] [ArrayList]
This is because, Trino has no way of knowing which fields are arrays from the default Elasticsearch mapping. So to enable querying over this array, you'll need to follow the instructions in the docs to explicitly identify that field as an Array type in Trino using the _meta field. Here is the command that would be used in this example to indetify linked_products as an ARRAY.
curl --request PUT \
--url localhost:9200/product_index/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"presto":{
"linked_products":{
"isArray":true
}
}
}
}'
Now, you will need to account in the SELECT statement that linked_products is an ARRAY of type ROW. Not all of the indexes will have values, so you should use the index safe element_at function to avoid errors.
SELECT
id,
name,
element_at(linked_products, 1).id AS id1,
element_at(linked_products, 2).id AS id2
FROM elasticsearch.default.product_index;
Result:
|id |name|id1|id2 |
|---|----|---|----|
|1 |foo |123|NULL|
|2 |bar |123|456 |
=========================================================
Update to answer #gil bob's updated question.
There is currently no support for pushdown aggregates in the Elasticsearch connector but this is getting added in PR 7131
You can set the elasticsearch.request-timeout properties in your elasticsearch.properties file to increase the request timeout as a workaround until the pushdown occurs. If it's taking Elasticsearch this long to return it, this will need to get set whether you run the aggregation in Trino or Elasticsearch.
I am trying to post some data to an elastic search server. I am using curl for this. The code is:
curl -X PUT https://username:password#someurl:443/index_name?pretty -H 'Content-Type: application/json' -d ' {"mappings": {"properties": {"my_field": {"type": "search_as_you_type"}}}}'
only the basic stuff of retrieving index info via cat/_indices?v works
Errors include
"Content-Type header [application/x-www-form-urlencoded] is not supported", curl: (6) Could not resolve host: application
curl: (3) [globbing] unmatched brace in column 1
curl: (3) [globbing] unmatched brace in column 1
curl: (3) [globbing] unmatched brace in column 1
curl: (3) [globbing] unmatched brace in column 1
curl: (3) [globbing] unmatched close brace/bracket in column 19
Error message is clear that you are using the wrong content type application/x-www-form-urlencoded, in very first line of error message.
"Content-Type header [application/x-www-form-urlencoded] is not
supported",
As given mapping seems to be copied from the official ES https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-as-you-type.html link, there only you can find the corresponding CURL format.
Pasting that format as well as screen-shot from the link, how can you copy the entire curl command.
here and I tried it in my local and it worked, note the only difference is content type which should be application/json
curl -X PUT "localhost:9500/so_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"my_field": {
"type": "search_as_you_type"
}
}
}
}
'
Figured it out after some trial and error. Here is the code used (after creating the index).
import requests
import json
url = 'http://localhost:9200/companies_list/_mapping'
headers1 = {'Content-Type':'application/json'}
data_obj = {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
response = requests.put(url, headers=headers1, data=json.dumps(data_obj))
print(response.content)
Here are some useful commands for people starting out on elasticsearch or upgrading to the latest version
import requests
import json
headers1 = {'Content-Type':'application/json'}
#create index
url = 'http://localhost:9200/my_index'
response = requests.put(url)
#view indexes
url = 'http://localhost:9200/_cat/indices?v'
response = requests.get(url)
#create mappings. if mappings are not created, then dynamic mapping will be created by elastic search
url = 'http://localhost:9200/my_index/_mapping'
response = requests.post(url, headers=headers1, data=json.dumps(data_obj))
data_obj = {
"properties": {
"names": {
"type": "search_as_you_type",
}
}
}
#insert data. if mappings have not been created initially, then dynamic mapping will be created by elastic search
url = 'http://localhost:9200/my_index/_doc/'
my_list = ['test name 1', 'test name 2', 'test name 3']
for i,j in enumerate(my_list):
response = requests.post(url+str(i), headers=headers1, data=json.dumps({"names":j}))
print(response.content)
#retrieve data.
url = 'http://localhost:9200/my_index/_doc/_search'
url = 'http://localhost:9200/my_index/_search' #works with both
data_obj = {
"query": {
"multi_match": {
"query": "test name 2",
"fields": [
"names",
"names._2gram",
"names._3gram"
]
}
}
}
response = requests.get(url, headers=headers1, data=json.dumps(data_obj))
print(response.content)
x = json.loads(response.content)
#x = json.loads(response.content.decode("utf-8")) #in case the response is in bytes and not str
for i in x["hits"]["hits"]:
print(i["_source"]["names"], i["_score"])
I am new to elastic search. Will keep updating the post with more basics
I've beat on this issue for what seems like a week and nothing online solves it. No data object is returned, not even "null". I'm converting from a working REST CRUD app to GraphQL. Not as easy as I expected.
My mutation in Playground:
mutation createMember($input: MemberInput!) {
createMember (input: $input) {
first_name
last_name
user_name
}
}
Playground Query Variables below: (Not in the headers section.)
{
"input": {
"first_name": "John",
"last_name": "Creston",
"user_name": "jc"
}
}
The schema: (The queries work fine and the Member type, an entity in TypeORM, works with them and full REST CRUD.)
input MemberInput {
first_name: String!
last_name: String!
user_name: String!
}
type Mutation {
createMember(input: MemberInput!): Member!
}
type Query {
getMembers: [Member]
getMember(member_id: Int!): Member!
checkUserName(user_name: String): Member
checkEmail(email: String): Member
}
I don't see how the resolver could be the problem for this error message but I'll add the Nestjs resolver:
#Mutation('createMember')
async createMember(#Args('input') input: MemberInput): Promise<Members> {
console.log('input in resolver: ', input);
return await this.membersService.addItem(input);
}
The service works with REST and the queries so that should be a problem. The console.log in the resolver never appears in terminal.
From the Copy CURL button:
curl 'http://localhost:3000/graphql' -H 'Accept-Encoding: gzip,
deflate, br' -H 'Content-Type: application/json' -H 'Accept:
application/json' -H 'Connection: keep-alive' -H 'DNT: 1' -H 'Origin:
http://localhost:3000' --data-binary '{"query":"mutation
createMember($input: MemberInput!) {\n createMember (input: $input)
{\n first_name\n middle_initial\n last_name\n
user_name\n pitch\n main_skill_title\n \tskill_id_array\n
skills_comments\n other_skills\n links\n country\n
email\n member_status\n }\n}"}' --compressed
I had same problem, in variables i had comma at the end of last parameter. gql gave me same error.
change this:
{
"input": {
"param1": "val1",
"param2": "val2",
}
}
to this:
{
"input": {
"param1": "val1",
"param2": "val2" // no comma here
}
}
A bit complicated to explain but we've probably all be there. I was trying different "solutions" and the remains of one was accidentally left in my JSON data object. The full object has more properties than my brief one above. An array shouldn't be "[]" in JSON. I was getting tired and desperate last night. I changed the resolver to what I posted above but it didn't fix the problem because of the experiment in JSON caused the same error as before. So I was looking in the wrong places. Once I noticed and removed the quotes my full object's data was posted to the Postgres db.
The code above works.
It would be nice if there were more specific GraphQL errors. I suspected that this one is very general but that didn't help. It didn't indicate that the original problem was in the resolver and later that I had a data input error.
In my case, my code is implementing this plugin graphql-query-complexity for which there is a known bug: https://github.com/slicknode/graphql-query-complexity/issues/69.
Wrapping the plugin in a try/catch I am able to skip the error validation.
I am trying to import a Kibana 6 visualization into Elasticsearch 6, to be viewed in Kibana. I am trying to do this with a curl command, or essentially a script without going through the Kibana UI. This is the command I’m using:
curl -XPUT http://localhost:9200/.kibana/doc/visualization:vis1 -H
'Content-Type: application/json' -d #visual1.json
And this is visual1.json:
{
"type": "visualization",
"visualization": {
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\”,\”filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\”,\”negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"ALOG\”,\”params\":{\"query\":\"AUTH_LOGIN\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"AUTH_LOGIN\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
Now a couple things to note about the curl command and this json file. The index I push the visualization to is .kibana. I found that when I pushed these to other index’s such as “test”, my data would not show up as a stored object in Kibana, and thus wouldn’t show up on the visualization tab. When I PUT to .kibana with this syntax ‘.kibana/doc/visualization:vis1 ‘, my object shows up on the visualization tab.
Now concerning the json file. Note that when you export a visualization from Kibana 6, it doesn’t look like this. It looks like:
{
"_id": "vis1",
"_type": "visualization",
"_source": {
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"LOG\",\"params\":{\"query\":\"LOG\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"LOG\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
Note the first few lines. I found from this link Unable to create visualization using curl command in elaticearch that you have to modify the json export in order to import it. Seems strange right?
Anyway, then I’ve had two errors on the actual visualization object once in Kibana. The first was that “The index pattern associated with this object no longer exists.” I was able to get around this by creating an index pattern with the id referenced in the searchSourceJson of my visualization. I had to do this within the Kibana UI, so technically this solution would not work for me. In any case, I created an index with a document in it by calling
curl -X PUT "localhost:9200/test57/_doc/1" -H 'Content-Type: application/json' -d'
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
'
And then in the Kibana UI, created an index pattern and gave it the custom index pattern ID def097e0-550f-11e8-9266-93ce640e5839.
Now when I go try to view my visualization, I get a new error. “A field associated with this object no longer exists in the index pattern.”
I am guessing this has something to do with me pushing a random object into the index, but even with debug settings on for elastic and kibana, I don’t really get enough information to fix this problem.
If anyone could point me in the right direction that would be great! Thanks in advance.
You need to make sure that the fields you reference in your visualization definition are also present in the Kibana index pattern (Kibana main screen > Management > Index Patterns). The easiest way to do that would be to include said fields in the dummy index you created and then 'refresh field list' in the Kibana Index Patterns screen.
You can do this via CLI by creating a document of _type index-pattern in the .kibana index.
It is possible to import through kibana endpoint using api saved_objects.
This needs to modify the exported json wrapping it inside {"attributes":....}
Base on your example it should be something like:
curl -XPOST "http://localhost:5601/api/saved_objects/visualization/myvisualisation?overwrite=true" -H "kbn-xsrf: reporting" -H 'Content-Type: application/json' -d'
{"attributes":{
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"LOG\",\"params\":{\"query\":\"LOG\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"LOG\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
'
I'm using Elasticsearch Bulk Index to update some stats of a documents, but it may happen the document I am trying to update does not exist - in this case I want it to do nothing.
I don't want it to create the document in this case.
I haven't found anything in the docs, or perhaps missed it.
My current actions (In this case it creates the document):
{
update: {
_index: "index1",
_type: "interaction",
_id: item.id
}
},
{
script: {
file: "update-stats",
lang: "groovy",
params: {
newCommentsCount: newRetweetCount,
}
},
upsert: normalizedItem
}
How do I update the document only if it exists, otherwise nothing?
Thank you
Dont use upsert and use a normal update.
Also if the document does not exist while updating , the update will fail.
There by it should work well for you.
Following worked for me with elasticsearch 7.15.2 (need to check lowest supported version for this, ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#update-api-example)
curl --location --request POST 'http://127.0.0.1:9200/exp/_update/8' \
--header 'Content-Type: application/json' \
--data-raw '
{
"scripted_upsert": true,
"script": {
"source": "if ( ctx.op == \"create\" ) {ctx.op=\"noop\"} else {ctx._source.name=\"updatedName\"} ",
"params": {
"count": 4
}
},
"upsert": {}
}
'
If ES is about to create a new record (ctx.op is "create" then we change the op to "noop" and nothing is done, otherwise we do the normal update through the script.