ElasticSearch MapperParsingException object mapping - elasticsearch

I folow a article about ElasticSearch and I try put this example on my engine.
example:
curl -XPUT 'elasticsearch:9200/twitter/tweet/1' -d '{
"user": "david",
"message": "C'est mon premier message de la journée !",
"postDate": "2010-03-15T15:23:56",
"priority": 2,
"rank": 10.2
}'
I try to send this information across a bash file (I use Putty), but I have this errror:
{"error":"MapperParsingException[object mapping for [tweet] tried to parse as object,
but got EOF, has a concrete value been provided to it?]","status":400}
I also try to see one error with "cat -e tweet.sh", but I don't understand why I've got this error.
Thanks in advance.

It's a type mismatching. I'm facing with such issue too. It looks like you try to index a value into an object mapped json. i.e., you indexed one time something like this:
{
"obj1": {
"field1": "value1"
}
}
and then index this:
{
"obj1": "value"
}
Check your existing mapping via elasticsearch:9200/twitter/_mapping and you will see if that one of the field was indexed as object

Related

Elastic/Opensearch: HowTo create a new document from an _ingest/pipeline

I am working with Elastic/Opensearch and want to create a new document in a different index out of an _ingest/pipeline
I found no help in the www...
All my documents (filebeat) get parsed and modified in the beginning by a pipline, lets say "StartPipeline".
Triggered by an information in a field of the incoming document, lets say "Start", I want to store that value in a special way by creating a new document in a different long-termindex - with some more information from the triggering document.
If found possibilities, how to do this manually from the console (update_by_query / reindex / painlesscripts) but it has to be triggered by an incoming document...
Perhaps this is easier to understand - in my head it looks like something like that.
PUT _ingest/pipeline/StartPipeline
{
"description" : "create a document in/to a different index",
"processors" : [ {
"PutNewDoc" : {
"if": "ctx.FieldThatTriggers== 'start'",
"index": "DestinationIndex",
"_id": "123",
"document": { "message":"",
"script":"start",
"server":"alpha
...}
}
} ]
}
Does anyone has an idea?
And sorry, I am no native speaker, I am from Germany

Trino/presto with elastic : how to search nested objects?

I'm new to trino and I'm trying to use it to query nested objects in elastic search.
This is my mapping in elasticsearch:
{
"product_index": {
"mappings": {
"properties" :{
"id" : { "type" : "keyword"},
"name" { "type" : "keyword"},
"linked_products" :{
"type": "nested",
"properties" :{
"id" : { "type" : "keyword"}
}
}
}
}
}
}
I need to perform a query on the id field under linked_products .
what is the syntax in trino to perform a query on the id field?
Do I need to use special definitions on the target index mapping in elastic to map the nested section for trino?
=========================================================
Hi,
I will try to add some clarifications to my question.
We are trying to query the data according to the id field.
This is the query in Elastic:
get product_index/_search
{
"query": {
"nested" : {
"path" : "linked_products",
"query": {
"bool": {
"should" : [
{ "match" : {"linked_products.id" :123}}
]
}
}
}
}
}
We tried to query the id field in 2 ways:
Trino query -
select count(*)
from es_table aaa
where any_match(aaa.linked_products, x-> x.id=123)
When we try to query according to the id field the Pushdown to elastic doesn't happen and the connector retrieve all the documents to trino (this only happens with queries on nested documents).
send es-query from trino to elastic:
SELECT * FROM es.default."$query:"
It works but when we are trying to retrieve id's with many documents we got timeout from the elastic client.
I don't understand from the documentation if it is possible to perform scrolling when we are using es-query to avoid the timeout problem.
Trino maps nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.
Assume we push the following document to your index.
curl -X POST "localhost:9200/product_index/_doc?pretty"
-H 'Content-Type: application/json' -d'
{
"id": "1",
"name": "foo",
"linked_products": {
"id": "123"
}
}
'
The way you would read this out in Trino would just be to use the standard ROW syntax.
SELECT
id,
name,
linked_products.id
FROM elasticsearch.default.product_index;
Result:
|id |name|id |
|---|----|---|
|1 |foo |123|
This is fine and well, but judging from the fact that the name of your nested object is plural, I'll assume you want to store an array of objects like so.
curl -X POST "localhost:9200/product_index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"id": "2",
"name": "bar",
"linked_products": [
{
"id": "123"
},
{
"id": "456"
}
]
}
'
If you run the same query as above, with the second document inserted, you'll get the following error.
SQL Error [58]: Query failed (#20210604_202723_00009_nskc4): Expected object for field 'linked_products' of type ROW: [{id=123}, {id=456}] [ArrayList]
This is because, Trino has no way of knowing which fields are arrays from the default Elasticsearch mapping. So to enable querying over this array, you'll need to follow the instructions in the docs to explicitly identify that field as an Array type in Trino using the _meta field. Here is the command that would be used in this example to indetify linked_products as an ARRAY.
curl --request PUT \
--url localhost:9200/product_index/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"presto":{
"linked_products":{
"isArray":true
}
}
}
}'
Now, you will need to account in the SELECT statement that linked_products is an ARRAY of type ROW. Not all of the indexes will have values, so you should use the index safe element_at function to avoid errors.
SELECT
id,
name,
element_at(linked_products, 1).id AS id1,
element_at(linked_products, 2).id AS id2
FROM elasticsearch.default.product_index;
Result:
|id |name|id1|id2 |
|---|----|---|----|
|1 |foo |123|NULL|
|2 |bar |123|456 |
=========================================================
Update to answer #gil bob's updated question.
There is currently no support for pushdown aggregates in the Elasticsearch connector but this is getting added in PR 7131
You can set the elasticsearch.request-timeout properties in your elasticsearch.properties file to increase the request timeout as a workaround until the pushdown occurs. If it's taking Elasticsearch this long to return it, this will need to get set whether you run the aggregation in Trino or Elasticsearch.

How can you sort documents by nested object in elastic search using go?

I'm new to Elasticsearch.
I have documents and each of them has a structure like this:
{
"created_at": "2018-01-01 01:01:01",
"student": {
"first_name": "john",
"last_name": "doe"
},
"parent": {
"first_name": "susan",
"last_name": "smile"
}
}
I just want to sort those documents based on student.first_name using olivere/elastic package for go.
This is my query at the moment:
searchSvc = searchSvc.SortBy(elastic.NewFieldSort("student.first_name").Asc())
and I'm getting this error:
elastic: Error 400 (Bad Request): all shards failed
[type=search_phase_execution_exception]
However when I tried sorting it by created_at, it's working.
searchSvc = searchSvc.SortBy(elastic.NewFieldSort("created_at").Asc())
I don't have any mapping in the index. (is this the problem?)
I tried searching for something like "Elasticsearch sort by nested object", but I always got questions that need to sort an array in the nested object.
It turns out that this is a beginner mistake.. You can't sort by text fields. I got it from here elasticsearch-dsl-py Sorting by Text() field
What you can do though, if you don't specify mappings, you can sort by the keyword property of the field.
searchSvc = searchSvc.SortBy(elastic.NewFieldSort("student.first_name.keyword").Asc())
And it works!

How do I import a Kibana 6 visualization into elasticsearch 6 without using the Kibana UI?

I am trying to import a Kibana 6 visualization into Elasticsearch 6, to be viewed in Kibana. I am trying to do this with a curl command, or essentially a script without going through the Kibana UI. This is the command I’m using:
curl -XPUT http://localhost:9200/.kibana/doc/visualization:vis1 -H
'Content-Type: application/json' -d #visual1.json
And this is visual1.json:
{
"type": "visualization",
"visualization": {
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\”,\”filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\”,\”negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"ALOG\”,\”params\":{\"query\":\"AUTH_LOGIN\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"AUTH_LOGIN\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
Now a couple things to note about the curl command and this json file. The index I push the visualization to is .kibana. I found that when I pushed these to other index’s such as “test”, my data would not show up as a stored object in Kibana, and thus wouldn’t show up on the visualization tab. When I PUT to .kibana with this syntax ‘.kibana/doc/visualization:vis1 ‘, my object shows up on the visualization tab.
Now concerning the json file. Note that when you export a visualization from Kibana 6, it doesn’t look like this. It looks like:
{
"_id": "vis1",
"_type": "visualization",
"_source": {
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"LOG\",\"params\":{\"query\":\"LOG\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"LOG\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
Note the first few lines. I found from this link Unable to create visualization using curl command in elaticearch that you have to modify the json export in order to import it. Seems strange right?
Anyway, then I’ve had two errors on the actual visualization object once in Kibana. The first was that “The index pattern associated with this object no longer exists.” I was able to get around this by creating an index pattern with the id referenced in the searchSourceJson of my visualization. I had to do this within the Kibana UI, so technically this solution would not work for me. In any case, I created an index with a document in it by calling
curl -X PUT "localhost:9200/test57/_doc/1" -H 'Content-Type: application/json' -d'
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
'
And then in the Kibana UI, created an index pattern and gave it the custom index pattern ID def097e0-550f-11e8-9266-93ce640e5839.
Now when I go try to view my visualization, I get a new error. “A field associated with this object no longer exists in the index pattern.”
I am guessing this has something to do with me pushing a random object into the index, but even with debug settings on for elastic and kibana, I don’t really get enough information to fix this problem.
If anyone could point me in the right direction that would be great! Thanks in advance.
You need to make sure that the fields you reference in your visualization definition are also present in the Kibana index pattern (Kibana main screen > Management > Index Patterns). The easiest way to do that would be to include said fields in the dummy index you created and then 'refresh field list' in the Kibana Index Patterns screen.
You can do this via CLI by creating a document of _type index-pattern in the .kibana index.
It is possible to import through kibana endpoint using api saved_objects.
This needs to modify the exported json wrapping it inside {"attributes":....}
Base on your example it should be something like:
curl -XPOST "http://localhost:5601/api/saved_objects/visualization/myvisualisation?overwrite=true" -H "kbn-xsrf: reporting" -H 'Content-Type: application/json' -d'
{"attributes":{
"title": "Logins",
"visState": "{\"title\":\"Logins\",\"type\":\"histogram\",\"params\":{\"type\":\"histogram\",\"grid\":{\"categoryLines\":false,\"style\":{\"color\":\"#eee\"}},\"categoryAxes\":[{\"id\":\"CategoryAxis-1\",\"type\":\"category\",\"position\":\"bottom\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\"},\"labels\":{\"show\":true,\"truncate\":100},\"title\":{}}],\"valueAxes\":[{\"id\":\"ValueAxis-1\",\"name\":\"LeftAxis-1\",\"type\":\"value\",\"position\":\"left\",\"show\":true,\"style\":{},\"scale\":{\"type\":\"linear\",\"mode\":\"normal\"},\"labels\":{\"show\":true,\"rotate\":0,\"filter\":false,\"truncate\":100},\"title\":{\"text\":\"Count\"}}],\"seriesParams\":[{\"show\":\"true\",\"type\":\"histogram\",\"mode\":\"stacked\",\"data\":{\"label\":\"Count\",\"id\":\"1\"},\"valueAxis\":\"ValueAxis-1\",\"drawLinesBetweenPoints\":true,\"showCircles\":true}],\"addTooltip\":true,\"addLegend\":true,\"legendPosition\":\"right\",\"times\":[],\"addTimeMarker\":false},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"principal.keyword\",\"otherBucket\":true,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"filter\":[{\"meta\":{\"index\":\"def097e0-550f-11e8-9266-93ce640e5839\",\"negate\":false,\"disabled\":false,\"alias\":null,\"type\":\"phrase\",\"key\":\"requestType.keyword\",\"value\":\"LOG\",\"params\":{\"query\":\"LOG\",\"type\":\"phrase\"}},\"query\":{\"match\":{\"requestType.keyword\":{\"query\":\"LOG\",\"type\":\"phrase\"}}},\"$state\":{\"store\":\"appState\"}}],\"query\":{\"query\":\"\",\"language\":\"lucene\"}}"
}
}
}
'

Can I create a document with the update API if the document doesn't exist yet

I have a very simple question :
I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.
I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.
Can you provide me some explaination ?
Thanks in advance!
This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.
Given an index created with the following documents:
PUT /cars/car/1
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }
We can get the upsert functionality you want using the update api with the following api call.
POST /cars/car/3/_update
{
"doc": {
"color" : "brown",
"brand" : "ford"
},
"doc_as_upsert" : true
}
This api call will add the document to the index since it does not exist.
Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.
POST /cars/car/3/_update
{
"doc": {
"color" : "black",
"brand" : "ford"
},
"doc_as_upsert" : true
}
AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.
UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.
Quote from the documentation:
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
For bulk API use
bulks.push({
update: {
_index: 'index',
_type: 'type',
_id: id
}
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});
As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.
after_commit on: :update do
begin
__elasticsearch__.update_document
rescue Elasticsearch::Transport::Transport::Errors::NotFound
__elasticsearch__.index_document
end
end
I think you want "create" action
Here's the bulk API documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.
Difference between actions:
create
(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.
index
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
update
(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.
doc
(Optional, object) The partial document to index. Required for update operations.

Resources