Bulk API error while indexing data into elasticsearch - elasticsearch

I want to import some data into elasticsearch using bulk API. this is the mapping I have created using Kibana dev tools:
PUT /main-news-test-data
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"title": {
"type": "text"
},
"lead": {
"type": "text"
},
"agency": {
"type": "keyword"
},
"date_created": {
"type": "date"
},
"url": {
"type": "keyword"
},
"image": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"id":{
"type": "keyword"
}
}
}
}
and this is my bulk data:
{ "index" : { "_index" : "main-news-test-data", "_id" : "1" } }
{
"content":"\u0641\u0647\u06cc\u0645\u0647 \u062d\u0633\u0646\u200c\u0645\u06cc\u0631\u06cc: \u0627\u06af\u0631\u0686\u0647 \u062f\u0631 \u0647\u06cc\u0627\u0647\u0648\u06cc ",
"title":"\u06a9\u0627\u0631\u0647\u0627\u06cc \u0642\u0627\u0644\u06cc\u0628\u0627\u0641",
"lead":"\u062c\u0627\u0645\u0639\u0647 > \u0634\u0647\u0631\u06cc -.",
"agency":"13",
"date_created":1494518193,
"url":"http://www.khabaronline.ir/(X(1)S(bud4wg3ebzbxv51mj45iwjtp))/detail/663749/society/urban",
"image":"uploads/2017/05/11/1589793661.jpg",
"category":"15",
"id":"2981643"
}
{ "index" : { "_index" : "main-news-test-data", "_id" : "2" } }
{
....
but when I want to post data I receive this error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Malformed action/metadata line [3], expected START_OBJECT but found [VALUE_STRING]"
}
"status" : 400
}
what is the problem? I used both PowerShell and POST method in Kibana dev tools but I receive the same error in both.

The data should be specified in a single line like this:
{ "index" : { "_index" : "main-news-test-data", "_id" : "1" } }
{ "content":"\u0641\u0647","title":"\u06a9" }
Please refer this SO answer
Try this below format of bulk JSON. I have tested this bulk API request locally also, and now it's working perfectly fine:
{ "index" : { "_index" : "main-news-test-data", "_id" : "1" } }
{"content":"\u0641\u0647\u06cc\u0645\u0647 \u062d\u0633\u0646\u200c\u0645\u06cc\u0631\u06cc: \u0627\u06af\u0631\u0686\u0647 \u062f\u0631 \u0647\u06cc\u0627\u0647\u0648\u06cc ", "title":"\u06a9\u0627\u0631\u0647\u0627\u06cc \u0642\u0627\u0644\u06cc\u0628\u0627\u0641", "lead":"\u062c\u0627\u0645\u0639\u0647 > \u0634\u0647\u0631\u06cc -.", "agency":"13", "date_created":1494518193, "url":"http://www.khabaronline.ir/(X(1)S(bud4wg3ebzbxv51mj45iwjtp))/detail/663749/society/urban", "image":"uploads/2017/05/11/1589793661.jpg", "category":"15", "id":"2981643"}
Dont forget to add a new line at the end of your content.

Related

Elasticsearch | Mapping exclude field with bulk API

I am using bulk api to create index and store data fields. Also I want to set mapping to exclude a field "field1" from the source. I know this can be done using "create Index API" reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html but I am using bulk API. below is sample API call:
POST _bulk
{ "index" : { "_index" : "test", _type = 'testType', "_id" : "1" } }
{ "field1" : "value1" }
Is there a way to add mapping settings while bulk indexing similar to below code:
{ "index" : { "_index" : "test", _type = 'testType', "_id" : "1" },
"mappings": {
"_source": {
"excludes": [
"field1"
]
}
}
}
{ "field1" : "value1" }
how to do mapping with bulk API?
It is not possible to define the mapping for a new Index while using the bulk API. You have to create your index beforehand and define the mapping then, or you have to define an index template and use a name for your index in your bulk request that triggers that template.
The following example code can be executed via the Dev Tools windows in Kibana:
PUT /_index_template/mytemplate
{
"index_patterns": [
"te*"
],
"priority": 1,
"template": {
"mappings": {
"_source": {
"excludes": [
"testexclude"
]
},
"properties": {
"testfield": {
"type": "keyword"
}
}
}
}
}
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "testfield" : "value1", "defaultField" : "asdf", "testexclude": "this shouldn't be in source" }
GET /test/_mapping
You can see by the response that in this example the mapping template was used for the new test index because the testfield has only the keyword type and the source excludes is used from the template.
{
"test" : {
"mappings" : {
"_source" : {
"excludes" : [
"testexclude"
]
},
"properties" : {
"defaultField" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"testexclude" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"testfield" : {
"type" : "keyword"
}
}
}
}
}
Also the document is not returned with the excluded field:
GET /test/_doc/1
Response:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"defaultField" : "asdf",
"testfield" : "value1"
}
}
Hope this answers your question and solves your use-case.

Elasticsearch - Can't search using suggestion field (“is not a completion suggest field”)

I'm completely new to elasticsearch and I'm trying to use elasticsearch completion suggester on an existing field called "identity.full_name", index = "search" and type = "person".
I followed the below index to change the mappings of the field.
1)
POST /search/_close
2)
POST search/person/_mapping
{
"person": {
"properties": {
"identity.full_name": {
"type": "text",
"fields":{
"suggest":{
"type":"completion"
}
}
}
}
}
}
3)
POST /search/_open
When I check the mappings at this point, using
GET search/_mapping/person/field/identity.full_name
I get the result,
{
"search": {
"mappings": {
"person": {
"identity.full_name": {
"full_name": "identity.full_name",
"mapping": {
"full_name": {
"type": "text",
"fields": {
"completion": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
},
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
}
}
}
Which is suggesting that it has been updated to be a completion field.
However, when I'm querying to check if this works using,
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "EMANNUEL",
"completion" : {
"field" : "identity.full_name"
}
}
}
}
It is giving me the error "Field [identity.full_name] is not a completion suggest field"
I'm not sure why I'm getting this error. Is there anything else I can try?
sample data:
{
"_index": "search",
"_type": "person",
"_id": "3106105149",
"_score": 1,
"_source": {
"identity": {
"id": "3106105149",
"first_name": "FLORENT",
"last_name": "TEBOUL",
"full_name": "FLORENT TEBOUL"
}
}
}
{
"_index": "search",
"_type": "person",
"_id": "125296353",
"_score": 1,
"_source": {
"identity": {
"id": "125296353",
"first_name": "CHRISTINA",
"last_name": "BHAN",
"full_name": "CHRISTINA K BHAN"
}
}
}
so when I do a GET based on prefix "CHRISTINA"
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "CHRISTINA",
"completion" : {
"field" : "identity.full_name.suggest"
}
}
}
}
I'm getting all the results like a match_all query.
You should use it like
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "EMANNUEL",
"completion" : {
"field" : "identity.full_name.suggest"
}
}
}
}
Mapping for GET search/_mapping/person/field/identity.full_name
{
"search" : {
"mappings" : {
"person" : {
"identity.full_name" : {
"full_name" : "identity.full_name",
"mapping" : {
"full_name" : {
"type" : "text",
"fields" : {
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
}
}
}
}
}
}
}
}
}

ElasticSearch => How to updates with a partial document using update_by_query

I want to update the data in my index whose cname is wang.
My index code is as follows:
PUT index_c
{
"mappings": {
"_doc" : {
"properties" : {
"cid" : {
"type" : "keyword"
},
"cname" : {
"type" : "keyword"
},
"cage" : {
"type" : "short"
},
"chome" : {
"type" : "text"
}
}
}
}
}
And my update request is as follows:
POST index_c/_update_by_query
{
"query" : {
"match": {
"cname": "wang"
}
},
"doc" : {
"cage" : "100",
"chome" : "china"
}
}
But I got an error like this:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Unknown key for a START_OBJECT in [doc].",
"line": 1,
"col": 43
}
],
"type": "parsing_exception",
"reason": "Unknown key for a START_OBJECT in [doc].",
"line": 1,
"col": 43
},
"status": 400
}
So I want to know how to implement this when using "update_by_query"
I think this will work for you just replace the doc part with script. if inline shows deprecated for you then just use source instead
POST index_c/_update_by_query
{
"query" : {
"match": {
"cname": "wang"
}
},
"script" : {
"inline" : "ctx._source.cage='100'; ctx._source.chome= 'china';",
"lang" : "painless"
}
}

How to speed up document append to existing array in elasticsearch?

I am using elasticsearch version 6.3.1.
And I am creating a nested type field,I have created this field to append all the documents of same ID.
Here is my schema for index:-
curl -XPUT 'localhost:9200/axes_index_test12?pretty' -H 'Content-Type: application/json' -d'
{
"mappings": {
"axes_type_test12": {
"properties": {
"totalData": {
"type": "nested",
"properties": {
"gpsdt": {
"type": "date",
"format":"dateOptionalTime"
},
"extbatlevel": {
"type": "integer"
},
"intbatlevel" : {
"type" : "integer"
},
"lastgpsdt": {
"type": "date",
"format":"dateOptionalTime"
},
"satno" : {
"type" : "integer"
},
"srtangle" : {
"type" : "integer"
}
}
},
"imei": {
"type": "long"
},
"date": {
"type": "date", "format":"dateOptionalTime"
},
"id" : {
"type" : "long"
}
}
}
}
}'
And to append into existing array I call following API : -
Here is the documents which I have to append:-
data={
"script" : {"source": "ctx._source.totalData.add(params.count)",
"lang": "painless",
"params" : {"count" : { "gpsdt" : gpsdt,
"analog1" : analog1,
"analog2" : analog2,
"analog3" : analog3,
"analog4" : analog4,
"digital1" : digital1,
"digital2" : digital2,
"digital3" : digital3,
"digital4" : digital4,
"extbatlevel" : extbatlevel,
"intbatlevel" : intbatlevel,
"lastgpsdt" : lastgpsdt,
"latitude" : latitude,
"longitude" : longitude,
"odo" : odo,
"odometer" : odometer,
"satno" : satno,
"srtangle" : srtangle,
"speed" : speed
}
}
}
}
Document Parsing:-
json_data = json.dumps(data)
And API url is: -
API_ENDPOINT = "http://localhost:9200/axes_index_test12/axes_type_test12/"+str(documentId)+"/_update"
And Finnaly I call this API:-
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
r = requests.post(url = API_ENDPOINT, data = json_data,headers=headers
Everything is fine with this but I am not getting good performance when I append new documents in existing array.
So please suggest me what changes I should make?
And I have 4 node cluster, 1 master, 2 data nodes and one cordinator node.

Elasticsearch data model

I'm currently parsing text from internal résumés in my company. The goal is to index everything in elasticsearch to perform search on them.
for the moment I have the following JSON document with no mapping defined :
Each coworker has a list of project with the client name
{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
{
"client": "SutrixMedia",
"missions": [
"Responsible for the quality on time and within budget",
"Writing specs, testing,..."
],
"technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
},
{
"client": "Société Générale",
"missions": [
" Writing test cases and scenarios",
" UAT"
],
"technologies": "HP QTP/QC"
}
]
}
The 2 main questions we would like to answer are :
Which coworker has already worked in this company ?
Which client use this technology ?
The first question is really easy to answer, for example:
Projects.client="SutrixMedia" returns me the right resume.
But how can I answer to the second one ?
I would like to make a query like this : Projects.technologies="HP QTP/QC" and the answer would be only the client name ("Société Générale" in this case) and NOT the entire document.
Is it possible to get this answer by defining a mapping with nested type ?
Or should I go for a parent/child mapping ?
Yes, indeed, that's possible with ES 1.5.* if you map projects as nested type and then retrieve nested inner_hits.
So here goes the mapping for your sample document above:
curl -XPUT localhost:9200/resumes -d '
{
"mappings": {
"resume": {
"properties": {
"name": {
"type": "string"
},
"position": {
"type": "string"
},
"projects": {
"type": "nested", <--- declare "projects" as nested type
"properties": {
"client": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"missions": {
"type": "string"
},
"technologies": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}'
Then, you can index your sample document from above:
curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'
Finally, with the following query which only retrieves the nested inner_hits you can retrieve only the nested object that matches Projects.technologies="HP QTP/QC"
curl -XPOST localhost:9200/resumes/resume/_search -d '
{
"_source": false,
"query": {
"nested": {
"path": "projects",
"query": {
"term": {
"projects.technologies.raw": "HP QTP/QC"
}
},
"inner_hits": { <----- only retrieve the matching nested document
"_source": "client" <----- and only the "client" field
}
}
}
}'
which yields only the client name instead of the whole matching document:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_score" : 1.4054651,
"inner_hits" : {
"projects" : {
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_nested" : {
"field" : "projects",
"offset" : 1
},
"_score" : 1.4054651,
"_source":{"client":"Société Générale"} <--- here is the client name
} ]
}
}
}
} ]
}
}

Resources