I am new to Elasticsearch.
I am trying to upload my existing MySql data to Elasticsearch. Elasticsearch bulk import uses json as the data format. That's why I converted my data to the json format.
employee.json:
[{"EmpId":"101", "Name":"John Doe", "Dept":"IT"}
{"EmpId":"102", "Name":"FooBar", "Dept":"HR"}]
But I am not able to upload my data using the following curl command:
post: curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary #employee.json
I get a parsing exception message.
After reading a document(https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html), I realized that the data format should be something like this:
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
I am still not sure how to format my data in the above format and perform the upload operation.
Basically I want to know the exact data format that is expected by the Elasticsearch bulk upload. And would also like to know whether my curl command is correct.
You data should be in form:
// if you want to use emp id as doc id specify otherwise dont add _id part
{ "index" : { "_index" : "index_name", "_type" : "type_name", "_id" : "101" } }
{"EmpId":"101", "Name":"John Doe", "Dept":"IT"}
{ "index" : { "_index" : "index_name", "_type" : "type_name", "_id" : "102" } }
{"EmpId":"102", "Name":"FooBar", "Dept":"HR"}
....
Or you can use logstash: https://www.elastic.co/blog/logstash-jdbc-input-plugin
From the docs:
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
So you would probably want your file to read something like
{ "update" : {"_id" : "101", "_type" : "foo", "_index" : "bar"} }
{"EmpId":"101", "Name":"John Doe", "Dept":"IT"}
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
Related
I want to update count field in the following doc for example. Please help
{
"_index" : "test-object",
"_type" : "data",
"_id" : "2.5.179963",
"_score" : 10.039009,
"_source" : {
"object_id" : "2.5.179963",
"block_time" : "2022-04-09T13:16:32",
"block_number" : 46975476,
"parent" : "1.2.162932",
"field_type" : "1.3.2",
"count" : 57000,
"maintenance_flag" : false
}
}
you can simply use the Update API as
POST <your-index>/_update/<your-doc-id>
{
"doc": {
"count": "" // provide the value which you want to update
}
}
I am using the Attachment Processor Attachment Processor in a Pipeline.
All work fine, but i wanted to do multiple post, then I tried to used bulk API.
Bulk work fine too, but I can't find how to send the url parameter "pipeline=attachment".
this put works :
POST testindex/type1/1?pipeline=attachment
{
"data": "Y291Y291",
"name" : "Marc",
"age" : 23
}
this bulk works :
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2" } }
{ "name" : "jean", "age" : 22 }
But how can I index Marc with his data field in bulk to be understood by the pipeline plugin?
thanks to Val comment, I did that and it work fine:
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2", "pipeline": "attachment"} } }
{"data": "Y291Y291", "name" : "jean", "age" : 22}
The documentation for ElasticSearch 5.5 offers no examples of how to use the bulk operation to index documents into the default mapping of an index. It also gives no indication why this is not possible, unless I'm missing that somewhere else in the documentation.
The ES 5.5 documentation gives one explicit example of bulk indexing:
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
But it also says that
The endpoints are /_bulk, /{index}/_bulk, and {index}/{type}/_bulk.
When the index or the index/type are provided, they will be used by
default on bulk items that don’t provide them explicitly.
So, the middle endpoint is valid, and it implies to me that a) you have to explicitly provide a type in the metadata for each document indexed, or b) that you can index documents into the default mapping ("_default_").
But I can't get this to work.
I've tried the /myindex/bulk endpoint with no type specified in the metadata.
I've tried it with "_type": "_default_" specified.
I've tried /myindex/_default_/bulk.
This has nothing to do with the _default_ mapping. This is about falling back to the default type that you specify in the URL. You can do the following
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
However the following snippet is exactly the same
POST /test/type1/_bulk
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
And you can mix this
POST foo/bar/_bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
In this example, one document would be indexed into foo and one into test.
Hope this makes sense.
Here is my problem, I'm trying to insert a bunch of data into elastic search and to vizualize it using kibana, however I got an issue with kibana timestamp recognition.
My time field is called "dateStart", and I tried to use it as a timestamp using the following command :
curl -XPUT 'localhost:9200/test/type1/_mapping' -d'{ "type1" :{"_timestamp":{"enabled":true, "format":"yyyy-MM-dd HH:mm:ss","path":"dateStart"}}}'
But this command give me the following error message :
{"error":"MergeMappingException[Merge failed with failures {[Cannot update path in _timestamp value. Value is null path in merged mapping is missing]}]","status":400}
I'm not sure to understand what I do with this command, but what I would like to do is telling to elastic search and kibana to use my "dateStart" field as a timestamp.
Here is a sample of my insert file (I use bulk insert) :
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1"} }
{ "dateStart" : "15-03-31 06:00:00", "score":0.9920092243874442}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2"} }
{ "dateStart" : "15-03-23 06:00:00", "score":0.0}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3"} }
{ "dateStart" : "15-03-29 12:00:00", "score":0.0}
The following sample data I haved used in my environment
Data:
{ "index" : { "_index" : "cases", "_type" : "case", "_id" : "101" } }
{ "admission" : "2015-01-03", "discharge" : "2015-01-04", "injury" : "broken arm" }
{ "index" : { "_index" : "cases", "_type" : "case", "_id" : "102" } }
{ "admission" : "2015-01-03", "discharge" : "2015-01-06", "injury" : "broken leg" }
{ "index" : { "_index" : "cases", "_type" : "case", "_id" : "103" } }
{ "admission" : "2015-01-06", "discharge" : "2015-01-07", "injury" : "broken nose" }
{ "index" : { "_index" : "cases", "_type" : "case", "_id" : "104" } }
{ "admission" : "2015-01-07", "discharge" : "2015-01-07", "injury" : "bruised arm" }
{ "index" : { "_index" : "cases", "_type" : "case", "_id" : "105" } }
{ "admission" : "2015-01-08", "discharge" : "2015-01-10", "injury" : "broken arm" }
{ "index" : { "_index" : "patients", "_type" : "patient", "_id" : "101" } }
{ "name" : "Adam", "age" : 28 }
{ "index" : { "_index" : "patients", "_type" : "patient", "_id" : "102" } }
{ "name" : "Bob", "age" : 45 }
{ "index" : { "_index" : "patients", "_type" : "patient", "_id" : "103" } }
{ "name" : "Carol", "age" : 34 }
{ "index" : { "_index" : "patients", "_type" : "patient", "_id" : "104" } }
{ "name" : "David", "age" : 14 }
{ "index" : { "_index" : "patients", "_type" : "patient", "_id" : "105" } }
{ "name" : "Eddie", "age" : 72 }
Indexed the data into the node
$ curl -X POST 'http://localhost:9200/_bulk' --data-binary #./hospital.json
[2015-02-12 08:18:01,347][INFO ][shield.license ] [node0] enabling license for [shield]
[2015-02-12 08:18:01,347][INFO ][license.plugin.core ] [node0] license for [shield] - valid
[2015-02-12 08:18:01,355][ERROR][shield.license ] [node0]
#
# Shield license will expire on [Saturday, March 14, 2015]. Cluster health, cluster stats and indices stats operations are
# blocked on Shield license expiration. All data operations (read and write) continue to work. If you
# have a new license, please update it. Otherwise, please reach out to your support contact.
#
Installed Shield and started as the above
The data is protected and I can see like below if i'm trying to access.
$ curl localhost:9200/cases/case/101?pretty=true
{
"error" : "AuthenticationException[missing authentication token for REST request [/cases/case/1]]",
"status" : 401
}
User and roles are added like below
$ elasticsearch/bin/shield/esusers useradd alice -r nurse
$ elasticsearch/bin/shield/esusers useradd bob -r doctor
I have edited the roles.yml and tried to add doctor and nurse according to the eg mentioned above. The security is not worked for me.
ubuntu#ip-10-142-247-183:~/elkproject/elasticsearch-1.4.4/config/shield$ curl --user alice:abc123 localhost:9200/_count?pretty=true
{
"error" : "AuthenticationException[unable to authenticate user [alice] for REST request [/_count?pretty=true]]",
"status" : 401
}
Note : I referred this blog http://blog.trifork.com/2015/03/05/shield-your-kibana-dashboards/
Any help would be highly appreciated
Did you install elasticsearch from a package (like a RPM or DEB)? If so, there may be an issue with the esusers tool putting the users in the wrong place. Right now, you have to configure your environment with the right location and add the users. If this is the case, you can move the $ES_HOME/config/shield directory to /etc/elasticsearch, which is the default configuration directory for RPM and DEB installations. When using the esusers commands in the future, just make sure the environment variables are set like shown in the link.
You can also remove Shield and start the install over following the full getting started guide and then start modifying the files as mentioned in the blog. To remove the existing Shield install: bin/plugin -r shield