How to add documents to existing index in elasticsearch - elasticsearch

Am using Elasticsearch 1.4. My requirement is I will have data every hour and that needs to be uploaded. So the approach that I have taken is to create an index - "demo" and upload the data. So, the first hour data gets inserted. Now, my question is how to append the subsequent hours data into this index.
PUT /demo/userdetails/1
{
"user" : "kimchy",
"message" : "trying out Elastic Search"
}
Now I am trying to add another document
{"user": "swarna","message":"hi"}

You simply need to PUT the additional documents. In your example above you did
PUT /demo/userdetails/1 { "user" : "kimchy", "message" : "trying out Elastic Search" }
Now you would do this:
PUT /demo/userdetails/2 {"user": "swarna","message":"hi"}
In you command there demo is the index, userdetails is the type, and the number is the document id. If you omit the document id ES will make one up for you.

Related

How to create a map chart with GeoIP mapping?

I'm fairly new to ELK (7.10), and I would like to know how to create a map chart using GeoIP mapping.
I already have logs parsed and one field is "remote_ip" which I want to view on a map chart.
I've seen lots of instructions on how to do this but most are out of date and do not apply to my version which is 7.10. I'm using filebeats/logstash/kibana/elasticsearch.
Could someone show me the high level steps required to do this? Or point me to a detailed guide appropriate to my version? I have no idea how to begin.
I'm assuming those IP addresses are public so you can geocode them. Since your logs are already indexed, you now need to geocode them. Here is how to do it.
First, you need to modify your mapping to add a geo_point field, like this:
PUT your-index/_mapping
{
"properties": {
"remote_location": {
"type": "geo_point"
}
}
}
Once you've added that new field to your mapping, you can update your index to geocode the IP addresses. For that, you first need to create an ingest pipeline with the geoip processor:
PUT _ingest/pipeline/geoip
{
"description" : "Geocode IP address",
"processors" : [
{
"geoip" : {
"field" : "remote_ip",
"target_field": "remote_location"
}
}
]
}
Once this ingest pipeline is created you can use it to update your index using the _update_by_query endpoint like this:
POST your-index/_update_by_query?pipeline=geoip
Once the update is over, you can go into Kibana, create an index pattern and then go to Analytics > Maps and create your map.

How to implement ElasticSearch new index creation after every fix number of days?

How to implement ElasticSearch new index creation after every fix number of days and if its possible then how to search over all the previous indexes? Currently we have only one index which has all the data. I looked at the RollOver API of ES, is this the correct way? But the problem seems when we want to search for some data in previous indexes, how this can be done? Any answers are appreciated, Thanks.
Yes, you are on the correct path, for searching into your old indices, you can link multiple indices to one alias, using alias API, and instead of searching for a single index, you need to search again the unified alias.
Refer to this official example on how to link multiple indices to the same alias(alias1 in the below example)
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}

Update a document using another field than _id in ElasticSearch

I would like to do a partial update of a document in ElasticSearch 2.3. The documentation shows:
POST /website/blog/1/_update
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
Is there a way to update a document using another field other than the _id (here 1) to identify the document?
Use update_by_query API and run a query which will select the documents that match the other field that you want. Basically, with that query you identify the documents you want to update following your own rules.

Can I create a document with the update API if the document doesn't exist yet

I have a very simple question :
I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.
I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.
Can you provide me some explaination ?
Thanks in advance!
This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.
Given an index created with the following documents:
PUT /cars/car/1
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }
We can get the upsert functionality you want using the update api with the following api call.
POST /cars/car/3/_update
{
"doc": {
"color" : "brown",
"brand" : "ford"
},
"doc_as_upsert" : true
}
This api call will add the document to the index since it does not exist.
Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.
POST /cars/car/3/_update
{
"doc": {
"color" : "black",
"brand" : "ford"
},
"doc_as_upsert" : true
}
AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.
UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.
Quote from the documentation:
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
For bulk API use
bulks.push({
update: {
_index: 'index',
_type: 'type',
_id: id
}
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});
As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.
after_commit on: :update do
begin
__elasticsearch__.update_document
rescue Elasticsearch::Transport::Transport::Errors::NotFound
__elasticsearch__.index_document
end
end
I think you want "create" action
Here's the bulk API documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.
Difference between actions:
create
(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.
index
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
update
(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.
doc
(Optional, object) The partial document to index. Required for update operations.

Index huge data into Elasticsearch

I am new to elasticsearch and have huge data(more than 16k huge rows in the mysql table). I need to push this data into elasticsearch and am facing problems indexing it into it.
Is there a way to make indexing data faster? How to deal with huge data?
Expanding on the Bulk API
You will make a POST request to the /_bulk
Your payload will follow the following format where \n is the newline character.
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
...
Make sure your json is not pretty printed
The available actions are index, create, update and delete.
Bulk Load Example
To answer your question, if you just want to bulk load data into your index.
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
The first line contains the action and metadata. In this case, we are calling create. We will be inserting a document of type type1 into the index named test with a manually assigned id of 3 (instead of elasticsearch auto-generating one).
The second line contains all the fields in your mapping, which in this example is just field1 with a value of value3.
You will just concatenate as many as these as you'd like to insert into your index.
This may be an old thread but I though I would comment anyway for anyone who is looking for a solution to this problem. The JDBC river plugin for Elastic Search is very useful for importing data from a wide array of supported DB's.
Link to JDBC' River source here..
Using Git Bash' curl command I PUT the following configuration document to allow for communication between ES instance and MySQL instance -
curl -XPUT 'localhost:9200/_river/uber/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/elastic",
"user" : "root",
"password" : "root",
"sql" : "select * from tbl_indexed",
"poll" : "24h",
"max_retries": 3,
"max_retries_wait" : "10s"
},
"index": {
"index": "uber",
"type" : "uber",
"bulk_size" : 100
}
}'
Ensure you have the mysql-connector-java-VERSION-bin in the river-jdbc plugin directory which contains jdbc-river' necessary JAR files.
Try bulk api
http://www.elasticsearch.org/guide/reference/api/bulk.html

Resources