Copy ElasticSearch-Index with Logstash - elasticsearch

I have an ready-build Apache-Index on one machine, that I would like to clone to another machine using logstash. Fairly easy i thought
input {
elasticsearch {
host => "xxx.xxx.xxx.xxx"
index => "logs"
}
}
filter {
}
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
}
}
that pulls over the docs, but doesn't stop as it uses the default query "*" (the original index has ~50.000 docs and I killed the former script, when the new index was over 600.000 docs and rising)
Next I tried to make sure the docs would get updated instead of duplicated, but this commit hasn't made it yet, so i don't have a primary..
Then I remembered sincedb but don't seem to be able to use that in the query (or is that possible)
Any advice? Maybe a complete different approach? Thanks a lot!

Assuming that the elasticsearch input creates a logstash event with the document id ( I assume it will be _id or something similar), try setting the elastic search output the following way:
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
document_id => "%{_id}"
}
}
That way, even if the elasticsearch input, for whatever reason, continues to push the same documents indefinitely, elasticsearch will merely updated the existing documents, instead of creating new documents with new ids.
Once you reach 50,000, you can stop.

Related

Collecting logs from different remote servers using just Logstash

Is it possible to send logs from different remote machines to elasticsearch using just logstash(no filebeats)? Is so, do I define same index in all the conf.d file in all the machines? I want all the logs to be in the same index.
Would i use logs-%{+YYYY.MM.dd} for the index of all config files to have them indexed into the same folder?
input {
file {
part => /home/ubuntu/logs/data.log
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index =>"logs-%{+YYYY.MM.dd}"
}
}
What you do is ok and it will work. Just one thing I would correct is that you should simply write to a data stream and not have to care about the index name and ILM matters (rollover, retention, etc), like this:
input {
file {
part => /home/ubuntu/logs/data.log
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "ubuntu"
data_stream_namespace => "prod"
}
}
The data stream name will be logs-ubuntu-prod, you can change the latter two to your liking.
Make sure to properly set up your data stream first, with an adequate Index Lifecycle Management policy, though.
On a different note, it's a waste of resource to install Logstash on all your remote machines which is supposed to work as centralized streaming engine. You should definitely either use Filebeat, or even better now the Elastic Agent which is fully manageable through Fleet in Kibana. You should have a look.

I want to Delete document by logstash,but it throws a exception

Now,I meet a question. My logstash configuration file as follows:
input {
redis {
host => "127.0.0.1"
port => 6379
db => 10
data_type => "list"
key => "local_tag_del"
}
}
filter {
}
output {
elasticsearch {
action => "delete"
hosts => ["127.0.0.1:9200"]
codec => "json"
index => "mbd-data"
document_type => "localtag"
document_id => "%{album_id}"
}
file {
path => "/data/elasticsearch/result.json"
}
stdout {}
}
I want to read id from redis, by logstash, notify es to delete document.
Excuse me,My English is poor,I hope that someone will help me .
Thx.
I can't help you particularly, because your problem is spelled out in your error message - logstash couldn't connect to your elasticsearch instance.
That usually means one of:
elasticsearch isn't running
elasticsearch isn't bound to localhost
That's nothing to do with your logstash config. Using logstash to delete documents is a bit unusual though, so I'm not entirely sure this isn't an XY problem

ElasticSearch assign own IDs while indexing with LogStash

I am indexing a large corpora of information and I have a string-key that I know is unique. I would like to avoid using the search and rather access documents by this artificial identifier.
Since the Path directive is discontinued in ES 1.5, anyone know a workaround to this problem!?
My data look like:
{unique-string},val1, val2, val3...
{unique-string2},val4, val5, val6...
I am using logstash to index the files and would prefer to fetch the documents through a direct get, rather than through an exact-match.
In your elasticsearch output plugin, just specify the document_id setting with a reference to the field you want to use as id, i.e. the one named 1 in your csv filter.
input {
file {...}
}
filter {
csv{
columns=>["1","2","3"]
separator => ","
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
index => "index-name"
document_id => "%{1}" <--- add this line
workers => 2
cluster => "elasticsearch-cluster"
protocol => "http"
}
}

Logstash reindex elasticsearch Issue

I'm trying to reindex the data by having elasticsearch as input and sending back to elasticsearch as output. The script is running fine but the indexing is going indefinitely. The script is as below
input {
elasticsearch {
host => "10.0.0.11"
index => "logstash-2015.02.05"
}
}
output {
elasticsearch {
host => "10.0.0.11"
protocol => "http"
cluster => "logstash"
node_name => "logindexer"
index => "logstash-2015.02.05_new"
}
}
This means if I have 200 docs under logstash-2015.02.05 index then it creates duplicate records in logstash-2015.02.05_new and keeps going until I stop the logstash agent. Is there a way to just restrict the documents in new index to have exactly the same as the old index? Pls help.

Way to populate Logstash output variable without getting it from an Input?

Is there another way to tell Logstash to supply a value to an output variable without pulling it from a Logstash input? For example, in my case I'd like to create an Elasticsearch index based on a performance run ID (which I'd do from an external script) and then have Logstash send to that. For now I was thinking of creating a tcp input just for receiving perf run info and then have a filter to match on the run id. Seems like a convoluted way to do this though. For example:
input {
tcp {
type => "perfinfo"
port => 8888
}
}
if [type] == "perfinfo" {
do some matching to extract the id
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure if setting manage_template to false would actually be necessary. I've read that it is.
Update
Thanks Nirdesh for that. Using Ruby might be very handy.
While I was waiting I tried using a grok filter like so:
grok {
match => { "message" => "%{WORD:perftype}-%{POSINT:perfid}" }
}
Which produced this stdout during debugging:
{
"message" => "awperf-14",
"#version" => "1",
"#timestamp" => "2014-10-17T20:01:19.758Z",
"host" => "0:0:0:0:0:0:0:1:33361",
"type" => "perfinfo",
"perftype" => "awperf",
"perfid" => "14"
}
Which I tried creating an index based on this like so:
index => "%{perftype}-%{perfid}"
So when I passed 'awperf-14' to the input, I ended up creating these indexes
%{perftype}-%{perfid}
awperf-14
Which is not what I was expecting. Also, it's the %{perftype}-%{perfid} index that starts to be populated, not awperf-14, the one I actually wanted.
Yes.
You can add any no. of your own variables either for intermediate result or for permanent using a property called add_field. All most all filters in logstash support this property.
So, for your soluation, you can use a ruby script to find out the id dynamically and store it in a new variable called id, which you can use it in output.
For Example :
input {
tcp {
type => "perfinfo"
port => 8888
}
}
filter{
if [type] == "perfinfo" {
ruby{
//do some processing
add_field => { "id" => "Some value" }
}
}
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure I can do what I was trying to do via Logstash. To be a clearer, I simply wanted to change the index based on the performance run ID I'm executing. There's nothing in the data that would have this information (I have to pull it from a DB). So instead of trying to have Logstash listen for a performance run ID, I scripted this externally. The script uses the Elasticsearch API to create a new index, and then does a string replace for the index in the Logstash config file. It then restarts Logstash, which normally happens between performance runs anyway. This approach was much easier to do, and seems cleaner.

Resources