logstash keeps placing just 1 entry in my index - elasticsearch

I have the following logstash conf file:
input {
jdbc {
jdbc_driver_library => "C:\Program Files\Microsoft JDBC DRIVER 6.2 for SQL Server\sqljdbc_6.2\enu\mssql-jdbc-6.2.1.jre8"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://TST-DBS-20;user=Elasticsearch;password=elastic123;"
jdbc_user => "Elasticsearch"
statement => "SELECT NewsID, HeadLine, BodyText, DateSubmitted, Approved, NULLIF(UserName,'') as UserName, NULLIF(Type,'') as NewsType, NULLIF(Caption,'') as caption, NULLIF(Author,'') as Author, NULLIF(Contact,'') as Contact, NULLIF(StaffID,'') as StaffID, SocialClubRegionID, DateCreated, CreatedBy, LastModifiedDate, ModifiedBy
FROM [News].[dbo].[News]"
}
}
filter {
}
output {
elasticsearch {
hosts => ["tst-sch-20:9200"]
index => "newsindex"
document_id => "%{id}"
user => "elastic"
password => elastic123
}
stdout { codec => json }
}
and I've created the following index:
put newsindex
{
"settings" : {
"number_of_shards":3,
"number_of_replicas":2
},
"mappings" : {
"news": {
"properties": {
"NewsId": {
"type": "integer"
},
"newstype": {
"type": "text"
},
"bodytext": {
"type": "text"
}
}
}
}
}
After running the above script, there's no entry in the logstash log files to suggest anything went wrong. If I run the SQL command directly in SQL,then strangely enough, the single entry in the index is the last entry of my select statement, so it's almost as if the script is inserting then overwriting such that I end up with a single record.

If you look at the _id field of the record loaded into Elasticsearch, you'll see it is %{id} because your query does not have an id field. You'll want to change to document_id => "%{newsid}" or whatever makes sense based on your query.

Related

Does Logstash support Elasticsearch's _update_by_query?

Does the Elasticsearch output plugin support elasticsearch's _update_by_query?
https://www.elastic.co/guide/en/logstash/6.5/plugins-outputs-elasticsearch.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html
The elasticsearch output plugin can only make calls to the _bulk endpoint, i.e. using the Bulk API.
If you want to call the Update by Query API, you need to use the http output plugin and construct the query inside the event yourself. If you explain what you want to achieve, I can update my answer with some more details.
Note: There's an issue requesting this feature, but it's still open after two years.
UPDATE
So if your input event is {"cname":"wang", "cage":11} and you want to update by query all documents with "cname":"wang" to set "cage":11, your query needs to look like this:
POST your-index/_update_by_query
{
"script": {
"source": "ctx._source.cage = params.cage",
"lang": "painless",
"params": {
"cage": 11
}
},
"query": {
"term": {
"cname": "wang"
}
}
}
So your Logstash config should look like this (your input may vary but I used stdin for testing purposes):
input {
stdin {
codec => "json"
}
}
filter {
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.cage = params.cage"
"[script][params][cage]" => "%{cage}"
"[query][term][cname]" => "%{cname}"
}
remove_field => ["host", "#version", "#timestamp", "cname", "cage"]
}
}
output {
http {
url => "http://localhost:9200/index/doc/_update_by_query"
http_method => "post"
format => "json"
}
}
The same result can be obtained with standard elasticsearch plugins:
input {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
index => "<your index pattern>"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
...
}
output {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
action => "update"
document_id => "%{[#metadata][_id]}"
index => "%{[#metadata][_index]}"
}
}

Creating an elasticsearch index from logstash

I am trying to load data from an Sql Server into ElasticSearch. I am using Logstash with the jdbc plugin and the elastic-search plugin. I am loading my data in ElasticSearch but can not figure out how to set my index. I am using a template index to try this. Below is what I am using but whenever I search I do not get any results.
logstash.config
# contents of logstash\bin\logstash.config
input {
jdbc {
jdbc_driver_library => ".\Microsoft JDBC Driver 6.2 for SQL Server\sqljdbc_6.2\enu\mssql-jdbc-6.2.1.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://mydbserver;databaseName=mydb;"
jdbc_user => "******"
jdbc_password => "******"
schedule => "* * * * *"
parameters => { "classification" => "EMPLOYEE" }
statement => "SELECT Cost_Center, CC_Acct_1, CC_Acct_2, CC_Acct_3 from dbo.Cost_Center where CC_Classification = :classification"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "allocation-testweb"
template => "index_template.json"
}
#stdout { codec => rubydebug }
}
index_template.json
{
"template": "allocation-*",
"order":1,
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
"analyzer": {
"substring_analyzer": {
"tokenizer": "ngram_tokenizer",
"filter": ["lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": ["letter","digit"]
}
}
}
},
"mappings":{
"costcenter": {
"properties": {
"cc_acct_1": {
"type": "string",
"analyzer": "substring_analyzer"
},
"cc_acct_2": {
"type": "string",
"analyzer": "substring_analyzer"
}
}
}
}
I have created a similar index in code while doing some initial research. Is my index_template incorrect or is there another way I should be doing this?
Update:
I had the mismatched index names between my 2 files. I'm now able to search using Postman and curl. However when I try to get data using a NEST client I can never get data back. Below is the code snippet for the query.
var searchResult = client.Search<CostCenter>(s => s
.Size(1000)
.Index("allocation_testweb")
.MatchAll());
This previously worked with the same data loaded from a file. CostCenter is simply an object with members called Cost_Center, CC_Acct_1, CC_Acct_2, and CC_Acct_3. I'm sure again I am over complicating the issue and missing something obvious.
UPDATE II:
I have made the changes suggested by #RussCam below and still do not get any results back. Below is my updated code.
var node = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(node);
//.InferMappingFor<CostCenter>(m => m.IndexName("allocation_testweb"));
var client = new ElasticClient(settings);
var searchResult = client.Search<CostCenter>(s => s
.Type("costCenter")
.Size(1000)
.Index("allocation_testweb")
.MatchAll());
I commented out the InferMappingFor<> since it was not providing a result.
Mapping image requested by #RussCam. I've also included my costcenter class (I have tried naming all variations of costcenter).
public class costcenter
{
public string cost_center { get; set; }
public string cc_acct_1 { get; set; }
public string cc_acct_2 { get; set; }
public string cc_acct_3 { get; set; }
}

Logstash - Send output from log files to elk

I have an index in elastic search that has a field named locationCoordinates. It's being sent to ElasticSearch from logstash.
The data in this field looks like this...
-38.122, 145.025
When this field appears in ElasticSearch it is not coming up as a geo point.
I know if I do this below it works.
{
"mappings": {
"logs": {
"properties": {
"http_request.locationCoordinates": {
"type": "geo_point"
}
}
}
}
}
But what I would like to know is how can i change my logstash.conf file so that it does this at startup.
At the moment my logstash.conf looks a bit like this...
input {
# Default GELF input
gelf {
port => 12201
type => gelf
}
# Default TCP input
tcp {
port => 5000
type => syslog
}
# Default UDP input
udp {
port => 5001
type => prod
codec => json
}
file {
path => [ "/tmp/app-logs/*.log" ]
codec => json {
charset => "UTF-8"
}
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json{
source => "message"
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
And I end up with this in Kibana (without the little Geo sign).
You simply need to modify your elasticsearch output to configure an index template in which you can add your additional mapping.
output {
elasticsearch {
hosts => "elasticsearch:9200"
template_overwrite => true
template => "/path/to/template.json"
}
}
And then in the file at /path/to/template.json you can add your additional geo_point mapping
{
"template": "logstash-*",
"mappings": {
"logs": {
"properties": {
"http_request.locationCoordinates": {
"type": "geo_point"
}
}
}
}
}
If you want to keep the official logstash template, you can download it and add your specific geo_point mapping to it.

logstash output to elasticsearch with document_id; what to do when I don't have a document_id?

I have some logstash input where I use the document_id to remove duplicates. However, most input doesn't have a document_id. The following plumbs the actual document_id through, but if it doesn't exist, it gets accepted as literally %{document_id}, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:
output {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
}
I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.
output {
elasticsearch_http {
host => "127.0.0.1"
if document_id {
document_id => "%{document_id}"
}
}
}
Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
elasticsearch_http {
host => "127.0.0.1"
if
I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:
if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {
You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:
output {
if [document_id] {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
} else {
elasticsearch_http {
host => "127.0.0.1"
}
}
}
(But the suggestion in one of the other answers to use the uuid filter is good too.)
One way to solve this is to make sure a document_idis always available. You can achieve this by adding a UUID filter in the filter section that would create the document_id field if it is not present.
filter {
if "" in [document_id] {
uuid {
target => "document_id"
}
}
}
Edited per Magnus Bäck's suggestion. Thanks!
Reference : docinfo_fields
For any document added in elasticsearch, the _id is auto-generated if not specified during insert. We can use this same _id later to update/delete/search queries by using docinfo_fields feature.
Example :
filter {
json {
source => "message"
}
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
query => "..."
docinfo_fields => {
"_id" => "docid"
"_index" => "document_index"
}
}
if ("_elasticsearch_lookup_failure" not in [tags]) {
#... doc update logic ...
}
}
output {
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
index => "%{document_index}"
action => "update"
doc_as_upsert => true
document_id => "%{docid}"
}
}

How to stop logstash from creating a default mapping in ElasticSearch

I am using logstash to feed logs into ElasticSearch.
I am configuring logstash output as:
input {
file {
path => "/tmp/foo.log"
codec =>
plain {
format => "%{message}"
}
}
}
output {
elasticsearch {
#host => localhost
codec => json {}
manage_template => false
index => "4glogs"
}
}
I notice that as soon as I start logstash it creates a mapping ( logs ) in ES as below.
{
"4glogs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
How can I prevent logstash from creating this mapping ?
UPDATE:
I have now resolved this error too. "object mapping for [logs] tried to parse as object, but got EOF, has a concrete value been provided to it?"
As John Petrone has stated below, once you define a mapping, you have to ensure that your documents conform to the mapping. In my case, I had defined a mapping of "type: nested" but the output from logstash was a string.
So I removed all codecs ( whether json or plain ) from my logstash config and that allowed the json document to pass through without changes.
Here is my new logstash config ( with some additional filters for multiline logs ).
input {
kafka {
zk_connect => "localhost:2181"
group_id => "logstash_group"
topic_id => "platform-logger"
reset_beginning => false
consumer_threads => 1
queue_size => 2000
consumer_id => "logstash-1"
fetch_message_max_bytes => 1048576
}
file {
path => "/tmp/foo.log"
}
}
filter {
multiline {
pattern => "^\s"
what => "previous"
}
multiline {
pattern => "[0-9]+$"
what => "previous"
}
multiline {
pattern => "^$"
what => "previous"
}
mutate{
remove_field => ["kafka"]
remove_field => ["#version"]
remove_field => ["#timestamp"]
remove_tag => ["multiline"]
}
}
output {
elasticsearch {
manage_template => false
index => "4glogs"
}
}
You will need a mapping to store data in Elasticsearch and to search on it - that's how ES knows how to index and search those content types. You can either let logstash create it dynamically or you can prevent it from doing so and instead create it manually.
Keep in mind you cannot change existing mappings (although you can add to them). So first off you will need to delete the existing index. You would then modify your settings to prevent dynamic mapping creation. At the same time you will want to create your own mapping.
For example, this will create the mappings for the logstash data but also restrict any dynamic mapping creation via "strict":
$ curl -XPUT 'http://localhost:9200/4glogs/logs/_mapping' -d '
{
"logs" : {
"dynamic": "strict",
"properties" : {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
'
Keep in mind that the index name "4glogs" and the type "logs" need to match what is coming from logstash.
For my production systems I generally prefer to turn off dynamic mapping as it avoids accidental mapping creation.
The following links should be useful if you want to make adjustments to your dynamic mappings:
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/dynamic-mapping.html
logs in this case is the index_type. If you don't want to create it as logs, specify some other index_type on your elasticsearch element. Every record in elasticsearch is required to have an index and a type. Logstash defaults to logs if you haven't specified it.
There's always an implicit mapping created when you insert records into Elasticsearch, so you can't prevent it from being created. You can create the mapping yourself before you insert anything (via say a template mapping).
The setting manage_template of false just prevents it from creating the template mapping for the index you've specified. You can delete the existing template if it's already been created by using something like curl -XDELETE http://localhost:9200/_template/logstash?pretty
Index templates can help you. Please see this jira for more details. You can create index templates with wildcard support to match an index name and put your default mappings.

Resources