Creating an elasticsearch index from logstash - elasticsearch

I am trying to load data from an Sql Server into ElasticSearch. I am using Logstash with the jdbc plugin and the elastic-search plugin. I am loading my data in ElasticSearch but can not figure out how to set my index. I am using a template index to try this. Below is what I am using but whenever I search I do not get any results.
logstash.config
# contents of logstash\bin\logstash.config
input {
jdbc {
jdbc_driver_library => ".\Microsoft JDBC Driver 6.2 for SQL Server\sqljdbc_6.2\enu\mssql-jdbc-6.2.1.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://mydbserver;databaseName=mydb;"
jdbc_user => "******"
jdbc_password => "******"
schedule => "* * * * *"
parameters => { "classification" => "EMPLOYEE" }
statement => "SELECT Cost_Center, CC_Acct_1, CC_Acct_2, CC_Acct_3 from dbo.Cost_Center where CC_Classification = :classification"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "allocation-testweb"
template => "index_template.json"
}
#stdout { codec => rubydebug }
}
index_template.json
{
"template": "allocation-*",
"order":1,
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
"analyzer": {
"substring_analyzer": {
"tokenizer": "ngram_tokenizer",
"filter": ["lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": ["letter","digit"]
}
}
}
},
"mappings":{
"costcenter": {
"properties": {
"cc_acct_1": {
"type": "string",
"analyzer": "substring_analyzer"
},
"cc_acct_2": {
"type": "string",
"analyzer": "substring_analyzer"
}
}
}
}
I have created a similar index in code while doing some initial research. Is my index_template incorrect or is there another way I should be doing this?
Update:
I had the mismatched index names between my 2 files. I'm now able to search using Postman and curl. However when I try to get data using a NEST client I can never get data back. Below is the code snippet for the query.
var searchResult = client.Search<CostCenter>(s => s
.Size(1000)
.Index("allocation_testweb")
.MatchAll());
This previously worked with the same data loaded from a file. CostCenter is simply an object with members called Cost_Center, CC_Acct_1, CC_Acct_2, and CC_Acct_3. I'm sure again I am over complicating the issue and missing something obvious.
UPDATE II:
I have made the changes suggested by #RussCam below and still do not get any results back. Below is my updated code.
var node = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(node);
//.InferMappingFor<CostCenter>(m => m.IndexName("allocation_testweb"));
var client = new ElasticClient(settings);
var searchResult = client.Search<CostCenter>(s => s
.Type("costCenter")
.Size(1000)
.Index("allocation_testweb")
.MatchAll());
I commented out the InferMappingFor<> since it was not providing a result.
Mapping image requested by #RussCam. I've also included my costcenter class (I have tried naming all variations of costcenter).
public class costcenter
{
public string cost_center { get; set; }
public string cc_acct_1 { get; set; }
public string cc_acct_2 { get; set; }
public string cc_acct_3 { get; set; }
}

Related

ElasticSearch: populating ip_range type field via logstash

I'm experimenting with the ip_range field type in ElasticSearch 6.8 (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/range.html) and struggle to find a way to load ip data into the field properly via logstash
I was able to load some sample data via Kibana Dev Tools, but cannot figure out a way to do the same via logstash.
Index definition
PUT test_ip_range
{
"mapping": {
"_doc": {
"properties": {
"ip_from_to_range": {
"type": "ip_range"
},
"ip_from": {
"type": "ip"
},
"ip_to": {
"type": "ip"
}
}
}
}
}
Add sample doc:
PUT test_ip_range/_doc/3
{
"ip_from_to_range" :
{
"gte" : "<dotted_ip_from>",
"lte": "<dotted_ip_to>"
}
}
Logstash config (reading from DB)
input {
jdbc {
...
statement => "SELECT ip_from, ip_to, <???> AS ip_from_to_range FROM sample_ip_data"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "<host>"
"index" => "test_ip_range"
"document_type" => "_doc"
}
}
Question:
How do I get ip_from and ip_to DB fields into their respective gte and lte parts of the ip_from_to_range via logstash config??
I know I can also insert the ip range in CIDR notation, but would like to be able to have both options - loading in CIDR notation and loading as a range.
After some trial and error, finally figured out the logstash config.
I had posted about a similar issue here, which finally got me on the right track with the syntax for this use case as well.
input { ... }
filter {
mutate {
add_field => {
"[ip_from_to_range]" =>
'{
"gte": "%{ip_from}",
"lte": "%{ip_to}"
}'
}
}
json {
source => "ip_from_to_range"
target => "ip_from_to_range"
}
}
output { ... }
Filter parts explained
mutate add_field: create a new field [ip_from_to_range] with its value being a json string ( '{...}' ). It is important to have the field as [field_name], otherwise the next step to parse the string into json object doesn't work
json: parse the string representation into a json object

logstash keeps placing just 1 entry in my index

I have the following logstash conf file:
input {
jdbc {
jdbc_driver_library => "C:\Program Files\Microsoft JDBC DRIVER 6.2 for SQL Server\sqljdbc_6.2\enu\mssql-jdbc-6.2.1.jre8"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://TST-DBS-20;user=Elasticsearch;password=elastic123;"
jdbc_user => "Elasticsearch"
statement => "SELECT NewsID, HeadLine, BodyText, DateSubmitted, Approved, NULLIF(UserName,'') as UserName, NULLIF(Type,'') as NewsType, NULLIF(Caption,'') as caption, NULLIF(Author,'') as Author, NULLIF(Contact,'') as Contact, NULLIF(StaffID,'') as StaffID, SocialClubRegionID, DateCreated, CreatedBy, LastModifiedDate, ModifiedBy
FROM [News].[dbo].[News]"
}
}
filter {
}
output {
elasticsearch {
hosts => ["tst-sch-20:9200"]
index => "newsindex"
document_id => "%{id}"
user => "elastic"
password => elastic123
}
stdout { codec => json }
}
and I've created the following index:
put newsindex
{
"settings" : {
"number_of_shards":3,
"number_of_replicas":2
},
"mappings" : {
"news": {
"properties": {
"NewsId": {
"type": "integer"
},
"newstype": {
"type": "text"
},
"bodytext": {
"type": "text"
}
}
}
}
}
After running the above script, there's no entry in the logstash log files to suggest anything went wrong. If I run the SQL command directly in SQL,then strangely enough, the single entry in the index is the last entry of my select statement, so it's almost as if the script is inserting then overwriting such that I end up with a single record.
If you look at the _id field of the record loaded into Elasticsearch, you'll see it is %{id} because your query does not have an id field. You'll want to change to document_id => "%{newsid}" or whatever makes sense based on your query.

reindex while converting a string value of a specific field (present in old index) into a number field value (in the new index)

Could I ask, how could I reindex while converting a 'string' field e.g. "field2": "123.2" (in old index documents) into a float/double number e.g. "field2": 123.2 (intended to be in the new index) ? This post is the closest I could get, but I do not know which function to use for the cast/conversion of a string to a number. I am using ElasticSearch version 2.3.3. Thank you very much for any advice !!!
You could use Logstash to reindex your data and convert the field. Something like the following:
input {
elasticsearch {
hosts => "es.server.url"
index => "old_index"
query => "*"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
mutate {
convert => { "fieldname" => "long" }
}
}
output {
elasticsearch {
host => "es.server.url"
index => "new_index"
index_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
Use Elasticsearch templates to specify the mapping for the new index and specify the field as a double type.
The easiest way to build a template is to use the existing mapping.
GET oldindex/_mapping
POST _template/templatename
{
"template" : "newindex", // this can be a wildcard pattern to match indexes
"mappings": { // this is copied from the response of the previous call
"mytype": {
"properties": {
"field2": {
"type": "double" // change the type
}
}
}
}
}
POST newindex
GET newindex/_mapping
Then use the elasticsearch _reindex API to move the data from the old index to the new index and parse the field as a double using an inline scripting (you may need to enable inline scripting)
POST _reindex
{
"source": {
"index": "oldindex"
},
"dest": {
"index": "newindex"
},
"script": {
"inline": "ctx._source.field2 = ctx._source.field2.toDouble()"
}
}
Edit: Updated to use _reindex endpoint

ELK for windows logs processing

I've made a working ELK stack on Debian Wheezy and have set up Nxlog to gather windows logs. I see the logs in Kibana - everything is working fine, but i get too much data and want to filter it by removing some fields that I don't need.
I've made a filter section but it's not working at all. What can be the reason?
The filter above
input {
tcp {
type => "eventlog"
port => 3515
format => "json"
}
}
filter {
type => "eventlog"
mutate {
remove => { "Hostname", "Keywords", "SeverityValue", "Severity", "SourceName", "ProviderGuid" }
remove => { "Version", "Task", "OpcodeValue", "RecordNumber", "ProcessID", "ThreadID", "Channel" }
remove => { "Category", "Opcode", "SubjectUserSid", "SubjectUserName", "SubjectDomainName" }
remove => { "SubjectLogonId", "ObjectType", "IpPort", "AccessMask", "AccessList", "AccessReason" }
remove => { "EventReceivedTime", "SourceModuleName", "SourceModuleType", "#version", "type" }
remove => { "_index", "_type", "_id", "_score", "_source", "KeyLength", "TargetUserSid" }
remove => { "TargetDomainName", "TargetLogonId", "LogonType", "LogonProcessName", "AuthenticationPackageName" }
remove => { "LogonGuid", "TransmittedServices", "LmPackageName", "ProcessName", "ImpersonationLevel" }
}
}
output {
elasticsearch {
cluster => "wisp"
node_name => "io"
}
}
I think you try to remove fields that do not exist in some logs.
Does all your logs contains all the fieds you're trying to remove ?
If not, you have to identify your logs before removing fields.
Your filter config will look like this :
filter {
type => "eventlog"
if [somefield] == "somevalue" {
mutate {
remove => { "specificfieldtoremove1", "specificfieldtoremove2" }
}
}
}

Creating an index Nest

How would I recreate the following index using Elasticsearch Nest API?
Here is the json for the index including the mapping:
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"data": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
}
Here is my attempt:
var newIndex = client.CreateIndexAsync(indexName, index => index
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true))
.AddMapping<Object>(mapping => mapping
.IndexAnalyzer("trigram")
.Type("string"))
);
The documentation does not mention anything about this?
UPDATE:
Found this post that uses
var index = new IndexSettings()
and then adds Analysis with the string literal json.
index.Add("analysis", #"{json});
Where can one find more examples like this one and does this work?
Creating an index in older versions
There are two main ways that you can accomplish this as outlined in the Nest Create Index Documentation:
Here is the way where you directly declare the index settings as Fluent Dictionary entries. Just like you are doing in your example above. I tested this locally and it produces the index settings that match your JSON above.
var response = client.CreateIndex(indexName, s => s
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Settings(settings => settings
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "1s")
.Add("mapping.allow_type_wrapper", true)
.Add("analysis.filter.trigrams_filter.type", "nGram")
.Add("analysis.filter.trigrams_filter.min_gram", "3")
.Add("analysis.filter.trigrams_filter.max_gram", "3")
.Add("analysis.analyzer.trigrams.type", "custom")
.Add("analysis.analyzer.trigrams.tokenizer", "standard")
.Add("analysis.analyzer.trigrams.filter.0", "lowercase")
.Add("analysis.analyzer.trigrams.filter.1", "trigrams_filter")
)
.AddMapping<Object>(mapping => mapping
.Type("data")
.AllField(af => af.Enabled())
.Properties(prop => prop
.String(sprop => sprop
.Name("text")
.IndexAnalyzer("trigrams")
)
)
)
);
Please note that NEST also includes the ability to create index settings using strongly typed classes as well. I will post an example of that later, if I have time to work through it.
Creating index with NEST 7.x
Please also note that in NEST 7.x CreateIndex method is removed. Use Indices.Create isntead. Here's the example.
_client.Indices
.Create(indexName, s => s
.Settings(se => se
.NumberOfReplicas(replicas)
.NumberOfShards(shards)
.Setting("merge.policy.merge_factor", "10")));
In case people have NEST 2.0, the .NumberOfReplicas(x).NumberOfShards(y) are in the Settings area now so specify within the lamba expression under Settings.
EsClient.CreateIndex("indexname", c => c
.Settings(s => s
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
)
NEST 2.0 has a lot of changes and moved things around a bit so these answers are a great starting point for sure. You may need to adjust a little for the NEST 2.0 update.
Small example :
EsClient.CreateIndex("indexname", c => c
.NumberOfReplicas(replicasNr)
.NumberOfShards(shardsNr)
.Settings(s => s
.Add("merge.policy.merge_factor", "10")
.Add("search.slowlog.threshold.fetch.warn", "15s")
)
#region Analysis
.Analysis(descriptor => descriptor
.Analyzers(bases => bases
.Add("folded_word", new CustomAnalyzer()
{
Filter = new List<string> { "icu_folding", "trim" },
Tokenizer = "standard"
}
)
.TokenFilters(i => i
.Add("engram", new EdgeNGramTokenFilter
{
MinGram = 1,
MaxGram = 20
}
)
)
.CharFilters(cf => cf
.Add("drop_chars", new PatternReplaceCharFilter
{
Pattern = #"[^0-9]",
Replacement = ""
}
)
#endregion
#region Mapping Categories
.AddMapping<Categories>(m => m
.Properties(props => props
.MultiField(mf => mf
.Name(n => n.Label_en)
.Fields(fs => fs
.String(s => s.Name(t => t.Label_en).Analyzer("folded_word"))
)
)
)
#endregion
);
In case anyone has migrated to NEST 2.4 and has the same question - you would need to define your custom filters and analyzers in the index settings like this:
elasticClient.CreateIndex(_indexName, i => i
.Settings(s => s
.Analysis(a => a
.TokenFilters(tf => tf
.EdgeNGram("edge_ngrams", e => e
.MinGram(1)
.MaxGram(50)
.Side(EdgeNGramSide.Front)))
.Analyzers(analyzer => analyzer
.Custom("partial_text", ca => ca
.Filters(new string[] { "lowercase", "edge_ngrams" })
.Tokenizer("standard"))
.Custom("full_text", ca => ca
.Filters(new string[] { "standard", "lowercase" } )
.Tokenizer("standard"))))));
For 7.X plus you can use the following code to create an index with Shards, Replicas and with Automapping:
if (!_elasticClient.Indices.Exists(_elasticClientIndexName).Exists)
{
var response = _elasticClient.Indices
.Create(_elasticClientIndexName, s => s
.Settings(se => se
.NumberOfReplicas(1)
.NumberOfShards(shards)
).Map<YourDTO>(
x => x.AutoMap().DateDetection(false)
));
if (!response.IsValid)
{
// Elasticsearch index status is invalid, log an exception
}
}

Resources