Adding Geo_shape to Elasticsearch using Logstash - elasticsearch

I have a CSV file which contains Geometries in WKT format. I was trying to ingest geo_shape data using CSV file. I created a mapping as given in file "input_mapping.json"
{
"mappings" : {
"doc" : {
"properties" : {
"Lot" : {
"type" : "long"
},
"Lot_plan" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Parcel_Address_Line_1" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Plan" : {
"type" : "long"
},
"Tenure" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"WKT" : {
"type" : "geo_shape"
}
}
}
}
}
WKT is my geo_shape and it is in WKT(String) format.
Below is input CSV file which I am trying to insert using logstash:
WKT,Lot_plan,Tenure,Parcel_Address_Line_1,Lot,Plan
"POLYGON ((148.41503356 -26.62829003,148.44798048 -26.62800857,148.45234634 -26.63457929,148.45507096 -26.64778132,148.41735984 -26.64808729,148.41514107 -26.64091476,148.41503356 -26.62829003))",21MM1,FH,MASSEY DOWNS,21,1
"POLYGON ((148.45507096 -26.64778132,148.45779641 -26.66098396,148.45859297 -26.66259081,148.45801376 -26.66410383,148.45989472 -26.67278979,148.42510081 -26.67310328,148.42434355 -26.67065659,148.41735984 -26.64808729,148.45507096 -26.64778132))",21MM2,FH,,21,2
"POLYGON ((148.39514404 -26.68791317,148.37228669 -26.68894235,148.37188338 -26.68895271,148.37092744 -26.68897445,148.37051869 -26.68898023,148.36312088 -26.68908468,148.36261958 -26.66909425,148.39598678 -26.66869309,148.39584372 -26.66934742,148.39583604 -26.66968184,148.39590526 -26.67007957,148.39598629 -26.67039933,148.39614586 -26.67085156,148.39625052 -26.67085085,148.42434355 -26.67065659,148.42510081 -26.67310328,148.42537156 -26.67397795,148.42549108 -26.68541445,148.41781484 -26.68547248,148.39988482 -26.68562107,148.39966009 -26.68562292,148.39704234 -26.68564442,148.39514404 -26.68791317))",21MM3,LL,DERWENT PARK,21,3
And my logstash conf file is :
input{
file{
path=>"D:/input.csv"
start_position=>"beginning"
sincedb_path=>"D:/sample.text"
}
}
filter{
csv{
separator =>","
columns =>["WKT","Lot_plan","Tenure","Parcel_Address_Line_1","Lot","Plan"]
skip_header=>true
skip_empty_columns=>true
convert => {
"Lot" => "integer"
"Plan" => "integer"
}
remove_field =>[ "_source","message","host","path","#version","#timestamp" ]
}
}
output{
elasticsearch{
hosts=>"http://localhost:9701"
index=>"input_mapping"
template =>"D:/input_mapping.json"
template_name => "input_mapping"
manage_template => true
}
}
Due to some reason it is not getting ingested in the ElasticSearch. I am using ElasticSearch version 6.5.4 and logstash version 6.5.4.
Kindly let me know if I have missed anything.

I realized there will be many other developers who would be looking for problem similar which I had faced it. Later point of time, I checked GDAL( ogr2ogr) which provides ElasticSearch ingestion. Also I use PostgreSQL to ingest the CSV file. Therefore using ogr2ogr tool helps me by following the below steps:
First ingest my CSV file in PostgreSQL where I put WKT as text column in a table.
Create another column within the table and updated this column with ST_GeomFromText function.
update TableName set WKT_GEOM=ST_GeomFromText("WKT",4632)
(Note: I already installed the postgis in PostgreSQL)
Now I start my ElasticSearch.
Using ogr2ogr by following the examples provided:
a.First create elasticsearch mapping using ogr2ogr.
b.Now ingest the data from PostgreSQL to ElasticSearch.
https://gdal.org/drivers/vector/elasticsearch.html
In this way, I was able to perform geoquery in Elasticsearch. But unfortunately it was without logstash. :(
Please comment if you have any doubts.

Related

Creating a new field into existing index - ElasticSearch

I am wanting to create a new field and add it to an existing index so that way I can send a unique value to that new field. I was hoping there was an API to do this without having to do it in the CLI of Kibana. But I ran into this article that tells you how to add new fields to an existing index.
I tried to add it under _source field but it did not allow me.
PUT customer-simulation-es-app-logs-development-2021-07/_mapping
{
"_source":{
"TransactionKey":{
"type": "keyword"
}
}
}
So I then added it to properties which allowed me:
PUT customer-simulation-es-app-logs-development-2021-07/_mapping
{
"properties":{
"TransactionKey":{
"type": "keyword"
}
}
}
To make sure it was updated I ran the cmd GET customer-simulation-es-app-logs-development-2021-07/_mapping which did return it.
{
"customer-simulation-es-app-logs-development-2021-07" : {
"mappings" : {
"properties" : {
"#timestamp" : {
"type" : "date"
},
"TransactionKey" : {
"type" : "keyword"
},
"exceptions" : {
"properties" : {
"ClassName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
.....
But when I go to Discover and type in TransactionKey for the fields nothing pops up. Did I not add the new field correctly to the existing index?
If you're running a version prior to 7.11, then you need to go to Stack Management > Index pattern and refresh your index pattern before seeing your new field in the Discover view. You need to do this every time your index mapping changes.
Since 7.11, the index pattern are being refreshed automatically whenever needed.

time-based when configure an index pattern not working

Hi!
I have an issue about set a date field as time-based when I configure my index pattern. When I choose my date filed on the timefield name, I cannot Vizualise any data on the Discover part.
However, when I uncheck the box named Index contains time-based events, all data appears:
Maybe I forgot something during my mapping ? There is the mapping I've set for this index:
"index_test" : {
"mappings": {
"tr": {
"_source": {
"enabled":true
},
"properties" : {
"id" : { "type" : "integer" },
"volume" : { "type" : "integer" },
"high" : { "type" : "float" },
"low" : { "type" : "float" },
"timestamp" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm:ss" }
}
}
}'
}
I am currently try to use timelion also, and it seems to not found any data to show. I think it cannot because of this time-based unchecked... Any idea about how set this timestamp as time-based without loose the data access on the Discover part ?
Simple question with simple answer... I just forgot to set the timepicker in the Right-top of the Discover part to show past data:

Logstash/Elasticsearch CSV Field Types, Date Formats and Multifields (.raw)

I've been playing around with getting a tab delimited file into Elasticsearch using the CSV filter in Logstash. Getting the data in was actually incredibly easy, but I'm having trouble getting the field types to come in right when I look at the data in Kibana. Dates and integers continue to come in as strings, so I can't plot by date or do any analysis functions on integers (sum, mean, etc).
I'm also having trouble getting the .raw version of the fields to populate. For example, in device I have data like "HTC One", but when if I do a pie chart in Kibana, it'll show up as two separate groupings "HTC" and "One". When I try to chart device.raw instead, it comes up as a missing field. From what I've read, it seems like Logstash should automatically create a raw version of each string field, but that doesn't seem to be happening.
I've been sifting through the documentation, google and stack, but haven't found a solution. Any ideas appreciated! Thanks.
Config file:
#logstash.conf
input {
file {
path => "file.txt"
type => "event"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["userid","date","distance","device"]
separator => " "
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
protocol => "http"
index => "userid"
workers => 2
template => template.json
}
#stdout {
# codec => rubydebug
#}
}
Here's the template file:
#template.json:
{
"template": "event",
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0,
"index" : {
"query" : { "default_field" : "userid" }
}
},
"mappings": {
"_default_": {
"_all": { "enabled": false },
"_source": { "compress": true },
"dynamic_templates": [
{
"string_template" : {
"match" : "*",
"mapping": { "type": "string", "index": "not_analyzed" },
"match_mapping_type" : "string"
}
}
],
"properties" : {
"date" : { "type" : "date", "format": "yyyy-MM-dd HH:mm:ss"},
"device" : { "type" : "string", "fields": {"raw": {"type": "string","index": "not_analyzed"}}},
"distance" : { "type" : "integer"}
}
}
}
Figured it out - the template name IS the index. So the "template" : "event" line should have been "template" : "userid"
I found another (easier) way to specify the type of the fields. You can use logstash's mutate filter to change the type of a field. Simply add the following filter after your csv filter to your logstash config
mutate {
convert => [ "fieldname", "integer" ]
}
For details check out the logstash docs - mutate convert

How to add default values while adding a new field in existing mapping in elasticsearch

This is my existing mapping in elastic search for one of the child document
sessions" : {
"_routing" : {
"required" : true
},
"properties" : {
"operatingSystem" : {
"index" : "not_analyzed",
"type" : "string"
},
"eventDate" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"durations" : {
"type" : "integer"
},
"manufacturer" : {
"index" : "not_analyzed",
"type" : "string"
},
"deviceModel" : {
"index" : "not_analyzed",
"type" : "string"
},
"applicationId" : {
"type" : "integer"
},
"deviceId" : {
"type" : "string"
}
},
"_parent" : {
"type" : "userinfo"
}
}
in above mapping "durations" field is an integer array. I need to update the existing mapping by adding a new field called "durationCount" whose default value should be the size of durations array.
PUT sessions/_mapping
{
"properties" : {
"sessionCount" : {
"type" : "integer"
}
}
}
using above mapping I am able to update the existing mapping but I am not able to figure out how to assign a value ( which would vary for each session document like it should be durations array size ) while updating the mapping. any ideas ?
Well 2 recommendations here -
Instead of adding default value , you can adjust it in the query using missing filter. Lets say , you want to search based on a match query - Instead of just match query , use a bool query with should clause having the match and missing filter. inside filtered query. This way , those documents which did not have the field is also accounted.
If you absolutely need the value in that field for existing documents , you need to reindex the whole set of documents. Or , use the out of box plugin , update by query -

How to search on a URL exactly in ElasticSearch / Kibana

I have imported an IIS log file and the data has moved through Logstash (1.4.2), into ElasticSearch (1.3.1) and then being displayed in Kibana.
My filter section is as follows:
filter {
grok {
match =>
["message" , "%{TIMESTAMP_ISO8601:iisTimestamp} %{IP:serverIP} %{WORD:method} %{URIPATH:uri} - %{NUMBER:port} - %{IP:clientIP} - %{NUMBER:status} %{NUMBER:subStatus} %{NUMBER:win32Status} %{NUMBER:timeTaken}"]
}
}
When using a Terms panel in Kibana, and using "uri" (one of my captured fields from Logstash), it is matching the tokens within the URI. Therefore it is matching items like:
'Scripts'
'/'
'EN
Q: How do I display the 'Top URLs' in their full form?
Q: How do I inform ElasticSearch that the field is 'not_analysed'. I don't mind having 2 fields, for example:
uri - The tokenized URI
uri.raw - the fully formed URL.
Can this be done Logstash side, or is this a mapping that needs to be set up in ElasticSearch?
Mapping is as follows :
//http://localhost:9200/iislog-2014.10.09/_mapping?pretty
{
"iislog-2014.10.09" : {
"mappings" : {
"iislogs" : {
"properties" : {
"#timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"#version" : {
"type" : "string"
},
"clientIP" : {
"type" : "string"
},
"device" : {
"type" : "string"
},
"host" : {
"type" : "string"
},
"id" : {
"type" : "string"
},
"iisTimestamp" : {
"type" : "string"
},
"logFilePath" : {
"type" : "string"
},
"message" : {
"type" : "string"
},
"method" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"os" : {
"type" : "string"
},
"os_name" : {
"type" : "string"
},
"port" : {
"type" : "string"
},
"serverIP" : {
"type" : "string"
},
"status" : {
"type" : "string"
},
"subStatus" : {
"type" : "string"
},
"tags" : {
"type" : "string"
},
"timeTaken" : {
"type" : "string"
},
"type" : {
"type" : "string"
},
"uri" : {
"type" : "string"
},
"win32Status" : {
"type" : "string"
}
}
}
}
}
}
In your Elasticsearch mapping:
url: {
type: "string",
index: "not_analyzed"
}
The problem is that the iislog- is not compliant with the logstash- format, and hence did not pick up the template:
My index format was iislog-YYYY.MM.dd, this did not use the out-of-the-box mappings by Logstash. When using the logstash- index format, Logstash will create 2 pairs of fields for strings. For example uri is:
uri (appears in Kibana)
uri.raw (does not appear in Kibana)
Note that the uri.raw will not appear in Kibana - but it is queryable.
So the solution to use an alternative index is to:
Don't bother! Use the default index format of logstash-%{+YYYY.MM.dd}
Add a "type" to the file input to help you filter the correct logs in Kibana (whilst using the logstash- index format)
input {
file {
type => "iislog"
....
}
}
Apply filtering in Kibana based in the type
OR
If you really really do want a different index format:
Copy the default configuration file to a new file, say iislog-template.json
Reference the configuration file in the output ==> elasticsearch like this:
output {
elasticsearch_http {
host => localhost
template_name => "iislog-template.json"
template => "<path to template>"
index => "iislog-%{+YYYY.MM.dd}"
}
}

Resources