I have an Elasticsearch index with the following mapping:
"pickup_datetime": {
"type": "date",
"format": "dateOptionalTime"
}
Here is an example of a date contained in the file that is being read in
"pickup_datetime": "2013-01-07 06:08:51"
I am using Logstash to read and insert data into ES with the following lines to attempt to convert the date string into the date type.
date {
match => [ "pickup_datetime", "yyyy-MM-dd HH:mm:ss" ]
target => "pickup_datetime"
}
But the match never seems to occur.
What am I doing wrong?
It turns out the date filter was before the csv filter, where the columns get named, hence the date filter was not finding the pickup_datetime column since it had not yet been named.
It might be a good idea to clearly mention the sequentiality of the filters in the documentation to avoid others having similar problems in the future.
Related
I need to modify CSV file in Apache Nifi environment.
My CSV looks like file:
Advertiser ID,Campaign Start Date,Campaign End Date,Campaign Name
10730729,1/29/2020 3:00:00 AM,2/20/2020 3:00:00 AM,Nestle
40376079,2/1/2020 3:00:00 AM,4/1/2020 3:00:00 AM,Heinz
...
I want to transform dates with AM/PM values to simple date format. From 1/29/2020 3:00:00 AM to 2020-01-29 for each row. I read about UpdateRecord processor, but there is a problem. As you can see, CSV headers contain spaces and I can't even parse these fields with both Replacement Value Strategy (Literal and Record Path).
Any ideas to solve this problem? Maybe somehow I should modify headers from Advertiser ID to advertiser_id, etc?
You don't need to actually make the transformation yourself, you can let your Readers and Writers handle it for you. To get the CSV Reader to recognize dates though, you will need to define a schema for your rows. Your schema would look something like this (I've removed the spaces from the column names because they are not allowed):
{
"type": "record",
"name": "ExampleCSV",
"namespace": "Stackoverflow",
"fields": [
{"name": "AdvertiserID", "type": "string"},
{"name": "CampaignStartDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignEndDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignName", "type": "string"}
]
}
To configure the reader, set the following properties:
Schema Access Strategy = Use 'Schema Text' property
Schema Text = (Above codeblock)
Treat First Line as Header = True
Timestamp Format = "MM/dd/yyyy hh:mm:ss a"
Additionally you can set this property to ignore the Header of the CSV if you don't want to or are unable to change the upstream system to remove the spaces.
Ignore CSD Header Column Names = True
Then in your CSVRecordSetWriter service you can specify the following:
Schema Access Strategy = Inherit Record Schema
Timestamp Format = "yyyy-MM-dd"
You can use UpdateRecord or ConvertRecord (or others as long as they allow you to specify both a reader and a writer)and it will just do the conversion for you. The difference between UpdateRecord and ConvertRecord is that UpdateRecord requires you to specify a user defined property, so if this is the only change you will make, just use ConvertRecord. If you have other transformations, you should use UpdateRecord and make those changes at the same time.
Caveat: This will rewrite the file using the new column names (in my example, ones without spaces) so keep that in mind for downstream usage.
I have a date column in my table that I fetch using jdbc input in logstash. The problem is logstash gives a wrong value to elasticsearch stack.
For example if I have a date start_date="2018-03-01" in elasticsearch I would get the value "2018-02-28 23:00:00.000".
What I want is to keep the format of start_date or at least output the value "2018-03-01 00:00:00.000" to elasticsearch.
I tried to use this filter :
date {
timezone => "UTC"
match => ["start_date" , "ISO8601", "yyyy-MM-dd HH:mm:ss"]
}
but it didn't work.
It is because, you are trying to convert it to UTC timezone. You need to change your configuration like this:
date {
match => ["start_date" , "yyyy-MM-dd"]
}
This would be enough to parse your date.
Let me know if that works.
I have a date field defined in index as
"_reportDate": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
and I have a query to query from _source field which gives _reportDate field in string of 2015-12-05 01:05:00.
I can't seems to find a way to get date in different date format during query retrieval apart from using script field (which is not preferable). From what I understand a date field will be parse to long value to be indexed in elastic search, can we retrieve the long value as well during elasticsearch query?
You need to store the field and at search time ask for this stored field.
If it does not work you can always apply the script at index time with ingest feature and a script processor.
I am trying to parse log files from IIS to the ELK stack (Logstash:2.3, Elastic:2.3 and Kibana:4.5, CentOS 7 vm).
I have attempted to parse a date field from the log message as the event timestamp using the date filter below in my logstash configuration:
date {
match => ["date_timestamp", "yyyy-MM-dd HH:mm:ss"]
timezone => "Europe/London"
locale => "en"
target => "#timestamp"
}
The first few characters of the entire log message that was parsed to Elastic Search is:
"message": "2016-03-01 03:30:49 .........
The date field above was parsed to Elastic Search as:
"date_timestamp": "16-03-01 03:30:49",
However, the event timestamp that was parsed to Elastic Search using the date filter above is:
"#timestamp": "0016-03-01T03:32:04.000Z",
I will like the #timestamp to be exactly 2016-03-01T03:30:49 as I can't immediately figure out why there is a difference between the hours and minutes.
I have looked at similar problems and documentations such as this one on SO and this one on logstash documentation and logstash documentation.
Any pointer in the right direction will be appreciated.
Regards
SO
in your date_timestamp you have only 2 characters for year: "16-03-01 03:30:49", so the date pattern in your date filter is incorrect, should be:
date {
match => ["date_timestamp", "yy-MM-dd HH:mm:ss"]
timezone => "Europe/London"
locale => "en"
target => "#timestamp"
}
I am trying to insert records in Elasticsearch using bulk api and I am getting below error
"error": "MapperParsingException[failed to parse [created_date]]; nested: MapperParsingException[failed to parse date field [2015-07-18 13:00:22], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: \"2015-07-18 13:00:22\" is malformed at \" 13:00:22\"]; "
while I am passing below date
"created_date":"2015-07-18 13:00:22"
and below mapping is used
"created_date": {
"format": "yyyy-MM-DD HH:mm:ss",
"type": "date"
},
I can see that date is correct and mapping is also correct, error is giving for this particular record only and other records are inserted successfully. What could be the reason?
I doubt your mapping has been applied to the field you are expecting.
Logs says tried both date format [dateOptionalTime], and timestamp number with locale []
It does not say that it tries yyyy-MM-DD HH:mm:ss.
May be your created_date is another created_date field?
use "created_date":"2015-07-18T13:00:22"
It may help You