Parsing CSV using logstash - elasticsearch

Is there a way the top row of the csv can be used to make column headers for the elasticsearch index? As in my csv, columns are not fixed in numbers or names.
Also the requirement is to convert all the datatypes based on a regex to integer/float. Can mutate do that?

You can do it using Embulk tools 'guess' command option. This command will interpret the data and automatically assign format to it.
Look at 'Loading a csv file' section in this link,
http://www.embulk.org/docs/recipe/scheduled-csv-load-to-elasticsearch-kibana4.html

Related

Apache NiFi. add new CSV field based on existing with modification

I have .csv file with several fields. One of them (for example 3th) contains email. How can I add additional filed that will contain only serverName from email field?
Example:
input:
01;city;name#servername.com;age;
result:
01;city;name#servername.com;age;servername;
I guess it possible through ReplaceText processor, but I can't choose correct value in "Search Value" and "Replacement Value" fields.
You can convert your flowfile, to a record with the help of ConvertRecord.
It allows you to pass to a JSON (or something else) format to whatever you prefer.
Then you can add a new field, like the following, with an UpdateRecordProcessor:
I recommend you the following reading:
Update Record tutorial

Problems defining new elastic data source in grafana using dots in time field name

I'm trying to define a new data source in Grafana
The data source is an Elastic index (which I'm not responsible of)
When trying to Save & Test the new data source I get the following error:
No date field named Date.Epoch found
This field is the same field that is set in the Kibana Index Pattern as the time filter field, So I'm sure there is no typo or some other confusion..
After a lot of searching online I suspect what causes the problem is that we have a dot . in the field name.
Is there any way to escape the dot? or another solution without changing the index?
Update: I opened an issue in Grafana's github project https://github.com/grafana/grafana/issues/27702
Try using advanced variable formatting and use raw value if you have escaping problems:
$variable
or
${variable:raw}

Dealing with Empty Fields

I am new to stormcrawler and elasticsearch in general. I am currently using stormcrawler 2.0 to index website data (including non-HTML items such as PDF's and Word Documents) into elasticsearch. In some cases, the metadata of PDF's or Word documents do not contain a title so the field is stored blank/null in elasticsearch. This is unfortunately causing issues in the webapp I am using to display search results (search-ui). Is there a way I can have stormcrawler insert a default value of "Untitled" into the title field if none exists in the metadata?
I understand that elasticsearch has a null_value field parameter, but if I understand correctly that parameter cannot be used for text fields and only helps with searching.
Thanks!
One option would be to write a custom ParseFilter to give an arbitrary value to any missing key or a key with an empty value. The StormCrawler code has quite a few examples of ParseFilters, see also the WIKI.
The same could be done as a custom Bolt placed between the parser and the indexer; grab the metadata and normalise to your heart's content.

How to parse a csv file which has some field containing seprator (comma) as-values

sample message - 111,222,333,444,555,val1in6th,val2in6th,777
The sixth column contains a value consisting of commas (val1in6th,val2in6th is a sample value of 6th column).
When I use a simple csv filter this message is getting converted to 8 fields. I want to be able to tell the filter that val1in6th,val2in6th should be treated as a single value and placed as the value of 6th column (its okay not to have comma between val1in6th and val2in6th when placed as the output as 6th column).
change your plugin, no more the csv one but grok filter - doc here.
Then you use a debugger to create a parser for your lines - like this one: https://grokdebug.herokuapp.com/
For your lines you could use this grok expression:
%{WORD:FIELD1},%{WORD:FIELD2},%{WORD:FIELD3},%{WORD:FIELD4},%{WORD:FIELD5},%{GREEDYDATA:FIELD6}
or :
%{INT:FIELD1},%{INT:FIELD2},%{INT:FIELD3},%{INT:FIELD4},%{INT:FIELD5},%{GREEDYDATA:FIELD6}
It changes the datatypes in elastic of the firsts 5 fields.
To know about parse csv with grok filter in elastic you could use this es official blog guide, it is explained how to use grok with ingestion pipeline, but it is the same with logstash

Using custom fields in freemarker

I am trying to print custom field in PDF template, got from a saved search with the naming syntax Bank Name(Custom). How to use this name in the freemarker pdf template ? I am not able to fetch the value using any of the below : subs.bankname where subs is the result of the saved search.
A NetSuite custom field is prefixed with "cust" (ie. custrecord, custentity, custbody, etc), so you need to find out the correct field's id in order to display it using the Freemarker syntax. Also, as "subs" is the result of a Saved Search, you might need to interact with all rows.
In your case it would be something like (to display the first row):
${subs[0].cust_bankname}
Or the following to interact with all rows:
<#list subs as sub>
${sub.cust_bankname}
</#list>
I hope it helps.

Resources