Dump from Mysql to elastic - elasticsearch

We are using logstash to dump data from mysql to elastic search. I am trying to dump list of all payments against a userId(this will be my _id for the _type)
Elastic mapping looks like this
{
"Users": {
"properties" :{
"userId" : {
"type" : "long"
},
"payment" :
{
"properties":{
"paymentId": {
"type": "long"
}
}
}
}
The sql table has userId, paymentId.
Which filter should i user to get the json output that i can feed to elastic search

Use jdbc Logstash input plugin.

Related

Disable mapping for a specific field using an Index Template Elasticsearch 6.8

I have an EFK pipeline set up. Everyday a new index is created using the logstash-* prefix. Every time a new field is sent by Fluentd, the field is added to the index pattern logstash-*. I'm trying to create an index template that will disable indexing on a specific field when an index is created. I got this to work in ES 7.1 using the PUT below:
PUT _template/logstash-test
{
"index_patterns": ["logstash-*"],
"mappings": {
"dynamic_templates" : [
{
"params" : {
"path_match" : "params",
"mapping" : {
"enabled": false
}
}
}
]
}
}
However when I try this on Elasticsearch 6.8 I get the following error:
"type": "illegal_argument_exception",
"reason": "Malformed [mappings] section for type [dynamic_templates], should include an inner object describing the mapping"
It is a little different in Elasticsearch 6.X as it had mapping types, which is not used anymore.
Try something like this:
PUT _template/logstash-test
{
"index_patterns": ["logstash-*"],
"mappings": {
"_doc": {
"dynamic_templates" : [
{
"params" : {
"path_match" : "params",
"mapping" : {
"enabled": false
}
}
}
]
}
}
}
If your index has a different custom type and is not using the _doc type, you should use that in the mapping.

Elastic Search: Alternative of flattened datatype in Elastic Search 7.1

I have two Elastic Search version one is 7.3 and the second is 7.1. I am using flattened data type for Elastic Search 7.3 and I also want to use this data type in Elastic Search 7.1. So that I can store my data as I stored in Elastic Search 7.3.
I researched about flattened data type and get to know that it's supported to 7.x but when I tried in 7.1 it gives me the mapper_parsing_exception error.
What I tried is as shown below.
In Elastic Search 7.3
Index Creation
PUT demo-flattened
Response:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "demo-flattened"
}
Insert Mapping
PUT demo-flattened/_mapping
{
"properties": {
"host": {
"type": "flattened"
}
}
}
Response:
{
"acknowledged": true
}
In Elastic Search 7.1
PUT demo-flattened
Response:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "demo-flattened"
}
Insert Mapping
PUT demo-flattened/_mapping
{
"properties": {
"host": {
"type": "flattened"
}
}
}
Response:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "No handler for type [flattened] declared on field [host]"
}
],
"type": "mapper_parsing_exception",
"reason": "No handler for type [flattened] declared on field [host]"
},
"status": 400
}
I want to use the flattened data type in Elastic Search 7.1. Is there any alternative to use flattened data type in the 7.1 version because flattened data type is supported from Elastic Search 7.3.
Any help or suggestions will be appreciated.
First the flattened is available in 7.1 with X-pack (X-pack is paid feature),
so what I think you can use object type with enabled flag as false
This will help you store that field as it is without any indexing.
{
"properties": {
"host": {
"type": "object",
"enabled": false
}
}
}
Check the version of your ElasticSearch. If its the OSS version, then it won't work for you.
You can check it by running GET \ in the Kibana. You would get something like:
{
"version" : {
"number" : "7.10.2",
"build_flavor" : "oss",
}
}
But for ElasticSearch that does support flattened type, you would get something like:
"version" : {
"number" : "7.10.2",
"build_flavor" : "default",
}
}
You can find more details on the official Kibana Github page No handler for type [flattened] declared on field [state] #52324.
Interally, it works like this
Similarities in the way values are indexed, flattened fields share much of the same mapping and search functionality as keyword fields
Here, You have only one field called host. You can replace this with keyword.
What similarities:
Mapping:
"labels": {
"type": "flattened"
}
Data:
"labels": {
"priority": "urgent",
"release": ["v1.2.5", "v1.3.0"],
"timestamp": {
"created": 1541458026,
"closed": 1541457010
}
}
During indexing, tokens are created for each leaf value in the JSON object. The values are indexed as string keywords, without analysis or special handling for numbers or dates
To query them, you can use "term": {"labels": "urgent"} or "term": {"labels.release": "v1.3.0"}.
When it is keyword, you can have them as separate fields.
{
"host":{
"type":"keyword"
}
}
Reference

Elastic Search Date Range Query

I am new to elastic search and I am struggling with date range query. I have to query the records which fall between some particular dates.The JSON records pushed into elastic search database are as follows:
"messageid": "Some message id",
"subject": "subject",
"emaildate": "2020-01-01 21:09:24",
"starttime": "2020-01-02 12:30:00",
"endtime": "2020-01-02 13:00:00",
"meetinglocation": "some location",
"duration": "00:30:00",
"employeename": "Name",
"emailid": "abc#xyz.com",
"employeecode": "141479",
"username": "username",
"organizer": "Some name",
"organizer_email": "cde#xyz.com",
I have to query the records which has start time between "2020-01-02 12:30:00" to "2020-01-10 12:30:00". I have written a query like this :
{
"query":
{
"bool":
{
"filter": [
{
"range" : {
"starttime": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}
This query is not giving results as expected. I assume that the person who has pushed the data into elastic search database at my office has not set the mapping and Elastic Search is dynamically deciding the data type of "starttime" as "text". Hence I am getting inconsistent results.
I can set the mapping like this :
PUT /meetings
{
"mappings": {
"dynamic": false,
"properties": {
.
.
.
.
"starttime": {
"type": "date",
"format":"yyyy-MM-dd HH:mm:ss"
}
.
.
.
}
}
}
And the query will work but I am not allowed to do so (office policies). What alternatives do I have so that I can achieve my task.
Update :
I assumed the data type to be "Text" but by default Elastic Search applies both "Text" and "Keyword" so that we can implement both Full Text and Keyword based searches. If it is also set as "Keyword" . Will this benefit me in any case. I do not have access to lots of stuff in the office that's why I am unable to debug the query.I only have the search API for which I have to build the query.
GET /meetings/_mapping output :
'
'
'
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
'
'
'
Date range queries will not work on text field, for that, you have to use the date field
Since you are working on date fields , best practice is to use the date field.
I would suggest you to reindex your index to another index so that you can change the type of your text field to date field
Step1-: Create index2 using index1 mapping and make sure to change the type of your date field which is text to date type
Step 2-: Run the elasticsearch reindex and reindex all your data from index1 to index2. Since you have changed your field type to date field type. Elasticsearch will now recognize this field as date
POST _reindex
{
"source":{ "index": "index1" },
"dest": { "index": "index2" }
}
Now you can run your Normal date queries on index2
As #jzzfs suggested the idea is to add a date sub-field to the starttime field. You first need to modify the mapping like this:
PUT meetings/_mapping
{
"properties": {
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"date": {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss",
}
}
}
}
}
When done, you need to reindex your data using the update by query API so that the starttime.date field gets populated and index:
POST meetings/_update_by_query
When the update is done, you'll be able to leverage the starttime.date sub-field in your query:
{
"query": {
"bool": {
"filter": [
{
"range": {
"starttime.date": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}
There are ways of parsing text fields as dates at search time but the overhead is impractical... You could, however, keep the starttime as text by default but make it a multi-field and query it using starttime.as_date, for example.

Adding Geo_shape to Elasticsearch using Logstash

I have a CSV file which contains Geometries in WKT format. I was trying to ingest geo_shape data using CSV file. I created a mapping as given in file "input_mapping.json"
{
"mappings" : {
"doc" : {
"properties" : {
"Lot" : {
"type" : "long"
},
"Lot_plan" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Parcel_Address_Line_1" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Plan" : {
"type" : "long"
},
"Tenure" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"WKT" : {
"type" : "geo_shape"
}
}
}
}
}
WKT is my geo_shape and it is in WKT(String) format.
Below is input CSV file which I am trying to insert using logstash:
WKT,Lot_plan,Tenure,Parcel_Address_Line_1,Lot,Plan
"POLYGON ((148.41503356 -26.62829003,148.44798048 -26.62800857,148.45234634 -26.63457929,148.45507096 -26.64778132,148.41735984 -26.64808729,148.41514107 -26.64091476,148.41503356 -26.62829003))",21MM1,FH,MASSEY DOWNS,21,1
"POLYGON ((148.45507096 -26.64778132,148.45779641 -26.66098396,148.45859297 -26.66259081,148.45801376 -26.66410383,148.45989472 -26.67278979,148.42510081 -26.67310328,148.42434355 -26.67065659,148.41735984 -26.64808729,148.45507096 -26.64778132))",21MM2,FH,,21,2
"POLYGON ((148.39514404 -26.68791317,148.37228669 -26.68894235,148.37188338 -26.68895271,148.37092744 -26.68897445,148.37051869 -26.68898023,148.36312088 -26.68908468,148.36261958 -26.66909425,148.39598678 -26.66869309,148.39584372 -26.66934742,148.39583604 -26.66968184,148.39590526 -26.67007957,148.39598629 -26.67039933,148.39614586 -26.67085156,148.39625052 -26.67085085,148.42434355 -26.67065659,148.42510081 -26.67310328,148.42537156 -26.67397795,148.42549108 -26.68541445,148.41781484 -26.68547248,148.39988482 -26.68562107,148.39966009 -26.68562292,148.39704234 -26.68564442,148.39514404 -26.68791317))",21MM3,LL,DERWENT PARK,21,3
And my logstash conf file is :
input{
file{
path=>"D:/input.csv"
start_position=>"beginning"
sincedb_path=>"D:/sample.text"
}
}
filter{
csv{
separator =>","
columns =>["WKT","Lot_plan","Tenure","Parcel_Address_Line_1","Lot","Plan"]
skip_header=>true
skip_empty_columns=>true
convert => {
"Lot" => "integer"
"Plan" => "integer"
}
remove_field =>[ "_source","message","host","path","#version","#timestamp" ]
}
}
output{
elasticsearch{
hosts=>"http://localhost:9701"
index=>"input_mapping"
template =>"D:/input_mapping.json"
template_name => "input_mapping"
manage_template => true
}
}
Due to some reason it is not getting ingested in the ElasticSearch. I am using ElasticSearch version 6.5.4 and logstash version 6.5.4.
Kindly let me know if I have missed anything.
I realized there will be many other developers who would be looking for problem similar which I had faced it. Later point of time, I checked GDAL( ogr2ogr) which provides ElasticSearch ingestion. Also I use PostgreSQL to ingest the CSV file. Therefore using ogr2ogr tool helps me by following the below steps:
First ingest my CSV file in PostgreSQL where I put WKT as text column in a table.
Create another column within the table and updated this column with ST_GeomFromText function.
update TableName set WKT_GEOM=ST_GeomFromText("WKT",4632)
(Note: I already installed the postgis in PostgreSQL)
Now I start my ElasticSearch.
Using ogr2ogr by following the examples provided:
a.First create elasticsearch mapping using ogr2ogr.
b.Now ingest the data from PostgreSQL to ElasticSearch.
https://gdal.org/drivers/vector/elasticsearch.html
In this way, I was able to perform geoquery in Elasticsearch. But unfortunately it was without logstash. :(
Please comment if you have any doubts.

Setting up a Kibana terms panel for an Elasticsearch field that is a list of strings

I have a Kibana dashboard that contains a terms panel to show the number of instances for a particular field (let's call it field1). Field1 is, effectively, a list of strings. Each string usually contains multiple words. Since it's analyzed, Elasticsearch breaks the terms up into separate columns. I need to keep the text together, so I need a not_analyzed version. Here's my attempt to do that with a template, located at ~\config\templates\doc_template.json on a Windows box, which does not seem to be working. Elasticsearch is running as a Windows service.
{
"doc_template": {
"template": "*",
"mappings": {
"Type-*": {
"properties": {
"Field1": {
"type": "multi_field",
"fields": {
"Field1": { "index": "analyzed" },
"RawField1": { "index": "not_analyzed" }
}
}
}
}
}
}
}
In the terms panel, I expect the necessary field to be either RawField1 or Field1.RawField1, but I've tried other variations including and excluding .raw, with no luck.
New indexes are created daily. Field1 exists in 4 separate types, each of which begin with "Type-". I suspect my attempt at using a wildcard there is problematic, but I'm not sure. All data is being sent to Elasticsearch via NEST in a C# .NET application. Here's the mapping for Field1 as it currently exists for one of the types:
{
"index-2014.12.08" : {
"mappings" : {
"Type-1" : {
"properties" : {
"Field1" : {
"type" : "string"
},
"Field2" : {
"type" : "string"
},
"Field3" : {
"type" : "string"
}
}
}
}
}
}
Obviously, the mapping doesn't look like how I expect. What's the best way to remedy this issue?

Resources