Why after set mapping, index return nothing? - elasticsearch

I am using Elasticsearch 7.12.0 , Logstash 7.12.0, Kibana 7.12.0 on Windows 10 x64. Logstash config file logistics.conf
input {
jdbc {
jdbc_driver_library => "D:\\tools\\postgresql-42.2.16.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://localhost:5433/ld"
jdbc_user => "xxxx"
jdbc_password => "sEcrET"
schedule => "*/5 * * * *"
statement => "select * from inventory_item_report();"
}
}
filter {
uuid {
target => "uuid"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "localdist"
document_id => "%{uuid}"
doc_as_upsert => "true"
}
}
Run logstash
logstash -f logistics.conf
If I don't set mapping explicit, the query
GET /localdist/_search
{
"query": {
"match_all": {}
}
}
return many result.
My mappings
POST localdist/_mapping
{
}
DELETE /localdist
PUT /localdist
{
}
POST /localdist
{
}
PUT localdist/_mapping
{
"properties": {
"unt_cost": {
"type": "double"
},
"ii_typ": {
"type": "keyword"
},
"qty_uom_id": {
"type": "keyword"
},
"prod_id": {
"type": "keyword"
},
"root_cat_id": {
"type": "keyword"
},
"uom": {
"type": "keyword"
},
"product_name": {
"type": "text"
},
"ii_id": {
"type": "keyword"
},
"wght_uom_id": {
"type": "keyword"
},
"iid_seq_id": {
"type": "long"
},
"avai_diff": {
"type": "double"
},
"invt_change_typ": {
"type": "keyword"
},
"ccy": {
"type": "keyword"
},
"exp_date": {
"type": "date"
},
"req_amt": {
"type": "text"
},
"pur_cost": {
"type": "double"
},
"tot_pri": {
"type": "long"
},
"own_pid": {
"type": "keyword"
},
"doc_type": {
"type": "keyword"
},
"ii_date": {
"type": "date"
},
"fac_id": {
"type": "keyword"
},
"shipment_type_id": {
"type": "keyword"
},
"lot_id": {
"type": "keyword"
},
"phy_invt_id": {
"type": "keyword"
},
"facility_name": {
"type": "text"
},
"amt_ohand_diff": {
"type": "double"
},
"reason_id": {
"type": "keyword"
},
"cat_id": {
"type": "keyword"
},
"qty_ohand_diff": {
"type": "double"
},
"#timestamp": {
"type": "date"
}
}
}
run query
GET /localdist/_search
{
"query": {
"match_all": {}
}
}
return nothing.
How to fix it, how to make explicit mappings works correctly?

If I got you right, you are indexing via logstash. Elastic then create the index if missing, indexes the documents, and tries to guess the mapping for your documents based on the very first documents.
TL;DR: You are DELETING your index containing the data by yourself.
With
DELETE /localdist
you are deleting the whole index including all data. After that, by issuing
PUT /localdist
{
}
you are re-creating the previously deleted index which is empty again. And at the end, you are setting the index mapping with
PUT localdist/_mapping
{
"properties": {
"unt_cost": {
"type": "double"
},
"ii_typ": {
"type": "keyword"
},
...
Now, as you have an empty elastic index with a mapping set, start the logstash pipeline again. If your documents are matching the index mapping, the docs should start to appear very quickly.

Related

Reindex parent/child relationship from 2.x to 6.x in elasticsearch

I need way to re-index parent/child data from 2.4 to 6.8 Join. The old index has multiple entities.
old-index mapping:
{
"mappings": {
"parentType": {
"properties": {
"field_1": {
"type": "text"
},
"field_2": {
"type": "text"
},
"field_3": {
"type": "keyword"
},
"filed_4": {
"type": "long"
}
}
},
"child_type_1": {
"_parent": {
"type": "case"
},
"_routing": {
" required": true
},
"child1_field_1": {
"type": "text"
},
"child1_field_2": {
"type": "text"
},
"child1_field_3": {
"type": "keyword"
}
},
"child_type_2": {
"_parent": {
"type": "parentType"
},
"_routing": {
"required": true
},
"child2_field_1": {
"type": "text"
},
"child2_field_2": {
"type": "text"
}
}
}
}
I want to transform it into to the following 6.8 mapping:
{
"mappings": {
"doc": {
"properties": {
"parentType": {
"properties": {
"field_1": {
"type": "text"
},
"field_2": {
"type": "text"
},
"field_3": {
"type": "keyword"
},
"filed_4": {
"type": "long"
}
}
},
"child_type_1": {
"child1_field_1": {
"type": "text"
},
"child1_field_2": {
"type": "text"
},
"child1_field_3": {
"type": "keyword"
}
},
"child_type_2": {
"child2_field_1": {
"type": "text"
},
"child2_field_2": {
"type": "text"
}
},
"join_field": {
"type": "join",
"relations": {
"parentType": [
"child_type_1",
"child_type_2"
]
}
}
}
}
}
}
I know I am supposed to use the re-index API but I am not sure how exactly the script has to be written. I want to re-index all the parent and children documents into the new index where the _type is "doc"

long and float fields showing up as text fields in Kibana

Running Kibana version 5.5.2.
My current setup is Logstash is taking the logs from Docker containers, runs grok filters before sending the logs to elasticsearch. The specific logs that I need to show up as long, float are two times from AWS calls to ECS and EC2 and currently a grok filter pulls them out. Here is the custom filter that pulls out the ECS timings: ECS_DESCRIBE_CONTAINER_INSTANCES (AWS)(%{SPACE})(ecs)(%{SPACE})(%{POSINT})(%{SPACE})(?<ECS_DURATION>(%{NUMBER}))(s)(%{SPACE})(?<ECS_RETRIES>(%{NONNEGINT}))(%{SPACE})(retries) so I need ECS_DURATION to be a float and ECS_RETRIES to be a long. In the docker log handler I have the following
if [ECS_DURATION] {
mutate {
convert => ["ECS_DURATION", "float"]
}
}
if [ECS_RETRIES] {
mutate {
convert => ["ECS_RETRIES", "integer"]
}
}
When I look at the field in Kibana, it still shows as a text field, but when I make the following request to elasticsearch for the mappings, it shows those fields as long and float.
GET /logstash-2020.12.18/_mapping
{
"logstash-2020.12.18": {
"mappings": {
"log": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
"#timestamp": {
"type": "date",
"include_in_all": false
},
"#version": {
"type": "keyword",
"include_in_all": false
},
"EC2_DURATION": {
"type": "float"
},
"EC2_RETRIES": {
"type": "long"
},
"ECS_DURATION": {
"type": "float"
},
"ECS_RETRIES": {
"type": "long"
},
I even created a custom mapping template in elasticsearch with the following call
PUT /_template/aws_durations?pretty
{
"template": "logstash*",
"mappings": {
"type1": {
"_source": {
"enabled": true
},
"properties": {
"ECS_DURATION": {
"type": "half_float"
},
"ECS_RETRIES": {
"type": "byte"
},
"EC2_DURATION": {
"type": "half_float"
},
"EC2_RETRIES": {
"type": "byte"
}
}
}
}
}
Have you checked that its actually going into the if [ECS_DURATION] and if [ECS_RETRIES] conditions? (I wasnt able to comment)

How to get more than one record?

I am using JDBC with logstash to get data from a PostgreSQL query and export it to ES v7 here is the configuration file:
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://ldatabase_rds_path?useSSL=true"
jdbc_user => "username"
jdbc_password => "password"
jdbc_driver_library => "/home/z/Documents/postgresql-42.2.18.jar"
jdbc_driver_class => "org.postgresql.Driver"
tracking_column => "id"
tracking_column_type => "numeric"
clean_run => true
schedule => "0 */1 * * *"
statement => "SELECT id as id, type as type, z_id as z_id, sender_id as sender_id, receiver_id as receiver_id, status as status, amount as amount, fees as fees, created as created, metadata as metadata, funding_source_from_id as funding_source_from_id, funding_source_to_id as funding_source_to_id, is_parent as is_parent, destination_type as destination_type, source_type as source_type FROM payments_transfer"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "payments_transfer_data"
document_id => "%{id}"
}
}
It take much time to only get 1 record from the database!
I tried some solutions like explicitly define a mapping so I added a mapping for the data like that:
PUT payments_transfer_data/_mapping/doc?include_type_name=true
{
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"amount": {
"type": "float"
},
"created": {
"type": "date"
},
"destination_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"z_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"fees": {
"type": "float"
},
"funding_source_from_id": {
"type": "long"
},
"funding_source_to_id": {
"type": "long"
},
"id": {
"type": "long"
},
"is_parent": {
"type": "boolean"
},
"metadata": {
"type": "keyword"
},
"receiver_id": {
"type": "long"
},
"sender_id": {
"type": "long"
},
"source_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"status": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
Here is the only record that get:
source_type:customer sender_id:3 destination_type:funding-source type:funded_transfer z_id:863a240c-2011-e911-8114-bacd823e9f1d receiver_id:65 status:processed amount:550 funding_source_from_id:332 #timestamp:Nov 8, 2020 # 16:00:08.809 fees:5.61 is_parent:false created:Jan 5, 2019 # 21:28:21.847 #version:1 id:2,160 funding_source_to_id: - metadata: - _id:2160 _type:doc _index:payments_transfer_data _score: -
According to the official documentation you should use tracking_column in conjunction to use_column_value. With your current setting tracking_column will not have any impact.
Have you tried using Select * from to see if everything is being pulled.

How to debug elastic search error number_format_exception with reason "empty String" but no field name

I just deployed a small application that loads a few thousand docs into an index and when working with production data i get an error in my search request.
http code is 400 and the error is
{
"error": {
"root_cause": [
{
"type": "number_format_exception",
"reason": "empty String"
}
],
"type": "number_format_exception",
"reason": "empty String"
},
"status": 400
}
Okay i kind of get it that my mapping defines some numeric field which i oviously dont store correctly, but how am i supposed to find that field?
each doc contains hundereds of fields.... i mean, really?
I tried looking in /var/log/elasticsearch but nothing useful there...
Please help me i need to get the thing going
I defined my fields as integer which should hold arrays and might be empty. Could that be a problem?
My ES Version is 6.6.0
Update:
The error does occur while searching, during index all is fine
My mapping for that index:
{
"development-object-1551202425": {
"mappings": {
"_doc": {
"dynamic": "false",
"properties": {
"accommodation": {
"properties": {
"badges": {
"properties": {
"maskedProp1": {
"type": "boolean"
},
"maskedProp2": {
"type": "boolean"
},
"maskedProp3": {
"type": "boolean"
},
"maskedProp4": {
"type": "boolean"
},
"maskedProp5": {
"type": "boolean"
},
"maskedProp6": {
"type": "boolean"
}
}
},
"businessTypes": {
"type": "integer"
},
"classification": {
"properties": {
"classification": {
"type": "keyword"
},
"classificationValue": {
"type": "short"
}
}
},
"endowments": {
"type": "integer"
},
"hasPrice": {
"type": "boolean"
},
"lowestPrice": {
"type": "float"
},
"metascore": {
"type": "short"
},
"rating": {
"type": "short"
},
"regionscore": {
"type": "short"
}
}
},
"certificates": {
"type": "integer"
},
"geoLocation": {
"type": "geo_point"
},
"id": {
"type": "text"
},
"isAccommodation": {
"type": "boolean"
},
"location": {
"properties": {
"maskedProp1": {
"type": "integer"
},
"maskedProp2": {
"type": "integer"
},
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fielddata": true
},
"zipcodes": {
"type": "integer"
}
}
},
"maskedProp1": {
"type": "integer"
},
"maskedProp2": {
"type": "integer"
},
"description": {
"type": "text"
},
"sortTitle": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
}
}
}
}
The name consists of an environment string (development) and a timestamp appended (i work with automatic index switching and query for the alias, that does is called {env}-{name}-current.
In my case the error was an empty "size" parameter in the query, i tried to find the error in my filters and did not see that...
A more verbose error message (at least at what property or setting the error occured) could save thousands of hours of debugging all around the world i guess xD.
For now you would have to take apart your dsl section by section to find the issue.

How to define specific field tokenization on Logstash

I am using logstash to index some mysql data on elasticsearch:
input {
jdbc {
// JDBC configurations
}
}
output {
elasticsearch {
index => ""
document_type => ""
document_id => ""
hosts => [ "" ]
}
}
When checking results I found that elasticsearch automatically tokenizes the text like this:
"Foo/Bar" -> "Foo", "Bar"
"The thing" -> "The", "thing"
"Fork, Knife" -> "Fork", "Knife"
Well, that is ok for most of my fields. But there is one specific field that I'd like to have a custom tokenizer. It is a comma separated field (or semi-colon separated). So it should be:
"Foo/Bar" -> "Foo/Bar"
"The thing" -> "The thing"
"Fork, Knife" -> "Fork", "Knife"
I wander if there is a way to configure this on my logstash configuration.
UPDATE:
This is one example of the index that I have. The specific field is kind:
{
"index-name": {
"aliases": {},
"mappings": {
"My-type": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"kind": {
"type": "string"
},
"id": {
"type": "long"
},
"text": {
"type": "string"
},
"version": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "",
"number_of_shards": "",
"number_of_replicas": "",
"uuid": "",
"version": {
"created": ""
}
}
},
"warmers": {}
}
}
It's possible to do so by using an index template.
First delete your current index:
DELETE index_name
Then create the template for your index with the appropriate mapping for the kind field, like this:
PUT _template/index_name
{
"template": "index-name",
"mappings": {
"My-type": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"kind": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "long"
},
"text": {
"type": "string"
},
"version": {
"type": "string"
}
}
}
}
}
Then you can run Logstash again and the index will be re-created with the proper mapping.
Well, the right answer for this question is: you cannot do it by logstash. So I had to add an additional step as follow.
I finally got this done following the path showed by #Val. Thanks, pal. So, what I had to do was to create the index before the logstash ETL with a specific tokenizer:
{
"settings": {
"analysis": {
"analyzer": {
"simple_analyzer": {
"tokenizer": "simple_tokenizer"
}
},
"tokenizer": {
"simple_tokenizer": {
"type": "pattern",
"pattern": ","
}
}
}
},
"template": "my-index",
"mappings": {
"my-type": {
"properties": {
"kind": {
"type": "string",
"analyzer": "simple_analyzer"
}
}
}
}
}
This will create a tokenizer by comma to the kind field. After that, I can perform the logstash etl and it won't overwrite the kind properties.

Resources