issue in creating a pipeline in elasticSearch - elasticsearch

I'm trying to ingest a pipeline that contains grok, date and remove processors but i am getting a missing field error despite explicitly mentioning the field "message" under the docs
GET _ingest/pipeline/_simulate
{
"pipeline" : {
"processors" : [
{
"grok" : {
"field" : "message",
"pattern" : "%{COMMONAPACHELOG}"
}
},
{
"date" : {
"match_field" : "timestamp",
"match_formats" : ["dd/MMM/YYYY:HH:mm:ss Z"]
}
},
{
"remove" : {
"field" : "message"
}
}
]
},
"docs" : [
{
"_source" : {
"message" : "52.35.38.35 -- [19/Apr/2016:12:00:04 +0200] \"GET/ HTTP/1.1\" 200 24"
},
"_index" : "indexer"
}
]
}
and i'm Getting this Error please help
{
"error" : {
"root_cause" : [
{
"type" : "parse_exception",
"reason" : "[patterns] required property is missing",
"property_name" : "patterns",
"processor_type" : "grok",
"suppressed" : [
{
"type" : "parse_exception",
"reason" : "[field] required property is missing",
"property_name" : "field",
"processor_type" : "date"
}
]
}
],
"type" : "parse_exception",
"reason" : "[patterns] required property is missing",
"property_name" : "patterns",
"processor_type" : "grok",
"suppressed" : [
{
"type" : "parse_exception",
"reason" : "[field] required property is missing",
"property_name" : "field",
"processor_type" : "date"
}
]
},
"status" : 400
}
i tried to look for a video on youtube and i found someone with the same code and it executed well
here's the video
https://www.youtube.com/watch?v=PEHnBa19Gxs&t=1s
it's on minute 34

as it turns out that it worked at the youtube guy because it was on an older version.
this will work on the newer version
GET _ingest/pipeline/_simulate
{
"pipeline" : {
"processors" : [
{
"grok" : {
"field" : "message",
"patterns" : ["%{COMMONAPACHELOG}"]
}
},
{
"date" : {
"field" : "timestamp",
"formats" : ["dd/MMM/YYYY:HH:mm:ss Z"]
}
},
{
"remove" : {
"field" : "message"
}
}
]
},
"docs" : [
{
"_source" : {
"message" : "52.35.38.35 - - [19/Apr/2016:12:00:04 +0200] \"GET/ HTTP/1.1\" 200 24"
},
"_index" : "indexer"
}
]
}

Related

Extract Hashtags and Mentions into separate fields

I am doing a DIY Tweet Sentiment analyser, I have an index of tweets like these
"_source" : {
"id" : 26930655,
"status" : 1,
"title" : "Here’s 5 underrated #BTC and realistic crypto accounts that everyone should follow: #Quinnvestments , #JacobOracle , #jevauniedaye , #ginsbergonomics , #InspoCrypto",
"hashtags" : null,
"created_at" : 1622390229,
"category" : null,
"language" : 50
},
{
"id" : 22521897,
"status" : 1,
"title" : "#bulls gonna overtake the #bears soon #ATH coming #ALTSEASON #BSCGem #eth #btc #memecoin #100xgems #satyasanatan 🙏🚩🚩🇮🇳""",
"hashtags" : null,
"created_at" : 1620045296,
"category" : null,
"language" : 50
}
There Mappings are settings are like
"sentiment-en" : {
"mappings" : {
"properties" : {
"category" : {
"type" : "text"
},
"created_at" : {
"type" : "integer"
},
"hashtags" : {
"type" : "text"
},
"id" : {
"type" : "long"
},
"language" : {
"type" : "integer"
},
"status" : {
"type" : "integer"
},
"title" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
},
"raw_text" : {
"type" : "text"
},
"stop" : {
"type" : "text",
"index_options" : "docs",
"analyzer" : "stop_words_filter"
},
"syn" : {
"type" : "text",
"index_options" : "docs",
"analyzer" : "synonyms_filter"
}
},
"index_options" : "docs",
"analyzer" : "all_ok_filter"
}
}
}
}
}
"settings" : {
"index" : {
"number_of_shards" : "10",
"provided_name" : "sentiment-en",
"creation_date" : "1627975717560",
"analysis" : {
"filter" : {
"stop_words" : {
"type" : "stop",
"stopwords" : [ ]
},
"synonyms" : {
"type" : "synonym",
"synonyms" : [ ]
}
},
"analyzer" : {
"stop_words_filter" : {
"filter" : [ "stop_words" ],
"tokenizer" : "standard"
},
"synonyms_filter" : {
"filter" : [ "synonyms" ],
"tokenizer" : "standard"
},
"all_ok_filter" : {
"filter" : [ "stop_words", "synonyms" ],
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "0",
"uuid" : "Q5yDYEXHSM-5kvyLGgsYYg",
"version" : {
"created" : "7090199"
}
}
Now the problem is i want to extract all the Hashtags and mentions in a seprate field.
What i want as O/P
"id" : 26930655,
"status" : 1,
"title" : "Here’s 5 underrated #BTC and realistic crypto accounts that everyone should follow: #Quinnvestments , #JacobOracle , #jevauniedaye , #ginsbergonomics , #InspoCrypto",
"hashtags" : BTC,
"created_at" : 1622390229,
"category" : null,
"language" : 50
},
{
"id" : 22521897,
"status" : 1,
"title" : "#bulls gonna overtake the #bears soon #ATH coming #ALTSEASON #BSCGem #eth #btc #memecoin #100xgems #satyasanatan 🙏🚩🚩🇮🇳""",
"hashtags" : bulls,bears,ATH, ALTSEASON, BSCGem, eth , btc, memecoin, 100xGem, satyasanatan
"created_at" : 1620045296,
"category" : null,
"language" : 50
}
What i have tried so far
Create a pattern based tokenizer to just read Hashtags and mentions and no other token for field hashtag and mentions did not had much success there.
Tried to write an n-gram tokenizer without any analysers did not achive much success there as well.
Any help would be appreciated, I am open to reindex my data. Thanks in advance !!!
You can use Logstash Twitter input plugin for indexing data and configured below ruby script in filter plugin as mentioned in blog.
if [message] {
ruby {
code => "event.set('hashtags', event.get('message').scan(/\#[a-z]*/i))"
}
}
You can use Logtstash Elasticsearch Input plugin for source index and configured about ruby code in Filter plugin and Logtstash elasticsearch output plugin with destination index.
input {
elasticsearch {
hosts => "localhost:9200"
index => "current_twitter"
query => '{ "query": { "query_string": { "query": "*" } } }'
size => 500
scroll => "5m"
}
}
filter{
if [message] {
ruby {
code => "event.set('hashtags', event.get('message').scan(/\#[a-z]*/i))"
}
}
}
output {
elasticsearch {
index => "new_twitter"
}
}
Another option is to use reingest API with ingest pipeline but ingest pipeline not support ruby code. So you need to convert above ruby code to the painless script.

Elasticsearch Suggestions Multi Index and Multi Fields

I have different indexes that contain different fields. And I try to figure out how to get suggests from all indexes and all fields. I know that with GET /_all/_search I can search for results through all indexes. But how can I get all suggestions from all indexes and all fields? Because I want to have a feature like Google "Did you mean: suggests"
So, I tried this out:
GET /_all/_search
{
"query" : {
"multi_match" : {
"query" : "berlin"
}
},
"suggest" : {
"text" : "berlin",
"my-suggest-1" : {
"term" : {
"field" : "street"
}
},
"my-suggest-2" : {
"term" : {
"field" : "city"
}
},
"my-suggest-3" : {
"term" : {
"field" : "description"
}
}
}
}
"my-suggest-1" and "-2" belongs to Index address (see below) and "my-suggest-3" belongs to Index product. I get the following error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [street]"
},
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [city]"
},
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [description]"
}
]
}
But if I use only the fields of 1 index I get suggestions, see:
GET /_all/_search
{
"query" : {
"multi_match" : {
"query" : "berlin"
}
},
"suggest" : {
"text" : "berlin",
"my-suggest-1" : {
"term" : {
"field" : "street"
}
},
"my-suggest-2" : {
"term" : {
"field" : "city"
}
}
}
}
Response
...
"failures" : {
...
},
"hits" : {
...
}
"suggest" : {
"my-suggest-1" : [
{
"text" : "berlin",
"offset" : 0,
"length" : 10,
"options" : [
{
"text" : "berliner",
"score" : 0.9,
"freq" : 12
},
{
"text" : "berlinger",
"score" : 0.9,
"freq" : 1
}
]
}
],
"my-suggest-2" : [
{
"text" : "berlin",
"offset" : 0,
"length" : 10,
"options" : []
}
]
...
I don't know how I can get suggests from index address and product? I would be happy if someone can help me.
Index 1 - Address:
"address" : {
"aliases" : {
....
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"street" : {
"type" : "text"
},
"city" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}
Index 2 - Product:
"product" : {
"aliases" : {
...
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"description" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}
You can add multiple indices to your search. In this case, you need to search over the fields that exist on all indices. So In your case, you need to define all three fields in both of the indices. The fields "street" and "city" are filed in the first index and the field "description" is filled only in the second index. This will be your mapping for the "Address" index. In this index, the "description" field exists but has no data. In the second index, "street" and "city" exist but have no data.
"address" : {
"aliases" : {
....
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"street" : {
"type" : "text"
},
"city" : {
"type" : "text"
},
"description" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}

Elasticsearch suggest from multiple indices

I am working on Elasticsearch. I want to use search suggestor for multiple indices at a time. I have two indices, tags and pool_tags which has name field in each index. How to use suggestor on this two indices having a similarly named field name.
I tried naming the suggestor (pool_tag_suggest in pool_tags) differently and I tried. Here are the mappings
tags:
{
"tags" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
},
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
}
}
}
}
}
}
}
pool_tags:
{
"pool_tags" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
},
"pool_tag_suggest" : {
"type" : "completion",
"analyzer" : "simple",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
}
}
}
}
}
}
}
WHAT I TRIED
POST pool_tags,tags/_search
{
"suggest": {
"tags_suggestor": {
"text": "ww",
"term": {
"field": "name.suggest"
}
},
"pooltags_suggestor": {
"text": "ww",
"term": {
"field": "name.pool_tag_suggest"
}
}
}
}
ERROR
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [name.suggest]"
},
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [name.pool_tag_suggest]"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "pool_tags",
"node" : "g2rCnS4PQMWyldWABVJawQ",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [name.suggest]"
}
},
{
"shard" : 0,
"index" : "tags",
"node" : "g2rCnS4PQMWyldWABVJawQ",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [name.pool_tag_suggest]"
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [name.suggest]",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [name.suggest]"
}
}
},
"status" : 400
}

Elasticsearch ILM Error policy does not exists

So this is my index template:
{
"net-stat-template" : {
"order" : 0,
"index_patterns" : [
"net-stat-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "net-stat",
"rollover_alias" : "net-stat"
},
"routing" : {
"allocation" : {
"require" : {
"data" : "hot"
}
}
},
"refresh_interval" : "15s",
"number_of_shards" : "1",
"number_of_replicas" : "0"
}
},
"mappings" : { },
"aliases" : { }
}
}
and this is my ilm/policy :
"net-stat" : {
"version" : 1,
"modified_date" : "2020-05-10T19:20:18.979Z",
"policy" : {
"phases" : {
"hot" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_size" : "50gb",
"max_age" : "5d"
},
"set_priority" : {
"priority" : 50
}
}
},
"delete" : {
"min_age" : "10d",
"actions" : {
"delete" : { }
}
},
"warm" : {
"min_age" : "0ms",
"actions" : {
"allocate" : {
"number_of_replicas" : 0,
"include" : { },
"exclude" : { },
"require" : {
"data" : "warm"
}
},
"set_priority" : {
"priority" : 50
}
}
}
}
}
}
but it's doesn't delete indexes with more than 10 days old and when I try GET net-stat-2020.04.20/_ilm/explain it returns:
{
"indices" : {
"net-stat-2020.04.20" : {
"index" : "net-stat-2020.04.20",
"managed" : true,
"policy" : "netstat",
"step_info" : {
"type" : "illegal_argument_exception",
"reason" : "policy [netstat] does not exist"
}
}
}
}
I'm not sure where this netstat came from and also when I try POST /net-stat-2020.04.20/_ilm/retry it returns error :
"type": "illegal_argument_exception",
"reason": "cannot retry an action for an index [net-stat-2020.04.20] that has not encountered an error when running a Lifecycle Policy"
Is there something I'm missing or my setting are somehow wrong?

Elasticsearch 'failed to find filter under name '

I'am just started with ES 5.2.2
Trying ad analyzer with support russian morhology.
Run ES using docker, i create image with installed elasticsearch-analysis-morphology.
then i:
Create index,
then put settings
after that get settings, and all sems right
curl http://localhost:9200/news/_settings?pretty
{
"news" : {
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "news",
"creation_date" : "1489343955314",
"analysis" : {
"analyzer" : {
"russian_analyzer" : {
"filter" : [
"stop",
"custom_stop",
"russian_stop",
"custom_word_delimiter",
"lowercase",
"russian_morphology",
"english_morphology"
],
"char_filter" : [
"html_strip",
"ru"
],
"type" : "custom",
"tokenizer" : "standard"
}
},
"char_filter" : {
"ru" : {
"type" : "mapping",
"mappings" : [
"Ё=>Е",
"ё=>е"
]
}
},
"filter:" : {
"custom_stop" : {
"type" : "stop",
"stopwords" : [
"n",
"r"
]
},
"russian_stop" : {
"ignore_case" : "true",
"type" : "stop",
"stopwords" : [
"а",
"без",
]
},
"custom_word_delimiter" : {
"split_on_numerics" : "false",
"generate_word_parts" : "false",
"preserve_original" : "true",
"catenate_words" : "true",
"generate_number_parts" : "true",
"catenate_all" : "true",
"split_on_case_change" : "false",
"type" : "word_delimiter",
"catenate_numbers" : "false"
}
}
},
"number_of_replicas" : "1",
"uuid" : "IUkHHwWrStqDMG6fYOqyqQ",
"version" : {
"created" : "5020299"
}
}
}
}
}
then i try open index but ES give me this:
{
"error" : {
"root_cause" : [
{
"type" : "exception",
"reason" : "Failed to verify index [news/IUkHHwWrStqDMG6fYOqyqQ]"
}
],
"type" : "exception",
"reason" : "Failed to verify index [news/IUkHHwWrStqDMG6fYOqyqQ]",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Custom Analyzer [russian_analyzer] failed to find filter under name [custom_stop]"
}
},
"status" : 500
}
Can't understand where i'm wrong.
Can anyone see what the problem is?
There was mistake in "filter" section
was:
look here this This colon was a mistake
|
v
"filter:" : {
"custom_stop" : {
"type" : "stop",
"stopwords" : [
"n",
"r"
]
}...
Thanks #asettou and #andrey-morozov

Resources