Elasticsearch Suggestions Multi Index and Multi Fields - elasticsearch

I have different indexes that contain different fields. And I try to figure out how to get suggests from all indexes and all fields. I know that with GET /_all/_search I can search for results through all indexes. But how can I get all suggestions from all indexes and all fields? Because I want to have a feature like Google "Did you mean: suggests"
So, I tried this out:
GET /_all/_search
{
"query" : {
"multi_match" : {
"query" : "berlin"
}
},
"suggest" : {
"text" : "berlin",
"my-suggest-1" : {
"term" : {
"field" : "street"
}
},
"my-suggest-2" : {
"term" : {
"field" : "city"
}
},
"my-suggest-3" : {
"term" : {
"field" : "description"
}
}
}
}
"my-suggest-1" and "-2" belongs to Index address (see below) and "my-suggest-3" belongs to Index product. I get the following error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [street]"
},
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [city]"
},
{
"type" : "illegal_argument_exception",
"reason" : "no mapping found for field [description]"
}
]
}
But if I use only the fields of 1 index I get suggestions, see:
GET /_all/_search
{
"query" : {
"multi_match" : {
"query" : "berlin"
}
},
"suggest" : {
"text" : "berlin",
"my-suggest-1" : {
"term" : {
"field" : "street"
}
},
"my-suggest-2" : {
"term" : {
"field" : "city"
}
}
}
}
Response
...
"failures" : {
...
},
"hits" : {
...
}
"suggest" : {
"my-suggest-1" : [
{
"text" : "berlin",
"offset" : 0,
"length" : 10,
"options" : [
{
"text" : "berliner",
"score" : 0.9,
"freq" : 12
},
{
"text" : "berlinger",
"score" : 0.9,
"freq" : 1
}
]
}
],
"my-suggest-2" : [
{
"text" : "berlin",
"offset" : 0,
"length" : 10,
"options" : []
}
]
...
I don't know how I can get suggests from index address and product? I would be happy if someone can help me.
Index 1 - Address:
"address" : {
"aliases" : {
....
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"street" : {
"type" : "text"
},
"city" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}
Index 2 - Product:
"product" : {
"aliases" : {
...
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"description" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}

You can add multiple indices to your search. In this case, you need to search over the fields that exist on all indices. So In your case, you need to define all three fields in both of the indices. The fields "street" and "city" are filed in the first index and the field "description" is filled only in the second index. This will be your mapping for the "Address" index. In this index, the "description" field exists but has no data. In the second index, "street" and "city" exist but have no data.
"address" : {
"aliases" : {
....
},
"mappings" : {
"dynamic" : "strict",
"properties" : {
"_entity_type" : {
"type" : "keyword",
"index" : false
},
"street" : {
"type" : "text"
},
"city" : {
"type" : "text"
},
"description" : {
"type" : "text"
}
}
},
"settings" : {
...
}
}

Related

match_only_text fields do not support sorting and aggregations elasticsearch

I would like to count and sort the number of occurred message on a field of type match_only_text. Using a DSL query the output needed to have to look like this:
{" Text message 1":615
" Text message 2":568
....}
So i tried this on kibana:
GET my_index_name/_search?size=0
{
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message"
}
}
}
}
However i get this error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "match_only_text fields do not support sorting and aggregations"
}
I am interested in the field "message" this is its mapping:
"message" : {
"type" : "match_only_text"
}
This is a part of the index mapping:
"mappings" : {
"_meta" : {
"package" : {
"name" : "system"
},
"managed_by" : "ingest-manager",
"managed" : true
},
"_data_stream_timestamp" : {
"enabled" : true
},
"dynamic_templates" : [
{
"strings_as_keyword" : {
"match_mapping_type" : "string",
"mapping" : {
"ignore_above" : 1024,
"type" : "keyword"
}
}
}
],
"date_detection" : false,
"properties" : {
"#timestamp" : {
"type" : "date"
}
.
.
.
"message" : {
"type" : "match_only_text"
},
"process" : {
"properties" : {
"name" : {
"type" : "keyword",
"ignore_above" : 1024
},
"pid" : {
"type" : "long"
}
}
},
"system" : {
"properties" : {
"syslog" : {
"type" : "object"
}
}
}
}
}
}
}
Please Help
Yes, by design, match_only_text is of the text field type family, hence you cannot aggregate on it.
You need to:
A. create a message.keyword sub-field in your mapping of type keyword:
PUT my_index_name/_mapping
{
"properties": {
"message" : {
"type" : "match_only_text",
"fields": {
"keyword": {
"type" : "keyword"
}
}
}
}
}
B. update the whole index (using _update_by_query) so the sub-field gets populated and
POST my_index_name/_update_by_query?wait_for_completion=false
Then, depending on the size of your index, call GET _tasks?actions=*byquery&detailed regularly to check the progress of the task.
C. run the aggregation on that sub-field.
POST my_index_name/_search
{
"size": 0,
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message.keyword"
}
}
}
}

Extract Hashtags and Mentions into separate fields

I am doing a DIY Tweet Sentiment analyser, I have an index of tweets like these
"_source" : {
"id" : 26930655,
"status" : 1,
"title" : "Hereโ€™s 5 underrated #BTC and realistic crypto accounts that everyone should follow: #Quinnvestments , #JacobOracle , #jevauniedaye , #ginsbergonomics , #InspoCrypto",
"hashtags" : null,
"created_at" : 1622390229,
"category" : null,
"language" : 50
},
{
"id" : 22521897,
"status" : 1,
"title" : "#bulls gonna overtake the #bears soon #ATH coming #ALTSEASON #BSCGem #eth #btc #memecoin #100xgems #satyasanatan ๐Ÿ™๐Ÿšฉ๐Ÿšฉ๐Ÿ‡ฎ๐Ÿ‡ณ""",
"hashtags" : null,
"created_at" : 1620045296,
"category" : null,
"language" : 50
}
There Mappings are settings are like
"sentiment-en" : {
"mappings" : {
"properties" : {
"category" : {
"type" : "text"
},
"created_at" : {
"type" : "integer"
},
"hashtags" : {
"type" : "text"
},
"id" : {
"type" : "long"
},
"language" : {
"type" : "integer"
},
"status" : {
"type" : "integer"
},
"title" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
},
"raw_text" : {
"type" : "text"
},
"stop" : {
"type" : "text",
"index_options" : "docs",
"analyzer" : "stop_words_filter"
},
"syn" : {
"type" : "text",
"index_options" : "docs",
"analyzer" : "synonyms_filter"
}
},
"index_options" : "docs",
"analyzer" : "all_ok_filter"
}
}
}
}
}
"settings" : {
"index" : {
"number_of_shards" : "10",
"provided_name" : "sentiment-en",
"creation_date" : "1627975717560",
"analysis" : {
"filter" : {
"stop_words" : {
"type" : "stop",
"stopwords" : [ ]
},
"synonyms" : {
"type" : "synonym",
"synonyms" : [ ]
}
},
"analyzer" : {
"stop_words_filter" : {
"filter" : [ "stop_words" ],
"tokenizer" : "standard"
},
"synonyms_filter" : {
"filter" : [ "synonyms" ],
"tokenizer" : "standard"
},
"all_ok_filter" : {
"filter" : [ "stop_words", "synonyms" ],
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "0",
"uuid" : "Q5yDYEXHSM-5kvyLGgsYYg",
"version" : {
"created" : "7090199"
}
}
Now the problem is i want to extract all the Hashtags and mentions in a seprate field.
What i want as O/P
"id" : 26930655,
"status" : 1,
"title" : "Hereโ€™s 5 underrated #BTC and realistic crypto accounts that everyone should follow: #Quinnvestments , #JacobOracle , #jevauniedaye , #ginsbergonomics , #InspoCrypto",
"hashtags" : BTC,
"created_at" : 1622390229,
"category" : null,
"language" : 50
},
{
"id" : 22521897,
"status" : 1,
"title" : "#bulls gonna overtake the #bears soon #ATH coming #ALTSEASON #BSCGem #eth #btc #memecoin #100xgems #satyasanatan ๐Ÿ™๐Ÿšฉ๐Ÿšฉ๐Ÿ‡ฎ๐Ÿ‡ณ""",
"hashtags" : bulls,bears,ATH, ALTSEASON, BSCGem, eth , btc, memecoin, 100xGem, satyasanatan
"created_at" : 1620045296,
"category" : null,
"language" : 50
}
What i have tried so far
Create a pattern based tokenizer to just read Hashtags and mentions and no other token for field hashtag and mentions did not had much success there.
Tried to write an n-gram tokenizer without any analysers did not achive much success there as well.
Any help would be appreciated, I am open to reindex my data. Thanks in advance !!!
You can use Logstash Twitter input plugin for indexing data and configured below ruby script in filter plugin as mentioned in blog.
if [message] {
ruby {
code => "event.set('hashtags', event.get('message').scan(/\#[a-z]*/i))"
}
}
You can use Logtstash Elasticsearch Input plugin for source index and configured about ruby code in Filter plugin and Logtstash elasticsearch output plugin with destination index.
input {
elasticsearch {
hosts => "localhost:9200"
index => "current_twitter"
query => '{ "query": { "query_string": { "query": "*" } } }'
size => 500
scroll => "5m"
}
}
filter{
if [message] {
ruby {
code => "event.set('hashtags', event.get('message').scan(/\#[a-z]*/i))"
}
}
}
output {
elasticsearch {
index => "new_twitter"
}
}
Another option is to use reingest API with ingest pipeline but ingest pipeline not support ruby code. So you need to convert above ruby code to the painless script.

Kibana index pattern mapping conflict

I am tired of reindexing every 2 3 weeks i have to do reindex.
{
"winlogbeat_sysmon" : {
"order" : 0,
"index_patterns" : [
"log-wlb-sysmon-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "winlogbeat_sysmon_policy",
"rollover_alias" : "log-wlb-sysmon"
},
"refresh_interval" : "1s",
"number_of_shards" : "1",
"number_of_replicas" : "1"
}
},
"mappings" : {
"properties" : {
"thread_id" : {
"type" : "long"
},
"z_elastic_ecs.event.code" : {
"type" : "long"
},
"geoip" : {
"type" : "object",
"properties" : {
"ip" : {
"type" : "ip"
},
"latitude" : {
"type" : "half_float"
},
"location" : {
"type" : "geo_point"
},
"longitude" : {
"type" : "half_float"
}
}
},
"dst_ip_addr" : {
"type" : "ip"
}
}
},
"aliases" : { }
}
}
this is the template i set earlier from then i didn't change anything
in current and previous indices of log-wlb-sysmon has dst_ip_addr has ip field and older indices of log-wlb-sysmon has text field in logstash i didn't see any warnning for this issue

Elasticsearch Aggregation sorting

My Elasticsearch mapping is
{
"mappings" : {
"loc" : {
"dynamic": "true",
"properties" : {
"geoip" : {
"properties" : {
"location" : { "type": "geo_point"}
}
},
"lon" : { "type" : "double" },
"lat" : { "type" : "double" },
"altitude" : { "type" : "double" },
"id" : { "type" : "long" },
"date" : { "type" : "date", "format" : "epoch_millis" },
"ip" : { "type" : "string" },
"port" : { "type" : "string" }
}
}
}
}
And I want to sort by time.
So i made query.
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "0.2km",
"geoip.location" : {
"lat" : 36.773353,
"lon" : 126.933847
}
}
}
}
},
"size" : 0,
"sort" : { "date" : { "order" : "desc" } },
"aggs" : {
"ids" : {
"terms" : {
"field" : "id"
},
"aggs" : {
"dedup_docs" : {
"top_hits" : {"size" : 1}
}
}
}
}
}
I want to return the latest time by grouping the results of applying the gps filter by id and sorting in chronological order.
However, the date value of the result is an unordered result.
I do not know how to modify the query.

aggregation fails on nested aggregation field

I've this mapping for fuas type:
curl -XGET 'http://localhost:9201/living_team/_mapping/fuas?pretty'
{
"living_v1" : {
"mappings" : {
"fuas" : {
"properties" : {
"backlogStatus" : {
"type" : "long"
},
"comment" : {
"type" : "string"
},
"dueTimestamp" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"matter" : {
"type" : "string"
},
"metainfos" : {
"properties" : {
"category 1" : {
"type" : "string"
},
"key" : {
"type" : "string"
},
"null" : {
"type" : "string"
},
"processos" : {
"type" : "string"
}
}
},
"resources" : {
"properties" : {
"noteId" : {
"type" : "string"
},
"resourceId" : {
"type" : "string"
}
}
},
"status" : {
"type" : "long"
},
"timestamp" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"user" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
I'm trying to perform this aggregation:
curl -XGET 'http://ESNode01:9201/living_team/fuas/_search?pretty' -d '
{
"aggs" : {
"demo" : {
"nested" : {
"path" : "metainfos"
},
"aggs" : {
"key" : { "terms" : { "field" : "metainfos.key" } }
}
}
}
}
'
ES realizes me:
"error" : {
"root_cause" : [ {
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [metainfos] is not nested"
} ],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query_fetch",
"grouped" : true,
"failed_shards" : [ {
"shard" : 3,
"index" : "living_v1",
"node" : "HfaFBiZ0QceW1dpqAnv-SA",
"reason" : {
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [metainfos] is not nested"
}
} ]
},
"status" : 500
}
Any ideas?
You're missing "type":"nested" from your metainfos mapping.
Should have been:
"metainfos" : {
"type":"nested",
"properties" : {
"category 1" : {
"type" : "string"
},
"key" : {
"type" : "string"
},
"null" : {
"type" : "string"
},
"processos" : {
"type" : "string"
}
}
}

Resources