Elasticsearch on object nested under objects array - elasticsearch

Assuming I have the following index structure:
{
"title": "Early snow this year",
"body": "After a year with hardly any snow, this is going to be a serious winter",
"source": [
{
"name":"CNN",
"details": {
"site": "cnn.com"
}
},
{
"name":"BBC",
"details": {
"site": "bbc.com"
}
}
]
}
and I have a bool query to try and retrieve this document here:
{
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "snow",
"fields" : ["title", "body"]
}
},
"filter": {
"bool": {
"must" : [
{ "term" : {"source.name" : "bbc"}},
{ "term" : {"source.details.site" : "BBC.COM"}}
]
}
}
}
}
}'
But it is not working with zero hits, how should I modify my query? It is only working if I remove the { "term" : {"source.details.site" : "BBC.COM"}}.
Here is the mapping:
{
"news" : {
"mappings" : {
"article" : {
"properties" : {
"body" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"properties" : {
"details" : {
"properties" : {
"site" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}

You are doing a term query on "source.details.site". Term query means that the value you provide will not be analysed at query time. If you are using default mapping then source.details.site will be lowercased. Now when you query it with term and "BBC.COM", "BBC.COM" will not be analysed and ES is trying to match "BBC.COM" with "bbc.com" (because it was lowercased at index time) and result is false.
You can use match instead of term to get it analysed. But its better to use term query on your keyword field, it you know in advance the exact thing that would have been indexed. Term queries have good advantage of caching from ES side and it is faster than match queries.
You should clean your data at index time as you will write once and read always. So anything like "/", "http" should be removed if you are not losing the semantics. You can achieve this from your code while indexing or you can create custom analysers in your mapping. But do remember that custom analysers won't work on keyword field. So, if you try to achieve this on ES side, you wont be able to do aggregations on that field without enabling field data, that should be avoided. We have an experimental support for normalisers in latest update, but as it is experimental, don't use it in production. So in my opinion you should clean the data in your code.

Related

Elasticsearch change type existing fields

In my case, NIFI will receive data from syslog firewall, then after transformation sends JSON to ELASTIC. This is my first contact with ELASTICSEARCH
{
"LogChain" : "Corp01 input",
"src_ip" : "162.142.125.228",
"src_port" : "61802",
"dst_ip" : "177.16.1.13",
"dst_port" : "6580",
"timestamp_utc" : 1646226066899
}
In Elasticsearch automatically created Index with such types
{
"mt-firewall" : {
"mappings" : {
"properties" : {
"LogChain" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dst_ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dst_port" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"src_ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"src_port" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp_utc" : {
"type" : "long"
}
}
}
}
}
How to change type fields in Elasticsearch?
"src_ip": type "ip"
"dst_ip": type "ip"
"timestamp_utc": type "data"
You can change or configure field type using Mapping in Elasticsearch and some of the way i have given below:
1. Explicit Index Mapping
Here, you will define index mapping by your self with all the required field and specific type of field before indexing any document to Elasticsearch.
PUT /my-index-000001
{
"mappings": {
"properties": {
"src_ip": { "type": "ip" },
"dst_ip": { "type": "ip" },
"timestamp_utc": { "type": "date" }
}
}
}
2. Dyanamic Template:
Here, you will provide dynamic template while creating index and based on condition ES will map field with specific data type like if field name end with _ip then map field as ip type.
PUT my-index-000001/
{
"mappings": {
"dynamic_templates": [
{
"strings_as_ip": {
"match_mapping_type": "string",
"match": "*ip",
"runtime": {
"type": "ip"
}
}
}
]
}
}
Update 1:
If you want to update mapping in existing index then it is not recommndate as it will create data inconsistent.
You can follow bellow steps:
Use Reindex API to copy data to temp index.
Delete your original index.
define index with one of the above one method with index mapping.
Use Reindex API to copy data from temp index to original index (newly created index with Mapping)

ES index mapping has "analyzer" parameter

One of my indices had this mapping (a lot not closely related content were omitted for simplicity):
{
"properties" : {
"analyzer" : {
"type" : "keyword",
"ignore_above" : 2048
},
"query" : {
"properties" : {
"bool" : {
"properties" : {
"filter" : {
"properties" : {
"range" : {
"properties" : {
"publish_time" : {
"properties" : {
"gte" : {
"type" : "date",
"format" : "yyyy-MM-dd"
},
"lte" : {
"type" : "date",
"format" : "yyyy-MM-dd"
}
}
}
}
}
}
},
"must" : {
"properties" : {
"match" : {
"properties" : {
"content" : {
"type" : "keyword",
"ignore_above" : 2048
}
}
},
"term" : {
"properties" : {
"topic_id" : {
"type" : "keyword",
"ignore_above" : 2048
}
}
}
}
}
}
},
"match" : {
"properties" : {
"doc_id" : {
"type" : "keyword",
"ignore_above" : 2048
}
}
},
"match_all" : {
"type" : "object"
}
}
},
}
}
I don't know why the "properties" has a filed "analyzer", I have read the offical doc and searched in google for quite a while, but find nearly nothing helps.
Actually, I don't know why there is a "query" field either, but fortunately, I find the answer in the post "ES index mapping has "query" parameter" (link ES index mapping has "query" parameter).
I guess there maybe three possible explanations:
"analyzer" parameter is indeed the keyword to specify the analyzer used when indexing all fields of the current index (I guess this has a medium possibility);
"analyzer" parameter is a result of incorrect use of some commond as why there is "query" parameter (I guess this has a medium possibility);
"analyzer" parameter is a user defined filed (I guess this has a lower possibility).
So, could someone help me with this issue please?

Elasticsearch ignores my index template mappings when creating new index

No matter what I do, when the index is created with a heartbeat process (7.10.2)
Elasticsearch maps all fields and the monitor.id will be like:
GET /heartbeat-7.10.2-2021.05.25
[...]
"monitor" : {
"properties" : {
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
[...]
even if i delete the index, template, and update the template as:
{
"order" : 1,
"index_patterns" : [
"heartbeat-7.10.2-*"
],
"settings" : {
},
"mappings" : {
"dynamic": false,
"properties" : {
"monitor" : {
"properties" : {
"id" : {
"ignore_above" : 1024,
"type" : "keyword"
}
}
}
}
},
"aliases" : { }
}
It appears to be that template configuration is ignored.
There is no other heartbeat template.
This is problematic, because in this way I cannot use e.g. monitor.id for aggregation. This is a problem with multiple fields.
I'm relatively new to templates, so maybe I'm missing something.
So apparently I had both _template and _index_template and the _index_template had the priority
After
delete _index_template/heartbeat*
it works just fine.

Is there a way to Search through Elastic Search to get all results that have an ID contained in an array of IDs?

Been trying to find a way to do this for a couple days now. I've looked through 'bool', 'constant_score', 'filtered' queries none of which seem to be able to come up with the result I want.
One that HAS come close is the 'ids' query (does exactly what I described in the title of this questions) the one problem is that the key that I'm trying to search is not the '_id' value of the Elastic search index. Instead it is 'posterId' in the index below:
"_index": "activity",
"_type": "activity",
"_id": "<unique string id>",
"_score": null,
"_source": {
...
misc keys
...
"posterId": "<QUERY BASED ON THIS VALUE>",
"time": 20171007173623
}
Query that returns based on the _id value:
ids : {
type : "activity",
values : ["<unique string id>", ...]
}
as seen here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
How I want my query to work:
posterId : {
type : "activity",
values : [<list of posterIds>]
}
Returning all indicies that have posterIds contained in "<list of posterIds>"
< Edit > I'm trying to do this in one query as apposed to looping through each member of my list of posterIds because I also need to sort based on the time key and be able to page the query.
So, does anyone know of a built in query that does this or a work around?
Side note: if you feel like you're about to downvote this please just comment why, I'm about to be banned and I've read through all the guidelines and I feel like I'm following them but my questions rarely perform well. :( It would be much appreciated
Edit:
{
"activity" : {
"aliases" : { },
"mappings" : {
"activity" : {
"properties" : {
"-Kvp7f3epvW_dXSONzKj" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"activityType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"cardType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"noteTitleDict" : {
"properties" : {
"noun" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"subject" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"verb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"posterId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"segueType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"time" : {
"type" : "long"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1507678305995",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "<id>",
"version" : {
"created" : "5010199"
},
"provided_name" : "activity"
}
}
}
}
I think what you are looking for is a Terms Query
{
"query": {
"constant_score" : {
"filter" : {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
}
}
This finds documents which contain the exact term Kimchy or elasticsearch in the index of the user field. You can read more about this here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
In your case you need to replace
the user with posterId.keyword
Kimchy and elasticsearch with all your posterIds
Keep in mind that a terms query is case sensitive and the keyword field does not use a lowercase analyzer (which means it'll save/index the value in the same case it was received)

Elasticsearch query response influenced by _id

I created an index with the following mappings and settings:
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_index": {
"type": "custom",
"tokenizer": "filename",
"filter": ["icu_folding", "edge_ngram"]
},
"default_search": {
"type":"standard",
"tokenizer": "filename",
"filter": [
"icu_folding"
]
}
},
"tokenizer" : {
"filename" : {
"pattern" : "[^\\p{L}\\d]+",
"type" : "pattern"
}
},
"filter" : {
"edge_ngram" : {
"side" : "front",
"max_gram" : 20,
"min_gram" : 3,
"type" : "edgeNGram"
}
}
}
},
"mappings": {
"metadata": {
"properties": {
"title": {
"type": "string",
"analyzer": "case_insensitive_index"
}
}
}
}
}
I have the following documents:
{"title":"P-20150531-27332_News.jpg"}
{"title":"P-20150531-27341_News.jpg"}
{"title":"P-20150531-27512_News.jpg"}
{"title":"P-20150531-27343_News.jpg"}
creating a document with simple numerical ids
111
112
113
114
and querying using the query
{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"title" : {
"query" : "P-20150531-27332_News.jpg",
"type" : "boolean",
"fuzziness" : "AUTO"
}
}
}
}
results in the correct scoring and ordering of the documents returned:
P-20150531-27332_News.jpg -> 2.780985
P-20150531-27341_News.jpg -> 0.8262239
P-20150531-27512_News.jpg -> 0.8120311
P-20150531-27343_News.jpg -> 0.7687101
Strangely, creating the same documents with UUIDs
557eec2e3b00002c03de96bd
557eec0f3b00001b03de96b8
557eec0c3b00001b03de96b7
557eec123b00003a03de96ba
as IDs results in different scorings of the documents:
P-20150531-27341_News.jpg -> 2.646321
P-20150531-27332_News.jpg -> 2.1998127
P-20150531-27512_News.jpg -> 1.7725387
P-20150531-27343_News.jpg -> 1.2718291
Is this an intentional behaviour of Elasticsearch? If yes - how can I preserve the correct ordering regardless of the IDs used?
In the query it looks like you should be using 'default_search' as the analyzer for match query unless you actuall intended to use egde-ngram on the search query too.
Example :
{
"from" : 0,
"size" : 10,
"query" : {
"match" : {
"title" : {
"query" : "P-20150531-27332_News.jpg",
"type" : "boolean",
"fuzziness" : "AUTO",
"analyzer" : "default_search"
}
}
}
}
default_search would be the default-search analyzer only if there is are no explicit search_analyzer or analyzer specified in the mapping of the field.
The articlehere gives a good explanation of the rules by which analyzers are applied.
Also to ensure idf takes into account documents across shards you could use search_type=dfs_query_then_fetch

Resources