ES index mapping has "analyzer" parameter - elasticsearch

One of my indices had this mapping (a lot not closely related content were omitted for simplicity):
{
"properties" : {
"analyzer" : {
"type" : "keyword",
"ignore_above" : 2048
},
"query" : {
"properties" : {
"bool" : {
"properties" : {
"filter" : {
"properties" : {
"range" : {
"properties" : {
"publish_time" : {
"properties" : {
"gte" : {
"type" : "date",
"format" : "yyyy-MM-dd"
},
"lte" : {
"type" : "date",
"format" : "yyyy-MM-dd"
}
}
}
}
}
}
},
"must" : {
"properties" : {
"match" : {
"properties" : {
"content" : {
"type" : "keyword",
"ignore_above" : 2048
}
}
},
"term" : {
"properties" : {
"topic_id" : {
"type" : "keyword",
"ignore_above" : 2048
}
}
}
}
}
}
},
"match" : {
"properties" : {
"doc_id" : {
"type" : "keyword",
"ignore_above" : 2048
}
}
},
"match_all" : {
"type" : "object"
}
}
},
}
}
I don't know why the "properties" has a filed "analyzer", I have read the offical doc and searched in google for quite a while, but find nearly nothing helps.
Actually, I don't know why there is a "query" field either, but fortunately, I find the answer in the post "ES index mapping has "query" parameter" (link ES index mapping has "query" parameter).
I guess there maybe three possible explanations:
"analyzer" parameter is indeed the keyword to specify the analyzer used when indexing all fields of the current index (I guess this has a medium possibility);
"analyzer" parameter is a result of incorrect use of some commond as why there is "query" parameter (I guess this has a medium possibility);
"analyzer" parameter is a user defined filed (I guess this has a lower possibility).
So, could someone help me with this issue please?

Related

ElasticSearch - Create Query That Show Different Properties Between Two Indexes

I am trying to create an elastic query that will show non-common properties between two indexes. Say the first index is:
{
"myFirstIndex" : {
"mappings" : {
"properties" : {
"CAT" : {
"type" : "keyword",
"ignore_above" : 256
},
"DATE_OF_BIRTH" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"ID" : {
"type" : "keyword",
"ignore_above" : 256
},
"NAME" : {
"type" : "text"
},
"timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
, and the second was is:
{
"mySecondIndex" : {
"mappings" : {
"properties" : {
"CAT" : {
"type" : "keyword",
"ignore_above" : 256
},
"DATE_OF_BIRTH" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"ID" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
I have never done a query across indexes so I am not sure how to do this. I don't care much about whether the properties have nested characteristics. For my purposes, finding the appropriate common properties at a base level is sufficient.
Grateful for any assistance. Thank you
(based off your clarifications) you can't do that natively in Elasticsearch
you'd need to run the queries from some code and then compare the two indices in said code

Why the _source of the index I created by default is disabled in elasticsearch

Why the _source of the index I created by default is disabled in elasticsearch
enter image description here
I did not use any templates,No matter how the index is created, _source is disabled.
My cluster may have insufficient disks before, but after my cleaning, the disk usage rate is within 20%
this is my index detail info
{
"test" : {
"aliases" : { },
"mappings" : {
"_source" : {
"enabled" : false
},
"properties" : {
"#timestamp" : {
"type" : "date"
},
"Exception" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"log" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1636705210909",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "8BRNRGnQSyOJzqMOHLXjHw",
"version" : {
"created" : "7030099"
},
"provided_name" : "test"
}
}
}
}

ElasticSearch Tokenizer keywords

I'm wondering how elastic search tokenizes keywords.
Example:
So I'm using a search box for searching keywords in comments.
When I search for "Zelle" only comments in Spanish showed up.
enter image description here
But if I search for "Zell", all comments with "Zelle" showed up, with highlighting "Zell".
enter image description here
Can anyone please tell me why when I search for some keywords, only some comments in specific languages showed up?
Edit1:
The mapping is like this:
{
"comments" : {
"mappings" : {
"ios" : {
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"date" : {
"type" : "date"
},
"language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"product_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"product_version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"rating" : {
"type" : "long"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"user_language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
and it did not have any info with the tokenizer.
How should I know which tokenizer es uses for searching?
I recommend you read the Mapping chapter of the official book, it will help you a lot.
To answer your question, we need to know the Mapping of your documents, specifically, the mapping of the field you search in.
By the look of it, you do not use the default analyzer (called "standard"), because "Zell" would not match "Zelle" with it.
In Elasticsearch you have analyzer which tokenize your content the way you want. And by the look of it, some analyzer is setup in your mapping, because "Zelle" and "Zell" are matching.

Is there a way to Search through Elastic Search to get all results that have an ID contained in an array of IDs?

Been trying to find a way to do this for a couple days now. I've looked through 'bool', 'constant_score', 'filtered' queries none of which seem to be able to come up with the result I want.
One that HAS come close is the 'ids' query (does exactly what I described in the title of this questions) the one problem is that the key that I'm trying to search is not the '_id' value of the Elastic search index. Instead it is 'posterId' in the index below:
"_index": "activity",
"_type": "activity",
"_id": "<unique string id>",
"_score": null,
"_source": {
...
misc keys
...
"posterId": "<QUERY BASED ON THIS VALUE>",
"time": 20171007173623
}
Query that returns based on the _id value:
ids : {
type : "activity",
values : ["<unique string id>", ...]
}
as seen here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
How I want my query to work:
posterId : {
type : "activity",
values : [<list of posterIds>]
}
Returning all indicies that have posterIds contained in "<list of posterIds>"
< Edit > I'm trying to do this in one query as apposed to looping through each member of my list of posterIds because I also need to sort based on the time key and be able to page the query.
So, does anyone know of a built in query that does this or a work around?
Side note: if you feel like you're about to downvote this please just comment why, I'm about to be banned and I've read through all the guidelines and I feel like I'm following them but my questions rarely perform well. :( It would be much appreciated
Edit:
{
"activity" : {
"aliases" : { },
"mappings" : {
"activity" : {
"properties" : {
"-Kvp7f3epvW_dXSONzKj" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"activityType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"cardType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"noteTitleDict" : {
"properties" : {
"noun" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"subject" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"verb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"posterId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"segueType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"time" : {
"type" : "long"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1507678305995",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "<id>",
"version" : {
"created" : "5010199"
},
"provided_name" : "activity"
}
}
}
}
I think what you are looking for is a Terms Query
{
"query": {
"constant_score" : {
"filter" : {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
}
}
This finds documents which contain the exact term Kimchy or elasticsearch in the index of the user field. You can read more about this here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
In your case you need to replace
the user with posterId.keyword
Kimchy and elasticsearch with all your posterIds
Keep in mind that a terms query is case sensitive and the keyword field does not use a lowercase analyzer (which means it'll save/index the value in the same case it was received)

Elasticsearch on object nested under objects array

Assuming I have the following index structure:
{
"title": "Early snow this year",
"body": "After a year with hardly any snow, this is going to be a serious winter",
"source": [
{
"name":"CNN",
"details": {
"site": "cnn.com"
}
},
{
"name":"BBC",
"details": {
"site": "bbc.com"
}
}
]
}
and I have a bool query to try and retrieve this document here:
{
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "snow",
"fields" : ["title", "body"]
}
},
"filter": {
"bool": {
"must" : [
{ "term" : {"source.name" : "bbc"}},
{ "term" : {"source.details.site" : "BBC.COM"}}
]
}
}
}
}
}'
But it is not working with zero hits, how should I modify my query? It is only working if I remove the { "term" : {"source.details.site" : "BBC.COM"}}.
Here is the mapping:
{
"news" : {
"mappings" : {
"article" : {
"properties" : {
"body" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"properties" : {
"details" : {
"properties" : {
"site" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
You are doing a term query on "source.details.site". Term query means that the value you provide will not be analysed at query time. If you are using default mapping then source.details.site will be lowercased. Now when you query it with term and "BBC.COM", "BBC.COM" will not be analysed and ES is trying to match "BBC.COM" with "bbc.com" (because it was lowercased at index time) and result is false.
You can use match instead of term to get it analysed. But its better to use term query on your keyword field, it you know in advance the exact thing that would have been indexed. Term queries have good advantage of caching from ES side and it is faster than match queries.
You should clean your data at index time as you will write once and read always. So anything like "/", "http" should be removed if you are not losing the semantics. You can achieve this from your code while indexing or you can create custom analysers in your mapping. But do remember that custom analysers won't work on keyword field. So, if you try to achieve this on ES side, you wont be able to do aggregations on that field without enabling field data, that should be avoided. We have an experimental support for normalisers in latest update, but as it is experimental, don't use it in production. So in my opinion you should clean the data in your code.

Resources