Elastic Search Aggregations not working with subfield keyword - elasticsearch

I am new to ES and trying to work with facet creation. But I am stuck at the first step itself. I have an index with a "text" field that has subfield as "keyword" shown below as an example.
PUT my-index
{
"mappings": {
"properties": {
"flavor": {
"type": "text",
"fields": {
"text": {
"type": "keyword"
}
}
}
}
}
}
I have 5 documents indexed, and I am trying to create buckets using the query below:
GET my-index/_search
{
"aggs": {
"my_buckets": {
"terms": {
"field": "flavor.keyword"
}
}
}
}
I do not get any error but also I do not get any buckets. But I also tried to do it in a reverse way. i.e. make the flavor field to be a keyword and in turn added text as the subfield. This method works and gives me the desired result. The problem here is I cannot change the original mapping which I have. I was expecting to get the results by using flavor.keyword but I hit a dead end now. Can someone please explain this behavior and anyway I can make it working?

You named the sub field as text. So, you have to refer it by flavor.text.
GET my-index/_search
{
"aggs": {
"my_buckets": {
"terms": {
"field": "flavor.text"
}
}
}
}

Related

How to search by non-tokenized field length in ElasticSearch

Say I create an index people which will take entries that will have two properties: name and friends
PUT /people
{
"mappings": {
"properties": {
"friends": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and I put two entries, each one of them has two friends.
POST /people/_doc
{
"name": "Jack",
"friends": [
"Jill", "John"
]
}
POST /people/_doc
{
"name": "Max",
"friends": [
"John", "John" # Max will have two friends, but both named John
]
}
Now I want to search for people that have multiple friends
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['friends.keyword'].length > 1"
}
}
}
]
}
}
}
This will only return Jack and ignore Max. I assume this is because we are actually traversing the inversed index, and John and John create only one token - which is 'john' so the length of the tokens is actually 1 here.
Since my index is relatively small and performance is not the key, I would like to actually traverse the source and not the inversed index
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "ctx._source.friends.length > 1"
}
}
}
]
}
}
}
But according to the https://github.com/elastic/elasticsearch/issues/20068 the source is supported only when updating, not when searching, so I cannot.
One obvious solution to this seems to take the length of the field and store it to the index. Something like friends_count: 2 and then filter based on that. But that requires reindexing and also this appears as something that should be solved in some obvious way I am missing.
Thanks a lot.
There is a new feature in ES 7.11 as runtime fields a runtime field is a field that is evaluated at query time. Runtime fields enable you to:
Add fields to existing documents without reindexing your data
Start working with your data without understanding how it’s structured
Override the value returned from an indexed field at query time
Define fields for a specific use without modifying the underlying schema
you can find more information here about runtime fields, but how you can use runtime fields you can do something like this:
Index Time:
PUT my-index/
{
"mappings": {
"runtime": {
"friends_count": {
"type": "keyword",
"script": {
"source": "doc['#friends'].size()"
}
}
},
"properties": {
"#timestamp": {"type": "date"}
}
}
}
You can also use runtime fields in search time for more information check here.
Search Time
GET my-index/_search
{
"runtime_mappings": {
"friends_count": {
"type": "keyword",
"script": {
"source": "ctx._source.friends.size()"
}
}
}
}
Update:
POST mytest/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.arrayLength = ctx._source.friends.size()"
}
}
You can update all of your document with query above and adjust your query.
For everyone wondering about the same issue, I think #Kaveh answer is the most likely way to go, but I did not manage to make it work in my case. It seems to me that source is created after the query is performed and therefore you cannot access source for the purposes of filtering query.
This leaves you with two options:
filter the result on the application level (ugly and slow solution)
actually save the filed length in a separate field. Such as friends_count
possibly there is another option I don't know about(?).

Only getting results when elasticsearch is case sensitive

I currently have a problem with my multi match search in ES.
Its simple like that: If I'm searching for the City "Sachsen", I'm getting results.
If I'm searching for "sachsen" (lowercase), I'm getting no results.
how to avoid this?
QUERY with no results
{
"match" : {
"City" : {
"query" : "sachsen"
}
}
My analyzer is analyzer_keyword. Should I have anything add ?
MAPPING
City: {
type: "string",
analyzer: "analyzer_keyword"
}
Your analyzer_keyword analyzer is most probably of type keyword which means you can only perform exact matches on it.
It's standard practice to apply multiple "variants" of a field, one of which is going to match lowercase, possibly ascii-tokenized characters (think München -> munchen) and one which will not be tokenized in any way (this is what you have in your analyzer_keyword).
Since you intend to search the lowercase version of Sachsen, your mapping could look something like
PUT sachsen
{
"mappings": {
"properties": {
"City": {
"type": "keyword", <----
"fields": {
"standard": {
"type": "text",
"analyzer": "standard" <----
}
}
}
}
}
}
After indexing a doc
POST sachsen/_doc
{
"City": "Sachsen"
}
The following will work for exact matches:
GET sachsen/_search
{
"query": {
"match": {
"City": "Sachsen"
}
}
}
and this for lowercase
GET sachsen/_search
{
"query": {
"match": {
"City.standard": "sachsen"
}
}
}
Note that I'm using the default, standard analyzer here but you can choose any one you deem appropriate.

elasticsearch query string(full text search) which can't be searched

we have a document below. I can't searched with financialmarkets. but it can be searched with industry_icon_financialmarkets.png. Can anyone tell me what is the reason?
content is the text type field.
document:
{
"title":"test",
"content":"industry_icon_financialmarkets.png"
}
Query:
{
"from": 0,
"size": 2,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "\"industry_icon_financialmarkets.png\""
}
}
]
}
}
}
The default analyzer for text field is standard which won't break industry_icon_financialmarkets into tokens using _ as a delimiter. I would suggest you to use simple analyzer instead which will breaks text into terms whenever it encounters a character which is not a letter.
You can also add sub-field of type keyword to retain the original value.
So the mapping of the field should be:
{
"content": {
"type": "text",
"analyzer": "simple",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
At the time of creating index, we should have our own mapping for each fields based on its type to get the expected result.
Mapping
PUT relevance
{"mapping":{"ID":{"type":"long"},"title":
{"type":"keyword","analyzer":"my_analyzer"},
"content":
{"type":"string","analyzer":"my_analyzer","search_analyzer":"my_analyzer"}},
"settings":
{"analysis":
{"analyzer":
{"my_analyzer":
{"tokenizer":"my_tokenizer"}},
"tokenizer":
{"my_tokenizer":
{"type":"ngram","min_gram":3,"max_gram":30,"token_chars":
["letter","digit"]
}
}
},"number_of_shards":5,"number_of_replicas":2
}
}
Then start inserting documents,
POST relevance/_doc/1
{
"name": "1elastic",
"content": "working fine" //replace special characters with space using program before inserting into ES index.
}
Query
GET relevance/_search
{"size":20,"query":{"bool":{"must":[{"match":{"content":
{"query":"fine","fuzziness":1}}}]}}}

How to store URLs in Elasticsearch for fast access?

I would like to use Elasticsearch for a website and need a fast way to retrieve documents via the page's URL strings - actually paths (e.g. /shoes/sneakers/nike). The paths are unique.
Following solutions come to my mind:
Store as string, indexed, not analyzed
Store in the _id field
Which one would be the better solution and are there maybe better methods?
Thanks!
You can store the url field as keyword datatype and use the below query to get the results.
https://www.elastic.co/guide/en/elasticsearch/reference/master/keyword.html
POST index/_search
{
"query": {
"term": {
"url": "/shoes/sneakers/nike"
}
}
}
if you store it as text data type then elasticsearch will automatically create a keyword field for you and you can use the below query to get the results
mapping created by Elasticsearch
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
query to search
POST index/_search
{
"query": {
"term": {
"url.keyword": "/shoes/sneakers/nike"
}
}
}
{
"query": {
"match_phrase": {
"url": "/shoes/sneakers/nike"
}
}
}
You need to store url field with their values. This match_phrase will solve your problem.

Multiple Paths in Nested Queries

I'm cross-posting this from the elasticsearch forums (https://discuss.elastic.co/t/multiple-paths-in-nested-query/96851/1)
Below is an example, but first I’ll tell you about my use case, because I’m not sure if this is a good approach. I’m trying to automatically index a large collection of typed data. What this means is I’m trying to generate mappings and queries on those mappings all automatically based on information about my data. A lot of my data is relational, and I’m interested in being able to search accross the relations, thus I’m also interested in using Nested data types.
However, the issue is that many of these types have on the order of 10 relations, and I’ve got a feeling its not a good idea to pass 10 identical copies of a nested query to elasticsearch just to query 10 different nested paths the same way. Thus, I’m wondering if its possible to instead pass multiple paths into a single query? Better yet, if its possible to search over all fields in the current document and in all its nested documents and their fields in a single query. I’m aware of object fields, and they’re not a good fit because I want to retrive some data of matched nested documents.
In this example, I create an index with multiple nested types and some of its own types, upload a document, and attempt to query the document and all its nested documents, but fail. Is there some way to do this without duplicating the query for each nested document, or is that actually a performant way to do this? Thanks
PUT /my_index
{
"mappings": {
"type1" : {
"properties" : {
"obj1" : {
"type" : "nested",
"properties": {
"name": {
"type":"text"
},
"number": {
"type":"text"
}
}
},
"obj2" : {
"type" : "nested",
"properties": {
"color": {
"type":"text"
},
"food": {
"type":"text"
}
}
},
"lul":{
"type": "text"
},
"pucci":{
"type": "text"
}
}
}
}
}
PUT /my_index/type1/1
{
"obj1": [
{ "name":"liar", "number":"deer dog"},
{ "name":"one two three", "number":"you can call on me"},
{ "name":"ricky gervais", "number":"user 123"}
],
"obj2": [
{ "color":"red green blue", "food":"meatball and spaghetti"},
{ "color":"orange", "food":"pineapple, fish, goat"},
{ "color":"none", "food":"none"}
],
"lul": "lul its me user123",
"field": "one dog"
}
POST /my_index/_search
{
"query": {
"nested": {
"path": ["obj1", "obj2"],
"query": {
"query_string": {
"query": "ricky",
"all_fields": true
}
}
}
}
}

Resources