language analyzer not working with span query elasticsearch? - elasticsearch

I have an index my_index and a type my_type
{
"mappings": {
"my_type": {
"properties": {
"my_text": {
"type": "text",
"analyzer": "spanish"
}
}
}
}
}
then I want to search for 2 or more words, with, or without gaps but in the specified order.
For that I wrote the following query:
{
"query": {
"span_near": {
"clauses": [
{
"span_term": {
"my_text": "creación"
}
},
{
"span_term": {
"my_text": "fundación"
}
}
],
"slop":4,
"in_order": true
}
}
}
The query is not producing results. But if i search "creacion" and "fundacion" (without diacritics) the query show results.
If i do a phrase_query "creación de una fundación" produces results, so i think there is not support for language analyzers in span queries?
Thanks in advance...

Span queries are NOT analyzed, nor stemmed. They are LOW level queries exposed by the API and you need to know how it is stored before u do the SPAN query.

Related

ElasticSearch autocomplete doesn't work with the middle words

Using python elasticsearch-dsl:
class Record(Document):
tags = Keyword()
tags_suggest = Completion(preserve_position_increments=False)
def clean(self):
self.tags_suggest = {
"input": self.tags
}
class Index:
name = 'my-index'
settings = {
"number_of_shards": 2,
}
When I index
r1 = Record(tags=['my favourite tag', 'my hated tag'])
r2 = Record(tags=['my good tag', 'my bad tag'])
And when I try to use autocomplete with the word in the middle:
dsl = Record.search()
dsl = dsl.suggest("auto_complete", "favo", completion={"field": "tags_suggest"})
search_response = dsl.execute()
for option in search_response.suggest.auto_complete[0].options:
print(option.to_dict())
It won't return anything, but it will when I search "my favo". Any good practices to fix that (make it return 'my favourite tag' when I request suggestions for "favo")?
Check Mapping
Search in Elasticsearch, Is also depends on how you are indexing your data. I would suggest to have look on index mapping with the below query:
curl -X GET "elasticsearch.url:port/index_name/_mapping?pretty"
You need to check how data is being inserted like is it using any analyzer or tokeninzer to save data. If you have not specified any analyzer elasticsearch default uses standard analyzer. It will produce the terms accordingly.
As per your use case you need to apply analyzer, tokens & filters. Here is the one Example where i have to use like query and implemented ngram token filter.
Solution
As i can see you are using suggester, The suggest feature suggests similar looking terms based on a provided text by using a suggester.
If you want to achieve autocomplete, I would suggest to use search as you type.
I tried to reproduce your use case and below is something which worked for me.
Create Index
PUT /test1?pretty
{
"mappings": {
"properties": {
"tags": {
"type": "search_as_you_type"
}
}
}
}
Indexing data
POST test1/_doc?pretty
{
"tags":"my favourite tag"
}
POST test1/_doc?pretty
{
"tags":"my hated tag"
}
POST test1/_doc?pretty
{
"tags":"my good tag"
}
POST test1/_doc?pretty
{
"tags":"my bad tag"
}
Query with your keyword
GET /test1/_search?pretty
{
"query": {
"multi_match": {
"query": "my",
"type": "bool_prefix",
"fields": [
"tags",
"tags._2gram",
"tags._3gram"
]
}
}
}
GET /test1/_search?pretty
{
"query": {
"multi_match": {
"query": "bad",
"type": "bool_prefix",
"fields": [
"tags",
"tags._2gram",
"tags._3gram"
]
}
}
}
GET /test1/_search?pretty
{
"query": {
"multi_match": {
"query": "fav",
"type": "bool_prefix",
"fields": [
"tags",
"tags._2gram",
"tags._3gram"
]
}
}
}
You can achive this by setting preserve_position_increments parameter to false in your mappings.
"tags_completion": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": false,
"preserve_position_increments": false,
"max_input_length": 50
}
You can query it in console like this:
GET /_search
{
"suggest" : {
"my-suggester": {
"prefix": "favou",
"completion": {
"field": "tags_completion",
"skip_duplicates": true,
"fuzzy": {
"fuzziness": 1
}
}
}
}
}
}

Only getting results when elasticsearch is case sensitive

I currently have a problem with my multi match search in ES.
Its simple like that: If I'm searching for the City "Sachsen", I'm getting results.
If I'm searching for "sachsen" (lowercase), I'm getting no results.
how to avoid this?
QUERY with no results
{
"match" : {
"City" : {
"query" : "sachsen"
}
}
My analyzer is analyzer_keyword. Should I have anything add ?
MAPPING
City: {
type: "string",
analyzer: "analyzer_keyword"
}
Your analyzer_keyword analyzer is most probably of type keyword which means you can only perform exact matches on it.
It's standard practice to apply multiple "variants" of a field, one of which is going to match lowercase, possibly ascii-tokenized characters (think München -> munchen) and one which will not be tokenized in any way (this is what you have in your analyzer_keyword).
Since you intend to search the lowercase version of Sachsen, your mapping could look something like
PUT sachsen
{
"mappings": {
"properties": {
"City": {
"type": "keyword", <----
"fields": {
"standard": {
"type": "text",
"analyzer": "standard" <----
}
}
}
}
}
}
After indexing a doc
POST sachsen/_doc
{
"City": "Sachsen"
}
The following will work for exact matches:
GET sachsen/_search
{
"query": {
"match": {
"City": "Sachsen"
}
}
}
and this for lowercase
GET sachsen/_search
{
"query": {
"match": {
"City.standard": "sachsen"
}
}
}
Note that I'm using the default, standard analyzer here but you can choose any one you deem appropriate.

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Elasticsearch boost with Wildcardsearch on _all

Im trying to search documents with wildcard and _all. But It does not seem like it's possible to get boosted result with wildcard on _all ?
MappingRequest:
"theboostingclass": {
"properties": {
"Important": {
"boost": 2.0,
"type": "string"
},
"LessImportant": {
"type": "string"
},
"Garbage": {
"type": "string"
}
}
}
}
Indexing:
{
"index" :
{
"_index":"boosting",
"_type":"theboostingclass"
}
}
{
"Important":"bomb",
"LessImportant":"kruka",
"Garbage":"kalkon"
}
{
"index" :
{
"_index":"boosting",
"_type":"theboostingclass"
}
}
{
"Important":"kalkon",
"LessImportant":"bomb",
"Garbage":"bomber"
}
{
"index" :
{
"_index":"boosting",
"_type":"theboostingclass"
}
}
{
"Important":"kruka",
"LessImportant":"bomber",
"Garbage":"bomb"
}
Query
"query": {
"wildcard": {
"_all": {
"value": "*bomb*"
}
}
}
The result returs all hits with a Score of 1 and a seemingly random order. Which is not really what Im after. I want the hit on "Important"field to yield a higher score.
If I do a wildcard search on all 3 fields the scoring seems correct. However I want to use it on _all. Any ideas?
Please see documentation here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-term-rewrite.html. Note that the reason it works with a constant scoring by default is for performance.
I believe you need to modify your query as follows:
"query": {
"wildcard": {
"_all": {
"value": "*bomb*",
"rewrite": "scoring_boolean"
}
}
}

Elastic search multiple terms in a dictionary

I have mapping like:
"profile": {
"properties": {
"educations": {
"properties": {
"university": {
"type": "string"
},
"graduation_year": {
"type": "string"
}
}
}
}
}
which obviously holds the educations history of people. Each person can have multiple educations. What I want to do is search for people who graduated from "SFU" in "2012". To do that I am using filtered search:
"filtered": {
"filter": {
"and": [
{
"term": {
"educations.university": "SFU"
}
},
{
"term": {
"educations.graduation_year": "2012"
}
}
]
}
But what this query does is to find the documents who have "SFU" and "2012" in their education, so this document would match, which is wrong:
educations[0] = {"university": "SFU", "graduation_year": 2000}
educations[1] = {"university": "UBC", "graduation_year": 2012}
Is there anyway I could filter both terms on each education?
You need to define nested type for educations and use nested filter to filter it, or Elasticsearch will internally flattens inner objects into a single object, and return the wrong results.
You can refer here for detail explainations and samples:
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Resources