Elasticsearch indexing multi_field with array field - elasticsearch

I'm new to elasticsearch and I'm trying to create a multi_field index with string and array of strings fields. With the string fields it's all working great but when I try to get some results when they are inside an array, it returns an empty one.
My data:
{
"string_field_one": "one",
"string_field_two": "two",
"array_of_strings_field": [
"2010", "2011", "2012", "2013"
]
}
Mappings:
{
"string_field_one" : {
"type" : "string",
"analyzer": "snowball",
"copy_to": "group"
},
"string_field_two" : {
"type" : "string",
"analyzer": "snowball",
"copy_to": "group"
},
"array_of_strings_field" : {
"type" : "string",
"analyzer": "keyword",
"copy_to": "group"
}
"group" : {
"type": "multi_field"
}
}
Search:
"body": {
"query": {
"multi_match": {
"query": "one two 2010",
type: "cross_fields",
operator: "and",
fields: [
"string_field_one",
"string_field_two",
"array_of_strings_field",
"group"
]
}
}
}
Expecting:
Searching by one, two, or 2010 should return the result
Searching by one two should return the result
Searching by one two 2010 should return the result
Searching by one two 2008 should not return the result
What am I missing?

Cross_fields has the constraint all fields should have the same search analyzer or rather all the query terms should occur in fields with same search analyzer. Which is not the case here.
You would need to use query_string for the above case:
Example:
"body": {
"query": {
"query_string": {
"query": "one two 2010",
"default_operator": "AND",
"fields": [
"string_field_one",
"string_field_two",
"array_of_strings_field",
"group"
]
}
}
}

Related

Elastic Search,lowercase search doesnt work

I am trying to search again content using prefix and if I search for diode I get results that differ from Diode. How do I get ES to return result where both diode and Diode return the same results? This is the mappings and settings I am using in ES.
"settings":{
"analysis": {
"analyzer": {
"lowercasespaceanalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"articles": {
"properties": {
"title": {
"type": "text"
},
"url": {
"type": "keyword",
"index": "true"
},
"imageurl": {
"type": "keyword",
"index": "true"
},
"content": {
"type": "text",
"analyzer" : "lowercasespaceanalyzer",
"search_analyzer":"whitespace"
},
"description": {
"type": "text"
},
"relatedcontentwords": {
"type": "text"
},
"cmskeywords": {
"type": "text"
},
"partnumbers": {
"type": "keyword",
"index": "true"
},
"pubdate": {
"type": "date"
}
}
}
}
here is an example of the query I use
POST _search
{
"query": {
"bool" : {
"must" : {
"prefix" : { "content" : "capacitance" }
}
}
}
}
it happens because you use two different analyzers at search time and at indexing time.
So when you input query "Diod" at search time because you use "whitespace" analyzer your query is interpreted as "Diod".
However, because you use "lowercasespaceanalyzer" at index time "Diod" will be indexed as "diod". Just use the same analyzer both at search and index time, or analyzer that lowercases your strings because default "whitespace" analyzer doesn't https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-whitespace-analyzer.html
There will be no term of Diode in your index. So if you want to get same results, you should let your query context analyzed by same analyzer.
You can use Query string query like
"query_string" : {
"default_field" : "content",
"query" : "Diode",
"analyzer" : "lowercasespaceanalyzer"
}
UPDATE
You can analyze your context before query.
AnalyzeResponse resp = client.admin().indices()
.prepareAnalyze(index, text)
.setAnalyzer("lowercasespaceanalyzer")
.get();
String analyzedContext = resp.getTokens().get(0);
...
Then use analyzedContext as new query context.

How do I search for partial accented keyword in elasticsearch?

I have the following elasticsearch settings:
"settings": {
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":["lowercase", "asciifolding"]
}
}
}
}
}
The above works fine for the following keywords:
Beyoncé
Céline Dion
The above data is stored in elasticsearch as beyonce and celine dion respectively.
I can search for Celine or Celine Dion without the accent and I get the same results. However, the moment I search for Céline, I don't get any results. How can I configure elasticsearch to search for partial keywords with the accent?
The query body looks like:
{
"track_scores": true,
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["name"],
"type": "phrase",
"query": "Céline"
}
}
]
}
}
}
and the mapping is
"mappings" : {
"artist" : {
"properties" : {
"name" : {
"type" : "string",
"fields" : {
"orig" : {
"type" : "string",
"index" : "not_analyzed"
},
"simple" : {
"type" : "string",
"analyzer" : "analyzer_keyword"
}
},
}
I would suggest this mapping and then go from there:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
}
Confirm that the same analyzer is getting used at query time. Here are some possible reasons why that might not be happening:
you specify a separate analyzer at query time on purpose that is not performing similar analysis
you are using a term or terms query for which no analyzer is applied (See Term Query and the section title "Why doesn’t the term query match my document?")
you are using a query_string query (E.g. see Simple Query String Query) - I have found that if you specify multiple fields with different analyzers and so I have needed to separate the fields into separate queries and specify the analyzer parameter (working with version 2.0)

bidirectional match on elasticsearch

I've indexed a list of terms and now I want to query for some of them
Say that I've indexed 'dog food','red dog','dog','food','cats'
How do I create an exact bidirectional match query. ie: I want when search for 'dog' to get only the term dog and not the other terms (because they don't match back).
One primitive solution I thought of is indexing the terms with their length (Words-wise) and then when searching query with lengh X restrict it to the terms of length X. but it seems over complicated.
Create a custom analyzer to lowercase and normalize your search terms. So that would be your index:
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_analyzer_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : [
"asciifolding",
"lowercase"
]
}
}
}
},
"mappings" : {
"your_type" : {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "my_analyzer_keyword"
}
}
}
}
}
So if you have indexed 'dog' and users types in Dog or dog or DOG, it will match only dog, 'dog food' won't be brought back.
Just set your field's index property to not_analyzed and your query should use term filter to search for text.
As per Evaldas' suggestion, find below a more complete solution, that also keeps the original value indexed with standard analyzer but uses a sub-field with a lowercased version of the terms:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"asset": {
"properties": {
"name": {
"type": "string",
"fields": {
"case_ignore": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
POST /test/asset/1
{
"name":"dog"
}
POST /test/asset/2
{
"name":"dog food"
}
POST /test/asset/3
{
"name":"red dog"
}
GET /test/asset/_search
{
"query": {
"match": {
"name.case_ignore": "Dog"
}
}
}

Elasticsearch multi-word, multi-field search with analyzers

I want to use elasticsearch for multi-word searches, where all the fields are checked in a document with the assigned analyzers.
So if I have a mapping:
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
},
"mappings" : {
"typeName" :{
"date_detection": false,
"properties" : {
"stringfield" : {
"type" : "string",
"index" : "folding"
},
"numberfield" : {
"type" : "multi_field",
"fields" : {
"numberfield" : {"type" : "double"},
"untouched" : {"type" : "string", "index" : "not_analyzed"}
}
},
"datefield" : {
"type" : "multi_field",
"fields" : {
"datefield" : {"type" : "date", "format": "dd/MM/yyyy||yyyy-MM-dd"},
"untouched" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
}
As you see I have different types of fields, but I do know the structure.
What I want to do is starting a search with a string to check all fields using the analyzers too.
For example if the query string is:
John Smith 2014-10-02 300.00
I want to search for "John", "Smith", "2014-10-02" and "300.00" in all the fields, calculating the relevance score as well. The better solution is the one that have more field matches in a single document.
So far I was able to search in all the fields by using multi_field, but in that case I was not able to parse 300.00, since 300 was stored in the string part of multi_field.
If I was searching in "_all" field, then no analyzer was used.
How should I modify my mapping or my queries to be able to do a multi-word search, where dates and numbers are recognized in the multi-word query string?
Now when I do a search, error occurs, since the whole string cannot be parsed as a number or a date. And if I use the string representation of the multi_search then 300.00 will not be a result, since the string representation is 300.
(what I would like is similar to google search, where dates, numbers and strings are recognized in a multi-word query)
Any ideas?
Thanks!
Using whitespace as filter in analyzer and then applying this analyzer as search_analyzer to fields in mapping will split query in parts and each of them would be applied to index to find the best matching. And using ngram for index_analyzer would very improve results.
I am using following setup for query:
"query": {
"multi_match": {
"query": "sample query",
"fuzziness": "AUTO",
"fields": [
"title",
"subtitle",
]
}
}
And for mappings and settings:
{
"settings" : {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"standard",
"lowercase",
"ngram"
]
}
},
"filter": {
"ngram": {
"type": "ngram",
"min_gram": 2,
"max_gram": 15
}
}
},
"mappings": {
"title": {
"type": "string",
"search_analyzer": "whitespace",
"index_analyzer": "autocomplete"
},
"subtitle": {
"type": "string"
}
}
}
See following answer and article for more details.

Elastic Search - display all distinct values of an array

For a field mapped as string I have stored list of strings in the ES index, for ex:
subject: ["Scientific Research", "Numerical Analysis", "History of Art"]
I would like to query this field and retrieve the full names of categories with their frequency count. What I tried so far with facets:
"query":{
"match_all": {}
},
"facets":{
"tag":{
"terms":{
"field":"subject"}
}
}
is not working as expected because it splits my subject fields into tokens and returns me the top most frequent stopwords. How can I get full entries ordered by counts for an analyzed field, and not only the top 10, if possible? Thanks!
I would use a multi-field define your mapping like so -
{
.....
....
.....
"subject": {
"type": "multi_field",
"store": "yes",
"fields": {
"analyzed": {
"type": "string",
"analyzer": "standard"
},
"notanalyzed": {
"type": "string",
"index": "not_analyzed"
}
}
}
Then I would carry out your faceting on the notanalyzed field like so -
"query":{
"match_all": {}
},
"facets":{
"tag":{
"terms":{
"field":"subject.notanalyzed",
"size": 50
}
}
}

Resources