Wildcard search on chinese/japanese characters in Elasticsearch - elasticsearch

I indexed a text containing chinese words in elastic using default analyzer. My text contains 聲譽 which means reputation in chinese.
When I apply a wildcard search using (?) e.g. 聲? or ?譽 I do not get any results.
However a wildcard search using (*) e.g. 聲* or *聲 return results.
Is this how it is supposed to work?
Here's my query:
{
"_source": ["content_id"],
"size": 10,
"query": {
"query_string": {
"default_field": "txt_1",
"query": "聲?"
}
}
}

Related

fuzziness in elastic search search by letter not by word

Hey I am trying to make a fuzzy search in Elasticsearch I write this query
"query": {
"match": {
"NAME": {
"query": "data" ,
"fuzziness": "AUTO"
}
}
}
but it keeps return the best match on word not the nearest letter
what I need is actually near to google search any idea ?

Elasticsearch: how to write bool query that will contain multiple conditions on the same token?

I have a field with tokenizer that splits by dots.
on search, the following value aaa.bbb will be splitted to two terms aaa and bbb.
My question is how to write bool query that will contain multiple conditions on the same term?
For example, i want to get all docs where its field contains a term that matches a fuzzy search for gmail but also the same term must not contain gamil.
Here are some examples of what i want to achieve:
bmail // MATCH: since its matches fuzzy search and is not gamil
gamil.bmail // MATCH: since the term bmail matches fuzzy search and is not gamil
gamil // NO MATCH: since its matches fuzzy search and but equals gamil
NOTE: the following query does NOT appear to be working since it looks as if one term matches one condition and the second term matches the other, it will be considered a hit.
{
...
"body": {
"query": {
"bool": {
"must": [
{
"fuzzy": {
"my_field": {
"value": "gmail",
"fuzziness": 1,
"max_expansions": 2100000000
}
}
},
{
"bool": {
"must_not": [
{
"query_string": {
"default_field": "my_field",
"query": "*gamil*",
"analyzer": "keyword"
}
}
]
}
}
]
}
}
},
}
I ended up using Highlight by executing fuzzy (or any other) query, and then programatically filter the results by the returned highlight object.
span queries might also be a good option if you don't need regular expression or you can make sure you don't exceed the boolean query limit.
(see more details in the provided link)

ANDing search keywords for elastic Search

How can we configure elastic search so that it only returns results which matches all the words in the search query. The documents indexed have data having multiple fields and so the words of search query may match different fields of data but all the words must get matched in the result ?
you can query string query feature to search for results
sample search query
GET /_search
{
"query": {
"query_string": {
"query": "(content:this OR name:this) AND (content:that OR name:that)"
}
}
}
In this query content and name is the field name, this is the search criteria
you can build search query similar to that.
I think you're looking for a multi_match query together with and operator. This is the link to docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html and it seems that cross_fieldsis query type you're looking for. I'd read more on that page, but this is probably what you are looking for:
GET /_search
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "cross_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}

Punctuation with wildcard search in Elasticsearch

I have a custom analyzer on field authlast that replaces punctuation with space. So when search with saint-, I am able to get results, but when I search with saint-* I get no results. Any idea why?
How does query_string analyze the string before submitting it for the search? If it does not analyze how does the term looks like when query_string submits the term to the ES index?
$"query": {
{
"query_string": {
"query": "saint-*",
"fields": ["authlast"],
"default_operator": "AND"
}
}
}
window.jQuery.

AND between tokens in elasticsearch

When I'm trying to search for a documents with such query (field indexed with Standard analyzer):
"query": {
"match": {
"Book": "OG/44"
}
}
I've got terms 'OG' and '44' and the result set will contain results where could be either of these terms. What analyzer/tokenizer I should use to get results when only both of terms are present?
You can set operator in match query (by default it is or)
"query": {
"match": {
"Book": {
"query": "OG/44",
"operator" : "and"
}
}
}
You have two tokens because standard analyzer tokenized them by slash, so if you need not this behaviour you can escape it

Resources