I use match query to search the field "syslog_5424"
{
"query":{
"filtered":{
"query":{"match":{"syslog5424_app":"e1c28ca3-dc7e-4425-ba14-7778f126bdd6"}}
}
}
}
Here is the query result:
{
took: 23,
timed_out: false,
-_shards: {
total: 45,
successful: 29,
failed: 0
},
-hits: {
total: 8340,
max_score: 17.623652,
-hits: [
-{
_index: "logstash-2014.12.16",
_type: "applog",
_id: "AUpTBuwKsotKslj7c27d",
_score: 17.623652,
-_source: {
message: "132 <14>1 2014-12-16T12:16:09.889089+00:00 loggregator e1c28ca3-dc7e-4425-ba14-7778f126bdd6 [App/0] - - Get the platform's MBean server",
#version: "1",
#timestamp: "2014-12-16T12:16:10.127Z",
host: "9.91.32.178:33128",
type: "applog",
syslog5424_pri: "14",
syslog5424_ver: "1",
syslog5424_ts: "2014-12-16T12:16:09.889089+00:00",
syslog5424_host: "loggregator",
syslog5424_app: "e1c28ca3-dc7e-4425-ba14-7778f126bdd6",
syslog5424_proc: "[App/0]",
syslog5424_msg: "Get the platform's MBean server",
syslog_severity_code: 5,
syslog_facility_code: 1,
syslog_facility: "user-level",
syslog_severity: "notice",
#source_host: "%{syslog_hostname}",
#message: "%{syslog_message}"
}
},
...
But when I change the "match" to "term", I got nothing. the content of field syslog5424_app is exactly "e1c28ca3-dc7e-4425-ba14-7778f126bdd6", but I can't find it using "term".any kind of advice would be good.
{
"query":{
"filtered":{
"query":{"term":{"syslog5424_app":"e1c28ca3-dc7e-4425-ba14-7778f126bdd6"}}
}
}
}
What analyser are you using on field syslog_5424?
if it's the standard analyser then the data is probably being broken down into search terms.
e.g.
e1c28ca3-dc7e-4425-ba14-7778f126bdd6
is broken down into:
e1c28ca3
dc7e
4425
ba14
7778f126bdd6
When you use match query, your search string will also be broken down - so a match is made.
However when you use a term query, the search string won't be analysed. i.e. you are looking for e1c28ca3-dc7e-4425-ba14-7778f126bdd6 in the 5 individual terms - it's not going to match.
So - my recommendation would be to update your mapping to use not_analyzed - you wouldn't normally need part of a UUID, so turn off all analysis for this field.
Related
I have a test collection with these two documents:
{ _id: ObjectId("636ce11889a00c51cac27779"), sku: 'kw-lids-0009' }
{ _id: ObjectId("636ce14b89a00c51cac2777a"), sku: 'kw-fs66-gre' }
I've created a search index with this definition:
{
"analyzer": "lucene.standard",
"searchAnalyzer": "lucene.standard",
"mappings": {
"dynamic": false,
"fields": {
"sku": {
"type": "string"
}
}
}
}
If I run this aggregation:
[{
$search: {
index: 'test',
text: {
query: 'kw-fs',
path: 'sku'
}
}
}]
Why do I get 2 results? I only expected the one with sku: 'kw-fs66-gre' 😬
During indexing, the standard anlyzer breaks the string "kw-lids-0009" into 3 tokens [kw][lids][0009], and similarly tokenizes "kw-fs66-gre" as [kw][fs66][gre]. When you query for "kw-fs", the same analyzer tokenizes the query as [kw][fs], and so Lucene matches on both documents, as both have the [kw] token in the index.
To get the behavior you're looking for, you should index the sku field as type autocomplete and use the autocomplete operator in your $search stage instead of text
You're still getting 2 results because of the tokenization, i.e., you're still matching on [kw] in two documents. If you search for "fs66", you'll get a single match only. Results are scored based on relevance, they are not filtered. You can add {$project: {score: { $meta: "searchScore" }}} to your pipeline and see the difference in score between the matching documents.
If you are looking to get exact matches only, you can look to using the keyword analyzer or a custom analyzer that will strip the dashes, so you deal w/ a single token per field and not 3
I have a trove of several million documents which I'm querying like this:
const query = {
min_score: 1,
query: {
bool: {
should: [
{
multi_match: {
query: "David",
fields: ["displayTitle^2", "synopsisList.text"],
type: "phrase",
slop: 2
}
},
{
nested: {
path: "contributors",
query: {
multi_match: {
query: "David",
fields: [
"contributors.characterName",
"contributors.contributionBy.displayTitle"
],
type: "phrase",
slop: 2
}
},
score_mode: "sum"
}
}
]
}
}
};
This query is giving sane looking results for a wide range of terms. However, it has a problem with "David" - and presumably others.
"David" crops up fairly regularly in the text. With the min_score option this query always returns 0 documents. When I remove min_score I get thousands of documents the best of which has a score of 22.749.
Does anyone know what I'm doing wrong? I guess min_score doesn't work the way I think it does.
Thanks
The problem I was trying to solve was that when I added some filter clauses to the above query elastic would return all the documents that satisfied the filter even those with a score of zero. That's how should works. I didn't realise that I can nest the should inside a must which achieves the desired effect.
I need a query that makes partial match on a string and filter outside documents that have a specific value for a field.
I tried this payload for es:
payload = {
search_request: {
_source: [ 'name', 'source','pg_id' ],
query: {
match: { name: query_string }
bool: {
must_not: {
term: { "source.source": source_value }
}
}
},
size: 100
},
query_hint: query,
algorithm: algorithm,
field_mapping: { title: ["_source.name", "_source.source"]}
}
But ES trows this error:
{
:error=> {
:root_cause=> [
{
:type=>"parse_exception",
:reason=>
"failed to parse search source. expected field name but got [
START_OBJECT
] "}],
:type=>" search_phase_execution_exception",
:reason=>"all shards failed",
:phase=>"query",
:grouped=>true,
:failed_shards=> [
{
:shard=>0,
:index=>"articles",
:node=>"3BUP3eN_TB2-zExigd_k2g",
:reason=> {
:type=>"parse_exception",
:reason=>
"failed to parse search source. expected field name but got [
START_OBJECT
] "
}
}
]
},
:status=>400
}
I am using Elasticsearch 2.4
First of all your json format is not valid. Check for a commas and quotes.
Also if you need just to filtrate documents - filters are much faster than queries. Check documentation
I am using elastic 5.1.1 in my environment. I have chosen completion suggester on a field name post_hashtags with an array of strings to have suggestion on it. I am getting response as below for prefix "inv"
Req:
POST hashtag/_search?pretty&&filter_path=suggest.hash-suggest.options.text,suggest.hash-suggest.options._source
{"_source":["post_hashtags" ],
"suggest": {
"hash-suggest" : {
"prefix" : "inv",
"completion" : {
"field" : "post_hashtags"
}
}
}
Response :
{
"suggest": {
"hash-suggest": [
{
"options": [
{
"text": "invalid",
"_source": {
"post_hashtags": [
"invalid"
]
}
},
{
"text": "invalid",
"_source": {
"post_hashtags": [
"invalid",
"coment_me",
"daya"
]
}
}
]
}
]
}
Here "invalid" is returned twice because it is also a input string for same field "post_hashtags" in other document.
Problems is if same "invalid" input string present in 1000 documents in same index then i would get 1000 duplicated suggestions which is huge and not needed.
Can I apply an aggregation on a field of type completion ?
Is there any way I can get unique suggestion instead of duplicated text field, even though if i have same input string given to a particular field in multiple documents of same index ?
ElasticSearch 6.1 has introduced the skip_duplicates operator. Example usage:
{
"suggest": {
"autocomplete": {
"prefix": "MySearchTerm",
"completion": {
"field": "name",
"skip_duplicates": true
}
}
}
}
Edit: This answer only applies to Elasticsearch 5
No, you cannot de-duplicate suggestion results. The autocomplete suggester is document-oriented in Elasticsearch 5 and will thus return suggestions for all documents that match.
In Elasticsearch 1 and 2, the autocomplete suggester automatically de-duplicated suggestions. There is an open Github ticket to bring back this functionality, and it looks like it is possible to do so in a future version.
For now, you have two options:
Use Elasticsearch version 1 or 2.
Use a different suggestion implementation not based on the autocomplete suggester. The only semi-official suggestion I have seen so far involve putting your suggestion strings in a separate index.
I have indexed documents, each with a field: "CodeName" that has values like the following:
document 1 has CodeName: "AAA01"
document 2 has CodeName: "AAA02"
document 3 has CodeName: "AAA03"
document 4 has CodeName: "BBB02"
When I try to use a match query on field:
query: {
"match": {
"CodeName": "AAA"
}
}
I expect to get results for "AAA01" and "AAA02", but instead, I am getting an empty array. When I pass in "AAA01" (I type in the whole thing), I get a result. How do I make it such that it matches more generically? I tried using "prefix" instead of "match" and am getting the same problem.
The mapping for "CodeName" is a "type": "string".
I expect to get results for "AAA01" and "AAA02"
This is not what Elastic Search expects. ES breaks your string into tokens using a tokenizer that you specify. If you didn't specify any tokenizer/analyzer, the default standard tokenizer splits words on spaces and hyphens etc. In your case, the tokens are stored as "AAA01", "AAA02" and so on. There is no such term as "AAA", and hence you don't get any results back.
To fix this, you can use match_phrase_prefix query or set the type of match query to be phrase_prefix . Try this code:
"query": {
"match_phrase_prefix": {
"CodeName": "AAA"
}
}
OR
"query": {
"match": {
"CodeName": {
"query": "AAA",
"type": "phrase_prefix"
}
}
}
Here is the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html. Also pay attention to the max_expansions parameter, as this query can be slow sometimes depending upon your data.
Note that for this technique, you should go with default mapping. You don't not to use nGram.
As far as I know first of all you sould index your data using a tokenizer of type nGram.
You can check detailes in documentation
COMMENT RELATED:
I'm familiar with symfony-way of using elasticsearch and we are using it like this:
indexes:
search:
client: default
settings:
index:
analysis:
custom_index_analyzer:
type: custom
tokenizer: nGram
filter: [lowercase, kstem]
tokenizer:
nGram:
type: nGram
min_gram: 2
max_gram: 20
types:
skill:
mappings:
skill.name:
search_analyzer: custom_index_analyzer
index_analyzer: custom_index_analyzer
type: string
boost: 1