How to request a single document by _id via alias? - elasticsearch

Is it possible to request a single document by its id by querying an alias, provided that all keys across all indices in the alias are unique (it is an external guarantee)?

From Elasticsearch 5.1 the query looks like:
GET /my_alias_name/_search/
{
"query": {
"bool": {
"filter": {
"term": {
"_id": "AUwNrOZsm6BwwrmnodbW"
}
}
}
}
}

Yes, querying an alias spanning over multiple indices work the same way as querying one indice.
Just do this query over the alias:
POST my_alias_name/_search
{
"filter":{
"term":{"_id": "AUwNrOZsm6BwwrmnodbW"}
}
}
EDIT: GET operations are not real searches and can't be done on aliases spanning over multiple indexes. So the following query is in fact no permitted:
GET my_alias_name/my_type/AUwNrOZsm6BwwrmnodbW

Following Elasticsearch 8.2 Doc, you can retrieve a single document by using GET API:
GET my-index-000001/_doc/0

7.2 version Docs suggest:
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
Response should look like:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "indexwhatever",
"_type": "_doc",
"_id": "anyID",
"_score": 1.0,
"_source": {
"field1": "value1",
"field2": "value2"
}
}
]
}
}

In case you want to find a document with some internal id with curl:
curl -X GET 'localhost:9200/_search?q=id:42&pretty'

Related

Elasticsearch query with fuzziness AUTO not working as expected

From the Elasticsearch documentation regarding fuzziness:
AUTO
Generates an edit distance based on the length of the term. Low and high distance arguments may be optionally provided AUTO:[low],[high]. If not specified, the default values are 3 and 6, equivalent to AUTO:3,6 that make for lengths:
0..2
Must match exactly
3..5
One edit allowed
>5
Two edits allowed
However, when I am trying to specify low and high distance arguments in the search query the result is not what I am expecting.
I am using Elasticsearch 6.6.0 with the following index mapping:
{
"fuzzy_test": {
"mappings": {
"_doc": {
"properties": {
"description": {
"type": "text"
},
"id": {
"type": "keyword"
}
}
}
}
}
}
Inserting a simple document:
{
"id": "1",
"description": "hello world"
}
And the following search query:
{
"size": 10,
"timeout": "30s",
"query": {
"match": {
"description": {
"query": "helqo",
"fuzziness": "AUTO:7,10"
}
}
}
}
I assumed that fuzziness:AUTO:7,10 would mean that for the input term with length <= 6 only documents with the exact match will be returned. However, here is a result of my query:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.23014566,
"hits": [
{
"_index": "fuzzy_test",
"_type": "_doc",
"_id": "OQtUu2oBABnEwrgM3Ejr",
"_score": 0.23014566,
"_source": {
"id": "1",
"description": "hello world"
}
}
]
}
}
This is strange but seems like that bug exists only in version the Elasticsearch 6.6.0. I've tried 6.4.2 and 6.6.2 and both of them work just fine.

Fuzziness behavior on a match_phrase query

Days ago I got this "problem". I was running a match_phrase query in my index. Everything was as expected, until I did the same search with a multiple words nouns (before I was using single word nouns, eg: university). I made one misspelling and the search did not work (not found), if I removed a word (let's say the one that was spelled correctly), the search work (found).
Here there are the example I made:
The settings
PUT index1
{
"mappings": {
"myType": {
"properties": {
"field1": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
POST index1/myType/1
{
"field1": "Commercial Banks"
}
Case 1: Single noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19178303,
"hits": [
{
"_index": "index1",
"_type": "myType",
"_id": "1",
"_score": 0.19178303,
"_source": {
"field1": "Commercial Banks"
}
}
]
}
}
Case 2: Multiple noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial banks",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
So, in the second case, why am I not finding the document when performing the match_phrase query? Is there something I am missing?
Those result just make doubt about what I know.
Am I using the fuzzy search incorrectly? I'm not sure if this is a problem, or I'm the one who do not understand the behavior.
Many thanks in advance for reading my question. I hope you can help me with this.
Fuzziness is not supported in phrase queries.
Currently, ES is silent about it, i.e. it allows you to specify the parameter but doesn't warn you that it is not supported. A pull request (#18322) (related to issue #7764) exists that will remedy to this problem. Once merged into ES 5, this query will error out.
In the breaking changes document for 5.0, we can see that this won't be supported:
The multi_match query will fail if fuzziness is used for cross_fields, phrase or phrase_prefix type. This parameter was undocumented and silently ignored before for these types of multi_match.

Retrieving top terms query in Elasticsearch

I am using Elasticsearch 1.1.0 and trying to retrieve the top 10 terms in a field called text
I've tried the following, but it instead returned all of the documents:
{
"query": {
"match_all": {}
},
"facets": {
"text": {
"terms": {
"field": "text",
"size": 10
}
}
}
}
EDIT
the following is an example of the result that is returned:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2747,
"max_score": 1,
"hits": [
{
"_index": "index_name",
"_type": "type_name",
"_id": "621637640908050432",
"_score": 1,
"_source": {
"metadata": {
"result_type": "recent",
"iso_language_code": "en"
},
"in_reply_to_status_id_str": null,
"in_reply_to_status_id": null,
"created_at": "Thu Jul 16 11:08:57 +0000 2015",
.
.
.
.
What am I doing wrong?
Thanks.
First of all, don't use facets. They are deprecated. Even though you use OLD version of Elasticsearch, switch to aggregations. Quoting documentation:
Faceted search refers to a way to explore large amounts of data by
displaying summaries about various partitions of the data and later
allowing to narrow the navigation to a specific partition.
In Elasticsearch, facets are also the name of a feature that allowed
to compute these summaries. facets have been replaced by aggregations
in Elasticsearch 1.0, which are a superset of facets.
Use this query instead:
POST /your_index/your_type/_search?search_type=count
{
"aggs" : {
"text" : {
"terms" : {
"field" : "text",
"size" : 10
}
}
}
}
This will work fine
Try this:
GET /index_name/type_name/_search?search_type=count
{
"query": {
"match_all": {}
},
"facets": {
"text": {
"terms": {
"field": "text",
"size": 10
}
}
}
}

How to filter out elements from an array that doesn’t match the query?

stackoverflow won't let me write that much example code so I put it on gist.
So I have this index
with this mapping
here is a sample document I insert into newly created mapping
this is my query
GET products/paramSuggestions/_search
{
"size": 10,
"query": {
"filtered": {
"query": {
"match": {
"paramName": {
"query": "col",
"operator": "and"
}
}
}
}
}
}
this is the unwanted result I get from previous query
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.33217794,
"hits": [
{
"_index": "products",
"_type": "paramSuggestions",
"_id": "1",
"_score": 0.33217794,
"_source": {
"productName": "iphone 6",
"params": [
{
"paramName": "color",
"value": "white"
},
{
"paramName": "capacity",
"value": "32GB"
}
]
}
}
]
}
}
and finally the wanted result, how I want the query result to look like
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.33217794,
"hits": [
{
"_index": "products",
"_type": "paramSuggestions",
"_id": "1",
"_score": 0.33217794,
"_source": {
"productName": "iphone 6",
"params": [
{
"paramName": "color",
"value": "white"
},
]
}
}
]
}
}
How should the query look like to achieve the wanted result with filtered array field which matches the query? In other words, all other non-matching array items should not appear in the final result.
The final result is the _source document that you indexed. There is no feature that lets you mask field elements of your document out of the Elasticsearch response.
That said, depending on your goal, you can look into how Highlighters and Suggesters identify result terms matching the query, or possibly, roll-your-own client-side masking using info returned from setting "explain": true in your query.

Strip HTML tags before indexing into elasticsearch

I am working to index HTML files into elastic using Nodejs. However even before using the Nodejs, I tried to run following manual indexing which doesn't seems working. What I am missing?
Index sample HTML tag using html_strip filter:
curl -XPOST 'localhost:9200/bhs/articles/_analyzer?tokenizer=standard&char_filters=html_strip' -d '
{
"content" : "<title>Dilip Kumar</title>"
}'
Search to get all documents:
http://localhost:9200/bhs/articles/_search
It gives following as result:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "bhs",
"_type": "articles",
"_id": "AUt2TGl9aadd5iLJ3mue",
"_score": 1,
"_source": {
"content": "<title>Dilip Kumar</title>"
}
}
]
}
}
Ideally it should not index tag as I have used html_filter to strip tags.
What you are seeing in the returned search result is the stored content, i.e. this is not the individual terms that have been indexed.
"content": "<title>Dilip Kumar</title>"
To see what has been indexed is a little more challenging - the indexed terms are not designed to be returned to a user and instead only used when search.
However you can access and view them by using a script:
curl 'http://localhost:9200/bhs/articles/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "content"
}
}
}
}'

Resources