I am trying to fetch data from Elasticsearch matching from a field name. I have following two records
{
"_index": "sam_index",
"_type": "doc",
"_id": "key",
"_version": 1,
"_score": 2,
"_source": {
"name": "Sample Name"
}
}
and
{
"_index": "sam_index",
"_type": "doc",
"_id": "key1",
"_version": 1,
"_score": 2,
"_source": {
"name": "Sample Name"
}
}
When I try to search using texts like sam, sample, Sa, etc, I able fetch both records by using match_phrase_prefix query. The query I tried with match_phrase_prefix is
GET sam_index/doc/_search
{
"query": {
"match_phrase_prefix" : {
"name": "sample"
}
}
}
I am not able to fetch the records when I try to search with string samplen. I need search and get results irrespective of spaces between texts. How can I achieve this in Elasticsearch?
First, you need to understand how Elasticsearch works and why it gives the result and doesn't give the result.
ES works on the token match, Documents which you index in ES goes through the analysis process and creates and stores the tokens generated from this process to inverted index which is used for searching.
Now when you make a query then that query also generates the search tokens, these can be as it is in the search query in case of term query or tokens based on the analyzer defined on the search field in case of match query. Hence it's very important to understand the internals of your search query.
Also, it's very important to understand the mapping of your index, ES uses the standard analyzer by default on the text fields.
You can use the Explain API to understand the internals of the query like which search tokens are generated by your search query, how documents matched to it and on what basis score is calculated.
In your case, I created the name field as text, which uses the word joined analyzer explained in Ignore spaces in Elasticsearch and I was able to get the document which consists of sample name when searched for samplen.
Let us know if you also want to achieve the same and if it solves your issue.
Related
In my elastic search I have documents which contains a "fieldname" with values "abc" and "abc-def". When I am using match_phrase query for searching documents with fieldname "abc" it is returning me documents with values "abc-def" as well. However when I am querying for "abc-def" it is working fine. My query is as follows:
Get my_index/_search
{
"query" : {
"match_phrase" : {"fieldname" : "abc"}
}
}
Can someone please help me in understanding the issue?
match_phrase query analyzes the search term based on the analyzer provided for the field (if no analyzer is added then by default standard analyzer is used).
Match phrase query searches for those documents that have all the terms present in the field (from the search term), and the terms must be present in the correct order.
In your case, "abc-def" gets tokenized to "abc" and "def" (because of standard analyzer). Now when you are using match phrase query for "abc-def", this searches for all documents that have both abc and def in the same order. (therefore you are getting only 1 doc in the result)
When searching for "abc", this will search for those documents that have abc in the fieldname field (since both the document contain abc, so both are returned in the result)
If you want to return only exact matching documents in the result, then you need to change the way the terms are analyzed.
If you have not explicitly defined any mapping then you need to add .keyword to the fieldname field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after the fieldname field).
Adding a working example with index data, mapping, search query and search result
Index data:
{
"name":"abc-def"
}
{
"name":"abc"
}
Search Query:
{
"query": {
"match_phrase": {
"name.keyword": "abc"
}
}
}
Search Result:
"hits": [
{
"_index": "67394740",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"name": "abc"
}
}
]
I'm using ElasticSearch as search engine for a human resource database.
The user submits a competence (f.ex 'disruption'), and ElasticSearch returns all users ordered by best match.
I have configured the field 'competences' to use synonyms, so 'innovation' would match 'disruption'.
I want to show the user (who is performing the search) how a particular search result matched the search query. For this I use the explain api (reference)
The query works as expected and returns an _explanation to each hit.
Details (simplified a bit) for a particular hit could look like the following:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
}
Problem: I cannot see what the original search term was in the _explanation. (As illustrated in example above: I can see that some search query matched with 'innovation' or 'disruption', I need to know what the skill the users searched for)
Question: Is there any way to solve this issue (example: parse a custom 'description' with info about the search query tag to the _explanation)?
Expected Result:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
customDescription: 'innovation'
}
Maybe you can put the original query in the _name field?
Like explained in https://qbox.io/blog/elasticsearch-named-queries:
GET /_search
{
"query": {
"query_string" : {
"default_field" : "skills",
"query" : "disruption",
"_name": "disruption"
}
}
}
You can then find the proginal query in the matched queries section in the return object:
{
"_index": "testindex",
"_type": "employee",
"_id": "2",
"_score": 0.19178301,
"_source": {
"skills": "disruption"
},
"matched_queries": [
"disruption"
]
}
Add the explain to the solution and i think it would work fine...?
What's the difference between
{
"query": {
"filtered": {
"filter": { "term": { "folder": "inbox" } }
}
}
}
and
{
"query": {
"term": { "folder": "inbox" }
}
}
It seems they both filter the index on the folder field by the inbox value.
Query can have two type of context in elastic search. Query context and filter context. Query context tells how well a document matches the query i.e. it calculates score whereas filter context tells whether a document matches the query and no scoring is done.
A query in query context tell you which document better matches the query. Higher the score more relevant the document is.
A query in filter context behaves like a conditional operator i.e. true if document matches the query and false if it doesn't.
To answer your question, both the queries will match the same number of documents but first query will not calculate the score (it will be faster compared to the second one because score calculation is skipped), whereas the second one will calculate score and will be slower comparatively to the first one. So if you just want to filter it is better to tell elastic that score need not to be calculated by putting the query in filter context. This way you save the computational cost of calculating score. Calculating score will be an overhead if only filtering is required and hence there are two type of contexts.
Sample output for 1st query (filter context):
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0, <-------- no scoring done
}
Sample output for 2nd query (query context):
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9808292 <-------- score calculated
}
So use query context to get relevant matches and filter context to filter out documents. You can use the combination of both as well.
You can read more on query and filter context here.
I agree with what is said upstairs, but there is one thing to add.Query and Filter can be used together in a query to reduce time.
I'd appreciate any help with this, I'm really stuck.
I am trying to create a simple visualization in Kibana, a line graph based on a number value in my data (origin_file_size_bytes). When I try to add a Visualization graph, I get this error:
No Compatible Fields: The "test*" index pattern does not contain any of the following field types: number or date
My actual index does contain a field with number, as does my data.
Thank you for any help!
Andrew
Here's a sample entry from the Discover Menu:
{
"_index": "lambda-index",
"_type": "lambda-type",
"_id": "LC08_L1TP_166077.TIF",
"_version": 1,
"_score": 2,
"_source": {.
"metadata_processed": {
"BOOL": true.
},
"origin_file_name": {
"S": "LC08_L1TP_166077.TIF"
},
"origin_file_size_bytes": {
"N": "61667800"
}
}
}
My Index pattern classifies as a string, even though it isn't:
origin_file_size_bytes.N string
You cannot aggregate on a string field. As seen from the screenshot above, your field has been indexed as string and NOT as a number. Elasticsearch dynamically determines mapping type of data if it is not explicitly defined. Since, you ingested the field as a string ES determined, correctly, that the field is of type string. See this link.
For ex. if you run the below to index a document with 2 fields as shown without an explicit mapping, ES creates message field as type 'string' and size field as type 'number' (long)
POST my_index\_doc\1
{
"message": "100",
"size": 100
}
Index your field into ES as a number instead and you should able to aggregate on it.
I am running ES 2.3.3. I want to index a non-analyzed String but truncate it to a certain number of characters. The ignore_above property, according to the documentation, will NOT index a field above the provided value. I don't want that. I want to take say a field that could potentially be 30K long and truncate it to 10K long, but still be able to filter and sort on the 10K that is retained.
Is this possible in ES 2.3.3 or do I need to do this using Java prior to indexing a document.
I want to index a non-analyzed String but truncate it to a certain number of characters.
Technically it's possible with Update API and Upsert option, but, depending on your exact needs, it may not be very handy.
Let's say you want to index this document:
{
"name": "foofoofoofoo",
"age": 29
}
but you need to truncate name field so that it has only 5 characters. Using Update API, you'd have to execute a script:
POST http://localhost:9200/insert/test/1/_update
{
"script" : "ctx._source.name = ctx._source.name.substring(0,5);",
"scripted_upsert": true,
"upsert" : {
"name": "foofoofoofoo",
"age": 29
}
}
It means that, if ES does not find the document with given id (here id=1), it should index the document that is inside upsert element, and perform given script. So as you can see, it's rather inconvenient if you want to have automatically generated ids, as you have to provide the id in URI.
Result:
GET http://localhost:9200/insert/test/1
{
"_index": "insert",
"_type": "test",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"name": "foofo",
"age": 29
}
}