Elastic Search Term Query Not Matching URL's - elasticsearch

I am a beginner with Elastic search and I am working on a POC from last week.
I am having a URL field as a part of my document which contains URL's in the following format :"http://www.example.com/foo/navestelre-04-cop".
I can not define mapping to my whole object as every object has different keys except the URL.
Here is how I am creating my Index :
POST
{
"settings" : {
"number_of_shards" : 5,
"mappings" : {
"properties" : {
"url" : { "type" : "string","index":"not_analyzed" }
}
}
}
}
I am keeping my URL field as not_analyzed as I have learned from some resource that marking a field as not_analyzed will prevent it from tokenization and thus I can look for an exact match for that field in a term query.
I have also tried using the whitespace analyzer as the URL value thus not have any of the white space character. But again I am unable to get a successful Hit.
Below is my term query :
{
"query":{
"constant_score": {
"filter": {
"term": {
"url":"http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
I am guessing the problem is somewhere with the Analyzers and Tokenizers but I am unable to get to a solution. Any kind of help would be great to enhance my knowledge and would help me reach to a solution.
Thanks in Advance.

You have the right idea, but it looks like some small mistakes in your settings request are leading you astray. Here is the final index request:
POST /test
{
"settings": {
"number_of_shards" : 5
},
"mappings": {
"url_test": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Notice the added url_test type in the mapping. This lets ES know that your mapping applies to this document type. Also, settings and mappings are also different keys of the root object, so they have to be separated. Because your initial settings request was malformed, ES just ignored it, and used the standard analyzer on your document, which led to you not being able to query it with your query. I point you to the ES Mapping docs
We can index two documents to test with:
POST /test/url_test/1
{
"url":"http://www.example.com/foo/navestelre-04-cop"
}
POST /test/url_test/2
{
"url":"http://stackoverflow.com/questions/37326126/elastic-search-term-query-not-matching-urls"
}
And then execute your unmodified search query:
GET /test/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
Yields this result:
"hits": [
{
"_index": "test",
"_type": "url_test",
"_id": "1",
"_score": 1,
"_source": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
]

Related

Elasticsearch term query issue

As the pictures show the record field of dispatchvoucher value is "True".
But when I searched with the term it couldn´t found any record.
when I changed the value to "true", the result matched. What's the reason for this?
As mentioned in the documentation :
Avoid using the term query for text fields.
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
To search text field values, use the match query instead.
The standard analyzer is the default analyzer which is used if none is specified. It provides grammar-based tokenization.
GET /_analyze
{
"analyzer" : "standard",
"text" : "True"
}
The token generated is -
{
"tokens": [
{
"token": "true",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Term query returns documents that contain an exact term in a provided field. Since True gets tokenized to true, so when you are using the term query for "dispatchvoucher": "True", it will not show any results.
You can either change your index mapping to
{
"mappings": {
"properties": {
"dispatchvoucher": {
"type": "keyword"
}
}
}
}
OR You need to add .keyword to the dispatchvoucher field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after dispatchvoucher field).
Adding a working example with index data, search query, and search result
Index Data:
{
"dispatchvoucher": "True"
}
Search Query:
{
"query": {
"bool": {
"filter": {
"term": {
"dispatchvoucher.keyword": "True"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65605120",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"dispatchvoucher": "True"
}
}
]

Only getting results when elasticsearch is case sensitive

I currently have a problem with my multi match search in ES.
Its simple like that: If I'm searching for the City "Sachsen", I'm getting results.
If I'm searching for "sachsen" (lowercase), I'm getting no results.
how to avoid this?
QUERY with no results
{
"match" : {
"City" : {
"query" : "sachsen"
}
}
My analyzer is analyzer_keyword. Should I have anything add ?
MAPPING
City: {
type: "string",
analyzer: "analyzer_keyword"
}
Your analyzer_keyword analyzer is most probably of type keyword which means you can only perform exact matches on it.
It's standard practice to apply multiple "variants" of a field, one of which is going to match lowercase, possibly ascii-tokenized characters (think München -> munchen) and one which will not be tokenized in any way (this is what you have in your analyzer_keyword).
Since you intend to search the lowercase version of Sachsen, your mapping could look something like
PUT sachsen
{
"mappings": {
"properties": {
"City": {
"type": "keyword", <----
"fields": {
"standard": {
"type": "text",
"analyzer": "standard" <----
}
}
}
}
}
}
After indexing a doc
POST sachsen/_doc
{
"City": "Sachsen"
}
The following will work for exact matches:
GET sachsen/_search
{
"query": {
"match": {
"City": "Sachsen"
}
}
}
and this for lowercase
GET sachsen/_search
{
"query": {
"match": {
"City.standard": "sachsen"
}
}
}
Note that I'm using the default, standard analyzer here but you can choose any one you deem appropriate.

SuggestionBuilder with BoolQueryBuilder in Elasticsearch

I am currently using BoolQueryBuilder to build a text search. I am having an issue with wrong spellings. When someone searches for a "chiar" instead of "chair" I have to show them some suggestions.
I have gone through the documentation and observed that the SuggestionBuilder is useful to get the suggestions.
Can I send all the requests in a single query, so that I can show the suggestions if the result is zero?
No need to send different search terms ie chair, chiar to get suggestions, it's not efficient and performant and you don't know all the combinations which user might misspell.
Instead, Use the fuzzy query or fuzziness param in the match query itself, which can be used in the bool query.
Let me show you an example, using the match query with the fuzziness parameter.
index def
{
"mappings": {
"properties": {
"product": {
"type": "text"
}
}
}
}
Index sample doc
{
"product" : "chair"
}
Search query with wrong term chiar
{
"query": {
"match" : {
"product" : {
"query" : "chiar",
"fuzziness" : "4" --> control it according to your application
}
}
}
}
Search result
"hits": [
{
"_index": "so_fuzzy",
"_type": "_doc",
"_id": "1",
"_score": 0.23014566,
"_source": {
"product": "chair"
}
}

how to properly use wildcard on a elastic search find?

I just jumped into ES and I dont have a lot of experience on this, so might be something I am missing on this.
I found this documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html that basically explains how to do a wildcard search.
I am trying to look for all messages inside my document that have certain patter.
So, using Kibana Sense (Elastic search query UI)I did this:
GET _search
{
"query": {
"wildcard" : {
"model.message": "my*"
}
}
}
with this I am trying to obtain all the messages that start with "my"
But I get no results...
Here is a copy of my document structure (or at least the first lines...)
"_index": "my_index",
"_type": "my_type",
"_id": "123456",
"_source": {
"model": {
"id": "123456",
"message": "my message",
Any idea what could be wrong?
Your sample document actually contains the model.content.message field but not the model.message field, so the following query should work:
GET _search
{
"query": {
"wildcard" : {
"model.content.message": "my*"
}
}
}
Can you share your mapping? It looks like you need to use a nested query:
GET /_search
{
"query": {
"nested" : {
"path" : "model",
"score_mode" : "avg",
"query" : {
"wildcard" : {
"model.message": "my*"
}
}
}
}
}
You can read more about nested queries here.

how to get total tokens count in documents in elasticsearch

I am trying to get the total number of tokens in documents that match a query. I haven't defined any custom mapping and the field for which I want to get the token count is of type 'string'.
I tried the following query, but it gives a very large number in the order of 10^20, which is not the correct answer for my dataset.
curl -XPOST 'localhost:9200/nodename/comment/_search?pretty' -d '
{
"query": {
"match_all": {}
},
"aggs": {
"tk_count": {
"sum": {
"script": "_index[\"body\"].sumttf()"
}
}
},
"size": 0
}
Any idea how to get the correct count of all tokens? ( I do not need counts for each term, but the total count).
This worked for me, is it what you need?
Rather than getting token count on query (using tk_count aggregation, as suggested in the other answer), my solution stores the token count on indexing using the token_count datatype., so that I could get "name.stored_length" values returned in query results.
token_count is a "multi-field" it works on one-field-at-a-time (i.e. the "name" field or the "body" field). I modified the example slightly to store the "name.stored_length"
Notice in my example it does not count cardinality of tokens (i.e. distinct values), it counts total tokens; "John John Doe" has 3 tokens in it; "name.stored_length"===3; (even though its count distinct tokens is only 2). Notice I ask for specific "stored_fields" : ["name.stored_length"]
Finally, you may need to re-update your documents (i.e. send a PUT), or any technique to get the values you want! In this case I PUT "John John Doe", even if it was already POST/PUT in elasticsearch; the tokens were not counted until a PUT again, after adding tokens to the mapping.!)
PUT test_token_count
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"stored_length": {
"type": "token_count",
"analyzer": "standard",
//------------------v
"store": true
}
}
}
}
}
}
}
PUT test_token_count/_doc/1
{
"name": "John John Doe"
}
Now we can query, or search for results, and configure results to include the name.stored_length field (which is both a multi-field and a stored field!):
GET/POST test_token_count/_search
{
//------------------v
"stored_fields" : ["name.stored_length"]
}
And results to the search should include the total token count as named.stored_length...
{
...
"hits": {
...
"hits": [
{
"_index": "test_token_count",
"_type": "_doc",
"_id": "1",
"_score": 1,
"fields": {
//------------------v
"name.stored_length": [
3
]
}
}
]
}
}
Seems like you want to retrieve cardinality of total tokens in body field.
In such case you can just use cardinality aggregation like below.
curl -XPOST 'localhost:9200/nodename/comment/_search?pretty' -d '
{
"query": {
"match_all": {}
},
"aggs": {
"tk_count": {
"cardinality" : {
"field" : "body"
}
}
},
"size": 0
}
For detailed information, see this official document

Resources