Elastic search: Like query not working in string contains &(ampersand)? - elasticsearch

I have a query which is not working. I think the searched value containing & (ampersand) symbol. I am talking about the following query.
GET "path"_search
{
"from": 0,
"size": 10000,
"query": {
"query_string": {
"query": "(Department_3037.Department_3037_analyzed:*P&C*)"
}
}
}
Why this query is not working and how to overcome this issue or i need like query of string containing & like (p&c) ,l&t etc...
Let me know how this can be fixed..
Thank you.

Since the field you are searching in is analyzed. If the field contains the text (Department_3037.Department_3037_analyzed:*P&C*) , this will be tokenized as : Department_3037 , Department_3037_analyzed , P , C.
You can get this using :
curl -XGET "http://localhost:9200/_analyze?tokenizer=standard" -d "(Department_3037.Department_3037_analyzed:*P&C*)"
You will get the tokens as follows:
{"tokens":[{"token":"Department_3037","start_offset":1,"end_offset":16,"type":"<ALPHANUM>","position":1},
{"token":"Department_3037_analyzed","start_offset":17,"end_offset":41,"type":"<ALPHANUM>","position":2},
{"token":"P","start_offset":43,"end_offset":44,"type":"<ALPHANUM>","position":3},
{"token":"C","start_offset":45,"end_offset":46,"type":"<ALPHANUM>","position":4}]}.
If you want to retrieve the documents you will have to escape the special characters :
{
"query": {
"query_string": {
"default_field": "name",
"query": "\\(Department_3037\\.Department_3037_analyzed\\:\\*P\\&C\\*\\)"
}
}
}
Hope it helps.

Related

Elastics Search email search mismatch with match query using com

GET candidates1/candidate/_search
{
"fields": ["contactInfo.emails.main"],
"query": {
"bool": {
"must": [
{
"match": {
"contactInfo.emails.main": "com"
}
}
]
}
}
}
GET candidates1/candidate/_search
{
"size": 5,
"fields": [
"contactInfo.emails.main"
],
"query": {
"match": {
"contactInfo.emails.main": "com"
}
}
}
Hi,
When i am using the above query i am getting results like ['nraheem#dbtech1.com','arelysf456#gmai1.com','ron#rgb52.com'] but i am not getting emails like ['pavann.aryasomayajulu#gmail.com','kumar#gmail.com','raj#yahoo.com']
But when i am using the query to match "gmail.com", i am getting results which have gmail.com
So My question is when i am using "com" in the first query, i am expecting results that include gmail.com as "com" is present in gmail.com. But that is not happening
Note: we are having almost 2Million emailid and most of them are gmail.com , yahoo.com or hotmail but only few are of other types.
"contactInfo.emails.main" fields seem to be an analyzed field.
In elasticsearch all string fields are analyed using Standard Analyzer and are converted into tokens.You can see how your text is getting analyzed using analyze api. Email Ids mentioned by you ending in number before com are getting analyzed as nraheem , dbtech1 , com. Use following query to see the tokens.
curl -XGET 'localhost:9200/_analyze' -d '
{
"analyzer" : "standard",
"text" : "nraheem#dbtech1.com"
}'
As you can see there is a separate term com being created. While if you analyze kumar#gmail.com you will get tokens like kumar , gmail.com. There is no separate token com created in this case.
This is because Standard Analyzer splits the terms when it encounters some special characters like #,? etc or numbers too. You can create custom Analyzer to meet your requirement.
Hope this helps!!

Scope Elasticsearch Results to Specific Ids

I have a question about the Elasticsearch DSL.
I would like to do a full text search, but scope the searchable records to a specific array of database ids.
In SQL world, it would be the functional equivalent of WHERE id IN(1, 2, 3, 4).
I've been researching, but I find the Elasticsearch query DSL documentation a little cryptic and devoid of useful examples. Can anyone point me in the right direction?
Here is an example query which might work for you. This assumes that the _all field is enabled on your index (which is the default). It will do a full text search across all the fields in your index. Additionally, with the added ids filter, the query will exclude any document whose id is not in the given array.
{
"bool": {
"must": {
"match": {
"_all": "your search text"
}
},
"filter": {
"ids": {
"values": ["1","2","3","4"]
}
}
}
}
Hope this helps!
As discussed by Ali Beyad, ids field in the query can do that for you. Just to complement his answer, I am giving an working example. In case anyone in the future needs it.
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field": "your query"
}
},
{
"ids" : {
"values" : ["0aRM6ngBFlDmSSLpu_J4", "0qRM6ngBFlDmSSLpu_J4"]
}
}
]
}
}
}
You can create a bool query that contains an Ids query in a MUST clause:
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-ids-query.html
By using a MUST clause in a bool query, your search will be further limited by the Ids you specify. I'm assuming here by Ids you mean the _id value for your documents.
According to es doc, you can
Returns documents based on their IDs.
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
With elasticaBundle symfony 5.2
$query = new Query();
$IdsQuery = new Query\Ids();
$IdsQuery->setIds($id);
$query->setQuery($IdsQuery);
$this->finder->find($query, $limit);
You have two options.
The ids query:
GET index/_search
{
"query": {
"ids": {
"values": ["1, 2, 3"]
}
}
}
or
The terms query:
GET index/_search
{
"query": {
"terms": {
"yourNonPrimaryIdField": ["1", "2","3"]
}
}
}
The ids query targets the document's internal _id field (= the primary ID). But it often happens that documents contain secondary (and more) IDs which you'd target thru the terms query.
Note that if your secondary IDs contain uppercase chars and you don't set their field's mapping to keyword, they'll be normalized (and lowercased) and the terms query will appear broken because it only works with exact matches. More on this here: Only getting results when elasticsearch is case sensitive

Elasticsearch wildcard query not honoring the analyzer of the field

I have a field named "tag" which is analyzed(default behavior) in elasticsearch. The "tag" field can have a single word or a comma separated string to store multiple tags. For eg. "Festive, Fast, Feast".
Now for example if a tag is "Festive", before indexing I am converting it to small case(to ignore case sensitivity) and indexing it as "festive".
Now if I search using a match query with all caps letters as mentioned below I get results fine(as expected).
{
"query": {
"match": {
"tag": "FESTIVE"
}
}
}
But if I do a wildcard query as mentioned below I don't get results :(
{
"query": {
"wildcard": {
"tag": {
"value": "F*"
}
}
}
}
If I change the value field in wildcard search to "f*" instead of "F*" then I get results.
Does anyone have any clue why is wildcard query behaving case sensitive?
Wildcard queries, fall under term level queries and hence not analyzed. From the Docs
Matches documents that have fields matching a wildcard expression (not
analyzed)
You will get expected results with query string query, it will lowercase the terms because by default as lowercase_expanded_terms is true. Try this
GET your_index/_search
{
"query": {
"query_string": {
"default_field": "tag",
"query": "F*"
}
}
}
Hope this helps!

Spring Data Elastic Search with special characters

As part of our project we are using Spring Data on top of Elastic Search.
We found very interesting issue with findBy queries. If we pass string that contains space it didn't find the right element unless we pad the string with quotes. For example: for getByName(String name) we should pass getByName("\"John Do\"").
Is there any way to eliminate such redundant padding?
I'm trying my first steps with Spring (Boot Starter) Data ES and stumbled upon the same issue as you have, only in my case it was a : that 'messed things up'. I've learned that this is part of the reserved characters (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_reserved_characters). The quoting that you mention is exactly the solution I use for now. It results in a query like this:
{
"from": 0,
"query": {
"bool": {
"must": {
"query_string": {
"query": "\"John Do\"",
"fields": ["name"]
}
}
}
}
}
(You can use this in a rest console or in ElasticHQ to check the result.)
A colleague suggested that switching to a 'term' query:
{
"from": 0,
"size": 100,
"query": {
"term" : {
"name": "John Do"
}
}
}
might help to avoid the quoting. I have tried this out by use of the #Query annotation on the method findByName in your repository. It would go something like this:
#Query(value = "{\"term\" : {\"name\" : \"?0\"}}")
List<Person> findByName(String name);

elasticsearch - confused on how to searching items that a field contains string

This query is returning fine only one item "steve_jobs".
{
"query": {
"constant_score": {
"filter": {
"term": {
"name":"steve_jobs"
}
}
}
}
}
So, now I want to get all people with name prefix steve_. So I try this:
{
"query": {
"constant_score": {
"filter": {
"term": {
"name": "steve_"
}
}
}
}
}
This is returning nothing. Why?
I'm confused about when to use term query / term filter / terms filter / querystring query.
What you need is Prefix Query.
If you are indexing your document like so:
POST /testing_nested_query/class/
{
"name": "my name is steve_jobs"
}
And you are using the default analyzer, then the problem is that the term steve_jobs will be indexed as one term. So your Term Query will never be able to find any docs matching the term steve as there is no term like in the index. Prefix Query helps you solve your problem by searching for a prefix in all the indexed terms.
You can solve the same problem by making your custom analyzers (read this and this) so that steve_jobs is stored as steve and jobs.

Resources