Elasticsearch bulk or search - elasticsearch

Background
I am working on an API that allows the user to pass in a list of details about a member (name, email addresses, ...) I want to use this information to match up with account records in my Elasticsearch database and return a list of potential matches.
I thought this would be as simple as doing a bool query on the fields I want, however I seem to be getting no hits.
I'm relatively new to Elasticsearch, my current _search request looks like this.
Example Query
POST /member/account/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" [{
"term" : {
"email": "jon.smith#gmail.com"
}
},{
"term" : {
"email": "samy#gmail.com"
}
},{
"term" : {
"email": "bo.blog#gmail.com"
}
}]
}
}
}
}
}
Question
How should I update this query to return records that match any of the email addresses?
Am I able to prioritise records that match email and another field? Example "family_name".
Will this be a problem if I need to do this against a few hundred emails addresses?

Well , you need to make the change in the index side rather than query side.
By default your email ID is broken into
jon.smith#gmail.com => [ jon , smith , gmail , com]
While indexing.
Now when you are searching using term query , it does not apply the analyzer and it tries to get the exact match of jon.smith#gmail.com , which as you can see , wont work.
Even if you use match query , then you will end up getting all document as matches.
Hence you need to change the mapping to index email ID as a single token , rather than tokenizing it.
So using not_analyzed would be the best solution here.
When you define email field as not_analyzed , the following happens while indexing.
jon.smith#gmail.com => [ jon.smith#gmail.com]
After changing the mapping and indexing all your documents , now you can freely run the above query.
I would suggest to use terms query as following -
{
"query": {
"terms": {
"email": [
"jon.smith#gmail.com",
"samy#gmail.com",
"bo.blog#gmail.com"
]
}
}
}
To answer the second part of your question - You are looking for boosting and would recommend to go through function score query

Related

Is it possible to check that specific data matches the query without loading it to the index?

Imagine that I have a specific data string and a specific query. The simple way to check that the query matches the data is to load the data into the Elastic index and run the online query. But can I do it without putting it into the index?
Maybe there are some open-source libraries that implement the Elastic search functionality offline, so I can call something like getScore(data, query)? Or it's possible to implement by using specific API endpoints?
Thanks in advance!
What you can do is to leverage the percolator type.
What this allows you to do is to store the query instead of the document and then test whether a document would match the stored query.
For instance, you first create an index with a field of type percolator that will contain your query (you also need to add in the mapping any field used by the query so ES knows what their types are):
PUT my_index
{
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"message": {
"type": "text"
}
}
}
}
Then you can index a real query, like this:
PUT my_index/_doc/match_value
{
"query" : {
"match" : {
"message" : "bonsai tree"
}
}
}
Finally, you can check using the percolate query if the query you've just stored would match
GET /my_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document" : {
"message" : "A new bonsai tree in the office"
}
}
}
}
So all you need to do is to only store the query (not the documents), and then you can use the percolate query to check if the documents would have been selected by the query you stored, without having to store the documents themselves.

Checking for not null with completion suggester query in Elastic Search

I have an existing query that is providing suggestions for postcode having the query as below (I have hard coded it with postcode as T0L)
"suggest":{
"suggestions":{
"text":"T0L",
"completion":{
"field": "postcode.suggest"
}
}
}
This works fine, but it searches for some results where the city contains null values. So I need to filter the addresses where the city is not null.
So I followed the solution on this and prepared the query like this.
{
"query": {
"constant_score": {
"filter": {
"exists": {
"field": "city"
}
}
}
},
"suggest":{
"suggestions":{
"text":"T0L",
"completion":{
"field": "postcode.suggest"
}
}
}
}
But unfortunately this is not giving the required addresses where the postcode contains T0L, rather I am getting results where postcode starts with A1X. So I believe it is querying for all the addresses where the city is present and ignoring the completion suggester query. Can you please let me know where is the mistake. Or may be how to write it correctly.
There is no way to filter out suggestions at query time, because completion suggester use FST (special in-memory data structure that built at index time) for lightning-fast search.
But you can change your mapping and add context for your suggester. The basic idea of context that it also filled at index time along with completion field and therefore can be used at query time with suggest query.

Elasticsearch bool search matching incorrectly

So I have an object with an Id field which is populated by a Guid. I'm doing an elasticsearch query with a "Must" clause to match a specific Id in that field. The issue is that elasticsearch is returning a result which does not match the Guid I'm providing exactly. I have noticed that the Guid I'm providing and one of the results that Elasticsearch is returning share the same digits in one particular part of the Guid.
Here is my query source (I'm using the Elasticsearch head console):
{
query:
{
bool:
{
must: [
{
text:
{
couchbaseDocument.doc.Id: 5cd1cde9-1adc-4886-a463-7c8fa7966f26
}
}]
must_not: [ ]
should: [ ]
}
}
from: 0
size: 10
sort: [ ]
facets: { }
}
And it is returning two results. One with ID of
5cd1cde9-1adc-4886-a463-7c8fa7966f26
and the other with ID of
34de3d35-5a27-4886-95e8-a2d6dcf253c2
As you can see, they both share the same middle term "-4886-". However, I would expect this query to only return a record if the record were an exact match, not a partial match. What am I doing wrong here?
The query is (probably) correct.
What you're almost certainly seeing is the work of the 'Standard Analyzer` which is used by default at index-time. This Analyzer will tokenize the input (split it into terms) on hyphen ('-') among other characters. That's why a match is found.
To remedy this, you want to set your couchbaseDocument.doc.Id field to not_analyzed
See: How to not-analyze in ElasticSearch? and the links from there into the official docs.
Mapping would be something like:
{
"yourType" : {
"properties" : {
"couchbaseDocument.doc.Id" : {"type" : "string", "index" : "not_analyzed"},
}
}
}

Elasticsearch doesn't return results

I am facing a strange issue in elasticsearch query. I don't know much about elasticsearch. My query is:
{
"query":
{
"bool":
{
"must":
[
{
"text":
{
"countryCode2":"DE"
}
}
],
"must_not":[],
"should":[]
}
},"from":0,"size":1,"sort":[],"facets":{}
}
The issues is for "DE". It is giving me results but for "BE" or "IN" it returns empty result.
You are indexing using the default mapping, which by default removes english stopwords. The country codes "IN", "BE", and many more are stopwords which don't even get indexed, therefore it's not possible to have matching documents, nor get back those country codes when faceting on that field.
The solution is to reindex after having submitted your own mapping for the country code field:
{
"your_type_name" : {
"country" : {
"type" : "string", "index" : "not_analyzed"
}
}
}
If you already tried to do this but nothing changed, the mapping didn't get submitted properly. I would suggest to double check that its json structure is correct and that you can actually get it back using the get mapping api.
As this is a common problem the defaults are probably going to change in the future to be less intrusive and avoid applying any language dependent text analysis.

Sorting a match query with ElasticSearch

I'm trying to use ElasticSearch to find all records containing a particular string. I'm using a match query for this, and it's working fine.
Now, I'm trying to sort the results based on a particular field. When I try this, I get some very unexpected output, and none of the records even contain my initial search query.
My request is structured as follows:
{
"query":
{
"match": {"_all": "some_search_string"}
},
"sort": [
{
"some_field": {
"order": "asc"
}
}
] }
Am I doing something wrong here?
In order to sort on a string field, your mapping must contain a non-analyzed version of this field. Here's a simple blog post I found that describes how you can do this using the multi_field mapping type.

Resources