Checking for not null with completion suggester query in Elastic Search - elasticsearch

I have an existing query that is providing suggestions for postcode having the query as below (I have hard coded it with postcode as T0L)
"suggest":{
"suggestions":{
"text":"T0L",
"completion":{
"field": "postcode.suggest"
}
}
}
This works fine, but it searches for some results where the city contains null values. So I need to filter the addresses where the city is not null.
So I followed the solution on this and prepared the query like this.
{
"query": {
"constant_score": {
"filter": {
"exists": {
"field": "city"
}
}
}
},
"suggest":{
"suggestions":{
"text":"T0L",
"completion":{
"field": "postcode.suggest"
}
}
}
}
But unfortunately this is not giving the required addresses where the postcode contains T0L, rather I am getting results where postcode starts with A1X. So I believe it is querying for all the addresses where the city is present and ignoring the completion suggester query. Can you please let me know where is the mistake. Or may be how to write it correctly.

There is no way to filter out suggestions at query time, because completion suggester use FST (special in-memory data structure that built at index time) for lightning-fast search.
But you can change your mapping and add context for your suggester. The basic idea of context that it also filled at index time along with completion field and therefore can be used at query time with suggest query.

Related

Filtering documents by an unknown value of a field

I'm trying to create a query to filter my documents by one (can be anyone) value from a field (in my case "host.name"). The point is that I don't know previously the unique values of this field. I need found these and choose one to be used in the query.
I had tried the below query using a painless script, but I have not been able to achieve the goal.
{
"sort" : [{"#timestamp": "desc"}, {"host.name": "asc"}],
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": """
String k = doc['host.name'][0];
return doc['host.name'].value == k;
""",
"lang": "painless"
}
}
}
}
}
I'll appreciate if any can help me improving this idea of suggesting me a new one.
TL;DR you can't.
The script query context operates on one document at a time and so you won't have access to the other docs' field values. You can either use a scripted_metric aggregation which does allow iterating through all docs but it's just that -- an aggregation -- and not a query.
I'd suggest to first run a simple terms agg to figure out what values you're working with and then build your queries accordingly.

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:
We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?
You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.
Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword
Query to Verify
POST <your_index_name>/_search
{
"size": 0,
"query": {
"term": {
"sourceId": "100"
}
}
}
Execute the above Term Query and take a note at the hits.total of the response.
Remove the "size":0 in the above query if you want to view the entire documents as response.
Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.
Query to Delete
POST <your_index_name>/_delete_by_query
{
"query": {
"term": {
"sourceId": "100"
}
}
}
Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.
I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.
Hope it helps!
POST /my_index/_delete_by_query?conflicts=proceed&pretty
{
"query": {
"match_all": {}
}
}
Delete all the documents of an index without deleting the mapping and settings:
See: https://opster.com/guides/elasticsearch/search-apis/elasticsearch-delete-by-query/

Elasticsearch bulk or search

Background
I am working on an API that allows the user to pass in a list of details about a member (name, email addresses, ...) I want to use this information to match up with account records in my Elasticsearch database and return a list of potential matches.
I thought this would be as simple as doing a bool query on the fields I want, however I seem to be getting no hits.
I'm relatively new to Elasticsearch, my current _search request looks like this.
Example Query
POST /member/account/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" [{
"term" : {
"email": "jon.smith#gmail.com"
}
},{
"term" : {
"email": "samy#gmail.com"
}
},{
"term" : {
"email": "bo.blog#gmail.com"
}
}]
}
}
}
}
}
Question
How should I update this query to return records that match any of the email addresses?
Am I able to prioritise records that match email and another field? Example "family_name".
Will this be a problem if I need to do this against a few hundred emails addresses?
Well , you need to make the change in the index side rather than query side.
By default your email ID is broken into
jon.smith#gmail.com => [ jon , smith , gmail , com]
While indexing.
Now when you are searching using term query , it does not apply the analyzer and it tries to get the exact match of jon.smith#gmail.com , which as you can see , wont work.
Even if you use match query , then you will end up getting all document as matches.
Hence you need to change the mapping to index email ID as a single token , rather than tokenizing it.
So using not_analyzed would be the best solution here.
When you define email field as not_analyzed , the following happens while indexing.
jon.smith#gmail.com => [ jon.smith#gmail.com]
After changing the mapping and indexing all your documents , now you can freely run the above query.
I would suggest to use terms query as following -
{
"query": {
"terms": {
"email": [
"jon.smith#gmail.com",
"samy#gmail.com",
"bo.blog#gmail.com"
]
}
}
}
To answer the second part of your question - You are looking for boosting and would recommend to go through function score query

Elasticsearch doesn't return results

I am facing a strange issue in elasticsearch query. I don't know much about elasticsearch. My query is:
{
"query":
{
"bool":
{
"must":
[
{
"text":
{
"countryCode2":"DE"
}
}
],
"must_not":[],
"should":[]
}
},"from":0,"size":1,"sort":[],"facets":{}
}
The issues is for "DE". It is giving me results but for "BE" or "IN" it returns empty result.
You are indexing using the default mapping, which by default removes english stopwords. The country codes "IN", "BE", and many more are stopwords which don't even get indexed, therefore it's not possible to have matching documents, nor get back those country codes when faceting on that field.
The solution is to reindex after having submitted your own mapping for the country code field:
{
"your_type_name" : {
"country" : {
"type" : "string", "index" : "not_analyzed"
}
}
}
If you already tried to do this but nothing changed, the mapping didn't get submitted properly. I would suggest to double check that its json structure is correct and that you can actually get it back using the get mapping api.
As this is a common problem the defaults are probably going to change in the future to be less intrusive and avoid applying any language dependent text analysis.

Sorting a match query with ElasticSearch

I'm trying to use ElasticSearch to find all records containing a particular string. I'm using a match query for this, and it's working fine.
Now, I'm trying to sort the results based on a particular field. When I try this, I get some very unexpected output, and none of the records even contain my initial search query.
My request is structured as follows:
{
"query":
{
"match": {"_all": "some_search_string"}
},
"sort": [
{
"some_field": {
"order": "asc"
}
}
] }
Am I doing something wrong here?
In order to sort on a string field, your mapping must contain a non-analyzed version of this field. Here's a simple blog post I found that describes how you can do this using the multi_field mapping type.

Resources