ElasticSearch Blacklist (Subset Matching) - elasticsearch

I'd like to implement a keyword blacklist using ElasticSearch. Basically I want to create a list of banned queries that a user is not allowed to search for. Then I want to be able to pass in a checked query and see which banned queries it matches (if any).
A checked query matches a banned query if the banned query has a subset of its keywords. To illustrate, let me provide an example:
Banned Queries:
"black lives"
"black lives matter"
"black lives
matters"
"black lives matter rulez"
Checked Query: "black lives
matter"
Matches:
"black lives"
"black lives matter"
Only the first two banned queries match, because they're strict subsets of the checked query. The third banned query doesn't match because it uses "matters", not "matter". The last banned query doesn't match because it isn't a strict subset of "black lives matter", because it has an additional keyword "rulez".
I've been told that the best way to implement this is a percolate index. My question is how do I create a percolate query that implements a subset match against a checked query (the incoming document)?
Here is the documentation page about percolate queries: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html
Here is a related answer about subset matching:
https://discuss.elastic.co/t/subset-in-an-array/237459

The best way to achieve your use case is to use Percolate query
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"field": {
"type": "text"
},
"query": {
"type": "percolator"
}
}
}
}
Index Data:
{
"query": {
"match": {
"field": {
"query": "black lives matter rulez",
"operator": "AND"
}
}
}
}
{
"query": {
"match": {
"field": {
"query": "black lives matters",
"operator": "AND"
}
}
}
}
{
"query": {
"match": {
"field": {
"query": "black lives matter",
"operator": "AND"
}
}
}
}
{
"query": {
"match": {
"field": {
"query": "black lives",
"operator": "AND"
}
}
}
}
Search Query:
{
"query": {
"percolate": {
"field": "query",
"document": {
"field": "black lives matter"
}
}
}
}
Search Result:
"hits": [
{
"_index": "68734373",
"_type": "_doc",
"_id": "2",
"_score": 0.39229372,
"_source": {
"query": {
"match": {
"field": {
"query": "black lives matter",
"operator": "AND"
}
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
},
{
"_index": "68734373",
"_type": "_doc",
"_id": "1",
"_score": 0.26152915,
"_source": {
"query": {
"match": {
"field": {
"query": "black lives",
"operator": "AND"
}
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]

Related

Elasticsearch match_phrase query inside multi_match

I have a simple multi_match query like this:
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"query": "RNA sequencing"
}
}
}
This works well as intended, however I'd like to make my query a match phrase query so it returns "RNA sequencing" as a phrase and not "RNA" and "sequencing" separately. I tried doing this
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"query": "RNA sequencing", "type": "phrase"
}
}
}
And
{
"from": 0,
"size": 10,
"query": {
"multi_match": {
"match_phrase": {"query": "RNA sequencing"}
}
}
}
but they both result parsing errors. Any ideas on what to do?
Adding a working example with index data, search query, and search result
Index Data:
{
"title":"sequencing"
}
{
"title":"RNA sequencing"
}
{
"title":"RNA"
}
Search Query:
{
"query": {
"multi_match": {
"query": "RNA sequencing",
"type": "phrase"
}
}
}
Search Result:
"hits": [
{
"_index": "65314008",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"title": "RNA sequencing"
}
}
]

Query and exclude in ElasticSearch

I'm trying to use the match_phrase_prefix query with an exclude query, so that it matches all terms except for the terms to be exclude. I have it figured out in a basic URI query, but not the regular JSON query. How do I convert this URI into a JSON type query?
"http://127.0.0.1:9200/topics/_search?q=name:"
+ QUERY + "* AND !name=" + CURRENT_TAGS
Where CURRENT_TAGS is a list of tags not to match with.
This is what I have so far:
{
"query": {
"bool": {
"must": {
"match_phrase_prefix": {
"name": "a"
}
},
"filter": {
"terms": {
"name": [
"apple"
]
}
}
}
}
}
However, when I do this apple is still included in the results. How do I exclude apple?
You are almost there, you can use must_not, which is part of boolean query to exclude the documents which you don't want, below is working example on your sample.
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index sample docs as apple and amazon worlds biggest companies which matches your search criteria :)
Search query to exclude apple
{
"query": {
"bool": {
"must": {
"match_phrase_prefix": {
"name": "a"
}
},
"must_not": {
"match": {
"name": "apple"
}
}
}
}
}
Search results
"hits": [
{
"_index": "matchprase",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"name": "amazon"
}
}
]

Returning documents that match multiple wildcard string queries

I'm new to Elasticsearch and would greatly appreciate help on this
In the query below I only want the first document to be returned, but instead both documents are returned. How can I write a query to search for two wildcard strings on two separate fields, but only return documents that match?
I think what's being returned currently is score dependent, but I don't need the score.
POST /pr/_doc/1
{
"type": "Type ONE",
"currency":"USD"
}
POST /pr/_doc/2
{
"type": "Type TWO",
"currency":"USD"
}
GET /pr/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Type ON*",
"fields": ["type"],
"analyze_wildcard": true
}
},
{
"simple_query_string": {
"query": "US*",
"fields": ["currency"],
"analyze_wildcard":true
}
}
]
}
}
}
Use below query which uses the default_operator: AND and query string for in depth information and further reading.
Search query
{
"query": {
"query_string": {
"query": "(Type ON*) AND (US*)",
"fields" : ["type", "currency"],
"default_operator" : "AND"
}
}
}
Index your sample docs and it returns your expected doc only:
"hits": [
{
"_index": "multiplequery",
"_type": "_doc",
"_id": "1",
"_score": 2.1823215,
"_source": {
"type": "Type ONE",
"currency": "USD"
}
}
]

Elasticsearch boost

I have an index called find and a type called song.
Song type structure :
"_index": "find",
"_type": "song",
"_id": "192108",
"_source": {
"id": 192108,
"artist": "Melanie",
"title": "Dark Night",
"lyrics": "Hot air hangs like a dead man\nFrom a white oak tree",
"downloadCount": 234
}
Because of multiple songs maybe has same field values, so I need to boost results by a popularity field such as downloadCount.
How can i change below query to optimize by downloadCount?
GET /search/song/_search
{
"query": {
"multi_match": {
"query": "like a dead hangs",
"type": "most_fields",
"fields": ["artist","title","lyrics"],
"operator": "or"
}
}
}
You can use field_value_factor feature of elastic_search to boost the result by downloadCount
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor
you can use function score query. Function score query provides api for scoring the document based on the document field through script_score functions.
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"term": {
"you_filter_field": {
"value": "VALUE"
}
}
}]
}
},
"functions": [{
"script_score": {
"script": "doc['downloadCount'].value"
}
}]
}
}
}
Thanks

how to make a query on a field I have not defined a mapping for

I have a field current_country that I am adding to brands, and which has not been defined in my elasticsearch mapping.
I would like to do a filtered query on this, since it is not defined I suppose it is not analyzed and a term query should work.
This is the query I am doing
{
"index": "products",
"type": "brand",
"body": {
"from": 0,
"size": 100,
"sort": [
{
"n_name": "asc"
}
],
"query": {
"filtered": {
"query": {
"function_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"current_country": "DK"
}
}
]
}
}
}
}
}
}
}
}
which returns no documents from the index.
I run the following query to check if current country exists
{
"index": "products",
"type": "brand",
"body": {
"from": 0,
"size": 100,
"sort": [
{
"n_name": "asc"
}
],
"query": {
"filtered": {
"query": {
"function_score": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "current_country"
}
}
]
}
}
}
}
}
}
}
}
which returns a total of 693 documents.
here is an example document from the index, returned when I ran the query above.
{
"_index": "products",
"_type": "brand",
"_id": "195da951241478LuxoLivingbrand",
"_score": null,
"_source": {
"categories": [
"Bordlamper og designer bordlamper der giver liv og lys"
],
"image": "http://www.fotoagent.dk/single_picture/11385/138/mega/and_tradition_flowerpot_bordlampe_lilla.jpg",
"top_price": 1695,
"low_price": 1695,
"n_name": "&Tradition",
"name": "&Tradition",
"current_country": "DK",
"current_currency": "DKK"
}
}
How can I query against current_country (preferably a filtered query).
If you do not define any mapping for a field, elasticsearch tries to detect the field as string/date/numeric. If it detects the field as string then it will use the default analyzer (standard analyzer) to analyze your input. Since standard analyzer uses lowercase token filter your input string is indexed as "dk". As term filters does not analyze the input, "DK" won't match "dk".
It can be solved by various means.
(hack) You can lowercase your input filter term. this won't work for phrases.
(better) define a mapping for your input. You can dynamically change mapping/ add new mapping easily

Resources