How to do a wildcard or regex match on _id in elasticsearch? - elasticsearch

From below sample elasticsearch data I want to apply wildcard say *.000ANT.* on _id so as to fetch all docs whose _id contains 000ANT. Please help.
"hits": [
{
"_index": "data_collector",
"_type": "agents",
"_id": "Org000LAN_example1.com",
"_score": 1,
"fields": {
"host": [
"example1.com"
]
}
},
{
"_index": "data_collector",
"_type": "agents",
"_id": "000BAN_example2.com",
"_score": 1,
"fields": {
"host": [
"example2.com"
]
}
},
{
"_index": "data_collector",
"_type": "agents",
"_id": "000ANT_example3.com",
"_score": 1,
"fields": {
"host": [
"example3.com"
]
}
}
]

This is just an extension on Andrei Stefan's answer
{
"query": {
"script": {
"script": "doc['_id'][0].indexOf('000ANT') > -1"
}
}
}
Note: I do not know the performance impact of such a query, most probably this is a bad idea. Use with caution and avoid if possible.

You can use a wildcard query like this, though it's worth noting that it is not advised to start a wildcard term with * as performance will suffer.
{
"query": {
"wildcard": {
"_uid": "*000ANT*"
}
}
}
Also note that if the wildcard term you're searching for matches the type name of your documents, using uid will not work, as uid is simply the contraction of the type and the id: type#id

Try this
{
"filter": {
"bool": {
"must": [
{
"regexp": {
"_uid": {
"value": ".*000ANT.*"
}
}
}
]
}
}
}

Allow your mapping for the id to be indexed:
{
"mappings": {
"agents": {
"_id": {
"index": "not_analyzed"
}
}
}
}
And use a query_string to search for it:
{
"query": {
"query_string": {
"query": "_id:(*000ANT*)",
"lowercase_expanded_terms": false
}
}
}
Or like this (with scripts and still querying only the _id):
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "org.elasticsearch.index.mapper.Uid.splitUidIntoTypeAndId(new org.apache.lucene.util.BytesRef(doc['_uid'].value))[1].utf8ToString().contains('000ANT')"
}
}
}
}
}

You have two options here, the first is to use partial matching, which is easiest by wrapping a query with wildcards similar to other answers. This works on not_analyzed fields and is case sensitive.
POST /my_index/my_type/_search
{
"query": {
"wildcard": {
"_id": {
"value": "*000ANT*"
}
}
}
}
The second option is to use ElasticSearch analyzers and proper mapping to describe the functionality you are looking for, you can read about those here.
The basic premise is that you introduce an analyzer in your mapping which has a tokenizer, which will break strings down into smaller tokens that then can be matched. Doing a simple query search for "000ANT" on the tokenized _id field will return all result with that string.

Related

combine terms and bool query in elasticsearch

I would like to do a search in an elasticsearch index but only for a list of ids. I can select the ids with a terms query
{
"query": {
"terms": {
"_id": list_of_ids
}
}
}
Now I want to search in the resulting list, which can be done with a query like this
{
"query": {
"bool": {
"must": {}
}
}
}
My question is how can I combine those two queries?
One solution I found is to add the ids into the must query like this
{
"query": {
"bool": {
"must": {}
"should": [{
"term": {
"_id": id1
},
"term": {
"_id": id2
}]
}
}
}
}
which works fine. However, if the list of ids is very large it can lead to errors.
elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to create query:
I am wondering whether there is a more compact way to write such a query? I think the error above is caused by my query just being too long since I added thousands of term searches... there must be a way to just provide an array, like in the terms query?
solved it
{
"query": {
"bool": {
"must": {},
"filter": {
"terms": {
"_id": list_of_ids
}
}
}
}
}
sorry I am a bit of a newbie to elasticsearch...
You can also use IDs query, which returns documents based on their IDs.
Adding a working example with index data, search query, and search result.
Index Data:
{
"name":"buiscuit",
"cost":"55",
"discount":"20"
}
{
"name":"multi grain bread",
"cost":"55",
"discount":"20"
}
Search Query:
{
"query": {
"bool": {
"must": {
"match": {
"name": "bread"
}
},
"filter": {
"ids": {
"values": [
"1",
"2",
"4"
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65431114",
"_type": "_doc",
"_id": "1",
"_score": 0.5754429,
"_source": {
"name": "multi grain bread",
"cost": "55",
"discount": "20"
}
}
]

ElasticSearch - Search if this term include in the list index or not?

It was mapped like this:
"parent_ids": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
The actual data looks like this:
parent_ids = ["asdf", "aeraeg", "A123"]
I want to filter all products with parent_ids "A123":
"filter":
{
"match": {
"parent_ids": "{{parent_ids}}"
}
}
But not working
You can use terms query that returns documents that contain one
or more exact terms in a provided field.
Search Query:
{
"query": {
"terms": {
"parent_ids.keyword": [ "A123"]
}
}
}
Search Result:
"hits": [
{
"_index": "64745756",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"parent_ids": [
"asdf",
"aeraeg",
"A123"
]
}
}
]
Search Query using bool query:
{
"query": {
"bool": {
"filter": {
"match": {
"parent_ids": "A123"
}
}
}
}
}
If you need to format your query to support a JSON array in parameters, you'll need to format your query like this:
{
"terms": {
"parent_ids.keyword": {{#toJson}}parent_ids{{/toJson}}
}
}
Note that the match query doesn't support an array of values, only the terms query does.

Query and exclude in ElasticSearch

I'm trying to use the match_phrase_prefix query with an exclude query, so that it matches all terms except for the terms to be exclude. I have it figured out in a basic URI query, but not the regular JSON query. How do I convert this URI into a JSON type query?
"http://127.0.0.1:9200/topics/_search?q=name:"
+ QUERY + "* AND !name=" + CURRENT_TAGS
Where CURRENT_TAGS is a list of tags not to match with.
This is what I have so far:
{
"query": {
"bool": {
"must": {
"match_phrase_prefix": {
"name": "a"
}
},
"filter": {
"terms": {
"name": [
"apple"
]
}
}
}
}
}
However, when I do this apple is still included in the results. How do I exclude apple?
You are almost there, you can use must_not, which is part of boolean query to exclude the documents which you don't want, below is working example on your sample.
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index sample docs as apple and amazon worlds biggest companies which matches your search criteria :)
Search query to exclude apple
{
"query": {
"bool": {
"must": {
"match_phrase_prefix": {
"name": "a"
}
},
"must_not": {
"match": {
"name": "apple"
}
}
}
}
}
Search results
"hits": [
{
"_index": "matchprase",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"name": "amazon"
}
}
]

Returning documents that match multiple wildcard string queries

I'm new to Elasticsearch and would greatly appreciate help on this
In the query below I only want the first document to be returned, but instead both documents are returned. How can I write a query to search for two wildcard strings on two separate fields, but only return documents that match?
I think what's being returned currently is score dependent, but I don't need the score.
POST /pr/_doc/1
{
"type": "Type ONE",
"currency":"USD"
}
POST /pr/_doc/2
{
"type": "Type TWO",
"currency":"USD"
}
GET /pr/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Type ON*",
"fields": ["type"],
"analyze_wildcard": true
}
},
{
"simple_query_string": {
"query": "US*",
"fields": ["currency"],
"analyze_wildcard":true
}
}
]
}
}
}
Use below query which uses the default_operator: AND and query string for in depth information and further reading.
Search query
{
"query": {
"query_string": {
"query": "(Type ON*) AND (US*)",
"fields" : ["type", "currency"],
"default_operator" : "AND"
}
}
}
Index your sample docs and it returns your expected doc only:
"hits": [
{
"_index": "multiplequery",
"_type": "_doc",
"_id": "1",
"_score": 2.1823215,
"_source": {
"type": "Type ONE",
"currency": "USD"
}
}
]

Elasticsearch - pass fuzziness parameter in query_string

I have a fuzzy query with customized AUTO:10,20 fuzziness value.
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
How to convert it to a query_string query? I tried nike~AUTO:10,20 but it is not working.
It's possible with query_strng as well, let me show using the same example as OP provided, both match_query provided by OP matches and query_string fetches the same document with same score.
And according to this and this ES docs, Elasticsearch supports AUTO:10,20 format, which is shown in my example as well.
Also
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index some doc
{
"name" : "nike"
}
Search query using match with fuzziness
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Query_string with fuzziness
{
"query": {
"query_string": {
"fields": ["name"],
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Lucene syntax only allows you to specify "fuzziness" with the tilde symbol "~", optionally followed by 0, 1 or 2 to indicate the edit distance.
Elasticsearch Query DSL supports a configurable special value for AUTO which then is used to build the proper Lucene query.
You would need to implement that logic on your application side, by evaluating the desired edit distance based on the length of your search term and then use <searchTerm>~<editDistance> in your query_string-query.

Resources