how to properly use wildcard on a elastic search find? - elasticsearch

I just jumped into ES and I dont have a lot of experience on this, so might be something I am missing on this.
I found this documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html that basically explains how to do a wildcard search.
I am trying to look for all messages inside my document that have certain patter.
So, using Kibana Sense (Elastic search query UI)I did this:
GET _search
{
"query": {
"wildcard" : {
"model.message": "my*"
}
}
}
with this I am trying to obtain all the messages that start with "my"
But I get no results...
Here is a copy of my document structure (or at least the first lines...)
"_index": "my_index",
"_type": "my_type",
"_id": "123456",
"_source": {
"model": {
"id": "123456",
"message": "my message",
Any idea what could be wrong?

Your sample document actually contains the model.content.message field but not the model.message field, so the following query should work:
GET _search
{
"query": {
"wildcard" : {
"model.content.message": "my*"
}
}
}

Can you share your mapping? It looks like you need to use a nested query:
GET /_search
{
"query": {
"nested" : {
"path" : "model",
"score_mode" : "avg",
"query" : {
"wildcard" : {
"model.message": "my*"
}
}
}
}
}
You can read more about nested queries here.

Related

SuggestionBuilder with BoolQueryBuilder in Elasticsearch

I am currently using BoolQueryBuilder to build a text search. I am having an issue with wrong spellings. When someone searches for a "chiar" instead of "chair" I have to show them some suggestions.
I have gone through the documentation and observed that the SuggestionBuilder is useful to get the suggestions.
Can I send all the requests in a single query, so that I can show the suggestions if the result is zero?
No need to send different search terms ie chair, chiar to get suggestions, it's not efficient and performant and you don't know all the combinations which user might misspell.
Instead, Use the fuzzy query or fuzziness param in the match query itself, which can be used in the bool query.
Let me show you an example, using the match query with the fuzziness parameter.
index def
{
"mappings": {
"properties": {
"product": {
"type": "text"
}
}
}
}
Index sample doc
{
"product" : "chair"
}
Search query with wrong term chiar
{
"query": {
"match" : {
"product" : {
"query" : "chiar",
"fuzziness" : "4" --> control it according to your application
}
}
}
}
Search result
"hits": [
{
"_index": "so_fuzzy",
"_type": "_doc",
"_id": "1",
"_score": 0.23014566,
"_source": {
"product": "chair"
}
}

Elasticsearch filter fields by data

I need to use an API with an elasticsearch backend. I don't have any controls about that server but I need to reduce the fields I get in the responses. I already figured out how to use "filter_path" and how to include specific fields by their name. But I didn't find anything about how to dismiss specific fields by data. I already tried using nested terms but with no positive results.
My actual JSON query looks like this:
{
"_source" : ["field1", "field2", "field3.1", "field3.2"],
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "*"
}
}
}
}
}
And the current output like this:
{
"hits": {
"hits": [
{
"_source": {
"field1": "data",
"field2": "data",
"field3": [
{
"field3.1": "SPECIALDATA",
"field3.2": "SPECIALDATA"
},
{
"field3.1": "data",
"field3.2": "data"
},
{
"field3.1": "data",
"field3.2": "data"
}
]
}
}
I only want field3.1 and 3.2 with the SPECIALDATA in it. All other fields with 3.1 and 3.2 should be dismissed.
Is this possible? And if so, how do I have to change my query to achieve this?

Elasticsearch 5.1: applying additional filters to the "more like this" query

Building a search engine on top of emails. MLT is great at finding emails with similar bodies or subjects, but sometimes I want to do something like: show me the emails with similar content to this one, but only from joe#yahoo.com and only during this date range. This seems to have been possible with ES 2.x, but it seems that 5.x doesn't allow allow filtration on fields other than that being considered for similarity. Am I missing something?
i still can't figure how to do what i described. Imagine I have an index of emails with two types for the sake of simplicity: body and sender. I know now to find messages that are restricted to a sender, the posted query would be something like:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"sender": "mike#foo.com"
}
}
]
}
}
}
}
}
Similarly, if I wish to know how to find messages that are similar to a single hero message using the contents of the body, i can issue a query like:
{
"query": {
"more_like_this": {
"fields" : ["body"],
"like" : [{
"_index" : "foo",
"_type" : "email",
"_id" : "a1af33b9c3dd436dabc1b7f66746cc8f"
}],
"min_doc_freq" : 2,
"min_word_length" : 2,
"max_query_terms" : 12,
"include" : "true"
}
}
}
both of these queries specify the results by adding clauses inside the query clause of the root object. However, any way I try to put these together gives me parse exceptions. I can't find any examples of documentations that would say, give me emails that are similar to this hero, but only from mike#foo.com
You're almost there, you can combine them both using a bool/filter query like this, i.e. make an array out of your filter and put both constraints in there:
{
"query": {
"bool": {
"filter": [
{
"term": {
"sender": "mike#foo.com"
}
},
{
"more_like_this": {
"fields": [
"body"
],
"like": [
{
"_index": "foo",
"_type": "email",
"_id": "a1af33b9c3dd436dabc1b7f66746cc8f"
}
],
"min_doc_freq": 2,
"min_word_length": 2,
"max_query_terms": 12,
"include": "true"
}
}
]
}
}
}

Elastic Search Term Query Not Matching URL's

I am a beginner with Elastic search and I am working on a POC from last week.
I am having a URL field as a part of my document which contains URL's in the following format :"http://www.example.com/foo/navestelre-04-cop".
I can not define mapping to my whole object as every object has different keys except the URL.
Here is how I am creating my Index :
POST
{
"settings" : {
"number_of_shards" : 5,
"mappings" : {
"properties" : {
"url" : { "type" : "string","index":"not_analyzed" }
}
}
}
}
I am keeping my URL field as not_analyzed as I have learned from some resource that marking a field as not_analyzed will prevent it from tokenization and thus I can look for an exact match for that field in a term query.
I have also tried using the whitespace analyzer as the URL value thus not have any of the white space character. But again I am unable to get a successful Hit.
Below is my term query :
{
"query":{
"constant_score": {
"filter": {
"term": {
"url":"http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
I am guessing the problem is somewhere with the Analyzers and Tokenizers but I am unable to get to a solution. Any kind of help would be great to enhance my knowledge and would help me reach to a solution.
Thanks in Advance.
You have the right idea, but it looks like some small mistakes in your settings request are leading you astray. Here is the final index request:
POST /test
{
"settings": {
"number_of_shards" : 5
},
"mappings": {
"url_test": {
"properties": {
"url": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Notice the added url_test type in the mapping. This lets ES know that your mapping applies to this document type. Also, settings and mappings are also different keys of the root object, so they have to be separated. Because your initial settings request was malformed, ES just ignored it, and used the standard analyzer on your document, which led to you not being able to query it with your query. I point you to the ES Mapping docs
We can index two documents to test with:
POST /test/url_test/1
{
"url":"http://www.example.com/foo/navestelre-04-cop"
}
POST /test/url_test/2
{
"url":"http://stackoverflow.com/questions/37326126/elastic-search-term-query-not-matching-urls"
}
And then execute your unmodified search query:
GET /test/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
}
}
}
Yields this result:
"hits": [
{
"_index": "test",
"_type": "url_test",
"_id": "1",
"_score": 1,
"_source": {
"url": "http://www.example.com/foo/navestelre-04-cop"
}
}
]

Elastic Search - Querying on values

I have an elasticsearch index with the following values
{
"_index": "article",
"_type": "articleId",
"_id": "10970",
"_score": 1,
"_source": {
"url": "http%3A%2F%2Fwww.tomshardware.com%2Fnews%2FAir-Traffic-Software-DoS-Attacks%2C16471.html%23xtor%3DRSS-181",
"title": "Air%20Traffic%20Software%20Vulnerable%20to%20DoS%20Attacks",
"publicationId": "888",
"text": "%20%3Cp%3E%3Cstrong%3EA%20security%20researcher%20revealed%20a%20flaw%20in%20commonly%20used%20air%20traffic%20control%20software%20that%20would%20allow%20an%20attacker%20to%20create%20an%20unlimited%20number%20of%20phantom%20flights.%3C%2Fstrong%3E%3C%2Fp%3E%20%3Cp%3E%3Ca%20target%3D%22_blank%22%3E%3C%2Fa%3E%3C%2Fp%3E%20%3Cp%3EAccording%20to%20Andrei%20Costin%2C%20%242%2C000%20in%20equipment%20and%20%22modest%20tech%20skills%22%20are%20enough%20to%20throw%20an%20air%20traffic%20control%20system%20of%20virtually%20any%20airport%20into%20complete%20disarray.%20The%20ADS-B%20system%20that%20is%20used%20across%20the%20world%20is%20vulnerable%20as%20it%20does%20not%20verify%20that%20incoming%20traffic%20signals%20as%20genuine.%20%3C%2Fp%3E%20%3Cp%3ECostin%20says%20that%20a%20hacker%20could%20inject%20flights%20that%20do%20not%20exist%20and%20could%20confuse%20an%20air%20controller%20station.%20Air%20controllers%20could%20cross-check%20flights%20with%20flight%20schedules%2C%20but%20if%20the%20number%20of%20phantom%20flights%20is%20high%20enough%2C%20there%20is%20no%20way%20that%20cross-checks%20would%20work.%20Consider%20it%20like%20an%20DoS%20attack%20on%20an%20air%20traffic%20control%20system.%3C%2Fp%3E%20%3Cp%3ECostin%20noted%20that%20rogue%20signals%20from%20the%20ground%20can%20be%20generally%20identified%20and%20ruled%20out%20as%20malicious%20signals%2C%20but%20there%20is%20no%20way%20to%20do%20the%20same%20for%20robotic%20aircraft%2C%20for%20example.%20He%20also%20noted%20that%20data%20sent%20from%20airplanes%20to%20air%20traffic%20controllers%20is%20unencrypted%20and%20can%20be%20captured%20by%20unidentified%20sources.%20Since%20this%20applies%20to%20any%20aircraft%2C%20it%20is%20in%20theory%20possible%20to%20deploy%20airplane%20tracking%20devices%20to%20track%20specific%20aircraft.%3C%2Fp%3E%20%3C%2Fp%3E%3Cp%3E%20%3Cp%3E%3Ca%20target%3D%22_blank%22%20href%3D%22mailto%3Anews-us%40bestofmedia.com%3Fsubject%3DNews%2520Article%2520Feedback%22%3E%3Cem%3E%3Csub%3EContact%20Us%20for%20News%20Tips%2C%20Corrections%20and%20Feedback%3C%2Fsub%3E%3C%2Fem%3E%3C%2Fa%3E%3C%2Fp%3E",
"keywords": {
"air": "3.4965034965034962",
"traffic": "3.4965034965034962",
"flights": "2.797202797202797",
"": "2.797202797202797",
"Costin": "2.097902097902098",
"aircraft": "2.097902097902098",
"signals": "2.097902097902098",
"control": "2.097902097902098",
"system": "2.097902097902098",
"there": "1.3986013986013985"
}
}
}
I am trying to write a query to search does this index have the keyword flights (which it does) but I am having difficulty
Its straightforward running a match query on one of the other fields like text but encountering problems when trying to do the same or similar for keywords
Is there a way of performing this search with the current setup or should I add the keywords in differently?
If I understood you correctly, you would like to find all records that have the field keyword.flights and the value of this field is not important. You can do it using string query:
curl "http://localhost:9200/_search?q=keywords.flights:*"
Or using the exist filter:
curl "http://localhost:9200/_search" -d '{
"query": {
"constant_score" : {
"filter" : {
"exists" : { "field" : "keywords.flights" }
}
}
}
}'

Resources