What is the difference between simple_query_string and query_string in elastic search?
Which is better for searching?
In the elastic search simple_query_string documentation, they are written
Unlike the regular query_string query, the simple_query_string query
will never throw an exception and discards invalid parts of the
query.
but it not clear. Which one is better?
There is no simple answer. It depends :)
In general the query_string is dedicated for more advanced uses. It has more options but as you quoted it throws exception when sent query cannot be parsed as a whole. In contrary simple_query_string has less options but does not throw exception on invalid parts.
As an example take a look at two below queries:
GET _search
{
"query": {
"query_string": {
"query": "hyperspace AND crops",
"fields": [
"description"
]
}
}
}
GET _search
{
"query": {
"simple_query_string": {
"query": "hyperspace + crops",
"fields": [
"description"
]
}
}
}
Both are equivalent and return the same results from your index. But when you will break the query and sent:
GET _search
{
"query": {
"query_string": {
"query": "hyperspace AND crops AND",
"fields": [
"description"
]
}
}
}
GET _search
{
"query": {
"simple_query_string": {
"query": "hyperspace + crops +",
"fields": [
"description"
]
}
}
}
Then you will get results only from the second one (simple_query_string). The first one (query_string) will throw something like this:
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "Failed to parse query [hyperspace AND crops AND]",
"index_uuid": "FWz0DXnmQhyW5SPU3yj2Tg",
"index": "your_index_name"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
...
]
},
"status": 400
}
Hope you are now understand the difference with throwing/not throwing exception.
Which is better? If you want expose the search to some plain end users I would rather recommend to use simple_query_string. Thanks to that end user will get some result in each query case even if he made a mistake in a query. query_string is recommended for some more advanced users who will be trained in how is the correct query syntax so they will know why they do not have any results in every particular situation.
Adding to what #Piotr has mentioned,
What I understand is when you want the external users or consumers want to make use of the search solution, simple query string offers better solution in terms of error handling and limiting what kind of queries users can probably construct.
In other words, if the search solution is available publicly for any consumers to consume the solution, then I guess simple_query_string would make sense, however if I do know who my end-users are in a way I can drive them as what they are looking for, no reason why I cannot expose them via query_string
Also QueryStringQueryBuilder.java makes use of QueryStringQueryParser.java while SimpleQueryStringBuilder.java makes use of SimpleQueryStringQueryParser.java which makes me think that there would be certain limitations in parsing and definitely the creators wouldn't want many features to be managed by end-users. for .e.g dis-max and which is available in query_string.
Perhaps the main purpose of simple query string is to limit end-users to make use of simple querying for their purpose and devoid them of all forms of complex querying and advance features so that we have more control on our search engine (which I'm not really sure about but just a thought).
Plus the possibilities to misuse query_string might be more as only advanced users are capable of constructing certain complex queries in correct way which may be a bit too much for simple users who are looking for basic search solution.
Related
I am looking for a way to implement the auto-suggest with synonyms & fuzziness
For example, when the user tried to search for "replce ar"
My synonym list has ar => audio record
So, the result should include the items matching
changing audio record
replacing audio record
etc..,
Here we need fuzziness because there is a typo on "replace" (in the user's search text)
Synonyms to match ar => audio record
Auto-suggest with regex pattern.
Is it possible to implement all the three features in a single field?
Edit:
a regex+fuzzy just throws error.
I haven't well explained my need of a regex-pattern.
so, i needed a Regex for doing a partial word lookup ('encyclopedic' contains 'cyclo').
now, after investigating what options do i have for this purpose, directing me to the NGram Tokenizer and looking into the other suggesters, i found that maybe Phrase suggester is realy what I'm looking for, so I'll try it & tell you about.
Yes, you can use synonyms as well as fuzziness for suggestions. The synonyms are handled by adding a synonym filter in your language analyzer and adding that filter to the analyzer. Then, when you create the field mapping for the field(s) you want to use for suggestions, you assign that analyzer to that field.
As for fuzziness, that happens at query time. Most text-based queries support a fuzziness option which allows you to specify how many corrections you want to allow. The default auto value adjusts the number of corrections, depending on how long the term is, so that's usually best.
Notional analysis setup (synonym_graph reference)
{
"analysis": {
"filter": {
"synonyms": {
"type": "synonym_graph",
"expand": "false",
"synonyms": [
"ar => audio record"
]
}
},
"analyzer": {
"synonyms": {
"tokenizer": "standard",
"type": "custom",
"filter": [
"standard",
"lowercase",
"synonyms"
]
}
}
}
}
Notional Field Mapping (Analyzer + Mapping reference)
(Note that the analyzer matches the name of the analyzer defined above)
{
"properties": {
"suggestion": {
"type": "text",
"analyzer": "synonyms"
}
}
}
Notional Query
{
"query": {
"match": {
"suggestion": {
"query": "replce ar",
"fuzziness": "auto",
"operator": "and"
}
}
}
}
Keep in mind that there are several different options for suggestions, so depending on which option you use, you may need to adjust the way the field is mapped, or even add another token filter to the analyzer. But analyzers are just made up of a series of token filters, so you can usually combine whatever token filters you need to achieve your goal. Just make sure you understand what each filter is doing so you get the filters in the correct order.
If you get stuck in part of this process, just submit another question with the specific issue you're running into. Good luck!
Assuming I have an index with two fields: title and loc, I would like to search in this two fields and get the "best" match. So if I have three items:
{"title": "castle", "loc": "something"},
{"title": "something castle something", "loc": "something,pontivy,something"},
{"title": "something else", "loc": "something"}
... I would like to get the second one which has "castle" in its title and "pontivy" in its loc. I tried to simplify the example and the base, it's a bit more complicated. So I tried this query, but it seems not accurate (it's a feeling, not really easy to explain):
GET merimee/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"fields": [ "title", "loc" ]
}
}
}
Is it the right way to search in various field and get the one which match the in all the fields?
Not sure my question is clear enough, I can edit if required.
EDIT:
The story is: the user type "castle pontivy" and I want to get the "best" result for this query, which is the second because it contains "castle" in "title" and "pontivy" in "loc". In other words I want the result that has the best result in both fields.
As the other posted suggested, you could use a bool query but that might not work for your use case since you have a single search box that you want to query against multiple fields with.
I recommend looking at a Simple Query String query as that will likely give you the functionality you're looking for. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
So you could do something similar to this:
{
"query": {
"simple_query_string" : {
"query": "castle pontivy",
"fields": ["title", "loc"],
"default_operator": "and"
}
}
}
So this will try to give you the best documents that match both terms in either of those fields. The default operator is set as AND here because otherwise it is OR which might not give you the expected results.
It is worthwhile to experiment with other options available for this query type as well. You might also explore using a Query String query as it gives more flexibility but the Simple Query String term works very well for most cases.
This can be done by using bool type of query and then matching the fields.
GET _search
{
"query":
{
"bool": {"must": [{"match": {"title": "castle"}},{"match": {"loc": "pontivy"}}]
}
}
}
As per the elasticsearch 5.1 documentation, I have built the following query to implement a basic search functionality on a subset of the piece of software I am building. For some reason, this query never returns any results even if all of the fields are present. All users are guaranteed to have all of these fields, but to be safe I tested it with each individual field and got the same result each time.
"query": {
"multi_match": {
"fields": [
"displayName",
"title",
"team",
"teamLeader"
],
"query": "a",
"fuzziness": "AUTO"
}
}
}
I have also attempted using other types like best_fields, phrase_prefix, etc. to no avail. I know the data is there because my filter query works just fine, but suddenly no data returns after I add this section. Is there anything I can do to better debug this situation?
How can i use Elasticsearch to find the missing word in a phrase? For example i want to find all documents which contain this pattern make * great again, i tried using a wildcard query but it returned no results:
{
"fields": [
"file_name",
"mime_type",
"id",
"sha1",
"added_at",
"content.title",
"content.keywords",
"content.author"
],
"highlight": {
"encoder": "html",
"fields": {
"content.content": {
"number_of_fragments": 5
}
},
"order": "score",
"tags_schema": "styled"
},
"query": {
"wildcard": {
"content.content": "make * great again"
}
}
}
If i put in a word and use a match_phrase query i get results, so i know i have data which matches the pattern.
Which type of query should i use? or do i need to add some type of custom analyzer to the field?
Wildcard queries operate on terms, so if you use it on an analyzed field, it will actually try to match every term in that field separately. In your case, you can create a not_analyzed sub-field (such as content.content.raw) and run the wildcard query on that. Or just map the actual field to not be analyzed, if you don't need to query it in other ways.
This question feels very similar to an old question posted here: Retrieve analyzed tokens from ElasticSearch documents, but to see if there are any changes I thought it would make sense to post it again for the latest version of ElasticSearch.
We are trying to search bodies of text in ElasticSearch with the search-query and field-mapping using the snowball stemmer built into ElasticSearch. The performance and results are great, but because we need to have the stemmed text-body for post-analysis we would like to have the search result return the actual stemmed tokens for the text-field per document in the search results.
The mapping for the field currently looks like:
"TitleEnglish": {
"type": "string",
"analyzer": "standard",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"stemming": {
"type": "string",
"analyzer": "snowball"
}
}
}
and the search query is performed specifically on TitleEnglish.stemming. Ideally I would like it to return that field, but returning that does not return the analyzed field but the original field.
Does anybody know of any way to do this? We have looked at Term Vectors, but they only seem to be returnable for individual documents or a body of documents, not for a search result?
Or perhaps other solutions like Solr or Sphinx do offer this option?
To add some extra information. If we run the following query:
GET /_analyze?analyzer=snowball&text=Eight issue of Industrial Lorestan eliminate barriers to facilitate the Committees review of
It returns the stemmed words: eight, issu, industri, etc. This is exactly the result we would like back for each matching document for all of the words in the text (so not just the matches).
Unless I'm missing something evident, why not simply returning a terms aggregation on the TitleEnglish.stemming field?
{
"query": {...},
"aggs" : {
"stems" : {
"terms" : {
"field" : "TitleEnglish.stemming",
"size": 50
}
}
}
}
Adding that aggregation to your query, you'd get a breakdown of all the stemmed terms in the TitleEnglish.stemming sub-field from the documents that matched your query.