Combining results of two queries - elasticsearch

I'm using Kibana v6.1.1 and trying to get within one GET request two different queries in order to use the "must" or "should" terms more than once.
When I run this query under "Dev Tools" in the Kibana, it works.
When I want to apply this "double query" (without the GET line of course) under "Discover"->"Add a filter"->"Edit filter"->"Edit Query DSL", it doesn't accept the syntax {} in order to create an 'OR' between the queries.
It is necessary that these two "must" terms will be separated but stay in the same filter.
GET _my_index/_search
{
"query" : {
"bool" : {
"must" : [{
...
}]
}
}
}
{}
{
"query" : {
"bool" : {
"must" : [{
...
}]
}
}
}
P.S.
Using the simple_query_string doesn't seem to solve the problem and so far, I couldn't find the way to combine these two queries.

I'm not sure what you actually want to achieve. Use the following if at least one of the shoulds has to match (there is an implicit minimum_should_match if there are no other conditions, but you can also set an explicit value for that):
{
"query" : {
"bool" : {
"should" : [
{
...
},
{
...
}
]
}
}
}
If you want to run independent queries, use a multi search.

Related

How to limit elasticsearch to a list of documents each identified by a unique keyword

I have an elasticsearch document repository with ~15M documents.
Each document has an unique 11-char string field (comes from a mongo DB) that is unique to the document. This field is indexed as keyword.
I'm using C#.
When I run a search, I want to be able to limit the search to a set of documents that I specify (via some list of the unique field ids).
My query text uses bool with must to supply a filter for the unique identifiers and additional clauses to actually search the documents. See example below.
To search a large number of documents, I generate multiple query strings and run them concurrently. Each query handles up to 64K unique ids (determined by the limit on terms).
In this case, I have 262,144 documents to search (list comes, at run time, from a separate mongo DB query). So my code generates 4 query strings (see example below).
I run them concurrently.
Unfortunately, this search takes over 22 seconds to complete.
When I run the same search but drop the terms node (so it searches all the documents), a single such query completes the search in 1.8 seconds.
An incredible difference.
So my question: Is there an efficient way to specify which documents are to be searched (when each document has a unique self-identifying keyword field)?
I want to be able to specify up to a few 100K of such unique ids.
Here's an example of my search specifying unique document identifiers:
{
"_source" : "talentId",
"from" : 0,
"size" : 10000,
"query" : {
"bool" : {
"must" : [
{
"bool" : {
"must" : [ { "match_phrase" : { "freeText" : "java" } },
{ "match_phrase" : { "freeText" : "unix" } },
{ "match_phrase" : { "freeText" : "c#" } },
{ "match_phrase" : { "freeText" : "cnn" } } ]
}
},
{
"bool" : {
"filter" : {
"bool" : {
"should" : [
{
"terms" : {
"talentId" : [ "goGSXMWE1Qg", "GvTDYS6F1Qg",
"-qa_N-aC1Qg", "iu299LCC1Qg",
"0p7SpteI1Qg", ... 4,995 more ... ]
}
}
]
}
}
}
}
]
}
}
}
#jarmod is right.
But if you don't wanna completely redo your architecture, is there some other single talent-related shared field you could query instead of thousands of talendIds? It could be one more simple match_phrase query.

How to use elastic search for advanced queries:

I'm using elasticsearch. I'm already pretty deep into it but I'm very confused as to how to go about writing advanced queries. There are queries / filters / etc. I'm confused as to how to proceed.
I have a schema that looks like this:
photos: {people: [{person_id: 1, person_name:"john kealy"}],
tags: [{tag_id: 1, tag_name:"other tag"},
by_line: "John D Kealy/My website.com",
location: "Some Place OUt West"]
I need to be able to string together these queries dynamically ALWAYS pulling in FULL MATCHES, e.g. I would like to search for
people.person_id: [1,2] (pulls in only photos with BOTH or more peole)
tags.tag_id: [1,2,3] (pulls in only photos with all three or more tags)
by_line: "John D. Kealy/My Website.com" (the full name including the slash)
location: "some place out west"
I would like to write one query with all these items. I need to include the slash in "by_line", i don't care up upper or lower case. I need the exact match "some place out west". What do I use here? Queries or filters / filtered?
General guidelines for bool filters/queries can be found here.
If you are constructing an "exact match" query, you can often use the term filter (or query).
If you are constructing a search that requires a solid performance speed wise, a filtered query is often advisable, as filters are set before the query is run, often improving performance.
As for your specific example, the below filters should work, throw it around a matchAll query or anything else you need [With the non-analyzed by_line field, the analyzed one has a query). This should give you an idea as how to construct future queries:
NOTE: This assumes that your by_line field is not analyzed. The double slash will escape your slash delimiter, if you are using an analyzed field you must use a match query.
Without analyzer on by_line
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [
{ "terms" : {"people.person_id" : ["1", "2"]}},
{ "terms" : {"tags.tag_id" : ["1", "2", "3"]}},
{ "term" : {"by_line" : "John D. Kealy\\/My Website.com"}},
{ "term" : {"location" : "some place out west"}}
]
}
}
}
}
}
I will keep the above there for future readers, however I see in your post history that you are using the standard analyzer, your query should be structured as follows.
With analyzer on by_line
{
"query" : {
"filtered" : {
"query": {
"match": {
"by_line": "John Kealy/BFA.com"
}
},
"filter" : {
"bool" : {
"must" : [
{ "terms" : {"people.person_id" : ["1", "2"]}},
{ "terms" : {"tags.tag_id" : ["1", "2", "3"]}},
{ "term" : {"location" : "some place out west"}}
]
}
}
}
}
}

elasticsearch query to find documents that don't exist

Is there a way in Elasticsearch through filters, queries, aggregations etc to search for a list of document ids and have returned which ids did not hit?
With a small list it is easy enough to compare the results against the requested ids list but I'm dealing with lists of ids in the tens of thousands and it is not going to be performant to do that.
Do you mean, from https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-not-filter.html
"filtered" : {
"query" : {
"term" : { "name.first" : "shay" }
},
"filter" : {
"not" : {
"range" : {
"postDate" : {
"from" : "2010-03-01",
"to" : "2010-04-01"
}
}
}
}
}
Take a look at the guide at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

How to use lucene SpanQuery in ElasticSearch

For my project, I thought of using Span Near Queries of ElasticSearch, with the constraint that is, certain tokens may have to searched with Fuzziness. I was able to generate a set of SpanQuery (org.apache.lucene.search.spans.SpanQuery) objects some with fuzzy enabled, some without. I couldn't figure out how to use these set of SpanQueries in ElasticSearch spanNearQuery.
Can someone help me out with right pointers to samples or docs. And is there any way to construct ES SpanNearQueryBuilder with some clauses fuzzy enabled ?
You can wrap an fuzzy query into a span query with Span Multi Term Query:
{
"span_near" : {
"clauses" : [
{ "span_term" : { "field" : "value1" } },
{ "span_multi" :
"match" : {
"prefix" : { "user" : { "field" : "value2" } }
}
}
],
...
}
}

Elastic Search boost query corresponding to first search term

I am using PyElasticsearch (elasticsearch python client library). I am searching strings like Arvind Kejriwal India Today Economic Times and that gives me reasonable results. I was hoping I could increase weight of the first words more in the search query. How can I do that?
res = es.search(index="article-index", fields="url", body={
"query": {
"query_string": {
"query": "keywordstr",
"fields": [
"text",
"title",
"tags",
"domain"
]
}
}
})
I am using the above command to search right now.
split given query into multiple terms. In your example it will be Arvind, Kejriwal... Now form query string queries(or field query or any other which fits into the need) for each of the given terms. A query string query will look like this
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-query-string-query.html
{
"query_string" : {
"default_field" : "content",
"query" : "<one of the given term>",
"boost": <any number>
}
}
Now you have got multiple queries like above with different boost values(depending upon which have higher weight). Combine all of those queries into one query using BOOL query. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
If you want all of the terms to be present in the result, query will be like this.
{
"bool" : {
"must" : [q1, q2, q3 ...]
}
}
you can use different options of bool query. for example you want any of 3 terms to present in result then query will be like
{
"bool" : {
"should" : [q1, q2,q3 ...]
},
"minimum_should_match" : 3,
}
theoretically:
split into terms using api
query against terms with different boosting
Lucene Query Syntax does the trick. Thanks
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Boosting%20a%20Term

Resources