How do I return just the fields from a query? - elasticsearch

If I run a search query, I want to be able to select just the distinct fields from the hits/sources. To be clear, I don't want to limit the fields that are returned, I want to select the available fields as a list. Is this possible?
For example, if I ran a typical search and there are three results each with different fields:
R1
"Field1": "foo"
"Field2": "bar"
R2
"Field2": "bara"
R3
"Field1": "fooa"
"Field3": "baz"
The result would be ["Field1", "Field2", "Field3"]

Related

Create a keyword field concatenated of other fields

I've got an index with a mapping of 3 fields. Let's say f1, f2 and f3.
I want a new keyword field with the concatenation of the values of f1, f2 and f3 to be able to aggregate by it to avoid having lots of nested loops when checking the search results.
I've seen that this could be achieved by source transformation, but since elastic v5, this feature was deleted.
ElasticSearch version used: 6.5
Q: How can I archieve the concatenation in ElasticSearch v 6.5?
There was indeed source transformation prior to ES 5, but as of ES 5 there is now a more powerful feature called ingest nodes which will allow you to easily achieve what you need:
First, define an ingest pipeline using a set processor that will help you concatenate three fields into one:
PUT _ingest/pipeline/concat
{
"processors": [
{
"set": {
"field": "field4",
"value": "{{field1}} {{field2}} {{field3}}"
}
}
]
}
You can then index a document using that pipeline:
PUT index/doc/1?pipeline=concat
{
"field1": "1",
"field2": "2",
"field3": "3"
}
And the indexed document will look like:
{
"field1": "1",
"field2": "2",
"field3": "3",
"field4": "1 2 3"
}
Just make sure to create the index with the appropriate mapping for field4 prior to indexing the first document.

Combine queries and order results by score

I want Elastic to execute multiple (multi-match) queries and sort them by score. The score of each query should be calculated indepentent of the other queries (which is different from what I have googled so far with the bool/should clause I think).
Example:
Query 1:
"multi_match" : {
"query": "test",
"fields": ["a", "b", "c"],
"tie_breaker": 0.2,
"minimum_should_match": "50%"
}
Query 2:
"multi_match" : {
"query": "test2",
"fields": ["a", "b", "c"],
"tie_breaker": 0.2,
"minimum_should_match": "50%"
}
Combine both results and order by score. How can I do that with Elastic?
I believe Dis Max query is what you are looking for:
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking
increment for any additional matching subqueries.

Elasticsearch simple query string: removing documents containing words

I created a foo example to express what I mean. Suppose we have an index which documents contain the words Text and Texture.
Then I'd like to select all documents containing the word Text (I'm using the simple query string).
When I use the query "query": "Text", I get areas 1, 2 and 3 from the picture bellow.
When I use the query "query": "Text -Texture", I get only the area 3 from the picture bellow.
How could I get both areas 2 and 3?
Thanks.
To understand your problem you need to post your query.
Try to use term:
{
"query": {
"term": {
"myField": "Text"
}
}
}

elasticsearch: or operator, number of matches

Is it possible to score my searches according to the number of matches when using operator "or"?
Currently query looks like this:
"query": {
"function_score": {
"query": {
"match": {
"tags.eng": {
"query": "apples banana juice",
"operator": "or",
"fuzziness": "AUTO"
}
}
},
"script_score": {
"script": # TODO
},
"boost_mode": "replace"
}
}
I don't want to use "and" operator, since I want documents containing "apple juice" to be found, as well as documents containing only "juice", etc. However a document containing the three words should score more than documents containing two words or a single word, and so on.
I found a possible solution here https://github.com/elastic/elasticsearch/issues/13806
which uses bool queries. However I don't know how to access the tokens (in this example: apples, banana, juice) generated by the analyzer.
Any help?
Based on the discussions above I came up with the following solution, which is a bit different that I imagined when I asked the question, but works for my case.
First of all I defined a new similarity:
"settings": {
"similarity": {
"boost_similarity": {
"type": "scripted",
"script": {
"source": "return 1;"
}
}
}
...
}
Then I had the following problem:
a query for "apple banana juice" had the same score for a doc with tags ["apple juice", "apple"] and another doc with tag ["banana", "apple juice"]. Although I would like to score the second one higher.
From the this other discussion I found out that this issue was caused because I had a nested field. And I created a usual text field to address it.
But I also was wanted to distinguish between a doc with tags ["apple", "banana", "juice"] and another doc with tag ["apple banana juice"] (all three words in the same tag). The final solution was therefore to keep both fields (a nested and a text field) for my tags.
Finally the query consists of bool query with two should clauses: the first should clause is performed on the text field and uses an "or" operator. The second should clause is performed on the nested field and uses and "and operator"
Despite I found a solution for this specific issue, I still face a few other problems when using ES to search for tagged documents. The examples in the documentation seem to work very well when searching for full texts. But does someone know where I can find something more specific to tagged documents?

How do boolean predicates work in Elasticsearch query string syntax

I have a question regarding the ES query string syntax. I am searching logstash log-entries containg xml documents and I'd like to search for documents containg certain XML attributes with certain values. When searching for:
id: foobar AND attrName=SomeValue
In my data set this query finds lets say 100 documents
When searching for:
id: foobar AND attrName SomeValue
I get less documents. Why is that, when according to the query_string docs the default operator is OR.
When I escape the " character and query like this I get the correct results:
id: foobar AND attrName=\"SomeValue\"
I'm running the query using the following json:
{
"sort": [
"#timestamp"
],
"query": {
"query_string": {
"query": "mySearchText"
}
},
"fields": [
"_id"
],
"size": 100
}
Any tips on how to search in XML documents containing only elements and attributes but no text nodes.
Edit #1: I just stumpbled upon another thing I don't understand. Why are these queries different:
a AND b OR c
is different than:
a AND (b OR c)
Any tips on how these queries are evaluated?
Edit #2: Okay I think I nailed down what behaviour is confusing me.
When my query string looks like this:
id: foo AND attrName=\"SomeValue\" AND field2:bar
I get all documents where:
- id=foo
- field2=bar
- contain the text attrName AND the text SomeValue
When I change my query to (added parentheses):
id: foo AND (attrName=\"SomeValue\") AND field2:bar
I get all documents where:
- id=foo
- field2=bar
- contain the text attrName OR the text SomeValue
Why is (attrName=\"SomeValue\") evaluated as attrName OR SomeValue, whereas without parentheses it is attrName AND SomeValue?

Resources