Elasticsearch: Is it possible to query for a term facet that contains more than a term - elasticsearch

Part of my mapping looks like this:
{
...
INFO_NODO: {
properties: {
CODIGO: {
type: string
}
ESTADO: {
type: string
}
IN_HOME: {
type: string
}
TEXTO: {
type: string
}
ID_NODO: {
type: integer
}
...
}
}
}
I need to make a facet that will return the fields: ID_NODO, TEXTO, IN_HOME, ESTADO, CODIGO, and COUNT to parse it and feed it to my application. The key is that all these fields except COUNT are dependant on the ID_NODO, that is, if the field INFO_NODO is the same the rest of the information is the same... with that being said ideally I would like to make my facet dependent on the whole INFO_NODO field and not its sub-fields.
I found several solutions but I keep either failing to implement them properly or they are just not working. Any thoughts on my weird situation?
EDIT: What I'd need to do is:
{
"facets": {
"FACET_X_NODO": {
"terms": {
"field": "INFO_NODO"
}
}
}
}
I just can't get the syntax in no documentation since INFO_NODO is a subdocument and not a field.

If I understood you correctly, you should be able to do something like this:
{
"query" : {
"match_all" : { }
},
"facets" : {
"info_node_facet" : {
"terms" : {
"script_field" : "_source.INFO_NODO.CODIGO + _source.INFO_NODO.ESTADO",
"size" : 10
}
}
}
}

Related

Spring Boot Mongo update nested array of documents

I'm trying to set an attribute of a document inside an array to uppercase.
This is a document example
{
"_id": ObjectId("5e786a078bc3b3333627341e"),
"test": [
{
"itemName": "alpha305102992",
"itemNumber": ""
},
{
"itemName": "beta305102630",
"itemNumber": "P5000"
},
{
"itemName": "gamma305102633 ",
"itemNumber": ""
}]
}
I already tried a lot of thing.
private void NameElementsToUpper() {
AggregationUpdate update = AggregationUpdate.update();
//This one does not work
update.set("test.itemName").toValue(StringOperators.valueOf(test.itemName).toUpper());
//This one also
update.set(SetOperation.set("test.$[].itemName").withValueOfExpression("test.#this.itemName"));
//And every variant in between these two.
// ...
Query query = new Query();
UpdateResult result = mongoTemplate.updateMulti(query, update, aClass.class);
log.info("updated {} records", result.getModifiedCount());
}
I see that Fields class in spring data is hooking into the "$" char and behaving special if you mention it. Do not seem to find the correct documentation.
EDIT: Following update seems to work but I do not seem to get it translated into spring-batch-mongo code
db.collection.update({},
[
{
$set: {
"test": {
$map: {
input: "$test",
in: {
$mergeObjects: [
"$$this",
{
itemName: {
$toUpper: "$$this.itemName"
}
}
]
}
}
}
}
}
])
Any solutions?
Thanks!
For now I'm using which does what i need. But a spring data way would be cleaner.
mongoTemplate.getDb().getCollection(mongoTemplate.getCollectionName(Application.class)).updateMany(
new BasicDBObject(),
Collections.singletonList(BasicDBObject.parse("""
{
$set: {
"test": {
$map: {
input: "$test",
in: {
$mergeObjects: [
"$$this",
{
itemName: { $toUpper: "$$this.itemName" }
}
]
}
}
}
}
}
"""))
);

Translate ElasticSearch query to Nest c#

I need some help in creating an AggregationDictionary from the following elasticsearch query
GET organisations/_search
{
"size": 0,
"aggs": {
"by_country": {
"nested": {
"path": "country"
},
"aggs": {
"by_country2": {
"filter": {
"bool": {
"must": [
{
"term": {
"country.isDisplayed": "true"
}
}
]
}
},
"aggs": {
"by_country3": {
"terms": {
"field": "country.displayName.keyword",
"size": 9999
}
}
}
}
}
}
}
}
I managed to write this horrible piece of code which I am pretty sure it is wrong, I am totally new to this.
AggregationDictionary aggs = new AggregationDictionary()
{
{
"countries_step1",
new NestedAggregation("countries_step1")
{
Path = "country",
Aggregations = new AggregationDictionary()
{
{
"countries_step2",
new FilterAggregation("countries_step2")
{
Filter = new BoolQuery
{
Must = new QueryContainer[] {
new NestedQuery
{
Query = new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
}
},
Aggregations = new AggregationDictionary
{
{
"countries_step3",
new TermsAggregation("countries_step3")
{
Field = "country.displayName.keyword",
Size = 9999
}
}
}
}
}
}
}
}
};
Can someone tell me if I am in the correct direction? I am using Nest 6.6.0. Is there any tool that helps with these translations?
What you have so far is pretty solid, but when you try to execute this aggregation with the following call
var searchAsync = await client.SearchAsync<Document>(s => s.Size(0).Aggregations(aggs));
you will get this error
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "query malformed, empty clause found at [14:22]"
}
],
"type" : "illegal_argument_exception",
"reason" : "query malformed, empty clause found at [14:22]"
},
"status" : 400
}
Checking request which was sent to elasticsearch give us the answer why it happened
{
"aggs": {
"countries_step1": {
"aggs": {
"countries_step2": {
"aggs": {
"countries_step3": {
"terms": {
"field": "country.displayName.keyword",
"size": 9999
}
}
},
"filter": {}
}
},
"nested": {
"path": "country"
}
}
},
"size": 0
}
filter clause is empty, this is because you tried to used nested query but you didn't pass path parameter. We don't need nested query here (as shown in your example query), we can simplify the whole query to
var aggs = new AggregationDictionary()
{
{
"countries_step1",
new NestedAggregation("countries_step1")
{
Path = "country",
Aggregations = new AggregationDictionary()
{
{
"countries_step2",
new FilterAggregation("countries_step2")
{
Filter = new BoolQuery
{
Must = new QueryContainer[]
{
new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
},
Aggregations = new AggregationDictionary
{
{
"countries_step3",
new TermsAggregation("countries_step3")
{
Field = "country.displayName.keyword",
Size = 9999
}
}
}
}
}
}
}
}
};
Now we have a valid request sent to elasticsearch.
There are a couple of things we can improve here:
1. Remove unnecessary bool query
Filter = new BoolQuery
{
Must = new QueryContainer[]
{
new TermQuery
{
Field = "country.isDisplayed",
Value = true
}
}
},
to
Filter =
new TermQuery
{
Field = "country.isDisplayed",
Value = true
},
2. Replace string field names
Usually, when doing calls from .Net there is some kind of POCO type which is helping us with writing strongly-typed requests to elasticsearch which helps us managing clean code and refactoring. With this, we can change field definition from
"country.displayName.keyword"
to
Infer.Field<Document>(f => f.Country.FirstOrDefault().DisplayName.Suffix("keyword"))
my types definition
public class Document
{
public int Id { get; set; }
[Nested]
public List<Country> Country { get; set; }
}
public class Country
{
public bool IsDisplayed { get; set; }
public string DisplayName { get; set; }
}
3. Consider using a fluent syntax
With NEST you can write queries in two ways: using object initializer syntax (which you did) or with help of fluent syntax. Have a look. Trying to write above query with the fluent syntax you will get something like
var searchResponse = await client.SearchAsync<Document>(s => s
.Size(0)
.Aggregations(a => a.Nested("by_country", n => n
.Path(p => p.Country)
.Aggregations(aa => aa
.Filter("by_country2", f => f
.Filter(q => q
.Term(t => t
.Field(field => field.Country.FirstOrDefault().IsDisplayed)
.Value(true)))
.Aggregations(aaa => aaa
.Terms("by_country3", t => t
.Field(field => field.Country.FirstOrDefault().DisplayName.Suffix("keyword"))
.Size(9999)
)))))));
which I find a little bit easier to follow and write, maybe it will be better for you as well.
As a final note, have a look into docs and check how you can debug your queries.
Hope that helps.

GraphQL filters in GatsbyJS

I'm having trouble understanding how to write filters for GraphQL queries in GatsbyJS.
This works:
filter: { contentType: { in: ["post", "page"] }
I basically need the reverse of that, like:
filter: { "post" in: { contentTypes } } // where contentTypes is array
That doesn't work because "NAME is expected" (where "post" is in my example).
After going through GatsbyJS docs I found this:
elemMatch: short for element match, this indicates that the field you are filtering will return an array of elements, on which you can apply a filter using the previous operators
filter:{
packageJson:{
dependencies:{
elemMatch:{
name:{
eq:"chokidar"
}
}
}
}
}
Great! That's what I need! So I try that, and I get:
error GraphQL Error Field "elemMatch" is not defined by type markdownRemarkConnectionFrontmatterTagsQueryList_2.
Keywords defined in markdownRemarkConnectionFrontmatterTagsQueryList_2 are:
eq: string | null;
ne: string | null;
regex: string | null;
glob: string | null;
in: Array | null;
Why am I limited to these keywords when more keywords such as elemMatch are mentioned in docs? Why am I not allowed to use the filter structure "element in: { array }"?
How can I create this filter?
Filter by value in an array
Let's say you have a markdown blog with categories as an array of string, you can filter posts with "historical" in categories like this:
{
allMarkdownRemark(filter:{
frontmatter:{
categories: {
in: ["historical"]
}
}
}) {
edges {
node {
id
frontmatter {
categories
}
}
}
}
}
You can try this query out in any of the graphiq blocks in Gatsby.js docs.
ElemMatch
I think elemMatch is only 'turned on' for fields with array of objects; something like comments: [{ id: "1", content: "" }, { id: "2", content: ""}]. This way, you can apply further filters on the fields of each comment:
comments: { elemMatch: { id: { eq: "1" } } }
Here's an example you can try out in the graphiq blocks in gatsby docs:
// only show plugins which have "#babel/runtime" as a dependency
{
allSitePlugin (filter:{
packageJson:{
dependencies: {
elemMatch: {
name: {
eq: "#babel/runtime"
}
}
}
}
}) {
edges {
node {
name
version
packageJson {
dependencies {
name
}
}
}
}
}
}

Elasticsearch custom sorting / adding filter clauses scores

I have this simple documents set:
{
id : 1,
book_ids : [2,3],
collection_ids : ['a','b']
},
{
id : 2,
book_ids : [1,2]
}
If I run this filter query, it will match both documents:
{
bool: {
filter: [
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'book_ids'
}
}
}
},
{
bool: {
filter: {
term: {
book_ids: 2
}
}
}
}
]
}
},
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'collection_ids'
}
}
}
},
{
bool: {
filter: {
term: {
collection_ids: 'a'
}
}
}
}
]
}
}
]
}
}
The thing is I want to sort these documents, and I would like the first one (id: 1) to be returned first because it matched both the book_ids value and the collection_ids values provided.
A simple sort clause like this one is not working:
[
'book_ids',
'collection_ids'
]
because it will return first document 2 due to the book_ids array first value.
Edit: this is a simplified example of the problem I am facing, which has N such clauses in the should clause. Moreover there is an order between the clauses, as I tried to reflect with the sort snippet: results matching the first clause (book_ids) should appear before results matching the second clause (collection_ids). I am really looking for some kind of SQL sort operation where I would only take into account the matching value of the field array. A viable option might be to assign decreasing constant_scores to each term clause, according to the expected sort order, and ES would have to sum this sub-scores to compute the final score. But I cannot figure out how to do it or if it is even possible.
Bonus question:
is there any way for ElasticSearch to return some kind of new document with only the matching values? Here is what I would expect as a response to the above filter query:
{
id : 1,
book_ids : [2],
collection_ids : ['a']
},
{
id : 2,
book_ids : [2]
}
I think you're right about the constant score idea. I think you can do it like this:
{
query: {
bool: {
must: [
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'book_ids'
}
}
}
},
{
constant_score: {
filter: {
term: {
book_ids: 2
}
},
boost: 100
}
}
]
}
},
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'collection_ids'
}
}
}
},
{
constant_score: {
filter: {
term: {
collection_ids: 'a'
}
},
boost: 50
}
}
]
}
}
]
}
}
}
I think the only thing you were missing using constant score, was likely just that the top level query needs to be must, not filter. (There's no scoring for filters, all the scores are 0.)
An alternative would be to put the filter inside a function_score query (but leave it as a filter), and then compute the score as you want (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html)
As to the bonus question, it's possible if you use a script field to filter and add a new field like you want (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html), but it's not possible in a straightforward way. It's probably easier and makes more sense to do that filtering after you receive the result, unless you have very long lists in your values.

Implementing Fuzzines in Autocomplete in ElasticSearch

I have implemented elasticsearch autocomplete. This is the current query that I use (node.js - elasticsearcj.js):
body: {
query: {
match_phrase_prefix: {
schoolname: {
query: clientSearchterm,
slop: 10,
max_expansions: 50,
fuzzy : {
fuzziness : 2
}
}
}
}
}
It works just fine. How do I implement Fuzziness parameter?
Simple. Just add this:
"fuzzy" : {
"fuzziness" : 2
}

Resources