How to do mongodb aggregation lookup using mongotemplate that involves a localfield array? - spring

I have 2 collections Customers and Accounts, Customers contains a accounts field that is an array of account ids. Using aggregation lookup I want to joint each customers account list entry to the corresponding account object. According to the MongoDB 5 manual this is perfectly doable and in fact I can create an aggregation pipeline in both the mongo shell and mongocompass like the following (and get the correct results):
db.customers.aggregate([{ "$lookup" : { "from" : "accounts", "localField" : "accounts", "foreignField" : "account_id", "as" : "accountInfo"}},{ "$limit": 10 }] )
In my java model for Customer I added an additonal field List accountInfo and run the following query using my mongotemplate from the Customer repo:
LookupOperation stage = Aggregation.lookup("accounts", "accounts", "account_id", "accountInfo");
Aggregation aggregation = Aggregation.newAggregation(stage);
AggregationResults<Customer> aggResults = secondaryMongoTemplate.aggregate(aggregation,"customers",Customer.class);
List<Customer> results = aggResults.getMappedResults();
When I run this, I get neither any errors OR any results. Any thoughts?

Related

How to set OpType on IndexQuery in Spring Data Elasticsearch

Assume spring-boot-starter-data-elasticsearch version 2.1.0.RC1.
Take the following, simple implementation for indexing an entity:
IndexQuery indexQuery = new IndexQueryBuilder().withId(entity.getId()).withObject(entity).build();
String id = elasticsearchTemplate.index(indexQuery);
How do I set the OpType.CREATE on this operation, so that I can assure only documents get indexed which don't already exist?
The equivalent REST API request would look like the following:
POST /{index}/{entity id}?op_type=create
{
"id" : "{entity id}",
"attribute" : "value"
}
This is not supported at the moment by Spring Data ES.
There's a open issue that reports exactly that feature, you might want to check it out: https://jira.spring.io/browse/DATAES-247

Logstash -> Elasticsearch - update denormalized data

Use case explanation
We have a relational database with data about our day-to-day operations. The goal is to allow users to search the important data with a full-text search engine. The data is normalized and thus not in the best form to make full-text queries, so the idea was to denormalize a subset of the data and copy it in real-time to Elasticsearch, which allows us to create a fast and accurate search application.
We already have a system in place that enables Event Sourcing of our database operations (inserts, updates, deletes). The events only contains the changed columns and primary keys (on an update we don't get the whole row). Logstash already gets notified for each event so this part is already handled.
Actual problem
Now we are getting to our problem. Since the plan is to denormalize our data we will have to make sure updates on parent objects are propagated to the denormalized child objects in Elasticsearch. How can we configure logstash to do this?
Example
Lets say we maintain a list of Employees in Elasticsearch. Each Employee is assigned to a Company. Since the data is denormalized (for the purpose of faster search), each Employee also carries the name and address of the Company. An update changes the name of a Company - how can we configure logstash to update the company name in all Employees, assigned to the Company?
Additional explanation
#Darth_Vader:
The problem we are facing is, that we get an event that a Company has changed, but we want to modify documents of type Employee in Elasticsearch, because they carry the data about the company in itself. Your answer expects that we will get an event for every Employee, which is not the case.
Maybe this will make it clearer. We have 3 employees in Elasticsearch:
{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}
Then an update happens in the source DB.
UPDATE company SET name = 'Company NEW' WHERE cmp_id = 1;
We get an event in logstash, where it says something like this:
{type:'company',cmp_id:'1',old.name:'Company A',new.name:'Company NEW'}
This should then be propagated to Elasticsearch, so that the resulting employees are:
{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}
Notice that the field company.name changed.
I suggest a similar solution to what I've posted here, i.e. to use the http output plugin in order to issue an update by query call to the Employee index. The query would need to look like this:
POST employees/_update_by_query
{
"script": {
"source": "ctx._source.company.name = params.name",
"lang": "painless",
"params": {
"name": "Company NEW"
}
},
"query": {
"term": {
"company.cmp_id": "1"
}
}
}
So your Logstash config should look like this:
input {
...
}
filter {
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.company.name = params.name"
"[script][params][name]" => "%{new.name}"
"[query][term][company.cmp_id]" => "%{cmp_id}"
}
remove_field => ["host", "#version", "#timestamp", "type", "cmp_id", "old.name", "new.name"]
}
}
output {
http {
url => "http://localhost:9200/employees/_update_by_query"
http_method => "post"
format => "json"
}
}

Elasticsearch: document relationship

I'm doing a elastic search autocomplete-as-you-type
I'm using cool features like ngram's and other stuff to create needed analyzer.
currently I break my had around indexing following data.
Let say I have Payments type,
each document in this type looks like this
{
..elastic meta data..
paymentId: 123453425342,
providerAccount : {
id: 123456
firstName: Alex,
lastName: Web
},
consumerAccount : {
id: 7575757,
firstName: John,
lastName: Doe
},
amount: 556,
date : 342523454235345 (some unix timestamp)
}
so basically this document represents not only the payment itself but it also shows the relationship of the payment, the 2 entities which related to the payment.
Payment always have its provider and consumer.
I need this data in payment document because I want to show it in UI.
By indexing it like so, it might be a big pain for handling the updates of Consumer or Provider because each time some of them change its properties I have to update all the payments which has this entity.
Another possible solution is to store only id's of this consumers/providers and make a query on payments and then 2 queries for the entities for retrieving needed fields, but i'm not sure about this because i'm doing ajax requests each time a character entered, so here comes the performance question.
I have also looked into parent/child relationship solution which basically fits my case but I wasn't able to figure out if I can retrieve also the parent(consumer/provider) fields while I querying child(payment).
What would you suggest?
Thanks!
Yes, you can retrieve the parent while querying child using has_child.
Considering payment as child and consumer as parent, You can search all the consumers by :
GET /index_name/consumer/_search
{
"query": {
"has_child": {
"type": "payment",
"query": {
// any query on payment table
},
"inner_hits": {}
}
}
}
This would fetch you all the consumer based on the query on child i.e payment in your case.
inner_hits is what you are looking for. This will retrieve you the children as well. But it was introduced in elasticsearch 1.5.0. So version should be greater than elasticsearch 1.5.0.
You can refer https://www.elastic.co/blog/elasticsearch-1-5-0-released.
Your problem is not an issue. I suppose you want tot freeze data after the pay, right? So you don't need to update the accounts data in existing payment documents.
Further: parent/schild is easy for updating, but less efficient with querying. For auto complete, stay using your current mapping!

get count of items not listed in a parent-child relationship model in elasticsearch

Let's say that we have employee & department types stored in an elasticsearch index. I have to get the following queries:
Count the number of employees that are assigned to any particular department (not a specific department). Note that the employee should just be assigned to some department that's it.
Count the number of employees that aren't assigned to any department yet
I am just over simplifying my question with a toy example to give more clarity on what is needed.
Any thoughts/help on this is appreciated.
Assume that your employees type has a field like this
{ "department" : "departmentXYZ" }
Then you can use aggregation to get employees assigned to each department as so
{
"aggs" :{
"employees_per_department" : {
"terms" : {
"field" : "department"
}
}
}
}
This depends on how you store non-assign value for "department". In case it's empty string then take a look at this
Find documents with empty string value on elasticsearch

How does elasticsearch facet feature work with async search query?

I am aware of how using the facet feature of elasticsearch, we can get the aggregated value of values for a specified field/s based on search query result data.
I have an application where I am monitoring logs and using elasticsearch to search through the log entries. On UI front I have a paging mechanism in place and hence using async feature of the search to fetch 'n' entries at a time.
So my question is, if I modify my async search query to fetch the facet information for certain fields, will it give the aggregated value for the sub-set of result that is fetched as a result of an async query. or will it get the aggregated value for the entire search result (and not the sub-set which is returned to user).
Many thanks and regards,
Komal
Facets are returned for the entire search result. You can even set size to 0 in your request, which will result in not fetching any results and you will still get all facets.
Please refer here for detail documentation. You can give match all query to fetch facet on all documents {
"query" : {
"match_all" : { }
},
"facets" : {
"tag" : {
"terms" : {
"field" : "tag",
"size" : 10
}
}
}
}
Please post your code gist for more information.

Resources