get count of items not listed in a parent-child relationship model in elasticsearch - elasticsearch

Let's say that we have employee & department types stored in an elasticsearch index. I have to get the following queries:
Count the number of employees that are assigned to any particular department (not a specific department). Note that the employee should just be assigned to some department that's it.
Count the number of employees that aren't assigned to any department yet
I am just over simplifying my question with a toy example to give more clarity on what is needed.
Any thoughts/help on this is appreciated.

Assume that your employees type has a field like this
{ "department" : "departmentXYZ" }
Then you can use aggregation to get employees assigned to each department as so
{
"aggs" :{
"employees_per_department" : {
"terms" : {
"field" : "department"
}
}
}
}
This depends on how you store non-assign value for "department". In case it's empty string then take a look at this
Find documents with empty string value on elasticsearch

Related

How to do mongodb aggregation lookup using mongotemplate that involves a localfield array?

I have 2 collections Customers and Accounts, Customers contains a accounts field that is an array of account ids. Using aggregation lookup I want to joint each customers account list entry to the corresponding account object. According to the MongoDB 5 manual this is perfectly doable and in fact I can create an aggregation pipeline in both the mongo shell and mongocompass like the following (and get the correct results):
db.customers.aggregate([{ "$lookup" : { "from" : "accounts", "localField" : "accounts", "foreignField" : "account_id", "as" : "accountInfo"}},{ "$limit": 10 }] )
In my java model for Customer I added an additonal field List accountInfo and run the following query using my mongotemplate from the Customer repo:
LookupOperation stage = Aggregation.lookup("accounts", "accounts", "account_id", "accountInfo");
Aggregation aggregation = Aggregation.newAggregation(stage);
AggregationResults<Customer> aggResults = secondaryMongoTemplate.aggregate(aggregation,"customers",Customer.class);
List<Customer> results = aggResults.getMappedResults();
When I run this, I get neither any errors OR any results. Any thoughts?

how to join indexes on elasticSearch

I have two indexes : Student
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
and University having a onetomany relationship with student
how to declare mappings for university?
While you can use the Join field type here, you should really beware of trying to use Elasticsearch as a relational database. It doesn't really support joins without a big performance tax and without some limitations (indexing parent and child into the same shard).
Usually the answer would be to de-normalize the relation. For example, in this case put some of the University fields directly in the Student document, allowing you to search on them directly.

GraphQL - How retrieve id of previous mutation, during query of multiple mutations

i would like run multiple mutations in the same query.
In the example below, i create an order and after i create a product record, concerning previously created.
I must have 2 mutations.
First, i insert an order. In output, i retrieve among others, idorder.
Then, i insert an product. This product
mutation {
createOrder(input: {
order: {
ordername: "My order"
}
}) {
order {
idorder
ordername
}
},
createProduct(input: {
product: {
quantity: 3
idrefproduct: 25 # link to refProduct
idorder: XXXX # how can i retrieve idorder from output of createOrder above ? 🤔
}
}) {
product {
idproduct
}
}
}
Real example with SQL structure :
user(iduser, othersFields);
scenarios(idscenario, iduser, name, otherFields);
cultA(idcultA, idscenario, ...); // this table need of idscenario field
cultB(idcultB, idscenario, ...); // this table need of idscenario field
cultC(idcultC, idscenario, ...); // this table need of idscenario field
how can i retrieve idorder from output of createOrder above ? 🤔
It is possible ?
If i forgot some informations, don't hesitate.
Thanks in advance.
EDIT :
With PostGraphile, plugin "postgraphile-plugin-nested-mutations" or "custom mutations" (with PL PGSQL function)
Without PostGraphile, a resolver as the example of #xadm permits this particular nested mutation.
IMHO you can search for "nested mutations" - not described here, you'll easily find examples/tutorials.
Proposed DB structure (n-to-n relation):
order{orderID,lines[{orderLineID}] } >
order_line{orderLineID, productID, anount, price} >
product {productID}
... created in nested mutations (in reverse order product>order_line>order)
Product don't need orderID, but when you ask for it [in product resolver]
query product(id) {
id
orderedRecently {
orderID
date
price
}
}
... you can simply get it (or rather many - array) from orderLines and orders tables [using simple SQL query - where price will be read from orderLines]
orderedRecently resolver can get product id from parent object (usually 1st param)
Of course you can (and should) return data as order and orderLine types (to be cached separately, normalized):
query product($id: ID!) {
product(id: $id) {
id
orderedRecently {
id
date
orderLine {
id
amount
price
}
}
}
}
where type orderedRecently: [Order!] - array can be empty, not eordered yet
update
I slightly misunderstood your requirements (naming convention) ... you already have proper db structure. Mutation can be 'feeded' with complex data/input:
mutation {
createOrder(input: {
order: {
ordername: "My order"
products: [
{
quantity: 3
idrefproduct: 25
},
{
quantity: 5
idrefproduct: 28
}
]
}
}) {
order {
id
ordername
products {
id
idrefproduct
quantity
}
}
}
}
Your product is my orderLine, idrefproduct is product.
createOrder creates/inserts order and then use its id for creation of product records (order.id, idrefproduct and quantity). Resolver can return only order id or structured data (as above).

Best practice for schema naming of entity/collection

I am building a Graphql Schema and I was wandering what is the best practice of returning single vs collection items of a type. Let's say we want to retrieve users,
One option (if possible somehow) would be to have a query like this where the ID is optional, if ID is passed we return a single item, if not a collection of all users
query {
user (id: 1234) {
name
}
}
// return a single [User]
query {
user (id: null) {
name
}
}
// return a collection [User,User,User,...]
Another option would be to have user and users
query {
user (id: 1234) {
name
}
}
// return a single User
query {
users {
name
}
}
// return a collection [User,User,User,...]
I was wondering what is the best practice, or if you can pin-point me some resources related to that to read.
I am using the singular and plurals nouns to name the query field that return a single object and a list of object respectively. I think this naming style is very natural to most of the developers.
So to return a single user, it is :
type Query {
user(id:Int!) : User
}
It always return a single user. Just make the id input parameter as mandatory such that it cannot accept NULL.
And to return a list of user , normally it is:
type Query {
users : [User]
}
But in case it can have many users , most probably you need to consider something like pagination that allows developers to get the user page by page. For the offset -based pagination , I am doing something like below :
type Query {
users(offset:Int limit:Int) : UserPage
}
type UserPage {
data : [User]
pageInfo : PageInfo
}
type PageInfo {
# When paginating forwards, are there more items?
hasNextPage : Boolean!
# When paginating backwards, are there more items?
hasPreviousPage: Boolean!
# Total number of records in all page
total : Long
}
Depending on the requirements , you can consider to add an orderBy or a filter input parameter to the users query field to provide more options to the developers to get the result set that they are interested.
If you want to return the user list in the cursor-based pagination style, you can take a look on Relay Specification.

Not able to understand this Elasticsearch query

{
"query": {
"nested": {
"path": "product_vendors",
"query": {
"bool" :{
"must" : {
"bool" : {
"should" : [
{ "terms": {"product_vendors.manufacturer_style":["FSS235D-26","SG463-1128-5","SG463-2879-4"]}},
{ "terms": {"product_vendors.id":["71320"]}}
]
}
}
}
}
}
}
}
I have above elastic query, not able to understand this. Would anyone please explain what it means and what documents it will return?
Update : #christinabo , i tried your query , and results returned , but here some small issues , apart from the matched documents , two more additional documents are returning in those documents only vendor_id is matching , may i know why two extra unmatched documents are returning , do we need to some attribute or something to make sure strict search and return is allowed , can please suggest on this .
By observing the query, I can understand that there is a nested object in the data. I can imagine that it has this structure:
product_vendors: {
'id': 'the_id',
'manufacturer_style': 'some style'
}
In order to query a nested object, you need a nested query. This is why you have the nested keyword there. In a nested query, you need to specify the path (product_vendors) that leads to the embedded fields (id, manufacturer_style).
Then, the query defines a bool query with the must keyword, which means that the query which follows must appear in matching documents. In this case, what it must appear is another bool query, defined with the should keyword. This contains two terms sub-queries (one for manufacturer_style and one for id) and means that the matching documents should match one or two of them. Each sub-query queries the embedded field by specifying the whole route of the nested object, using the dot (i.e. product_vendors.manufacturer_style).
I would expect the query to return you the documents that match at least one of the terms queries, with the documents that match both to have higher score.
I hope that this explanation gives you an overall idea of this query.
More about bool queries from the documentation here.

Resources