How to join 2 arrays in a performant way using jSONata? - jsonata

I would like to join 2 arrays having about 500 elements in a performant way using JSONata.
I have found a way to join 2 arrays but it is not very performant.
See https://try.jsonata.org/VqzeZDAjA (The same input/output and query is copied here below).
Input :
{
"_msgid": "a070e32c.e71ed",
"topic": "",
"rc": {
"code": 0
},
"table1": {
"array1": [
{
"country_region": "Thailand",
"field_A": "A for Thailand"
},
{
"country_region": "Japan",
"field_A": "A for Japan"
}
]
},
"array2": [
{
"country_region": "Thailand",
"field_B": "B for Thailand"
},
{
"country_region": "Japan",
"field_B": "B for Japan"
}
]
}
Expected output:
{
"array1": [
{
"country_region": "Thailand",
"field_A": "A for Thailand",
"field_B": "B for Thailand"
},
{
"country_region": "Japan",
"field_A": "A for Japan",
"field_B": "B for Japan"
}
]
}
Working query but not very performant in case of arrays with 500 elements.
(
$array2 := array2;
table1 ~> | array1 | { "field_B" :
($country_region := country_region;
$array2[$country_region=$.country_region])
.field_B}|
)
Update 2020-03-29
Here above it is claimed that the Working query (see above) is not very performant. Further analysis revealed that this not true: the actual performance of the above query is fine (similar to the performance of the approved query). The performance issue I encountered was caused by another JSONata query that has nothing to do with respect to this join !

Not sure if it's any more performant, but I'd use the join syntax to do this:
{
"array1": array2#$A2.table1.array1#$A1[$A1.country_region = $A2.country_region].$merge([$A1, $A2])
}

Related

NiFi Jolt Specification for array input

I have the following input in Nifi Jolt Specification processor:
[
{
"values": [
{
"id": "paramA",
"value": 1
}
]
},
{
"values": [
{
"id": "paramB",
"value": 3
}
]
}
]
Expected output:
[
{
"id": "paramA",
"value": 1
},
{
"id": "paramB",
"value": 2
}
]
Can you explain how I have to do?
thanks in advance
You want to reach the objects of the values array which are nested within seperate object signs ({}). A "*" notation is needed in order to cross them over per each individual values array, and then use another "*" notation for indexes of those arrays while choosing "" as the counterpart values in order to grab nothing but the sub-objects such as
[
{
"operation": "shift",
"spec": {
"*": {
"values": {
"*": ""
}
}
}
}
]

How to cleanly batch queries together in Gremlin

I am writing a GraphQL resolver that retrieves all vertices by a particular edge using the following query (created returns label person):
software {
created {
name
}
}
Which would resolve to the following Gremlin Query for each software node found:
g.V().hasLabel('software').has('name', 'ripple').in('created')
This returns a result that includes all properties of the object:
{
"result": [
{
"#type": "d",
"#rid": "#24:0",
"#version": 6,
"#class": "person",
"in_knows": [
"#35:0"
],
"name": "josh",
"out_created": [
"#32:0",
"#33:0"
],
"age": 32,
"#fieldTypes": "in_knows=g,out_created=g"
}
],
"dbStats": {
...
}
}
I realize that this will fall foul on GraphQL's N+1 query so i'm trying to batch queries together using a Dataloader pattern. (i'm also hoping to do property selections, so i'm not asking the database to return too much info)
So i'm trying to craft a query like so:
g.V().union(
__.hasLabel('software').has('name', 'ripple').
project('parent', 'child').by('id').
by(__.in('created').fold()),
__.hasLabel('software').has('name', 'lop').
project('parent', 'child').by('id').
by(__.in('created').fold())
)
But this results in the following where the props are missing and it just includes the id of the vertices I want:
{
"result": [
{
"parent": "ripple",
"child": [
"#24:0"
]
},
{
"parent": "lop",
"child": [
"#22:0",
"#23:0",
"#24:0"
]
}
],
"dbStats": {
...
}
}
My Question is, how can I have the Gremlin query return all of the props for the found vertices and none of the other props? Should I even been doing batching this way?
For anyone else reading, the query I was trying to write wouldn't work because the TraversalSet created in the .by(_.in('created') can't be cast from a List to an ElementMap as the stream cardinality wouldn't be enforced. (You can only have one record per row, I think?)
My working query would be to duplicate the keys for each row and specify the props needed (the query below is ok for gremlin 3.3 as used in ODB, otherwise if you've got < gremlin 3.4 replace the last by step with be(elementMap('name', 'age')):
g.V().union(
__.hasLabel('software').has('name', 'ripple').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value())),
__.hasLabel('software').has('name', 'lop').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value()))
)
So that you get a result like this:
{"data": [
{
"parent": "ripple",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
},
{
"parent": "lop",
"child": {
"id": 5709,
"name": "peter",
"age": 35
}
},
{
"parent": "lop",
"child": {
"id": 5713,
"name": "marko",
"age": 29
}
},
{
"parent": "lop",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
}
]
}
Which would allow you to create a lookup where you concat all results for "lop" and "ripple" into arrays.

Incorrectly selected data in the query

Only articles that contain the EmailMarketing tag are needed.
I'm probably doing the wrong search on the tag, since it's an array of values, not a single object, but I don't know how to do it right, I'm just learning graphql. Any help would be appreciated
query:
query {
enArticles {
title
previewText
tags(where: {name: "EmailMarketing"}){
name
}
}
}
result:
{
"data": {
"enArticles": [
{
"title": "title1",
"previewText": "previewText1",
"tags": [
{
"name": "EmailMarketing"
},
{
"name": "Personalization"
},
{
"name": "Advertising_campaign"
}
]
},
{
"title": "title2",
"previewText": "previewText2",
"tags": [
{
"name": "Marketing_strategy"
},
{
"name": "Marketing"
},
{
"name": "Marketing_campaign"
}
]
},
{
"title": "article 12",
"previewText": "article12",
"tags": []
}
]
}
}
I believe you first need to have coded an equality operator within your GraphQL schema. There's a good explanation of that here.
Once you add an equality operator - say, for example _eq - you can use it something like this:
query {
enArticles {
title
previewText
tags(where: {name: {_eq: "EmailMarketing"}}){
name
}
}
}
Specifically, you would need to create a filter and resolver.
The example here may help.

ReferenceManyFields (One to Many Relationship)

I am working on a project where I have to create one to many relationships which will get all the list of records referenced by id in another table and I have to display all the selected data in the multi-select field (selectArrayInput). Please help me out in this, if you help with an example that would be great.
Thanks in advance.
Example:
district
id name
1 A
2 B
3 C
block
id district_id name
1 1 ABC
2 1 XYZ
3 2 DEF
I am using https://github.com/Steams/ra-data-hasura-graphql hasura-graphql dataprovider for my application.
You're likely looking for "nested object queries" (see: https://hasura.io/docs/1.0/graphql/manual/queries/nested-object-queries.html#nested-object-queries)
An example...
query MyQuery {
district(where: {id: {_eq: 1}}) {
id
name
blocks {
id
name
}
}
}
result:
{
"data": {
"district": [
{
"id": 1,
"name": "A",
"blocks": [
{
"id": 1,
"name": "ABC"
},
{
"id": 2,
"name": "XYZ"
}
]
}
]
}
}
Or...
query MyQuery2 {
block(where: {district: {name: {_eq: "A"}}}) {
id
name
district {
id
name
}
}
}
result:
{
"data": {
"block": [
{
"id": 1,
"name": "ABC",
"district": {
"id": 1,
"name": "A"
}
},
{
"id": 2,
"name": "XYZ",
"district": {
"id": 1,
"name": "A"
}
}
]
}
}
Setting up the tables this way...
blocks:
districts:
Aside: I recommend using plural table names as they are more standard, "districts" and "blocks"

mongo slow query

My mongodb is currently loaded with 105,000 documents, and I still have to insert 500,000 more, and it is taking more than 4hours just to insert 1000 documents, due to querying for references:
Insert DocA, and DocA have many citations (about 30)
Find documents in the database which are cited by DocA. [ie: findBy-Doi-Or-Pmid-Or-Pmc(...)]
-so for each of the query for DocA's citation, it is taking about 400ms to complete.
following is one of the profile:
Query { $or [ {$or [ {doi: ""}, {pmid: "10508155"} ] }, {pmc: "" } ]}
{
"ts": ISODate("2012-12-22T11: 55: 39.796Z"),
"op": "query",
"ns": "fyparticles.mArticle",
"query": {
"$or": {
"0": {
"$or": {
"0": {
"doi": ""
},
"1": {
"pmid": "10508155"
}
}
},
"1": {
"pmc": ""
}
}
},
"ntoreturn": NumberInt(1),
"nscanned": NumberInt(105707),
"responseLength": NumberInt(20),
"millis": NumberInt(477),
"client": "192.168.0.15",
"user": ""
}
And the index I have created:
{
"v": NumberInt(1),
"key": {
"doi": NumberInt(1),
"pmid": NumberInt(1),
"pmc": NumberInt(1)
},
"ns": "fyparticles.system.indexes",
"background": NumberInt(1),
"name": "params"
}
Please help me out here! Am I missing something or doing something wrong?
First off you are using an $or which in itself is not the fastest operator in the world due to its need to run multiple queries and then merge duplicates to return a result.
Second you are using an $or with one index. Since an $or is basically one or more queries you may need one or more indexes to cover the unique fields you have in each clause.
Third you are using nested $ors it is good to note that nested $ors do not use indexes: https://jira.mongodb.org/browse/SERVER-3327
So already you have like 3 or more performance problems with your query.
first off, take out that nested $or:
{ $or: [ {doi: ""}, {pmid: "10508155"}, {pmc: ""} ] }
And then you will probably need to create three indexes on this (you might be able to get one to fit all I haven't tested):
db.col.ensureIndex({ doi: 1 });
db.col.ensureIndex({ pmdi: 1 });
db.col.ensureIndex({ pmc: 1 });
That should be the first place to start to make your query faster.

Resources