Add Edge Property ithrough Orientdb ETL - etl

I have 2 csv files.
Person.csv
ID,PetID,Jumps
1,101,Yes
2,102,No
3,103,Yes
Pet.csv
ID,Name
101,Dog
102,Cat
103,Rabbit
I am writing ETL to populate my graph with these two entities.
I want to add an edge between Person and Pet as HAS_PET. And i also want this edge to have property called Jumps. How can i achieve this ?
I tried as follows,
{
"source":{
"file":{
"path":"C:/Users/60886/Project/person.csv"
}
},
"extractor":{
"row":{
}
},
"transformers":[
{
"csv":{
}
},
{
"vertex":{
"class":"Person"
}
},
{
"edge":{
"class":"HAS_PET",
"joinFieldName":"PETID",
"lookup":"PET.ID",
"direction":"out",
"unresolvedLinkAction":"NOTHING"
}
}
],
"loader":{
"orientdb":{
"dbURL":"remote:localhost/GratefulDeadConcerts",
"dbType":"graph",
"wal":false,
"tx":false,
"batchCommit":1000
}
}
}

In edge transformer use edgeFields to bind properties in edges. Example:
"edge":{
"class":"HAS_PET",
"joinFieldName":"PETID",
"lookup":"PET.ID",
"direction":"out",
"edgeFields": { "Jumps": "${input.Jumps}" },
"unresolvedLinkAction":"NOTHING"
}
Remember to remove "Jumps" from vertex, after the edge transformer, with:
"field": { "fieldName": "Jump", "operation": "remove" },

Related

What's the difference between top-level edges and top-level nodes?

I'm new to GraphQL and I'm using WPGraphQL and WooGraphQL.
Their top level connections allow me to expand both nodes and edges like so:
{
wpgraphql {
productCategories {
# What's the difference between these?
# 1) top-level nodes
nodes {
id
name
}
# 2) top-level edges
edges {
node {
id
name
}
}
}
}
}
Which returns a response like so (IDs omitted):
{
"data": {
"wpgraphql": {
"productCategories": {
"nodes": [
{
"name": "Accessories"
},
{
"name": "Gift Cards"
},
{
"name": "Indoor"
}
],
"edges": [
{
"node": {
"name": "Accessories"
}
},
{
"node": {
"name": "Gift Cards"
}
},
{
"node": {
"name": "Indoor"
}
}
]
}
}
}
}
My question is simply: Which one do I use? Why are there both?
Here is a screenshot of the GraphiQL explorer if that helps.
GraphQL schemas that implement the Relay specification utilize Connection types to model one-to-many or many-to-many relationships.
Each connection includes a list of edges and a PageInfo object. Each edge includes a node and the cursor for that node.
Edges may also contain additional fields -- for example, if we have a friends connection between User nodes, we might include the timestamps when the friendships were created. Normally, though, edges are only used for the cursor field they expose. The cursor value is used when paginating through the connection and exposing it for every edge means you can start your pagination from any arbitrary point in the results. The cursor is not included as part of the node because it's may be specific to the connection and not just the node itself (for example, some cursors encode sort criteria).
However, if as a client you don't need to paginate the results of a connection and just want to fetch all the nodes, you probably don't care about cursors. In these scenarios, having edges doesn't add any value and just increases the depth of your query. As a result, as a convenience to the client, some GraphQL services have opted to exposing just the nodes for the connection in addition to the edges.

I have two Json payload. I want to merge them in a single Json object

I have two payloads and want to merge them into single JSON object (streaming join). At few places people are suggesting to use AttributesToJSON, but as one of the JSON does not have fix set of attributes I guess that would not be possible.
First payload is
{
"title":"API-Actions Documentation",
"title_link":"https://api.slack.com/",
"author_name":"name",
"author_link":"http://flickr.com/bobby/",
"author_icon":"http://flickr.com/icons/bobby.jpg",
"text":"Optional",
"image_url":"http://my-website.com/path/to/image.jpg",
"thumb_url":"http://example.com/path/to/thumb.png",
"footer":null,
"pretext":"#name",
"color":"#7CD197"
}
And second one is,
{
"fields":[
{
"title":"Priority",
"value":"low",
"short":"true"
},
{
"title":"Priority",
"value":"medium",
"short":"true"
},
{
"title":"Priority",
"value":"high",
"short":"true"
},
{
"title":"Priority",
"value":"blocker",
"short":"true"
}
]
}
I want the output as
{
"title":"API-Actions Documentation",
"title_link":"https://api.slack.com/",
"author_name":"name",
"author_link":"http://flickr.com/bobby/",
"author_icon":"http://flickr.com/icons/bobby.jpg",
"text":"Optional",
"image_url":"http://my-website.com/path/to/image.jpg",
"thumb_url":"http://example.com/path/to/thumb.png",
"footer":null,
"pretext":"#name",
"color":"#7CD197",
"fields":[
{
"title":"Priority",
"value":"low",
"short":"true"
},
{
"title":"Priority",
"value":"medium",
"short":"true"
},
{
"title":"Priority",
"value":"high",
"short":"true"
},
{
"title":"Priority",
"value":"blocker",
"short":"true"
}
]
}
Easy! Just use MergeContent and set the following configuration:
Merge Format: Binary Concatenation
Minimum Number of Entries: 2
Delimiter Strategy: Text
Header: [
Footer: ]
Demarcator: ,
(You could use MergeRecord but it is a little buggy for me at least).
Then transfer to JoltTrasnformJSON and set Jolt Transformation DSL to Shift and Jolt Specification to:
{
"*": {
"*": "&"
}
}
This should do the job :)
Generally NiFi is not meant to do traditional streaming joins, but this recent thread on the mailing list can help explain what is possible:
http://apache-nifi-users-list.2361937.n4.nabble.com/join-two-datasets-td7039.html

Nested pagination with relay graphql

Currently having an issue with the relay approach to nested pagination. An example below to illustrate what I mean:
{
"data": {
"locations": {
"edges": [
{
"node": {
"id": "Location_254"
}
},
{
"node": {
"id": "Location_247"
}
},
{
"node": {
"id": "Location_217"
}
},
]
}
}
Here I have 3 locations returned from a query. Now I wanted to paginate on these locations and look at their 'history'.
query {
locations {
edges {
node {
history(
first:10
after:"eyJzbm9vemVJZCI6Mzg3fQ=="
)
}
}
}
}
This would paginate 10 results after the specified cursor. My issue is, is that this cursor is specific to the location it was obtained from. The cursor it is referring to paginate after, only applies to the location it came from.
Nested pagination tries to paginate on ALL locations here, when in actuality, the cursor being used, was grabbed from a specific location.
Am I seeing this incorrectly, or is there a better way I could be approaching this issue?
Regards, Sebastian

GraphQL Github API formatting

I am wondering how to deal with the following problem. I am using GraphQL to query the v4 Github API with the following query:
{
viewer {
repositories(first: 30) {
edges {
node {
name
}
}
}
}
}
This gets me a response that looks like so:
{
"data": {
"viewer": {
"repositories": {
"edges": [
{
"node": {
"name": "test-repo"
}
},
{
"node": {
"name": "another-repo"
}
}
]
}
}
}
}
I am pretty new to GraphQL, I understand that in my query I need to provide the edges and nodes but I would rather get a response back in this kind of way because I am not interested to know about "edges" and "nodes" in my frontend:
{
"data": {
"viewer": {
"repositories": [
{
"name": "test-repo"
},
{
"name": "another-repo"
}
]
}
}
}
}
I am guessing this kind of response is normal for GraphQL but it would be pretty cumbersome to rewrite to response all the time for easier usage in my frontend. Is there some way to emit the "edges" and "nodes" and get the formatting that I would like or is this simply all up to me to deal with?
I have looked at some libraries like Apollo but I have no idea is this is a right fit to deal with things like this. Hopefully someone a bit more experienced with GraphQL could tell me something more.
Sometimes, services provides two endpoints: Relay endpoint (with edges and nodes) and simple endpoint.
Looks like GitHub only have a Relay endpoint. In this case, the only thing you can do is to manually format the response on your frontend.
Actually, such complex response structure is needed because we often need to do a pagination. Take a look at the example:
{
getArticle(id: "some-id") {
id
userId
user {
id
name
}
tags(first: 10, after: "opaqueCursor") {
edges {
node {
id
name
itemsCount
}
}
pageInfo {
hasNextPage
hasPreviousPage
endCursor
startCursor
}
}
}
}
pageInfo is located at the same level as edges.
So if you later will need to do a pagination, it would be better to keep the response format as is.
You can remove the edges query if you know you aren't searching along those relationships. Cursor-based pagination will work by checking the pageInfo value hasNextPage and using endCursor as the after query parameter:
viewer {
repositories(first: 30,after:"<CURSOR_STRING>") {
totalCount
pageInfo{
hasNextPage
endCursor
}
nodes{
name
}
}
}
returns
"viewer": {
"repositories": {
"totalCount": 38,
"pageInfo": {
"hasNextPage": true,
"endCursor": "Y3Vyc29yOnYyOpHOAl/5mw=="
},
"nodes": [
{
"name": "AllStarRoom"
},
{
"name": "shimsham"
},
{
"name": "Monitor-Docs"
}
]
}
}

Validate a $Location in Firebase

I am trying to do some validation on incoming data into my firebase app. My structure is at the bottom. I have removed existing validation rules for clarity - however we can assume that reads and writes are allowed at the root rules level.
$categoryid will look something like this:
1234: {1:{...}, 2:{...}, 3:{...}}
I want to ensure that $categoryid (which is 1234 in the above example) is numerical - however the rule ".validate": "$categoryid.isNumeric()" results in an "no such method or property" error.
I could check for data.child($categoryid) in categories, however the variable doesn't exist at that level and results in an "unknown variable" error.
I'm sure I'm missing a trick here...
{
"rules": {
"categories": {
"$categoryid": {
"$itemid": {
"members": {
"$id": {
}
}
}
}
}
}
}
There is currently no good way to do this, but there is a hacky work around that involves storing the $categoryId in a field, then checking that that field is numeric.
Using these security rules:
{
"rules": {
"categories": {
"$categoryid": {
".validate": "'' + newData.child('meta/id') === $categoryId && newData.child('meta/id').isNumber()"
"meta": {},
"items": {
"$itemid": {
"members": {
"$id": {
}
}
}
}
}
}
}
}
We can then create a new category by running:
categoriesRef.child(1234).set({meta: {id: 1234}});
These rules will check that a) the $categoryId matches $categoryId/meta/id and that $categoryId/meta/id is a number.
To do this validation you can use RegEx /^[0-9]+$/
{
"rules": {
"categories": {
"$categoryid": {
.validate": "$categoryid.matches(/^[0-9]+$/)"
"$itemid": {
"members": {
"$id": {
}
}
}
}
}
}
}

Resources