Data not being written to OrientDB - etl

I have a single column csv file with the column name Links in the first row and then some links in each row, following is my etl file:
{
"source": { "file": { "path": "fb22.csv" } },
"extractor": { "csv": {
"columns":["Links:string"]
} },
"transformers": [
{ "vertex": { "class": "Links" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/orientdb/databases/fblinks1",
"dbType": "graph",
"classes": [
{"name": "Links", "extends": "V"}
]
}
}
}
I have a database with no vertexes created, when I run oetl fblinks.json file I get the following in the console:
E:\orientdb-server\bin>oetl fblinks.json
OrientDB etl v.2.2.29 (build 9914189f972103907c24377a1567897e68642920) https://www.orientdb.com
[file] INFO Load from file fb22.csv
[csv] INFO column types: {Links=STRING}
[file] INFO Reading from file fb22.csv with encoding UTF-8
[orientdb] INFO committing
E:\orientdb-server\bin>
But no data is written to the database. Please advise what am I doing wrong??
Thanks in advance.

Related

How to filter any data on any filed is present or not in object in GraphQL

Below is my sample data which store in Dgraph
{
"data": {
"comp": [
{
"topologytemplate": {
"namespace": "a",
"nodetemplates": [
{
"name": "a",
},
{
"name": "b",
}
]
}
},
{
"topologytemplate": {
"namespace": "c",
"nodetemplates": [
{
"name": "a",
},
{
"name": "b",
"directives": [
"a"
]
},
]
}
},
]
},
}
I want to filter data so that as a result we get data that does not contain "directives" filed. I am want to filter data using GraphQL query?
Currently, I am trying to filter data as follows:
query {
comp(func: eq(dgraph.type,"QQQ")){
name
topologytemplate{
nodetemplates #filter (eq(nodetypename,"a")){
name
directives
}
}
}
}
Query to check that directives filed are present or not in nodetemplate?

Why does FaunaDB output differ from Graphqli?

I have created a simple user.gql file
type Query {
users: [user]
userById(id:ID!):user
}
type user {
id: ID!
chat_data: String
}
My data is
[
{
"id": "0815960b-9725-48d5-b326-7718c4749cf5",
"chat_data": ""
}
]
When I run this on my local server and use the query
{users{id}}
I see the expected output
{
"data": {
"users": [
{
"id": "0815960b-9725-48d5-b326-7718c4749cf5"
}
]
}
}
I have created a user collection on FaunaDB with the data
{
"ref": Ref(Collection("user"), "324407037973758152"),
"ts": 1645691670220000,
"data": {
"id": "0815960b-9725-48d5-b326-7718c4749cf5",
"chat_data": ""
}
}
and uploaded my user.gql, but when I run the GraphQl query
{users{id}}
I get the error
{
"data": null,
"errors": [
{
"message": "Cannot query field 'id' on type 'userPage'. (line 3, column 5):\n id\n ^",
"locations": [
{
"line": 3,
"column": 5
}
]
}
]
}
What am I doing wrong?
This is very unintuitive, but Fauna seems to be returning a paginated result. Read more about it here.
The best thing would be to GraphiQL to have a look at the schema of the Fauna GraphQL endpoint. Autocomplete should also work when you look for fields to query. The error basically says that you can't query the id directly. Try this:
{ users { data { id } } }

Unable to parse schemas received by schema registry while tracking oracle database changes

I am using confluent and kafka-connect-oracle (https://github.com/erdemcer/kafka-connect-oracle) to track changes in Oracle database 11g XE and i can see schema content by using schema registry api such as "curl -X GET http://localhost:8081/schemas/ids/44" :
{"subject":"TEST.KAFKAUSER.TEST-value","version":1,"id":44,"schema":"{"type":"record","name":"row","namespace":"test.kafkauser.test","fields":[{"name":"SCN","type":"long"},{"name":"SEG_OWNER","type":"string"},{"name":"TABLE_NAME","type":"string"},{"name":"TIMESTAMP","type":{"type":"long","connect.version":1,"connect.name":"org.apache.kafka.connect.data.Timestamp","logicalType":"timestamp-millis"}},{"name":"SQL_REDO","type":"string"},{"name":"OPERATION","type":"string"},{"name":"data","type":["null",{"type":"record","name":"value","namespace":"","fields":[{"name":"ID","type":["null","double"],"default":null},{"name":"NAME","type":["null","string"],"default":null}],"connect.name":"value"}],"default":null},{"name":"before","type":["null","value"],"default":null}],"connect.name":"test.kafkauser.test.row"}","deleted":false}
However this schema cannot be parsed by confluent's schema registry in python :
schemaRegistryClientURL="http://localhost:8081"
from confluent.schemaregistry.client import CachedSchemaRegistryClient
from confluent.schemaregistry.serializers import MessageSerializer
schema_registry_client= CachedSchemaRegistryClient(url=schemaRegistryClientURL)
schema_registry_client.get_by_id(44)
I get following error :
Traceback (most recent call last):
File "", line 1, in
File "build/bdist.linux-x86_64/egg/confluent/schemaregistry/client/CachedSchemaRegistryClient.py", line 140, in get_by_id
confluent.schemaregistry.client.ClientError: Received bad schema from registry.
Does kafka-connect-oracle send unvalid schema to schema registry ? How can I get this schema into proper format?
Thanks.
Looks like there is a problem with your schema. JSON formatter says it's an invalid format. You can check if your JSON is formatted correctly here: https://jsonformatter.curiousconcept.com/#
By looking at it, I see there are 2 overused quote marks here:
First one is in the firt row, after "schema":
Second one is in the last row, between test.row"} and ,"deleted":false}
After deleting these two, it is now in the valid form. If you are asking a way to do this automatically, I don't know a way to do it. Maybe you can search for some python codes to validate and fix JSON format.
This is the valid format:
{
"subject":"TEST.KAFKAUSER.TEST-value",
"version":1,
"id":44,
"schema":{
"type":"record",
"name":"row",
"namespace":"test.kafkauser.test",
"fields":[
{
"name":"SCN",
"type":"long"
},
{
"name":"SEG_OWNER",
"type":"string"
},
{
"name":"TABLE_NAME",
"type":"string"
},
{
"name":"TIMESTAMP",
"type":{
"type":"long",
"connect.version":1,
"connect.name":"org.apache.kafka.connect.data.Timestamp",
"logicalType":"timestamp-millis"
}
},
{
"name":"SQL_REDO",
"type":"string"
},
{
"name":"OPERATION",
"type":"string"
},
{
"name":"data",
"type":[
"null",
{
"type":"record",
"name":"value",
"namespace":"",
"fields":[
{
"name":"ID",
"type":[
"null",
"double"
],
"default":null
},
{
"name":"NAME",
"type":[
"null",
"string"
],
"default":null
}
],
"connect.name":"value"
}
],
"default":null
},
{
"name":"before",
"type":[
"null",
"value"
],
"default":null
}
],
"connect.name":"test.kafkauser.test.row"
},
"deleted":false
}

Loading Numeric data into BigQuery with Avro files created with goavro

I am trying to figure out how to load dollar values into a Numeric column in BigQuery using an Avro file. I am using golang and the goavro package to generate the avro file.
It appears that the appropriate datatype in go to handle money is big.Rat.
BigQuery documentation indicates it should be possible to use Avro for this.
I can see from a few goavro test cases that encoding a *big.Rat into a fixed.decimal type is possible.
I am using a goavro.OCFWriter to encode data using a simple avro schema as follows:
{
"type": "record",
"name": "MyData",
"fields": [
{
"name": "ID",
"type": [
"string"
]
},
{
"name": "Cost",
"type": [
"null",
{
"type": "fixed",
"size": 12,
"logicalType": "decimal",
"precision": 4,
"scale": 2
}
]
}
]
}
I am attempting to Append data with the "Cost" field as follows:
map[string]interface{}{"fixed.decimal": big.NewRat(617, 50)}
This is successfully encoded, but the resulting avro file fails to load into BigQuery:
Err: load Table MyTable Job: {Location: ""; Message: "Error while reading data, error message: The Apache Avro library failed to parse the header with the following error: Missing Json field \"name\": {\"logicalType\":\"decimal\",\"precision\":4,\"scale\":2,\"size\":12,\"type\":\"fixed\"}"; Reason: "invalid"}
So am doing something wrong here... Hoping someone can point me in the right direction.
I figured it out. I need to use bytes.decimal instead of fixed.decimal
{
"type": "record",
"name": "MyData",
"fields": [
{
"name": "ID",
"type": [
"string"
]
},
{
"name": "Cost",
"type": [
"null",
{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}
]
}
]
}
Then encode similarly
map[string]interface{}{"bytes.decimal": big.NewRat(617, 50)}
And it works nicely!

Orientdb Slow import large dataset how to make it faster?

I'm working on a network of 17M edges and 20K vrtices , and I'm loading it into Orientdb using ETL tool but it is taking forever to load.
I tried the batch varying from 1000 to 100000 , yet still no change .
Is there an optimized way to make it load faster ? other tha using Java API
Any help would be appreciated.
I'm using 2.2.20 community version .
Here is the ETL fo import :
{
"source": { "file": { "path": "C:/Users/Muuna/Desktop/files/H.csv" } },
"extractor": { "csv": {
"separator": ",",
"columnsOnFirstLine": true,
"ignoreEmptyLines": true,
"columns": ["id:Integer","p1:String","p2:String","s:Integer"] } },
"transformers": [
{ "command": { "command": "UPDATE H set p='${input.p1}' UPSERT WHERE p='${input.p1}'"},"vertex": { "class": "H", "skipDuplicates": true} }
],
"loader": {
"orientdb": {
"dbURL": "PLOCAL:C:/orientdb/databases/Graph",
"dbUser": "admin",
"dbPassword": "admin",
"dbType": "graph",
"classes": [
{"name": "H", "extends": "V"},
{"name": "HAS_S", "extends": "E"}
],"indexes": [ {"class":"H", "fields":["p:String"], "type":"UNIQUE" }]
}
}
}
Based on [1]: orientdb load graph csv of nodes and edges
The same script is loaded twice to import the 2 vertices and another ETL for loading the edges .
Edges .
Based on [Ref][1]
{
"source": { "file": { "path": "C:/Users/Muuna/Desktop/files/H.csv" } },
"extractor": { "csv": {
"separator": ",",
"columnsOnFirstLine": true,
"ignoreEmptyLines": true,
"columns": ["id:Integer","p1:String","p2:String","s:Integer"] } },
"transformers": [
{ "command": { "command": "CREATE EDGE HAS_S FROM (SELECT FROM H WHERE p='${input.p1}') TO (SELECT FROM H WHERE p='${input.p2}') set score=${input.s}"} }
],
"loader": {
"orientdb": {
"dbURL": "PLOCAL:C:/orientdb/databases/Graph",
"dbUser": "admin",
"dbPassword": "admin",
"dbType": "graph",
"classes": [
{"name": "H", "extends": "V"},
{"name": "HAS_S", "extends": "E"}
],"indexes": [ {"class":"H", "fields":["p:String"], "type":"UNIQUE" }]
}
}
}

Resources