Oracle NoSQL database : How to import json documents in a specific column?

Oracle NoSQL database : How to import json documents in a specific column? - oracle-nosql

I have a question about the import command in the Oracle NoSQL Database - SQL shell tool. I am wondering if there is an option to read a file and put the data in a JSON column instead of put the data in a table. Currently, it is matching the fields from the JSON file to the columns in the table. Something like
import -table <table_name> - column <myjsoncolumn> -file <myfile>
Here an example - simplified version
sql-> create table stat (reportTime long, reportTimeHuman string , primary key (reportTime));
Statement completed successfully
sql-> import -table stat -file file.json
Loaded 736 rows to stat.
sql-> select * from stat limit 5;
{"reportTime":1624370080000,"reportTimeHuman":"2021-06-22 13:54:40.000 UTC"}
{"reportTime":1624366760000,"reportTimeHuman":"2021-06-22 12:59:20.000 UTC"}
{"reportTime":1624368660000,"reportTimeHuman":"2021-06-22 13:31:00.000 UTC"}
{"reportTime":1624370980002,"reportTimeHuman":"2021-06-22 14:09:40.002 UTC"}
I want to do
CREATE TABLE IF NOT EXISTS stat
( id INTEGER GENERATED ALWAYS AS IDENTITY, myJson JSON, PRIMARY KEY (id))
import -table stat -column myJSON -file file.json
sql-> select myJson from stat limit 5;
{"reportTime":1624370080000,"reportTimeHuman":"2021-06-22 13:54:40.000 UTC"}
{"reportTime":1624366760000,"reportTimeHuman":"2021-06-22 12:59:20.000 UTC"}
{"reportTime":1624368660000,"reportTimeHuman":"2021-06-22 13:31:00.000 UTC"}
{"reportTime":1624370980002,"reportTimeHuman":"2021-06-22 14:09:40.002 UTC"}
I am expecting to have the json documents in the myJson column. id is a generated number in this case.

No, there is no such option using the shell but the migrator tool can do this
Step 1: create your table
Step 2: create the following file ./migrator-export.json
{
"source" : {
"type" : "file",
"format" : "json",
"dataPath" : "/data/test/kvstore_export/"
},
"sink" : {
"type" : "nosqldb",
"storeName" : "OUG",
"helperHosts" : ["localhost:5000"],
"table" : "stat",
"requestTimeoutMs" : 5000
},
"transforms": {
"aggregateFields" : {
"fieldName" : "myJson",
"skipFields" : []
}
},
"abortOnError" : true,
"migratorVersion" : "1.0.0"
}
Step 3; execute the command Migrator using the previous configuration file.
You can also follow the wizard
Would you like to create table as part of migration process?
Use this option if you want to create table through the migration tool.
If you select yes, you will be asked to provide a file that contians table DDL or to use default schema.
(y/n) (n): y
We identified source as file.
Would you like to use below default schema?
CREATE TABLE IF NOT EXISTS stat(id LONG GENERATED ALWAYS AS IDENTITY(CACHE 5000), document JSON, PRIMARY KEY(SHARD(id)))
Where 'id' will be auto generated and 'document' is all the
fields aggregated into one JSON column.
Please note that tool will internally create table with above mentioned schema.
For aggregation below transforms are applied
internally by the tool."transforms" : {
"aggregateFields" : {
"fieldName" : "document"
}
}
(y/n) (n): y

Related

How to update table from JSON flowfile

I have a flow-files with the below structure
{
"PN" : "U0-WH",
"INPUT_DATE" : "44252.699895833335",
"LABEL" : "Marker",
"STATUS" : "Approved",
}
and I need to execute an update statement using some fields
update table1 set column1 = 'value' where pn=${PN}
I found convertJsonToSQL but am not sure how to use it in this case

You can use a processor namely ConvertjSONToSQL. Using this you can convert your json into an update query.
ConvertjSONToSQL Description
It takes the following parameters :
1. JDBC Connection Pool : Create a JDBC pool which takes DB connection information as input.
2. Statement Type : Here you need to provide type of statement you want to create. In your case its 'UPDATE'.
3. Table Name : Name of the table for which update query needed to be created
4. Schema Name : Name of the schema of your database.
5. Translate Field Names : If true, the Processor will attempt to translate JSON field names into the appropriate column names for the table specified. If false, the JSON field names must match the column names exactly, or the column will not be updated
6. Unmatched Field Behaviour : if an incoming JSON element has a field that does not map to any of the database table's columns, this property specifies how to handle the situation
7. Unmatched Column Behaviour : If an incoming JSON element does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation
8. Update Keys : A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. This property is ignored if the Statement Type is INSERT
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Read the description above and try to use the properties given. Detailed description of the processor is given in the link.
ConvertjSONToSQL Description

Rule-based mapping on Copy Activity in Azure Data Factory

I'm trying to create a dynamic mapping when I use copy data activity on Azure Data Factory.
I want to create a parquet file that contains the same data that I'm reading from the source but I want to modfy some columns names to remove white spaces on it (It's a bug of Parquet format) and I want to do that automatically.
I have seen that this is possible in mapping data flow, but I don't see any such functionality on Copy Activity (Mapping data flow is limited to a few connectors as a source, so I can't use it).
As you can see on the image, it seems that I can only modify individual columns, not a few of them that fullfil certain conditions
How can I do that?
Thanks in advance

Here is a solution to apply a dynamic column name mapping with ADF so that you can still use the copy data activities with parquet format, even when the source column names have pesky white-space characters which are not supported.
The solution involves three parts:
Dynamically generate your list of mapped column names. The example below demonstrates how you could encode the white-space from an SQL database table source dataset dynamically with a lookup activity (referred to as 'lookup column mapping' below).
;with cols as (
select
REPLACE(column_name, (' ', '__wspc__') as new_name,
column_name as old_name
from INFORMATION_SCHEMA.columns
where table_name = '#{pipeline().parameters.SOURCE_TABLE}'
and table_schema = '#{pipeline().parameters.SOURCE_SCHEMA}'
)
select ' |||'+old_name+'||'+new_name+'|' as mapping
from cols;
Use an expression to repack the column mapping derived in the lookup activity in step 1. into the json syntax expected by the copy data activity template. You can insert this into a set variable activity with Array type variable (referred to as 'column_mapping_list' below).
#json(
concat(
'[ ',
join(
split(
join(
split(
join(
split(
join(
xpath(
xml(
json(
concat(
'{\"root_xml_node\": ',
string(activity('lookup column mapping').output),
'}'
)
)
),
'/root_xml_node/value/mapping/text()',
)
','
),
'|||'
),
'{\"source\": { \"name\": \"'
),
'||'
),
'\" },\"sink\": { \"name\": \"'
),
'|'
),
'\" }}'
),
' ]'
)
)
Unfortunately the expression is more convoluted than we would like as the xpath function requires a single root node which is not provided by the lookup activity output, and the string escaping of the ADF json templates present some challenges to simplifying this.
Lastly, use the column mapping list variable as "dynamic content" in the mapping section of the copy data activity with the following expression
#json(
concat(
'{ \"type\": \"TabularTranslator\", \"mappings\":',
string(variables('column_mapping_list')),
'}'
)
)
Expected results:
Step 1.
'my wspccol' -> '|||my wspccol||my__wspc__wspcol|'
Step 2.
'|||my wspccol||my__wspc__wspccol|' -> ['{ "source": { "name": "my wspccol" }, "sink": { "name": "my__wspc__wspccol" } }']
Step 3.
{
"type": "TabularTranslator",
"mappings": [
{
"source": { "name": "my wspccol" },
"sink": { "name": "my__wspc__wspccol" }
}
]
}
Additionally:
Keep in mind that the solution can be as easily reversed, so that if you want to load that parquet file back into an SQL table with the original column names then you can use the same expressions to build your dynamic copy data mapping; Just switch over the old_names, new_names in step 1. to map back to the original names.
A data type can also by specified in the mapping where needed. Adjust the syntax accordingly, following the documentation here: https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.datafactory.models.tabulartranslator.mappings?view=azure-dotnet

The Copy activity can change from one file type to another, eg csv to json, parquet to database but it does not inherently allow any transform, such as changing the content of any columns, even adding additional columns.
Alternately consider using ADF to call a Databricks notebook for these complex rule-based transforms.

Set a data to a specific name with spring & mongodb

My json data
[{"cameraid":"000000001","timestamp":"2016-06-17 23:08","filename":"7e3800fbd0557c683874ed2f41ed7057"},
{"cameraid":"000000002","timestamp":"2016-06-17 23:08","filename":"b260cc730da88a6af4e5038d6e1e32db"}]
How can i have cameraid to link to a specific name?
Like example my cameraid "000000001" to be call as bedok.
Anybody got any idea on how to do it?

Create another collection with names(lets call it db_name) and link the cameraid to the _id of db_name collection. This way you can fetch the names using cameraid. This is more like primary key and foreign key concept in relational database(RDBMS).
More on this with code here: Primary Key and Foreign Key Concept in MongoDB

Assuming you want to add a name field for specific cameraid,
Try following query:-
db.collname.update({"cameraid":"000000001"},{$set : {name : 'bedok'}});
EDIT:-
The above query will update only one record which matches the query {"cameraid":"000000001"}.
Add multi:true to the query, to multiple records.
db.collname.update({"cameraid":"000000001"},{$set : {name : 'bedok'}},{multi : true});
Now it will update all the records that matches the query {"cameraid":"000000001"}.

Laravel Eloquent - get distinct values of one column

I have a MongoDB collection organized like the following document:
{ "_id" : ObjectId("55f699bb1638cf0f4ba139f1"), "code" : 169, "categories" : "Consulenti del lavoro", "listing" : "B", "macrocategory" : "Consulenti" }
I need to get an array of the macrocategory field, but it's assigned to multiple documents and I need only one occurence of each of them. I tried using distinct in my Eloquent query, but I can't use it as I don't know which is the return value.
My query now, which returns the macrocategories multiple times:
$macrocategories = Category::orderBy('macrocategory','asc')->get();
Thanks!

You suppose to use distinct when you want an entire row to be unique,when you want a specific column to be unique you should use GROUP BY
I guess the distinct would be the same as yours order by coulmn -
$macrocategories = Category::orderBy('macrocategory','asc')->groupBy('macrocategory')->get();
BTW if you have columns that you don't need you should use select and fetch only the columns you do.

Grails Gorm find retrieves different date value compared from database

In my Oracle database, there is an Agreement table with a column effectivityDate with a data type of DATE. When I try to query a certain row
select * from agreement where id = 'GB'
it returns a row with this value:
id: GB
name: MUITF - Double bypass
...
effectivityDate: 7/2/2015
I created a Grails Domain class for this:
class Agreement implements Serializable {
String id
Date effectivityDate
static mapping = {
table "agreement"
varsion: false
id column: "id"
name column: "name"
...
effectivityDate column: "effectivityDate"
}
}
But when I tried to query it on groovy using:
Agreement a = Agreement.findById("GB")
println a
It return this object:
[id:GB, name:MUITF - Double bypass, ..., effectivityDate: 2015-07-01T16:00:00Z]
^^^^^^^^^^^^^^^^^^^^
My question is, why would the date fetched directly from the database different from the one retrieved by gorm? Does this have something to do with time zones?

Just seen in your profile you are from Philippines (PHT, GMT+8).
Since 2015-07-01T16:00:00Z === 2015-07-02T00:00:00+08:00, the most likely cause is that you are using the PHT time zone to display the date when querying the database and the GMT/Zulu time zone when querying/displaying with groovy/grails.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio