Does CockroachDB support JSON? - cockroachdb

I’m interested in trying out CockroachDB, but typically SQL databases don’t natively support JSON. Is there a way for me to access fields of JSON objects in queries if I store them in CockroachDB?

UPDATE: CockroachDB supports JSON now.
Be Flexible & Consistent: JSON Comes to CockroachDB
We are excited to announce support for JSON in our 2.0 release (coming in April) and available now via our most recent 2.0 Beta release. Now you can use both structured and semi-structured data within the same database.

CockroachDB supports JSON. It stores JSON data in JSONB data type - Binary JSON.
CREATE TABLE my_table1 (
id INT PRIMARY KEY,
data JSONB
);
The size of a JSONB field is variable but should be kept within 1 MB to ensure satisfactory performance.
We can insert JSON string as follows:
INSERT INTO my_table1 (id, data)
VALUES
(1, '{"name": "Mary", "age": 16, "city": "Singapore"}'::JSONB),
(2, '{"name": "John", "age": 17, "city": "Malaysia" }'::JSONB),
(3, '{"name": "Pete", "age": 18, "city": "Vienna" }'::JSONB),
(99,'{"name": "Anna", "gender": "Female" }'::JSONB);
SELECT * from my_table1;
id | data
-----+---------------------------------------------------
1 | {"age": 16, "city": "Singapore", "name": "Mary"}
2 | {"age": 17, "city": "Malaysia", "name": "John"}
3 | {"age": 18, "city": "Vienna", "name": "Pete"}
99 | {"gender": "Female", "name": "Anna"}
(4 rows)
Return rows (and returning data as JSONB)
SELECT * FROM my_table1 WHERE data->'age' = '17'::JSONB;
id | data
-----+--------------------------------------------------
2 | {"age": 17, "city": "Malaysia", "name": "John"}
(1 row)
Return rows (and returning data as string)
SELECT * FROM my_table1 WHERE data->>'age' = '17';
id | data
-----+--------------------------------------------------
2 | {"age": 17, "city": "Malaysia", "name": "John"}
(1 row)
Return rows showing age and city from data field as JSONB
SELECT id,
data->'age' AS "age",
data->'city' AS "city"
FROM my_table1;
id | age | city
-----+------+--------------
1 | 16 | "Singapore"
2 | 17 | "Malaysia"
3 | 18 | "Vienna"
99 | NULL | NULL
(4 rows)
Return rows if data field has age subfield
SELECT id,
data->'age' AS "age",
data->'city' AS "city"
FROM my_table1
WHERE data ? 'age';
id | Age | City
-----+-----+--------------
1 | 16 | "Singapore"
2 | 17 | "Malaysia"
3 | 18 | "Vienna"
(3 rows)
Return rows if age subfield (as string) is equal to 17
SELECT * FROM my_table1 WHERE data->>'age' = '17';
id | data
-----+--------------------------------------------------
2 | {"age": 17, "city": "Malaysia", "name": "John"}
(1 row)
Return rows if age subfield (as JSONB) is equal to 17
SELECT * FROM my_table1 WHERE data->'age' = '17'::JSONB;
id | data
-----+--------------------------------------------------
2 | {"age": 17, "city": "Malaysia", "name": "John"}
(1 row)
Select rows if name and gender subfields exist in the data field.
SELECT * FROM my_table1 WHERE data ?& ARRAY['name', 'gender'];
id | data
-----+---------------------------------------
99 | {"gender": "Female", "name": "Anna"}
(1 row)

Related

How do I Update Oracle database by JSON data with 1 query?

I am trying to Update below sample json data into an Oracle version 19 table. (I want update 1000 rows from json with 1 query):
create table jt_test (
CUST_NUM int, SORT_ORDER int, CATEGORY varchar2(100)
);
[
{"CUST_NUM": 12345, "SORT_ORDER": 1, "CATEGORY": "ICE CREAM"}
{"CUST_NUM": 12345, "SORT_ORDER": 2, "CATEGORY": "ICE CREAM"}
{"CUST_NUM": 12345, "SORT_ORDER": 3, "CATEGORY": "ICE CREAM"}
]
I use this tutorial and this for insert rows from json and that work perfect. But for update rows I have no idea. How can I do?
Note: I use Oracle19C and connect and insert to db with cx_Oracle module python.
Code for Inserting by json to Oracle columns:
DECLARE
myJSON varchar2(1000) := '[
{"CUST_NUM": 12345, "SORT_ORDER": 1, "CATEGORY": "ICE CREAM"},
{"CUST_NUM": 12345, "SORT_ORDER": 2, "CATEGORY": "ICE CREAM"},
{"CUST_NUM": 12345, "SORT_ORDER": 3, "CATEGORY": "ICE CREAM"}
]';
BEGIN
insert into jt_test
select * from json_table ( myjson, '$[*]'
columns (
CUST_NUM, SORT_ORDER, CATEGORY
)
);
END;
In SQL Developer use below code :
MERGE INTO jt_test destttt using(
SELECT CUST_NUM,SORT_ORDER,CATEGORY FROM json_table (
'[
{"CUST_NUM": 12345, "SORT_ORDER": 1, "CATEGORY": "ICE CREAM"},
{"CUST_NUM": 12345, "SORT_ORDER": 2, "CATEGORY": "ICE CREAM"},
{"CUST_NUM": 12345, "SORT_ORDER": 3, "CATEGORY": "ICE CREAM"}
]'
,'$[*]'
COLUMNS
CUST_NUM int PATH '$.CUST_NUM ',
SORT_ORDER int PATH '$.SORT_ORDER ',
CATEGORY varchar2 PATH '$.CATEGORY ' ) ) srccccc
ON ( destttt.CUST_NUM= srccccc.CUST_NUM)
WHEN MATCHED THEN UPDATE SET destttt.CATEGORY=srccccc.CATEGORY
WHEN NOT MATCHED THEN INSERT ( CUST_NUM,SORT_ORDER,CATEGORY) VALUES (srccccc.CUST_NUM,srccccc.SORT_ORDER,srccccc.CATEGORY);
In python with cx_Oracle use below code :
long_json_string = '''[
{"CUST_NUM": 12345, "SORT_ORDER": 1, "CATEGORY": "ICE CREAM"},
{"CUST_NUM": 12345, "SORT_ORDER": 2, "CATEGORY": "ICE CREAM"},
{"CUST_NUM": 12345, "SORT_ORDER": 3, "CATEGORY": "ICE CREAM"}
]'''
sql = '''
DECLARE jsonvalue CLOB := :long_json_string ;
begin
MERGE INTO jt_test destttt using(
SELECT CUST_NUM,SORT_ORDER,CATEGORY FROM json_table (jsonvalue
,'$[*]'
COLUMNS
CUST_NUM int PATH '$.CUST_NUM',
SORT_ORDER int PATH '$.SORT_ORDER',
CATEGORY varchar2 PATH '$.CATEGORY' ) ) srccccc
ON ( destttt.CUST_NUM= srccccc.CUST_NUM)
WHEN MATCHED THEN UPDATE SET destttt.CATEGORY=srccccc.CATEGORY
WHEN NOT MATCHED THEN INSERT ( CUST_NUM,SORT_ORDER,CATEGORY) VALUES (srccccc.CUST_NUM,srccccc.SORT_ORDER,srccccc.CATEGORY);
'''
cursor.execute(sql, long_json_string=long_json_string)
Note1: Do not forget in end use commit.
Note 2: Make sure that the column you use as a comparison is not repeated in a json and causes deadlock.
Note 3: there is case sensitivity json keys, that is, CUST_NUM is different from cust_num and CUST_num and ...
Wrong : CUST_NUM int PATH '$.CUST_num' or CUST_NUM int PATH '$.cusr _num'
Ok: CUST_NUM int PATH '$.CUST_NUM'

How to get opening balance on the first record given accumulated balance in power query

Given table below, where the starting date is variable, as it can be start from any date for a month:
Component | Type | Date | AccumulateBalance
A | PO | 31 Jan | 240
A | PO | 1 Feb | 240
B | PO | 28 Jan | 300
B | PO | 29 Jan | 300
A | SO | 31 Jan | 100
A | SO | 1 Feb | 100
I need to calculate the first opening balance given only the accumulated Balance, and it is reset by Component + Type
Component | Type | Date | OpenBalance
A | PO | 31 Jan | 240
A | PO | 1 Feb | 0
B | PO | 28 Jan | 300
B | PO | 29 Jan | 0
A | SO | 31 Jan | 100
A | SO | 1 Feb | 0
Any helps or advice will be very much appreciated!
Thank you
Andrea
You can use a bit of custom PowerQuery code pasted into home ... advanced editor...
Assuming data is loaded into Table1, the third row will add a column that is the minimum date of all rows with matching Component and Type. The fourth row then checks to see if the current row's date matches the minimum; if so it shows the balance, otherwise zero
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Component", type text}, {"Type", type text}, {"Date", type date}, {"AccumulateBalance", Int64.Type}}),
AddMinDateColumn = Table.AddColumn(#"Changed Type", "Earliest Date", (thisrow) => List.Min(Table.SelectRows(#"Changed Type", each [Component] = thisrow[Component] and [Type] = thisrow[Type])[Date]), type date),
#"Added Custom" = Table.AddColumn(AddMinDateColumn, "OpenBalance", each if [Date]=[Earliest Date] then [AccumulateBalance] else 0)
in #"Added Custom"
another way is to select the Component and Type columns, and Group them, using minimum of Date column and AllRows. If you expand the AccumulateBalance column and remove duplicates, that will give you a table of the minimum dates and their values. You can then merge it back into original table matching on Component, Type, and Date, and expand the Balance field. Sample below which can be pasted into Home...advanced editor... Assumes data is loaded into Table1
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Component", type text}, {"Type", type text}, {"Date", type date}, {"AccumulateBalance", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Component", "Type"}, {{"MinDate", each List.Min([Date]), type date}, {"Data", each _, type table}}),
#"Expanded Data" = Table.ExpandTableColumn(#"Grouped Rows", "Data", {"AccumulateBalance"}, {"Data.AccumulateBalance"}),
Table2= Table.Distinct(#"Expanded Data"),
#"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Component", "Type", "Date"},Table2,{"Component", "Type", "MinDate"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Table2", {"Data.AccumulateBalance"}, {"OpeningBalance"})
in #"Expanded Table2"

PowerQuery Table Transformation

I want to be able to make the first column in my table pair with all others, making a new row for every combination of the two.
I need to be able to turn this:
A | 1 | 2 | 3
B | 4 | 5 | 6
C | 6 | 7 | 9
Into this:
A | 1
A | 2
A | 3
B | 4
B | 5
B | 6
C | 7
C | 8
C | 9
Is there any way this can be done using just powerquery?
Just unpivot other columns:
unpivot = Table.UnpivotOtherColumns(Source, {"col1"}, "a", "b")[[col1],[b]]
Are they in separate columns? Then load into powerquery, right click first column, choose unpivot other columns.
Are they in a single column separated by |s? Then use below
Right click column, split column by delimiter, Select or enter delimeter --Custom-- |, Split at each occurrence of the delimiter, ignore advanced options
Remove the latter part of the query so that
= Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4"}),
becomes
= Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv))
right Click first Column, unpivot other columns
right Click attribute column, remove column
Assuming data is in Table1 with no column headers or header column of Column1 then code is
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv)),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"Column1.1"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"})
in #"Removed Columns"

Create associations issue in inserting in associated table

type Group struct {
gorm.Model
CreatedBy uint64
GroupOrders []GroupOrder gorm:"many2many:group_orders;association_jointable_foreignkey:group_id;jointable_foreignkey:group_id;"
}
type GroupOrder struct {
gorm.Model
GroupID uint64
OrderID uint64
UserID uint64
Group Group
}
I am trying to insert a record like this
newGroup: = &Group{
CreatedBy: newGroupDetails.UserID,
GroupOrders: []GroupOrder{
{
OrderID: newGroupDetails.OrderID,
UserID: newGroupDetails.UserID,
},
},
}
I am creating a record by using this.
db.Create(newGroup)
It creates a record correctly in Group model but while inserting in GroupOrder model, it is inserting NULL value in group_id column.
After that, it fires a query
INSERT INTO group_orders (group_id) SELECT ? FROM DUAL WHERE NOT EXISTS (SELECT * FROM group_orders WHERE group_id = ?)[30 30] 1 pkg=mysql
And then insert another record in GroupOrder Model with all empty fields but adding the group id field as the previously inserted group_order_id value.
The resultant data in mysql
GroupOrder
| id | group_id | order_id | user_id |
+----+---------------+----------+---------+
| 30 | 0 | 8764822 | 678972 |
| 31 | 30 | NULL | NULL |
Group
| id | created_by |
+----+------------+
| 18 | 678972 |
At least, it should be inserting 18 in place of 30 in the last row group_id column in GroupOrder table.
Why is this happening? Can someone explain if there's a bug.
PS: For brevity, removed a few other columns from both Models.
Found the bug myself. Group has an has-many association with GroupOrder and not many to many. Removed that and it worked clean.
Hope it helps someone :)

Manage insert update scheduled tasks

I'm not used to work with scheduled tasks, I need some advice (is my thought good or bad)
I'm designing a function that runs every 20 minutes. This function retrieves data from a json file (which I do not have control over) and inserts the data into the database.
When I was doing this I did not think that this will create a unique ID problem in the database view that it is the same data that updates each time.
I thought of doing two functions:
1: the first insertions (INSERT)
2: Update the data according to the ID (UPDATE)
#Component
public class LoadSportsCompetition {
#PostConstruct
public void insert() {
// 1 : get json data
// 2 : insert in DB
}
#Scheduled(cron="0 0/20 * * * ?")
public void update() {
// 1 : get json data
// 2 : update rows by ID
}
}
The (most probably) best way to handle this in PostgreSQL 9.5 and later, is to use INSERT ... ON CONFLICT ... DO UPDATE.
Let's assume this is your original table (very simple, for the sake of this example):
CREATE TABLE tbl
(
tbl_id INTEGER,
payload JSONB,
CONSTRAINT tbl_pk
PRIMARY KEY (tbl_id)
) ;
We fill it with the starting data:
INSERT INTO tbl
(tbl_id, payload)
VALUES
(1, '{"a":12}'),
(2, '{"a":13, "b": 25}'),
(3, '{"a":15, "b": [12,13,14]}'),
(4, '{"a":12, "c": "something"}'),
(5, '{"a":13, "x": 1234.567}'),
(6, '{"a":12, "x": 1234.789}') ;
Now we perform a non-conflicting insert (i.e.: the ON CONFLICT ... DO won't be executed):
-- A normal insert, no conflict
INSERT INTO tbl
(tbl_id, payload)
VALUES
(7, '{"x": 1234.56, "y": 3456.78}')
ON CONFLICT ON CONSTRAINT tbl_pk DO
UPDATE
SET payload = excluded.payload ; -- Note: the excluded pseudo-table comprises the conflicting rows
And now we perform one INSERT that would generate a PRIMARY KEY conflict, which will be handled by the ON CONFLICT clause and will perform an update
-- A conflicting insert
INSERT INTO tbl
(tbl_id, payload)
VALUES
(3, '{"a": 16, "b": "I don''t know"}')
ON CONFLICT ON CONSTRAINT tbl_pk DO
UPDATE
SET payload = excluded.payload ;
And now, a two row insert that will conflict on one row, and insert the other:
-- Now one of each
-- A conflicting insert
INSERT INTO tbl
(tbl_id, payload)
VALUES
(4, '{"a": 18, "b": "I will we updated"}'),
(9, '{"a": 17, "b": "I am nuber 9"}')
ON CONFLICT ON CONSTRAINT tbl_pk DO UPDATE
SET payload = excluded.payload ;
We check now the table:
SELECT * FROM tbl ORDER BY tbl_id ;
tbl_id | payload
-----: | :----------------------------------
1 | {"a": 12}
2 | {"a": 13, "b": 25}
3 | {"a": 16, "b": "I don't know"}
4 | {"a": 18, "b": "I will we updated"}
5 | {"a": 13, "x": 1234.567}
6 | {"a": 12, "x": 1234.789}
7 | {"x": 1234.56, "y": 3456.78}
9 | {"a": 17, "b": "I am nuber 9"}
Your code should loop through your incoming data, get it, and perform all the INSERT/UPDATE (sometimes called MERGE or UPSERT) one row at a time, or in batches, with multi-line VALUES.
You can get all the code at dbfiddle here
There is also one alternative, which is better suited if you work in batches. Use a WITH statement, that has one UPDATE clause, followed by an INSERT one:
-- Avoiding (most) concurrency issues.
BEGIN TRANSACTION ;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
WITH data_to_load (tbl_id, payload) AS
(
VALUES
(3, '{"a": 16, "b": "I don''t know"}' :: jsonb),
(4, '{"a": 18, "b": "I will we updated"}'),
(7, '{"x": 1234.56, "y": 3456.78}'),
(9, '{"a": 17, "b": "I am nuber 9"}')
),
update_existing AS
(
UPDATE
tbl
SET
payload = data_to_load.payload
FROM
data_to_load
WHERE
tbl.tbl_id = data_to_load.tbl_id
)
-- Insert the non-existing
INSERT INTO
tbl
(tbl_id, payload)
SELECT
tbl_id, payload
FROM
data_to_load
WHERE
data_to_load.tbl_id NOT IN (SELECT tbl_id FROM tbl) ;
COMMIT TRANSACTION ;
You'll get the same results, as you can see at dbfiddle here.
In both cases, be ready for error handling, and be prepared to retry your transactions if they conflict due to concurrent actions also modifying your database. Your transactions can be explicit (like in the second case), or implicit, if you have some kind of auto-commit every single INSERT

Resources