Vertica: Parse string as JSON when inserting data from table into flextable - etl

I use Vertica 9.2.1 in EON-mode.I have a fact table with a column that holds JSON strings. I want to load this data together with some identifiers from the fact table into a flextable. So that we can run analysis on that data. What I want to avoid is, loading all the necessary data onto an ETL machine to transform the data and then load it into the flextable, since all the data is already available in Vertica. How can I tell Vertica to parse a VARCHAR column as JSON?
CREATE TABLE public.tmp_facts ("id" INTEGER, "user_id" VARCHAR(64), "event_type" VARCHAR(50), /* other columns omitted */ "additional" VARCHAR(65000));
INSERT INTO public.tmp_facts ("id", "user_id", "event_type", "additional")
SELECT 1, 'user1', 'event1', '{"os":"Android", "time":"'||NOW()||'"}';
CREATE FLEX TABLE public.fact_additional
(
"id" INTEGER NOT NULL,
"user_id" VARCHAR(64) NOT NULL,
"event_type" VARCHAR(50)
);
INSERT INTO public.fact_additional ("id", "user_id", "event_type")
SELECT "id", "user_id", "event_type", "additional" FROM tmp_facts;
SELECT "additional", "additional.os", "additional[os]" FROM fact_additional;
I expected that the last query outputs for at least one column Android

You need to pass the additional column through MapJSONExtractor() function when inserting from public.tmp_facts into public.fact_additional
INSERT INTO public.fact_additional ("id", "user_id", "event_type")
SELECT "id",
"user_id",
"event_type",
MapJSONExtractor("additional") as additional
FROM tmp_facts;
SELECT "additional"['os'] as os FROM fact_additional;
os
---------
Android
(1 row)
Notice the usage of single / double quotes at the appropriate places.

Related

i got a error ORA-01747 when insert data into table

i got trouble when insert data using this query :
INSERT ALL
INTO obat ('id_obat','nama_obat','tanggal_kadarluarsa','stock','harga')
VALUES (1, 'Indomethacin', '2023-09-01', 50, 3000)
SELECT * FROM dual;
this is the table query :
CREATE TABLE obat (
id_obat INTEGER NOT NULL,
nama_obat VARCHAR2(255) NOT NULL,
tanggal_kadarluarsa DATE NOT NULL,
stock INTEGER NOT NULL,
harga NUMBER(20, 2) NOT NULL,
CONSTRAINT obat_pk PRIMARY KEY ( id_obat )
);
is somethin wrong with my code?
Single quotes are for string literals; identifiers (e.g. column names) should be in double quotes - but only have to be if they were created as quoted identifiers, and then the case of the name has to match the data dictionary name exactly. Yours are not quoted identifiers in the create statement, so you can just do:
INSERT ALL
INTO obat (id_obat,nama_obat,tanggal_kadarluarsa,stock,harga)
VALUES (1, 'Indomethacin', DATE '2023-09-01', 50, 3000)
SELECT * FROM dual;
If you really wanted to quote them then you would need to do (including the table name to demonstrate, as the same rules apply):
INSERT ALL
INTO "OBAT" ("ID_OBAT","NAMA_OBAT","TANGGAL_KADARLUARSA","STOCK,HARGA")
VALUES (1, 'Indomethacin', DATE '2023-09-01', 50, 3000)
SELECT * FROM dual;
but that's just more typing and arguably maybe harder to read, and easier to get wrong.
You can read more about quoted and non-quoted identifiers in the documentation.
With a single row you don't really need ALL, you can also do:
INSERT INTO obat (id_obat,nama_obat,tanggal_kadarluarsa,stock,harga)
VALUES (1, 'Indomethacin', DATE '2023-09-01', 50, 3000)
Note that I've added the DATE keyword to these statements; '2023-09-01' is not a date, it's a string literal, so you're relying on Oracle implicitly converting that to an actual date, based on your current session NLS settings. With DATE '2023-09-01' that is now a date literal. Again, there is more in the documentation.

i want to make a query to give condition to my board table

INSERT INTO ABOARD(BNO, BNAME, BPW, BTITLE, BCONTENT, BDATE, BGROUP,
BSTEP, BINDENT, COMCNT, ISDEL, HAVEFILE)
VALUES(#{bno}, #{bname }, #{bpw },
#{btitle }, #{bcontent }, SYSDATE, GROUPSEQ.NEXTVAL, 0, 0, 0,'N','N')
This is my board table.
I want to change the last column "HAVEFILE" if I add file to insert the board when I register.
below table is the attachment file. How to join two table or make a query?
CREATE TABLE MP_FILE
(
FILE_NO NUMBER,
BNO NUMBER NOT NULL,
ORG_FILE_NAME VARCHAR2(260) NOT NULL,
STORED_FILE_NAME VARCHAR2(36) NOT NULL,
FILE_SIZE NUMBER,
REGDATE DATE DEFAULT SYSDATE NOT NULL,
DEL_GB VARCHAR2(1) DEFAULT 'N' NOT NULL,
PRIMARY KEY(FILE_NO)
);

Expensive subquery tuning with SQLite

I'm working on a small media/file management utility using sqlite for it's persistent storage needs. I have a table of files:
CREATE TABLE file
( file_id INTEGER PRIMARY KEY AUTOINCREMENT
, file_sha1 BINARY(20)
, file_name TEXT NOT NULL UNIQUE
, file_size INTEGER NOT NULL
, file_mime TEXT NOT NULL
, file_add_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
);
And also a table of albums
CREATE TABLE album
( album_id INTEGER PRIMARY KEY AUTOINCREMENT
, album_name TEXT
, album_poster INTEGER
, album_created TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
, FOREIGN KEY (album_poster) REFERENCES file(file_id)
);
to which files can be assigned
CREATE TABLE album_file
( album_id INTEGER NOT NULL
, file_id INTEGER NOT NULL
, PRIMARY KEY (album_id, file_id)
, FOREIGN KEY (album_id) REFERENCES album(album_id)
, FOREIGN KEY (file_id) REFERENCES file(file_id)
);
CREATE INDEX file_to_album ON album_file(file_id, album_id);
Part of the functionality is to list albums, exposing
the album id,
the album's name,
an poster image for that album and
the number of files in the album
which currently uses this query:
SELECT a.album_id, a.album_name,
COALESCE(
a.album_poster,
(SELECT file_id FROM file
NATURAL JOIN album_file af
WHERE af.album_id = a.album_id
ORDER BY file.file_name LIMIT 1)),
(SELECT COUNT(file_id) AS file_count
FROM album_file WHERE album_id = a.album_id)
FROM album a
ORDER BY album_name ASC
The only "tricky" part of that query is that the album_poster column may be null, in which case COALESCE statement is used to just return the first file in the album as the "default poster".
With currently ~260000 files, ~2600 albums and ~250000 entries in the album_file table, this query takes over 10 seconds which makes for a not-so-great user experience. Here's the query plan:
0|0|0|SCAN TABLE album AS a
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|1|SEARCH TABLE album_file AS af USING COVERING INDEX album_to_file (album_id=?)
1|1|0|SEARCH TABLE file USING INTEGER PRIMARY KEY (rowid=?)
1|0|0|USE TEMP B-TREE FOR ORDER BY
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE album_file USING COVERING INDEX album_to_file (album_id=?)
Replacing the COALESCE statement with just a.album_poster, sacrificing the auto-poster functionality, brings the query time down to a few milliseconds:
0|0|0|SCAN TABLE album AS a
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE album_file USING COVERING INDEX album_to_file (album_id=?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
What I don't understand is that limiting the album listing to 1 or 1000 rows makes no difference. It seems SQLite is doing the expensive sub-query for the "default" poster on all albums, only to throw away most of the results when finally cutting down the result set to the LIMITs specified with the query.
Is there something I can do to make the original query substantially faster, especially given that I'm usually only querying a small subset (using LIMIT) of all rows for display?

cx_Oracle.DatabaseError: ORA-00947: not enough values

I have a table:
create table employee (
employee_id NUMBER NOT NULL,
name VARCHAR2(255) NOT NULL,
notes VARCHAR2(4000),
created_by varchar2(255) not null,
created_at date default sysdate not null,
updated_by varchar2(255) not null,
updated_at date default sysdate not null,
PRIMARY KEY(vendor_id)
);
so when I insert from SQL developer:
insert into employee(employee_id, name,notes) values(1,'xyz','test');
it auto populates create_by, created_at, updated_at and updated_by.
row gets inserted successfully.
Whereas if I try to insert using cx_Oracle module in python,
cursor.execute("INSERT INTO employee VALUES (:employee_id,:name,:notes)",
{
'employee_id' : max_value,
'name' : each_vendor,
'notes' : 'test'
}
)
it throws error saying not enough values.
Why do I get this error? How can I solve it?
The answer is very simple, and has nothing to do with python. Your 2 insert statements are very different.
In the 1st, you explicitly name the columns you intend to provide values for: (employee_id, name,notes). However, in the insert statement used from Python, you don't specify the 3 columns by name. As a result, your insert statement expects you to provide the values for all columns in the table.
The fix: explicitly name the 3 columns:
cursor.execute("INSERT INTO employee (employee_id, name, notes) VALUES (:employee_id,:name,:notes)",
{
'employee_id' : max_value,
'name' : each_vendor,
'notes' : 'test'
}
)

"Create table as select" does not preserve not null

I am trying to use the "Create Table As Select" feature from Oracle to do a fast update. The problem I am seeing is that the "Null" field is not being preserved.
I defined the following table:
create table mytable(
accountname varchar2(40) not null,
username varchar2(40)
);
When I do a raw CTAS, the NOT NULL on account is preserved:
create table ctamytable as select * from mytable;
describe ctamytable;
Name Null Type
----------- -------- ------------
ACCOUNTNAME NOT NULL VARCHAR2(40)
USERNAME VARCHAR2(40)
However, when I do a replace on accountname, the NOT NULL is not preserved.
create table ctamytable as
select replace(accountname, 'foo', 'foo2') accountname,
username
from mytable;
describe ctamytable;
Name Null Type
----------- ---- -------------
ACCOUNTNAME VARCHAR2(160)
USERNAME VARCHAR2(40)
Notice that the accountname field no longer has a null, and the varchar2 field went from 40 to 160 characters. Has anyone seen this before?
This is because you are no longer selecting ACCOUNTNAME, which has a column definition and meta-data. Rather you are selecting a STRING, the result of the replace function, which doesn't have any meta-data. This is a different data type entirely.
A (potentially) better way that might work is to create the table using a query with the original columns, but with a WHERE clause that guarantees 0 rows.
Then you can insert in to the table normally with your actual SELECT.
By having query of 0 rows, you'll still get the column meta-data, so the table should be created, but no rows will be inserted. Make sure you make your WHERE clause something fast, like WHERE primary_key = -999999, some number you know would never exist.
Another option here is to define the columns when you call the CREATE TABLE AS SELECT. It is possible to list the column names and include constraints while excluding the data types.
An example is shown below:
create table ctamytable (
accountname not null,
username
)
as
select
replace(accountname, 'foo', 'foo2') accountname,
username
from mytable;
Be aware that although this syntax is valid, you cannot include the data type. Also, explicitly declaring all the columns somewhat defeats the purpose of using CREATE TABLE AS SELECT.

Resources