Flex tables are one of the new features in Vertica 7.0.
Can anyone tell me how does Flex table convert the unstructured data into structured data?
Thanks in advance!
Well Flex tables are a new feauture in Vertica 7.0. This feauture creates a different kind of table designed especially for loading and querying unstructured data, also called semi-structured data in HP Vertica
Syntax to create a flex table :
create flex table unstruc_data();
Where the content of the unstruc_data has two columns the _identity_ and the _row_;
Where the row col is the content of the semistructured data with it's type LONG VARBINARY and the identitity will be the row id.
Flex tables comes with a set of help functions :
COMPUTE_FLEXTABLE_KEYS
BUILD_FLEXTABLE_VIEW
COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW
MATERIALIZE_FLEXTABLE_COLUMNS
RESTORE_FLEXTABLE_DEFAULT_KEYS_TABLE_AND_VIEW
I am not going to go thru explaining all of them as i think you should go and study them.
For more details on new Vertica Features go to this link Vertica 7.0 New Stuff
In a Scenario where a JSON document is passed to you through client and you need to store it in Vertica DB.
Without using the flex table here are few problems :
1) You need to know the structure of the Json .
2) Create a table in Vertica DB .
3) Extract each column value from JSON document
4) Insert the values in the table .
Apart from this process , If a new Key is added to JSON there is additional task on vertica DB to alter the table and also on processing logic to get the new key pair value
Using Flex table ,Below detailed is a explanation on how we simplify it:
1) Take the below Json,EE.txt
{"Name":"Rahul","Age":30}
2) Create a flex table EMP_test
dbadmin=> create flex table EMP_Test();
CREATE TABLE
3) Load the data into the flex table
dbadmin=> copy EMP_Test from '/home/dbadmin/EE.txt' parser fjsonparser();
Rows Loaded
-------------
1
(1 row)
4) To find out what keys are there in your Json , You have to refresh keys projection using below command
dbadmin=> select compute_flextable_keys('EMP_Test');
compute_flextable_keys
--------------------------------------------------
Please see public.EMP_Test_keys for updated keys
(1 row)
dbadmin=> select * FRom EMP_Test_keys;
key_name | frequency | data_type_guess
----------+-----------+-----------------
Age | 1 | varchar(20)
Name | 1 | varchar(20)
(2 rows)
5) Refresh the view for flex table using below command .You can query the view for data
dbadmin=>
dbadmin=> select build_flextable_view('EMP_Test');
build_flextable_view
-----------------------------------------------------
The view public.EMP_Test_view is ready for querying
(1 row)
dbadmin=> select * From EMP_Test_View
dbadmin-> ;
age | name
-----+-------
30 | Rahul
(1 row)
6) Now , If your Json structure changes and a Additional key 'Gender' is added .
{"Name":"Sid","Age":22,"Gender":"M"}
7) You can load the data directly into the table EMP_Test
dbadmin=> copy EMP_Test from '/home/dbadmin/EE1.txt' parser fjsonparser();
Rows Loaded
-------------
1
(1 row)
8) Re compute the keys and rebuild the view using below command
dbadmin=> select compute_flextable_keys('EMP_Test');
compute_flextable_keys
--------------------------------------------------
Please see public.EMP_Test_keys for updated keys
(1 row)
dbadmin=> select build_flextable_view('EMP_Test');
build_flextable_view
-----------------------------------------------------
The view public.EMP_Test_view is ready for querying
(1 row)
9) You can find the new data added and new keys using the below command .
dbadmin=>
dbadmin=> select * From EMP_Test_keys;
key_name | frequency | data_type_guess
----------+-----------+-----------------
Age | 2 | varchar(20)
Name | 2 | varchar(20)
Gender | 1 | varchar(20)
(3 rows)
dbadmin=> select * From EMP_test_view;
age | name | gender
-----+-------+--------
30 | Rahul |
22 | Sid | M
(2 rows)
This is how Flex table converts unstructured data(semi structured data) to structured data .
Flex table has made it very easy to integrate any data service with vertica DB .
all unstructured data save to raw data field
it's a BLOB
when you need access to unstructured field, it's a SLOW, because need BLOB extracting
Related
I have requirement of performing some calculation on a column of a table with large date set ( 300 GB). and return that value.
Basically I need to create a View on that table. Table has data of 21 years and It is partitioned on date column (Daily). We can not put date condition on View's query and User will put filter on runtime while execution of the view.
For example:
Create view v_view as
select * from table;
Noe I want to query View like
Select * v_view where ts_date between '1-Jan-19' and '1-Jan-20'
How Internally Oracle execute above statement? Will it execute view query first and then put date filter on that?
If so will there not be performance issue ? and how to resolve this?
oracle first generates the view and then applies the filter. you can create a function that input may inserted by user. the function results a create query and if yo run the query then the view will be created. just run:
create or replace function fnc_x(where_condition in varchar2)
return varchar2
as
begin
return ' CREATE OR REPLACE VIEW sup_orders AS
SELECT suppliers.supplier_id, orders.quantity, orders.price
FROM suppliers
INNER JOIN orders
ON suppliers.supplier_id = orders.supplier_id
'||where_condition||' ';
end fnc_x;
this function should be run. input the function is a string like this:
''WHERE suppliers.supplier_name = Microsoft''
then you should run a block like this to run the function's result:
cl scr
set SERVEROUTPUT ON
declare
szSql varchar2(3000);
crte_vw varchar2(3000);
begin
szSql := 'select fnc_x(''WHERE suppliers.supplier_name = Microsoft'') from dual';
dbms_output.put_line(szSql);
execute immediate szSql into crte_vw; -- generate 'create view' command that is depended on user's where_condition
dbms_output.put_line(crte_vw);
execute immediate crte_vw ; -- create the view
end;
In this manner, you just need received where_condition from user.
Oracle can "push" the predicates inside simple views and can then use those predicates to enable partition pruning for optimal performance. You almost never need to worry about what Oracle will run first - it will figure out the optimal order for you. Oracle does not need to mindlessly build the first step of a query, and then send all of the results to the second step. The below sample schema and queries demonstrate how only the minimal amount of partitions are used when a view on a partitioned table is queried.
--drop table table1;
--Create a daily-partitioned table.
create table table1(id number, ts_date date)
partition by range(ts_date)
interval (numtodsinterval(1, 'day'))
(
partition p1 values less than (date '2000-01-01')
);
--Insert 1000 values, each in a separate day and partition.
insert into table1
select level, date '2000-01-01' + level
from dual
connect by level <= 1000;
--Create a simple view on the partitioned table.
create or replace view v_view as select * from table1;
The following explain plan shows "Pstart" and "Pstop" set to 3 and 4, which means that only 2 of the many partitions are used for this query.
--Generate an explain plan for a simple query on the view.
explain plan for
select * from v_view where ts_date between date '2000-01-02' and date '2000-01-03';
--Show the explain plan.
select * from table(dbms_xplan.display(format => 'basic +partition'));
Plan hash value: 434062308
-----------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | | |
| 1 | PARTITION RANGE ITERATOR| | 3 | 4 |
| 2 | TABLE ACCESS FULL | TABLE1 | 3 | 4 |
-----------------------------------------------------------
However, partition pruning and predicate pushing do not always work when we may think they should. One thing we can do to help the optimizer is to use date literals instead of strings that look like dates. For example, replace
'1-Jan-19' with date '2019-01-01'. When we use ANSI date literals, there is no ambiguity and Oracle is more likely to use partition pruning.
I have a column which i set enum earlier,
$table->enum('column_name', ['value1', 'value2']);
Now i want to change that into string without losing data.
I am using postgres database.
please help me,
thanks
You Actually cant change type of enum column with simple migration
How i achieved it was using the DB::statement to alter the column type
DB::statement('ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(200)');
I'm not sure about the postgres You can modify query according to your need. Do make backup 1st we aren't sure if it will make you lose your data.
In plain SQL the solution is more something like "ALTER TABLE table_name ALTER COLUMN column_name SET DATA TYPE VARCHAR(20)".
For example:
postgres=# create table tenum(x serial, y enumval);
CREATE TABLE
postgres=# select relfilenode from pg_class where relname='tenum';
relfilenode
-------------
61038
(1 row)
postgres=# insert into tenum(y) values ('val1');
INSERT 0 1
postgres=# insert into tenum(y) values ('val2');
INSERT 0 1
postgres=# select * from tenum;
x | y
---+------
1 | val1
2 | val2
(2 rows)
postgres=# alter table tenum alter column y set data type varchar(20);
ALTER TABLE
postgres=# select relfilenode from pg_class where relname='tenum';
relfilenode
-------------
61042
(1 row)
postgres=# select * from tenum;
x | y
---+------
1 | val1
2 | val2
(2 rows)
postgres=#
Note that PostgreSQL will rewrite the table due to data type change.
Incoming file format mainframe/cobol record layout, one single record that is more than 21000 characters long. Please be aware of the occurs 350 times , which is making the record length very long, a horizontal layout, instead of a row-like layout in incoming file.
id pic x(23).
idnum pic 9(04).
filler pic x(10).
grp occurs 350 times
grpkey1 PIC X(25).
grpkeynum PIC X(09).
grpsubkey PIC X(01).
grptyp PIC X(01).
grpst PIC X(08).
grpend PIC X(08).
filler PIC X(10).
Target Table Definition (Preferably Oracle External Table)
create table grpkeys (
id CHAR(23),
idnum CHAR(04),
filler10 CHAR(10),
grpkey1 CHAR(25),
grpkeynum CHAR(09),
grpsubkey CHAR(01),
grptyp CHAR(01),
grpst CHAR(08),
grpend CHAR(08),
filler20 CHAR(10)
)
I have to load above record format in a file into a table (preferably a working oracle external table, if possible). id, idnum, filler10 values need to be copied into all 350 records created in oracle table (preferably external table) for a single record of incoming file. Please suggest the easiest way to accomplish this.
I'll stick to 5 columns for this example, but there should be no syntactic nor performance restriction to scale from 005 up to 350. Example assumes your file is on Oracle database server in file /tmp/test/test.txt.
My recommendation is to use an Oracle external table definition that effectively reads the data file "as is" (in all its 350-column glory), but does not worry about parsing out the 350-repeat field into components like grpkey1, grpsubkey, etc.
CREATE DIRECTORY TEST_DIR AS '/tmp/test';
CREATE TABLE TEST_XT
(
id VARCHAR2(23),
idnum INTEGER,
filler10 VARCHAR2(10),
GRP_001 varchar2(62),
GRP_002 varchar2(62),
GRP_003 varchar2(62),
GRP_004 varchar2(62),
GRP_005 varchar2(62)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY TEST_DIR
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
FIELDS
(
ID CHAR(23),
IDNUM INTEGER EXTERNAL (4),
FILLER10 CHAR(10),
GRP_001 CHAR(62),
GRP_002 CHAR(62),
GRP_003 CHAR(62),
GRP_004 CHAR(62),
GRP_005 CHAR(62)
)
)
LOCATION ('test.txt')
);
Then wrap that external table definition with a view definition that performs an UNPIVOT operation (plus some basic SUBSTR functions to slice each of the 350 unpivoted GRP fields into its constituent pieces).
CREATE OR REPLACE VIEW TEST_V AS
SELECT ID, IDNUM, FILLER10, GRP_NUM,
SUBSTR(GRP_STR,01,25) GRPKEY1,
SUBSTR(GRP_STR,26,09) GRPKEYNUM,
SUBSTR(GRP_STR,35,01) GRPSUBKEY,
SUBSTR(GRP_STR,36,01) GRPTYP,
SUBSTR(GRP_STR,37,08) GRPST,
SUBSTR(GRP_STR,45,08) GRPEND,
SUBSTR(GRP_STR,53,10) FILLER20
FROM
(
SELECT *
FROM TEST_XT
UNPIVOT
(GRP_STR FOR GRP_NUM IN
(
GRP_001 as 1,
GRP_002 as 2,
GRP_003 as 3,
GRP_004 as 4,
GRP_005 as 5
)
)
);
Of course, you can query view directly, or load to standard table (insert into standard_table select * from test_v) for indexing, partitioning, etc needs.
Also, you can scale to desired level of performance by adding parallelism to external table:
ALTER TABLE TEST_XT PARALLEL 8;
I have a HBase table where the rowkey looks like this.
08:516485815:2013 1
06:260070837:2014 1
00:338289200:2014 1
I create a Hive link table using the below query.
create external table hb
(key string,value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,e:-1")
tblproperties("hbase.table.name"="hbaseTable");
When I query the table I get the below result
select * from hb;
08:516485815 1
06:260070837 1
00:338289200 1
This is very strange to me. Why the serde is not able to map the whole content of the HBase key? The hive table is missing everything after the second ':'
Has anybody faced a similar kind of issue?
I tried by recreating your scenario on Hbase 1.1.2 and Hive 1.2.1000,it works as expected and i am able to get the whole rowkey from hive.
hbase> create 'hbaseTable','e'
hbase> put 'hbaseTable','08:516485815:2013','e:-1','1'
hbase> scan 'hbaseTable'
ROW COLUMN+CELL
08:516485815:2013 column=e:-1, timestamp=1519675029451, value=1
1 row(s) in 0.0160 seconds
As i'm having 08:516485815:2013 as rowkey and i have created hive table
hive> create external table hb
(key string,value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,e:-1")
tblproperties("hbase.table.name"="hbaseTable");
hive> select * from hb;
+--------------------+-----------+--+
| hb.key | hb.value |
+--------------------+-----------+--+
| 08:516485815:2013 | 1 |
+--------------------+-----------+--+
Can you once make sure your hbase table rowkey having the data after second :.
I have created a cluster in Oracle
CREATE CLUSTER myLovelyCluster (clust_id NUMBER(38,0))
SIZE 1024 SINGLE TABLE HASHKEYS 11;
Than a table for the cluster
CREATE TABLE Table_cluster
CLUSTER myLovelyCluster (columnRandom)
AS SELECT * FROM myTable ;
the columnRandom is well defined as NUMBER(38,0) but why I am getting an error assuming incompatible column definition?
Thanks
Are you sure that columnRandom is number(38,0)? In oracle NUMBER != NUMBER(38,0)
Let's create two table.
create table src_table ( a number);
create table src_table2( a number(38,0));
select column_name,data_precision,Data_scale from user_tab_cols where table_name like 'SRC_TABLE%';
Result of query is. Definitions of column are different.
+-------------+----------------+------------+
| Column_name | Data_Precision | Data_scale |
+-------------+----------------+------------+
| A | | |
| A | 38 | 0 |
+-------------+----------------+------------+
And if i try creat cluster for first table.
CREATE TABLE Table_cluster
CLUSTER myLovelyCluster (a)
AS SELECT * FROM src_table ;
ORA-01753: column definition incompatible with clustered column definition
For 2-nd every thing is ok.
CREATE TABLE Table_cluster
CLUSTER myLovelyCluster (a)
AS SELECT * FROM src_table2 ;
If you add cast into select. Execution also is correct.
CREATE TABLE Table_cluster CLUSTER myLovelyCluster (a)
AS SELECT cast(a as number(38,0)) FROM src_table;