I'm using vertica flex tables i want to know how to apply projections on flex table
Thanks in advance
A freshly created flex table looks like this when you export its definition:
CREATE FLEX TABLE public.teas
(
);
CREATE PROJECTION public.teas_super /*+basename(teas),createtype(P)*/
(
__identity__,
__raw__
)
AS
SELECT teas.__identity__,
teas.__raw__
FROM public.teas
ORDER BY teas.__identity__
SEGMENTED BY hash(teas.__identity__) ALL NODES OFFSET 0;
No columns but the two hidden ones __identity__, which is an int, and __raw__, which is a long varbinary (130000).
Not much point in creating different projections, here.
You can load into the flex table, and then materialise a few columns that you find especially interesting in a flex table.
Then, they will appear in the output of an export_objects(), and you will be able to create projections for these.
Hope this helps ..
Marco
Related
I'm using MonetDB in a project and currently evaluating moving to DuckDB. As part of that I'm also reevaluating how we do versioning of data and if there's a better way to do it in DuckDB.
We manage data versions by having tables structured like this:
create table data(
row int,
key varchar(50),
value varchar(50),
version int
);
create table data_version(
row int,
next_version int
);
and then querying like this (to show a surface of all data up to and including version 2):
select d.key, d.value from data d
left join data_version dv on dv.row = d.row
where d.version <= 2 and (dv.next_version > 2 or dv.next_version is null);
Mininal working example here
This has the advantage of being append only (no table updates, just inserts) and seems to be quite performant. Bulk loading can be tricky because you have to keep track of what was already written in order to update the data_version table, but it's not too bad.
DuckDB has a lot of great functionality above and beyond standard SQL (like window funtions) and I'm wondering does this mean there's a better way to do versioning of data? I'm hoping someone more familiar with DuckDB might know. (Maybe there's just a better way to do versioning anyway!)
(Note, the example above isn't really showing off why we need a column oriented database but in that data table there will be lots of other columns we perform grouping style queries on, with the versioning clause)
i'm struggling to understand what the TABLE clause does, per oracle docs:
it transforms a collection like a nested table into a table which could be used in an sql statement.
which seems clear enough but i don't know how it works in practice.
these are the relevant types and tables;
create type movies_type as Table of ref movie_type;
create type actor_type under person_type
(
starring movies_type
) Final;
create table actor of actor_type
NESTED TABLE starring STORE AS starring_nt;
i want to list actors and movies they starred in, this works
select firstname, lastname, value(b).title
from actor2 a, table(a.starring) b;
but i don't understand why. why isn't this
actor2 a, table(a.starring) b
a Cartesian product?
also, why does value(b) work here?, since it's table of refs, it should be deref, but that doesn't work.
my question is:
why does this query work as intended? i would expect it to list every actor with every movie (Cartesian product) as there are no specified conditions on how to join, and why does value(b) work here?, since it's table of refs, it should be deref, but that doesn't work.
i don't have a mental model for oracle sql, help is very much appreciated on how to learn properly.
thank you very much.
It’s not a Cartesian product because table(a.starring) is correlated by a: For each row in a it is running the table function against its starring nested table.
This is not a very common way of data modelling in Oracle, usually you would use a junction table to allow for a properly normalised model (which usually is much easier to query and allows for better for performance)
I need to find a solution to the following problem: there should be a common and single "interface" that I can use in an insert into statement, something like this: insert into INTERFACE (fields) select ...
But there are many tables with the same structure behind the interface which should decide based on list of values (coming in a field) where to put the data. The tables are partitioned by range interval (daily) right now.
I was thinking about having a composite partitioned table which cannot be SELECT-ed to avoid mixing different type of data in a single select query, but creating views on the top of it. In this case the table should be partitioned like this: partition by list FIELD subpartition by range interval. But oracle 12 does not support this.
Any idea how to solve this? (There is a reason why I need a single interface and why I have to store data separately.)
Thank you in advance!
The INSERT ALL syntax can help easily route data to specific tables based on conditions:
create table interface1(a number, b number);
create table interface2(a number, b number);
insert all
when a <= 1 then
into interface1
else
into interface2
select '1' a, 2 b from dual;
Can we insert into a view in Hive?
I have done this in the past with Oracle and Teradata.
But, doesn't seem to work in Hive.
create table t2 (id int, key string, value string, ds string, hr string);
create view v2 as select id, key, value, ds, hr from t2;
insert into v2 values (1,'key1','value1','ds1','hr1')
***Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if null is encrypted: java.lang.NullPointerException***
These seems to be some sort of update support in view. But, I can't see anything on insert into a view.
https://cwiki.apache.org/confluence/display/Hive/UpdatableViews
Thanks for the feedback. Makes sense. The reason behind needing this functionality is, we use an ETL tool that has problems with handling high precision decimals (>15 digits). If the object(table->column in this case) is represented as string within the tool, we don't have a problem. So, i thought i'll define a bunch of views with string datatypes and use that in the tool instead. But, can't do inserts in hive to view. So, may be i need to think of something else. Have done this way before with oracle and teradata.
Can we have two tables with different structures point to the same underlying hdfs content? Probably wouldn't work because fo the parquet storage which stores schema. Sorry, not a hadoop expert.
Thanks a lot for your time.
It is not possible to insert data in a Hive view, Hive view is just a projection of a Hive table (you can see it as presaved query). From Hive documentation
Note that a view is a purely logical object with no associated
storage. (No support for materialized views is currently available in
Hive.) When a query references a view, the view's definition is
evaluated in order to produce a set of rows for further processing by
the query. (This is a conceptual description; in fact, as part of
query optimization, Hive may combine the view's definition with the
query's, e.g. pushing filters from the query down into the view.)
The link (https://cwiki.apache.org/confluence/display/Hive/UpdatableViews) seems to be for a proposed feature.
Per the official documentation:
Views are read-only and may not be used as the target of LOAD/INSERT/ALTER.
I'm wondering if its possible to create a view that automatically checks if there is a new monthly created table and if there is include that one?
We have a new table created each month and each one ends with the number of the month, like
table for January: table_1
table for February: table_2
etc...
Is it possible to create a view that takes data from all those tables and also finds when there is a new one created?
No, a view's definition is static. You would have to replace the view each month with a new copy that included the new table; you could write a dynamic PL/SQL program to do this. Or you could create all the empty tables now and include them all in the view definition; if necessary you could postpone granting any INSERT access to the future tables until they become "live".
But really, this model is flawed - see Michael Pakhantsov's answer for a better alternative - or just have one simple table with a MONTH column.
Will be possible if you instead of creating new table each month will create new partition for existing table.
UPDATE:
If you have oracle SE without partitioning option you can create two tables: LiveTable and ArchiveTable. Then each month you need move rows from Live to ArchiveTable and clean live table. In this case you need create view just from two tables.
Another option is to create the tables in another schema with grants to the relevant user and create public synonyms to them.
As the monthly tables get created in the local schema, they'll "out-precedence" the public synonyms and the view will pick them up. It will still get invalidated and need recompiling, but the actual view text should need changing, which may be simpler from a code-control point of view.
You can write a procedure or function that looks at USER_TABLES or ALL_TABLES to determine if a table exists, generate dynamic sql, and return a ref cursor with the data. The same can be done with a pipelined function.