How many ROS containers would be created if I am loading a table with 100 rows in a 3 node cluster - vertica

I have a 3 node cluster. There is 1 database and 1 table. I have not created a projection. If I load 100 rows in the table using copy command then:
How many projections would be created? I suspect only 1 super projection, am I correct?
If I am using segmentation then would that distribute data evenly (~33 rows) per node? Does that mean I have now 3 Read Optimised Storage (ROS) one per node and the projection has 3 ROSes?
If I use KSafety value as 1 then a copy of each ROS (buddy) would be stored in another node? So DO I have 6 ROSes now, each containing 33 rows?

Well, let's play the scenario ...
You will see that you get a projection and its identical buddy projection ...
And you can query the catalogue to count the rows and identify the projections ..
-- load a file with 100 random generated rows into table example;
-- generate the rows from within Vertica, and export to file
-- then create a new table and see what the projections look like
CREATE TABLE rows100 AS
SELECT
(ARRAY['Ann','Lucy','Mary','Bob','Matt'])[RANDOMINT(5)] AS fname,
(ARRAY['Lee','Ross','Smith','Davis'])[RANDOMINT(4)] AS lname,
'2001-01-01'::DATE + RANDOMINT(365*10) AS hdate,
(10000 + RANDOM()*9000)::NUMERIC(7,2) AS salary
FROM (
SELECT tm FROM (
SELECT now() + INTERVAL ' 1 second' AS t UNION ALL
SELECT now() + INTERVAL '100 seconds' AS t -- Creates 100 rows
) x TIMESERIES tm AS '1 second' OVER(ORDER BY t)
) y
;
-- set field separator to vertical bar (the default, actually...)
\pset fieldsep '|'
-- toggle to tuples only .. no column names and no row count
\tuples_only
-- spool to example.bsv - in bar-separated-value format
\o example.bsv
SELECT * FROM rows100;
-- spool to file off - closes output file
\o
-- create a table without bothering with projections matching the test data
DROP TABLE IF EXISTS example;
CREATE TABLE example LIKE rows100;
-- load the new table ...
COPY example FROM LOCAL 'example.bsv';
-- check the nodes ..
SELECT node_name FROM nodes;
-- out node_name
-- out ----------------
-- out v_sbx_node0001
-- out v_sbx_node0002
-- out v_sbx_node0003
SELECT
node_name
, projection_schema
, anchor_table_name
, projection_name
, row_count
FROM v_monitor.projection_storage
WHERE anchor_table_name='example'
ORDER BY projection_name, node_name
;
-- out node_name | projection_schema | anchor_table_name | projection_name | row_count
-- out ----------------+-------------------+-------------------+-----------------+-----------
-- out v_sbx_node0001 | public | example | example_b0 | 38
-- out v_sbx_node0002 | public | example | example_b0 | 32
-- out v_sbx_node0003 | public | example | example_b0 | 30
-- out v_sbx_node0001 | public | example | example_b1 | 30
-- out v_sbx_node0002 | public | example | example_b1 | 38
-- out v_sbx_node0003 | public | example | example_b1 | 32

Related

How to define for each table, the maximum value of one field of a list?

I have a list of Oracle table and fields and I would like to define for each table, the maximum value of the field of the list.
Input:
+------+--------+
| TAB | FIELDS |
+------+--------+
| tab1 | field1 |
+------+--------+
| tab2 | field2 |
+------+--------+
Output:
+------+--------+-----------+
| TAB | FIELDS | Max value |
+------+--------+-----------+
| tab1 | field1 | 10 |
+------+--------+-----------+
| tab2 | field2 | 15 |
+------+--------+-----------+
I want to write a PL / SQL function to create the loop but I have very little knowledge in this language. Do you have any examples to show me?
The input table is dynamic, which is why I want to use a loop.
thanks in advance
The input is build with system table like all_column_tab The output must be store in a table.
It is indeed not a great design to store and retrieve data, but I presume something like this should work for you. I've used a VARCHAR2 variable for storing max value instead of a Numeric because to handle MAX for non-numeric fields. Your table that stores the max val should be defined as VARCHAR2 for it to work normally for such cases.
DECLARE
v_maxVal VARCHAR2(400);
begin
FOR rec IN
( SELECT table_name,column_name
FROM user_tab_columns where table_name IN ('TAB1','TAB2')
)
LOOP
EXECUTE IMMEDIATE
'SELECT MAX('||rec.column_name||') FROM '||rec.table_name
INTO v_maxVal ;
INSERT INTO fieldstab(tab,fields,max_val) VALUES
( rec.table_name,rec.column_name,v_maxVal);
END LOOP;
END;
/
DEMO

Insert value based on min value greater than value in another row

It's difficult to explain the question well in the title.
I am inserting 6 values from (or based on values in) one row.
I also need to insert a value from a second row where:
The values in one column (ID) must be equal
The values in column (CODE) in the main source row must be IN (100,200), whereas the other row must have value of 300 or 400
The value in another column (OBJID) in the secondary row must be the lowest value above that in the primary row.
Source Table looks like:
OBJID | CODE | ENTRY_TIME | INFO | ID | USER
---------------------------------------------
1 | 100 | x timestamp| .... | 10 | X
2 | 100 | y timestamp| .... | 11 | Y
3 | 300 | z timestamp| .... | 10 | F
4 | 100 | h timestamp| .... | 10 | X
5 | 300 | g timestamp| .... | 10 | G
So to provide an example..
In my second table I want to insert OBJID, OBJID2, CODE, ENTRY_TIME, substr(INFO(...)), ID, USER
i.e. from my example a line inserted in the second table would look like:
OBJID | OBJID2 | CODE | ENTRY_TIME | INFO | ID | USER
-----------------------------------------------------------
1 | 3 | 100 | x timestamp| substring | 10 | X
4 | 5 | 100 | h timestamp| substring2| 10 | X
My insert for everything that just comes from one row works fine.
INSERT INTO TABLE2
(ID, OBJID, INFO, USER, ENTRY_TIME)
SELECT ID, OBJID, DECODE(CODE, 100, (SUBSTR(INFO, 12,
LENGTH(INFO)-27)),
600,'CREATE') INFO, USER, ENTRY_TIME
FROM TABLE1
WHERE CODE IN (100,200);
I'm aware that I'll need to use an alias on TABLE1, but I don't know how to get the rest to work, particularly in an efficient way. There are 2 million rows right now, but there will be closer to 20 million once I start using production data.
You could try this:
select primary.* ,
(select min(objid)
from table1 secondary
where primary.objid < secondary.objid
and secondary.code in (300,400)
and primary.id = secondary.id
) objid2
from table1 primary
where primary.code in (100,200);
Ok, I've come up with:
select OBJID,
min(case when code in (300,400) then objid end)
over (partition by id order by objid
range between 1 following and unbounded following
) objid2,
CODE, ENTRY_TIME, INFO, ID, USER1
from table1;
So, you need a insert select the above query with a where objid2 is not null and code in (100,200);

How to load data to same Hive table if file has different number of columns

I have a main table (Employee) which is having 10 columns and I can load data into it using load data inpath /file1.txt into table Employee
My question is how to handle the same table (Employee) if my file file2.txt has same columns but column 3 and columns 5 are missing. if I directly load data last columns will be NULL NULL. but instead it should load 3rd as NULL and 5th column as NULL.
Suppose I have a table Employee and I want to load the file1.txt and file2.txt to table.
file1.txt
==========
id name sal deptid state coutry
1 aaa 1000 01 TS india
2 bbb 2000 02 AP india
3 ccc 3000 03 BGL india
file2.txt
id name deptid country
1 second 001 US
2 third 002 ENG
3 forth 003 AUS
In file2.txt we are missing 2 columns i.e. sal and state.
we need to use the same Employee table how to handle it ?
I'm not aware of any way to create a table backed by data files with a non-homogenous structure. What you can do however, is to define separate tables for the different column configurations and then define a view that queries both.
I think it's easier if I provide an example. I will use two tables of people, both have a column for name, but one stores height as well, while the other stores weight instead:
> create table table1(name string, height int);
> insert into table1 values ('Alice', 178), ('Charlie', 185);
> create table table2(name string, weight int);
> insert into table2 values ('Bob', 98), ('Denise', 52);
> create view people as
> select name, height, NULL as weight from table1
> union all
> select name, NULL as height, weight from table2;
> select * from people order by name;
+---------+--------+--------+
| name | height | weight |
+---------+--------+--------+
| Alice | 178 | NULL |
| Bob | NULL | 98 |
| Charlie | 185 | NULL |
| Denise | NULL | 52 |
+---------+--------+--------+
Or as a closer example to your problem, let's say that one table has name, height and weight, while the other only has name and weight, thereby height is "missing from the middle":
> create table table1(name string, height int, weight int);
> insert into table1 values ('Alice', 178, 55), ('Charlie', 185, 78);
> create table table2(name string, weight int);
> insert into table2 values ('Bob', 98), ('Denise', 52);
> create view people as
> select name, height, weight from table1
> union all
> select name, NULL as height, weight from table2;
> select * from people order by name;
+---------+--------+--------+
| name | height | weight |
+---------+--------+--------+
| Alice | 178 | 55 |
| Bob | NULL | 98 |
| Charlie | 185 | 78 |
| Denise | NULL | 52 |
+---------+--------+--------+
Be sure to use union all and not just union, because the latter tries to remove duplicate rows, which makes it very expensive.
It seems like there is no way to directly load into specified columns.
As such, this is what you probably need to do:
Load data inpath to a (temporary?) table that matches the file
Insert into relevant columns of final table by selecting the contents of the previous table.
The situation is very similar to this question which covers the opposite scenario (you only want to load a few columns).

How to use Oracle DBMS_ADVANCED_REWRITE with bind variable?

We need to implement a query rewrite with a bind variable because we don't have the option of modifying the web application source code. Example:
BEGIN
SYS.DBMS_ADVANCED_REWRITE.declare_rewrite_equivalence (
name => 'test_rewrite2',
source_stmt => 'select COUNT(*) from ViewX where columnA = :1',
destination_stmt => 'select COUNT(*) from ViewY where columnA = :1',
validate => FALSE,
rewrite_mode => 'recursive');
END;
The above command will result in error because there is a bind variable:
30353. 00000 - "expression not supported for query rewrite"
*Cause: The SELECT clause referenced UID, USER, ROWNUM, SYSDATE,
CURRENT_TIMESTAMP, MAXVALUE, a sequence number, a bind variable,
correlation variable, a set result, a trigger return variable, a
parallel table queue column, collection iterator, a non-deterministic
date format token RR, etc.
*Action: Remove the offending expression or disable the REWRITE option on
the materialized view.
I am reading here that there is a work around but I just cannot find the document anywhere online.
Could you please tell me what the work around is?
You can't specify the bind parameters, but it should already work as you wish. The key is the recursive parameter you passed as mode.
The recursive and general mode will intercept all statements that involve the table (or view), disregarding the filter, and transform them to target the second table (or view), adapting the filter condition from your original statement.
(If you had defined it as TEXT_MATCH, it would have checked the presence of the same filter in the original and target statement in order to trigger the transformation.)
In the example below one can see that even if we don't define any bind condition, the filter id = 2 is applied nervetheless; in other words it is actually transforming the SELECT * FROM A1 where id = 2 into SELECT * FROM A2 where id = 2
set LINESIZE 300
drop table A1;
drop view A2;
drop index A1_IDX;
EXEC SYS.DBMS_ADVANCED_REWRITE.drop_rewrite_equivalence (name => 'test_rewrite');
create table A1 (id number, name varchar2(20));
insert into A1 values(1, 'hello world');
insert into A1 values(2, 'hola mundo');
create index A1_IDX on A1(id);
select * from A1;
ALTER SESSION SET QUERY_REWRITE_INTEGRITY = TRUSTED;
CREATE OR REPLACE VIEW A2 AS
SELECT id,
INITCAP(name) AS name
FROM A1
ORDER BY id desc;
BEGIN
SYS.DBMS_ADVANCED_REWRITE.declare_rewrite_equivalence (
name => 'test_rewrite',
source_stmt => 'SELECT * FROM A1',
destination_stmt => 'SELECT * FROM A2',
validate => FALSE,
rewrite_mode => 'recursive');
END;
/
select * from A1;
ID NAME
---------- --------------------
2 Hola Mundo
1 Hello World
select * from A1 where id = 2;
ID NAME
---------- --------------------
2 Hola Mundo
explain plan for
select * from A1 where id = 2;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
Plan hash value: 1034670462
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 2 (0)| 00:00:01 |
| 1 | VIEW | A2 | 1 | 25 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID | A1 | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN DESCENDING| A1_IDX | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("ID"=2)
Note
-----
- dynamic sampling used for this statement (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
20 rows selected
As you can see
the engine is transparently applying the transformation and returning the filtered result
on top of that, the transformation on the filter is applied. The filter is correctly "pushed" into the source table, to extract the values from A1. It is not blindly extracting all values from A2 and then applying the filter, so the performance is preserved.

LOOP update with column size

I tried to create PL/SQL script that would update my table with column size.
My table looks like this:
| ID | TEXT | SIZE |
--------------------
| 1 | .... | null |
| 2 | .... | null |
| 3 | .... | null |
...
I want the PL/SQL script to fill the size column depending of the length of text for a certain document and then delete the contents of the TEXT column.
Here's what I've tried:
DECLARE
cursor s1 is select id from table where size is null;
BEGIN for d1 in s1 loop
update table set size = (select length(TEXT) from table where id = d1) where id=d1;
end loop;
END;
/
Unless there is a good reason, do this in pure SQL (or put the following statement into PL/SQL):
UPDATE t
SET size = LENGTH(text),
text = NULL
WHERE size IS NULL;
This is both easier to read and faster.

Resources