Virtual column names in ClickHouse - clickhouse

ClickHouse documentation says:
Virtual column is an integral table engine attribute that is defined in the engine source code. You shouldn't specify virtual columns in the CREATE TABLE query and you can't see them in SHOW CREATE TABLE and DESCRIBE TABLE query results. Virtual columns are also read-only, so you can't insert data into virtual columns.
But I cannot find a list of virtual column names. Could you please point out, what are their names? How do I select their values and data types?

Do you mean MergeTree engine or file or HDFS or what?
MergeTree
_part -- name of a part
_part_index -- sequential index of the part in the query result
_partition_id -- name of a partition
_part_uuid -- unique part identifier, if enabled `MergeTree` setting `assign_part_uuids` (Part movement between shards)
_partition_value -- values (tuple) of a `partition by` expression
_sample_factor -- sample_factor from the query

Related

Hive not creating separate directories for skewed table

My hive version is 1.2.1. I am trying to create a skewed table but it clearly doesn't seem to be working. Here is my table creation script:-
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable
(
country string,
payload string
)
PARTITIONED BY (year int,month int,day int,hour int)
SKEWED BY (country) on ('USA','Brazil') STORED AS DIRECTORIES
STORED AS TEXTFILE;
INSERT OVERWRITE TABLE mydb.mytable PARTITION(year = 2019, month = 10, day=05, hour=18)
SELECT country,payload FROM mydb.mysource;
The select query returns names of countries and some associated string data (payload). So, based on the way I have specified skewing on the column 'country' I was expecting the insert statement to cause creation of separate directories for USA & Brazil (the select query returns enough rows with country as USA & Brazil), but this clearly didn't happen. I see that hive created directory called 'HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME' and all the values went into a single file in that directory. Skewed table is only supposed to send rows with default values (those not specified in table creation statement) to common directory (which is what HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME seems to be) and should create dedicated directories for the rows with skew values. But instead all is going to the default directory and the other directory isn't even created. Do I have to toggle any hive options to make this thing work?
It looks like old bug, doesn't look like it's fixed yet. https://issues.apache.org/jira/browse/HIVE-13697. Basically internally when Hive stores these skew values specified during the table creation, they are converted to lower case before storing in the metastore. That's why the workaround for now is to convert case in the select statement so it goes to the right bucket. I tested this and this way it works.

Searching over Oracle partitions

Is there a big performance hit when running SQL on a partitioned Oracle table where the SQL does not reference the column that is used for partitioning?
It depends. If your WHERE clause is not based on any index column your query will be very slow, because Oracle has to do a Full-Table-Scan.
If you have a globally defined index on columns in WHERE clause, it does not matter whether you access only one partition or all.
But you get a degradation when your table has many partions and locally defined index which are used by WHERE clause. Assume your table has 50 partitions, and locally defined index, then a query which does not specify the partition (either by WHERE condition or explicitly by partition name) has to scan 50 individual indexes likewise.

Have dynamic columns in external tables

My requirement is that I have to use a single external table in a store procedure for different text files which have different columns.
Can I use dynamic columns in external tables in Oracle 11g? Like this:
create table ext_table as select * from TBL_test
organization external (
type oracle_loader
default directory DATALOAD
access parameters(
records delimited by newline
fields terminated by '#'
missing field values are null
)
location ('APD.txt')
)
reject limit unlimited;
The set of columns that are defined for an external table, just like the set of columns that are defined for a regular table, must be known at the time the external table is defined. You can't choose at runtime to determine that the table has 30 columns today and 35 columns tomorrow. You could also potentially define the external table to have the maximum number of columns that any of the flat files will have, name the columns generically (i.e. col1 through col50) and then move the complexity of figuring out that column N of the external table is really a particular field to the ETL code. It's not obvious, though, why that would be more useful than creating the external table definition properly.
Why is there a requirement that you use a single external table definition to load many differently formatted files? That does not seem reasonable.
Can you drop and re-create the external table definition at runtime? Or does that violate the requirement for a single external table definition?

Index is not being used on a view with WHERE CONTAINS clause

I created a table and one of the columns is address. I then created a view with a WHERE CONTAINS clause that states select can only be performed on address that contain a specific word.
I then created an index of the address column on the original table.
It says index created.
When I type
select * from myview
It says
drg-10599: column is not indexed.
Any idea why this isn't working?
You would need to create an Oracle Text index, not a standard b-tree index. There are quite a few options for creating and maintaining Oracle Text indexes that you should really read through in order to figure out exactly what options you want to use.
The simplest possible DDL statement would be
CREATE INDEX myindex ON table_a(address)
INDEXTYPE IS CTXSYS.CONTEXT;

Efficiency of storing an Oracle User Defined Type in an index organized table

Are there known issues with storing user defined types within index organized tables in Oracle 10G?
CREATE OR REPLACE TYPE MyList AS VARRAY(256) OF NUMBER(8,0);
CREATE TABLE myTable (
id NUMBER(10,0) NOT NULL,
my_list MyList NOT NULL)
CONSTRAINT pk_myTable_id PRIMARY KEY(id))
ORGANIZATION INDEX NOLOGGING;
With this type and table setup, I loaded via insert append ~2.4M records and it took 20G of space at which point I ran out of disk space. Looking at the size of the data types, this seemed to be taking up a lot of space for what was being stored. I then changed the table to be a regular table (not IOT) and I stored 6M+ records in ~7G of storage, added the PK index which took an additional 512M.
I've used IOT many times in the past, but not with a user defined type.
Why is it that the storage requirements when using a UDT and IOT are so high?
AFAICR, Oracle always stores VARRAY's out of row in the IOT's.
I'll now try to find the references in the docs to confirm this.

Resources