Partitions are still showing in hive even though they are dropped for an external table - hadoop

I have an external table in hive partitioned by year, month, day. So I dropped one partition but I still see it in show partitions.
>use test_raw_tables;
>show partitions test1_raw;
[year=2016/month=01/day=01]
[year=2017/month=03/day=24]
> alter table test1_raw drop partition (year=2016, month=01, day=01);
> refresh test1_raw;
> show partitions test1_raw;
[year=2016/month=01/day=01]
[year=2017/month=03/day=24] ---Still see the dropped partition here----
> msck repair table test1_raw;
> show partitions test1_raw;
[year=2016/month=01/day=01]
[year=2017/month=03/day=24] ---Still see the dropped partition here----
Running from impala with hive as engine.
describe test1_raw
col_name,data_type,comment ('amount_hold', 'int', '') ('id', 'int', '') ('transaction_id', 'string', '') ('recipient_id', 'string', '') ('year', 'string', '') ('month', 'string', '') ('day', 'string', '') ('', None, None) ('# Partition Information', None, None) ('# col_name ', 'data_type ', 'comment ') ('', None, None) ('year', 'string', '') ('month', 'string', '') ('day', 'string', '')
location 'hdfs://localhost/sys/datalake/testing/test1_raw'
What is the problem here?
The data in hdfs is deleted for that partition after dropping it. Cannot figure out the issue.

In your Table defination Column Year, Month and Day are in String format.
Pls try with '2016', '01' and '01'.
I used below code and it works.
alter table test1_raw drop partition (year='2016', month='01', day='01');

Not that it matters anymore but you changed a definition of a table. You didn't change the data. Hive is unique in that it will let you define schema on read, altering the definition is just altering the definition it's not changing the data only the table definition. You can have multiple tables sit on top of the same data without issues, it doesn't mean a table definition change in one affects the other. (Well I'm talking about On READ anyways...)

Related

HBase Need to export data from one cluster and import it to another with slight modification in row key

I am trying to export data from HBase table 'mytable' which rowkey starts with 'abc'.
scan 'mytable', {ROWPREFIXFILTER => 'abc'}
The above exported data need to be imported into the another cluster by changing the rowkey prefix from 'abc' to 'def'
Old Data:
hbase(main):002:0> scan 'mytable', {ROWPREFIXFILTER => 'abc'}
ROW COLUMN+CELL
abc-6535523 column=track:aid, timestamp=1339121507633, value=some stream/pojos
New Data: (In another cluster)
hbase(main):002:0> get 'mytable', 'def-6535523'
ROW COLUMN+CELL
def-6535523 column=track:aid, timestamp=1339121507633, value=some stream/pojos
Only part of the row key needs to be modified. Other data needs to be as same.
Tried to use bin/hbase org.apache.hadoop.hbase.mapreduce.Export table_name file:///tmp/db_dump/
In the Export there is no provision to specify start row and end row.
But don't know how to import it with changed rowkey.
Also is there any inbuilt available in HBase/Hadoop to achie
Please help.

Clickhouse create jdbc engine table?

CREATE TABLE jdbc_table
ENGINE = JDBC('jdbc:mysql://192.168.10.16:4307/?user=root&password=root', 'test', 'test')
this statement gets error
Syntax error: failed at position 114 (end of query):
CREATE TABLE jdbc_table ENGINE =
JDBC('jdbc:mysql://192.168.10.16:4307/?user=root&password=root',
'test', 'test')
Expected one of: AS, SETTINGS, TTL, PARTITION BY, PRIMARY KEY, ORDER
BY, SAMPLE BY
I don't know why.
Documentation is incorrect. Create table needs a columns list (e.g. (A String, B UInt32)).
create table jdbc.category engine
= JDBC('datasource://postgresql', 'public', 'category')
as select * from jdbc('datasource://postgresql', 'public', 'category');
or
create table jdbc.category(A String, B UInt32)
engine = JDBC('datasource://postgresql', 'public', 'category')

How to change a table to partition table with DBMS_REDEFINITION but no primary key in current table

How to convert a non-partition table (no primary key) to partitioned table? Someone says I can use rowid, but I can not find any sample from Oracle doc.
My oracle is 12C release 1, it did not contain the new feature Using the MODIFY clause of ALTER TABLE to convert online to a partitioned table.
Please provide a sample if you can.
"Someone says can use rowed, but I can not find any sample from oracle doc"
I think the option you are looking for is the DBMS_REDEFINITION.START_REDEF_TABLE parameter options_flag.
Like this
start_redef_table (
uname => 'your_schema'
, orig_table => 'your_current_table'
, int_table => 'your_interim_table'
, options_flag => dbms_redefinition.cons_use_rowid
);
Find out more

How to add Histogram constraints to Yes when altering adding columns

I'm working on Oracle database and have to alter the table 'RETURNS' and add the columns RENTAL_SALES and INBOUND_SALES.
ALTER TABLE
RETURNS
ADD(
RENTAL_SALES NUMBER (14,2) NULL,
INBOUND_SALES NUMBER (14,2) NULL
);
How do I set the Histogram to "Yes"
Run the gather status using method_opt='FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 {colum name on which you want to enable histogram}' .
Check whether it is enabled or not
Select column_name, histogram from
User_tab_column_statics where table_name='tableName';
Why you need to use histograms? are you facing wrong query planes?
There are type of histograms, depending on Number of distinct values the type is assigned.
frequency(top) histograms, high balanced histograms and hybrid histograms.
The database will assign a histogram by gathering the statistics auto, then query on the tables (when querying on the table data will be update on SYS.COL_USAGE$) then update statistic again.
BEGIN
dbms_stats.Gather_table_stats('SCHEMA_NAME', 'TABLE',
method_opt => 'FOR ALL COLUMNS SIZE AUTO');
END;
/
select * from TABLE where ....
BEGIN
dbms_stats.Gather_table_stats('SCHEMA_NAME', 'TABLE',
method_opt => 'FOR ALL COLUMNS SIZE AUTO');
END;
/
Note: ( If you already created an index before or already updated statistics and you were querying on the table, updating the statistics again will create the histogram)
Another Note: this method_opt='FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 column name will assign the column to high balanced , maybe this columns needs frequency type, so if you don't know the NDV and how much data there its better let the database choose, else you might have bad query plan, and the rest columns will not have histograms created because all columns size 1 collects base column statistics.

How to analyze a table using DBMS_STATS package in PL/SQL?

Here's the code I'm working on:
begin
DBMS_STATS.GATHER_TABLE_STATS (ownname => 'appdata' ,
tabname => 'TRANSACTIONS',
cascade => true,
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt=>'for all indexed columns size 1',
granularity => 'ALL',
degree => 1);
end;
After executing the code, PL/SQL procedure successfully completed is displayed.
How to view the statistics for the particular table, analyzed by DBMS_STATS ?
You may see information in DBA_TABLES
SELECT *
FROM DBA_TABLES where table_name='TRANSACTIONS';
e.g. Column LAST_ANALYZED shows when it was last analyzed.
There are also information column by column in
SELECT * FROM all_tab_columns where table_name='TRANSACTIONS';
where you could find min value, max value, etc.

Resources