Greenplum date/time field out of range: "10/10/2020" - insert

I'm testing Greenplum (which is based of Postegres) with a table of this form:
CREATE TABLE whiteglove (bigint BIGINT,varbinary bytea,boolean BOOLEAN,date DATE,decimal DECIMAL,double float,real REAL,integer INTEGER,smallint SMALLINT,timestamp TIMESTAMP,tinyint smallint,varchar VARCHAR)
Then I trying to insert this row using Postegres JDBC driver
INSERT INTO whiteglove VALUES (100000,'68656c6c6f',TRUE,'10/10/2020',0.5,1.234567,1.234,10,2,'4/14/2015 7:32:33PM',2,'hello')
which fails with the following error
org.postgresql.util.PSQLException: ERROR: date/time field value out of range: "10/10/2020"
Hint: Perhaps you need a different "datestyle" setting.
Position: 57
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:310)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:296)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:273)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:268)
If I take that same query and execute it from terminal using psql it passes without problems
dev=# select * from whiteglove ;
bigint | varbinary | boolean | date | decimal | double | real | integer | smallint | timestamp | tinyint | varchar
--------+-----------+---------+------+---------+--------+------+---------+----------+-----------+---------+---------
(0 rows)
dev=# INSERT INTO whiteglove VALUES (100000,'68656c6c6f',TRUE,'10/10/2020',0.5,1.234567,1.234,10,2,'4/14/2015 7:32:33PM',2,'hello');
INSERT 0 1
dev=# select * from whiteglove ;
bigint | varbinary | boolean | date | decimal | double | real | integer | smallint | timestamp | tinyint | varchar
--------+------------+---------+------------+---------+----------+-------+---------+----------+---------------------+---------+---------
100000 | 68656c6c6f | t | 2020-10-10 | 0.5 | 1.234567 | 1.234 | 10 | 2 | 2015-04-14 19:32:33 | 2 | hello
(1 row)
Any pointers on why I'm getting this out of range error??

Related

How to drop hive partitions with hivevar passed as partition variable?

I have been trying to run this piece of code to drop current day's partition from hive a table and for some reason it does not drop the partition from the hive table. Not sure what's worng.
Table Name : prod_db.products
desc:
+----------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+----------------------------+-----------------------+-----------------------+--+
| name | string | |
| cost | double | |
| load_date | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| load_date | string | |
+----------------------------+-----------------------+-----------------------+--+
## I am using the following code
SET hivevar:current_date=current_date();
ALTER TABLE prod_db.products DROP PARTITION(load_date='${current_date}');
Before and After picture of partitions:
+-----------------------+--+
| partition |
+-----------------------+--+
| load_date=2022-04-07 |
| load_date=2022-04-11 |
| load_date=2022-04-18 |
| load_date=2022-04-25 |
+-----------------------+--+
It runs without any error but doesn't work but won't drop the partition. Table is internal/managed.
I tried different ways mentioned on stack but it is just not working for me.
Help.
You dont need to set a variable. You can directly drop using direct sql.
Alter table prod_db.products
drop partition (load_date= current_date());

hive table shows 0 results while querying

My hive table is a managed table and i can see the files present in HDFS.
While querying through hive it does not display any result.
hive> describe formatted emp
Result -
| Table Type: | MANAGED_TABLE
| Table Parameters: | NULL
| 2 | bucketing_version
| 1376 | numFiles
| 43 | numPartitions
| 0 | numRows
| gzip | parquet.compression
| 0 | rawDataSize
| 4770821594 | totalSize
| true | transactional
| insert_only | transactional_properties
| 1612857428 | transient_lastDdlTime
While selecting data from table -
select * from emp;
it fetches no results.
Why there is difference in HDFS and select output.
Command worked for me -
ANALYZE TABLE table_name COMPUTE STATISTICS FOR COLUMNS;

Querying HIVE Metadata

I need to query the following table and view information from my Apache HIVE cluster:
Each row needs to contain the following:
TABLE SCHEMA
TABLE NAME
TABLE DESCRIPTION
COLUMN NAME
COLUMN DATA TYPE
COLUMN LENGTH
COLUMN PRECISION
COLUMN SCALE
NULL OR NOT NULL
PRIMARY KEY INDICATOR
This can be easily queried from most RDBMS (metadata tables/views), but I am struggling to find much information about the equivalent metadata tables/views in HIVE.
Please help :)
This information is available from the Hive metastore. The below example query is for a MySQL-backed metastore (Hive version 1.2).
SELECT
DBS.NAME AS TABLE_SCHEMA,
TBLS.TBL_NAME AS TABLE_NAME,
TBL_COMMENTS.TBL_COMMENT AS TABLE_DESCRIPTION,
COLUMNS_V2.COLUMN_NAME AS COLUMN_NAME,
COLUMNS_V2.TYPE_NAME AS COLUMN_DATA_TYPE_DETAILS
FROM DBS
JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN
(
SELECT DISTINCT TBL_ID, TBL_COMMENT
FROM
(
SELECT TBLS.TBL_ID TBL_ID, TABLE_PARAMS.PARAM_KEY, TABLE_PARAMS.PARAM_VALUE, CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment' THEN TABLE_PARAMS.PARAM_VALUE ELSE '' END TBL_COMMENT
FROM TBLS JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID
) TBL_COMMENTS_INTERNAL
) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;
Sample output:
+--------------+----------------------+-----------------------+-------------------+------------------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_DESCRIPTION | COLUMN_NAME | COLUMN_DATA_TYPE_DETAILS |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
| default | temp003 | This is temp003 table | col1 | string |
| default | temp003 | This is temp003 table | col2 | array<string> |
| default | temp003 | This is temp003 table | col3 | array<string> |
| default | temp003 | This is temp003 table | col4 | int |
| default | temp003 | This is temp003 table | col5 | decimal(10,2) |
| default | temp004 | | col11 | string |
| default | temp004 | | col21 | array<string> |
| default | temp004 | | col31 | array<string> |
| default | temp004 | | col41 | int |
| default | temp004 | | col51 | decimal(10,2) |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
Metastore tables referred in query:
DBS: Details of databases/schemas.
TBLS: Details of tables.
COLUMNS_V2: Details about columns.
SDS: Details about storage.
TABLE_PARAMS: Details about table parameters (key-value pairs)

Automatically generating documentation about the structure of the database

There is a database that contains several views and tables.
I need create a report (documentation of database) with a list of all the fields in these tables indicating the type and, if possible, an indication of the minimum/maximum values and values from first row. For example:
.------------.--------.--------.--------------.--------------.--------------.
| Table name | Column | Type | MinValue | MaxValue | FirstRow |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | day | date | ‘2010-09-17’ | ‘2016-12-10’ | ‘2016-12-10’ |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | price | double | 1030.8 | 29485.7 | 6023.8 |
:------------+--------+--------+--------------+--------------+--------------:
| … | | | | | |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | day | date | ‘2014-06-20’ | ‘2016-11-28’ | ‘2016-11-16’ |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | owner | string | NULL | NULL | ‘Joe’ |
'------------'--------'--------'--------------'--------------'--------------'
I think the execution of many queries
SELECT MAX(column_name) as max_value, MIN(column_name) as min_value
FROM table_name
Will be ineffective on the huge tables that are stored in Hadoop.
After reading documentation found an article about "Statistics in Hive"
It seems I must use request like this:
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
But this command ended with error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
Do I understand correctly that this request add information to the description of the table and not display the result? Will this request work with view?
Please suggest how to effectively and automatically create documentation for the database in HIVE?

How to use ResultSet to fetch the ID of the record

I have got a table with name table_listnames whose structure is given below
mysql> desc table_listnames;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.04 sec)
It has got sample data as shown
mysql> select * from table_listnames;
+----+------------+
| id | name |
+----+------------+
| 6 | WWW |
| 7 | WWWwww |
| 8 | WWWwwws |
| 9 | WWWwwwsSSS |
| 10 | asdsda |
+----+------------+
5 rows in set (0.00 sec)
I have a requirement where if name not found under the table , i need to insert or else do nothing
I am achieving it this way
String sql = "INSERT INTO table_listnames (name) SELECT name FROM (SELECT ?) AS tmp WHERE NOT EXISTS (SELECT name FROM table_listnames WHERE name = ?) LIMIT 1";
pst = dbConnection.prepareStatement(sql);
pst.setString(1, salesName);
pst.setString(2, salesName);
pst.executeUpdate();
Is it possible to know the id of the record of the given name in this case

Resources