How to set parameters for hiveQL queries - hadoop

I am running the following hiveql file with the following content
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
INSERT OVERWRITE DIRECTORY '${hivevar:hadoop_temp_output_dir}${hivevar:filepattern}${hivevar:business_date}' select data,'~','~${hivevar:src_sys_file_nm}~${hivevar:outbound_file_nm}~${hivevar:prd_typ_cd}~${hivevar:brnch_cd}~${hivevar:orig_src_sys}~${hivevar:my_src_sys}~${hivevar:extract_ts}~$hivevar:business_date}' from mydb.${hivevar:data_table} where src_sys_file_nm=${hivevar:src_sys_file_nm} and business_date=${hivevar:business_date};
In this example except data column in select query, everything else is in argument variable. Also I am just appending some strings like this -- '~${hivevar:src_sys_file_nm}~${hivevar:outbound_file_nm}~${hivevar:prd_typ_cd}~${hivevar:brnch_cd}~${hivevar:orig_src_sys}~${hivevar:my_src_sys}~${hivevar:extract_ts}~$hivevar:business_date}'-- :
these are not table columns; they are some strings I am passing. This query works very well outside of Hive QL. When I execute the above HQL script, I am getting the following error :
Invalid table alias or column reference 'somestring': (possible column names are: data, business_date, src_sys_file_nm, prd_typ_cd)
Note: business_date, src_sys_file_nm, prd_typ_cd are partitions.
It would be great if any could suggest/point out the mistake I have made here

Related

How to create Oracle Spatial Index?

I am trying to create an Oracle Spatial index but seeing strange behavior.
I have a table in my schema as follows:
CREATE TABLE "Event" (
"EventID" NUMBER(32,0) GENERATED ALWAYS AS IDENTITY INCREMENT BY 1 START WITH 1 NOT NULL,
"Name" NVARCHAR2(30),
"Location" "SDO_GEOMETRY" NOT NULL,
CONSTRAINT "PK_EVENT" PRIMARY KEY ("EventID")
) ;
This works fine and I know I have to create an entry in user_sdo_geom_metadata, that works as you would expect with the following:
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid) values ('Event','Location',
sdo_dim_array(sdo_dim_element('X',-180.0,180.0, 0.005),sdo_dim_element('Y',-90.0,90.0, 0.005)), 4326);
This reports success and when I do a select on user_sdo_geom_metadata I see the row. However, when I try to create the spatial index with:
CREATE INDEX "EVINDEX" ON "Event" ("Location") INDEXTYPE IS MDSYS.SPATIAL_INDEX_V2
I get the following error:
SQL Error [29855] [99999]: ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-13203: failed to read USER_SDO_GEOM_METADATA view
ORA-13203: failed to read USER_SDO_GEOM_METADATA view
ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 10
The weird thing is the Index looks like it's been created.
select * from all_indexes where table_name='Event';
Shows the index??? The other odd thing is when I do a select * on ALL_SDO_GEOM_METADATA, no rows are returned??? I'm connecting as a user with almost every privilege and role but not as SYSDBA. I can't get my head around this one.
UPDATE
Incredibly, this seems to be a case sensitivity issue. If you change the table and column names to all UPPERCASE it works. It seems my neverending disappointment in Oracle has a whole new chapter. Going to try to struggle through this somehow, but like most things with Oracle, it's one unrelenting slog to get anything done :(
The documentation says:
The table name cannot contain spaces or mixed-case letters in a quoted string when inserted into the USER_SDO_GEOM_METADATA view, and it cannot be in a quoted string when used in a query (unless it is in all uppercase characters).
and
The column name cannot contain spaces or mixed-case letters in a quoted string when inserted into the USER_SDO_GEOM_METADATA view, and it cannot be in a quoted string when used in a query (unless it is in all uppercase characters).
However, it also says:
All letters in the names are converted to uppercase before the names are stored in geometry metadata views or before the tables are accessed. This conversion also applies to any schema name specified with the table name.
which you can see if you query the user_sdo_geom_metadata view after your insert; the mixed-case names have become uppercase EVENT and LOCATION.
But then:
Note: Letter case conversion does not apply if you use mixed case (“CamelCase”) names enclosed in quotation marks. However, be aware that many experts recommend against using mixed-case names.
And indeed, rather unintuitively, it seems to work if you include the quotes in the user_sdo_geom_metadata insert:
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values (
'"Event"',
'"Location"',
sdo_dim_array(sdo_dim_element('X',-180.0,180.0, 0.005),
sdo_dim_element('Y',-90.0,90.0, 0.005)), 4326
);
db<>fiddle
So it appears that the values from the view are at some point concatenated into a dynamic SQL statement, which would explain some of the behaviour.

Update Query in For each loop container

"How to write sql task and vary something based on a variable in foreach loop container"
I have created a table(Dummy) with 10 columns. These columns will exactly replicate the input files. In a folder i have five excel files with different names. Each file contain say 100 line items. While loading all these files in my table, i need to capture each of the filenames.
So, i created a column in my table(Dummy) as "File_name". I created foreachloop in SSIS, and declared the variable by referencing the folder and path name. How do I write an update query in SQL task to pick this variable and set as file_name for all the rows loaded?
What's your WHERE Clause in Update Statement(Is there a column in Dummy table that Identifies each set of data from different files),unless you have column or set of columns that identifies each set of data Update does not make sense here.
Other alternative is to have your INSERT(Data flow task) and Update SQL task(Where file_name IS NULL or empty) in same For each loop container and parameterize your Excel file connection as well;
Here is the link to Pass variable to Excel Connection : SSIS with variable excel connection manager

Load string data that does not have quotes to Hive

I'm trying to load some test data to a simple Hive table. The data is comma separated, but the individual elements are not enclosed in double quotes. I'm getting an error due to this. How do I tell Hive not to expect varchar fields to be enclosed in quotes. Manually adding quotes to varchar fields is not an option since the input file I'm trying to use has thousands of records. Sample query and data below.
create table mydatabase.flights(FlightDate varchar(10),Airline int,FlightNum int,Origin varchar(4),Destination varchar(4),Departure varchar(4),DepDelay double,Arrival varchar(4),ArrivalDelay double,Airtime double,Distance double) row format delimited;
insert into mydatabase.flights(FlightDate,Airline,FlightNum,Origin,Destination,Departure,DepDelay,Arrival,ArrivalDelay,Airtime,Distance)
values(2014-04-01,19805,1,JFK,LAX,0854,-6.00,1217,2.00,355.00,2475.00);
The insert query above gives me an error message. It works fine if I enclose the varchar fields in quotes.
Error while compiling statement: FAILED: ParseException line 11:11 mismatched input '-' expecting ) near '2014' in value row constructor
I'm loading data using the following query
load data inpath '/user/alpsusa/hive/flights.csv' overwrite into table mydatabase.flights;
After load, I see only the first field being loaded. Rest all are NULL.
Sample data
2014-04-01,19805,1,JFK,LAX,0854,-6.00,1217,2.00,355.00,2475.00
2014-04-01,19805,2,LAX,JFK,0944,14.00,1736,-29.00,269.00,2475.00
2014-04-01,19805,3,JFK,LAX,1224,-6.00,1614,39.00,371.00,2475.00
2014-04-01,19805,4,LAX,JFK,1240,25.00,2028,-27.00,264.00,2475.00
2014-04-01,19805,5,DFW,HNL,1300,-5.00,1650,15.00,510.00,3784.00
Below is the output of DESCRIBE FORMATTED

Using SQL reserved words in Hive when creating external temporary table

I need to create an external hive table from hdfs location where one column in files has reserved name (end).
When running the script I get the error:
"cannot recognize input near 'end' 'STRUCT' '<' in column specification"
I found 2 solutions.
The first one is to set hive.support.sql11.reserved.keywords=false, but this option has been removed.
https://issues.apache.org/jira/browse/HIVE-14872
The second solution is to use quoted identifiers (column).
But in this case I get the error:
"org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): was expecting comma to separate OBJECT entries"
This is my code for table creation:
CREATE TEMPORARY EXTERNAL TABLE ${tmp_db}.${tmp_table}
(
id STRING,
email STRUCT<string:STRING>,
start STRUCT<long:BIGINT>,
end STRUCT<long:BIGINT>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '${input_dir}';
It's not possible to rename the column.
Does anybody know the solution for this problem? Or maybe any ideas?
Thanks a lot in advance!
can you try below.
hive> set hive.support.quoted.identifiers=column;
hive> create temporary table sp_char ( `#` int, `end` string);
OK
Time taken: 0.123 seconds
OK
Time taken: 0.362 seconds
hive>
When you set hive property hive.support.quoted.identifiers=column all the values within back ticks are treated as literals.
Other value for above property is none , when it is set to none you can use regex to evaluate the column or expression value.
Hope this helps

Oracle: Import CSV file

I've been searching for a while now but can't seem to find answers so here goes...
I've got a CSV file that I want to import into a table in Oracle (9i/10i).
Later on I plan to use this table as a lookup for another use.
This is actually a workaround I'm working on since the fact that querying using the IN clause with more that 1000 values is not possible.
How is this done using SQLPLUS?
Thanks for your time! :)
SQL Loader helps load csv files into tables: SQL*Loader
If you want sqlplus only, then it gets a bit complicated. You need to locate your sqlloader script and csv file, then run the sqlldr command.
Another solution you can use is SQL Developer.
With it, you have the ability to import from a csv file (other delimited files are available).
Just open the table view, then:
choose actions
import data
find your file
choose your options.
You have the option to have SQL Developer do the inserts for you, create an sql insert script, or create the data for a SQL Loader script (have not tried this option myself).
Of course all that is moot if you can only use the command line, but if you are able to test it with SQL Developer locally, you can always deploy the generated insert scripts (for example).
Just adding another option to the 2 already very good answers.
An alternative solution is using an external table: http://www.orafaq.com/node/848
Use this when you have to do this import very often and very fast.
SQL Loader is the way to go.
I recently loaded my table from a csv file,new to this concept,would like to share an example.
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION
)
Place the control file and csv at the same location on the server.
Locate the sqlldr exe and invoce it.
sqlldr userid/passwd#DBname control=
Ex : sqlldr abc/xyz#ora control=load.ctl
Hope it helps.
Somebody asked me to post a link to the framework! that I presented at Open World 2012. This is the full blog post that demonstrates how to architect a solution with external tables.
I would like to share 2 tips: (tip 1) create a csv file (tip 2) Load rows from a csv file into a table.
====[ (tip 1) SQLPLUS to create a csv file form an Oracle table ]====
I use SQLPLUS with the following commands:
set markup csv on
set lines 1000
set pagesize 100000 linesize 1000
set feedback off
set trimspool on
spool /MyFolderAndFilename.csv
Select * from MYschema.MYTABLE where MyWhereConditions ;
spool off
exit
====[tip 2 SQLLDR to load a csv file into a table ]====
I use SQLLDR and a csv ( comma separated ) file to add (APPEND) rows form the csv file to a table.
the file has , between fields text fields have " before and after the text
CRITICAL: if last column is null there is a , at the end of the line
Example of data lines in the csv file:
11,"aa",1001
22,"bb',2002
33,"cc",
44,"dd",4004
55,"ee',
This is the control file:
LOAD DATA
APPEND
INTO TABLE MYSCHEMA.MYTABLE
fields terminated by ',' optionally enclosed by '"'
TRAILING NULLCOLS
(
CoulmnName1,
CoulmnName2,
CoulmnName3
)
This is the command to execute sqlldr in Linux. If you run in Windows use \ instead of / c:
sqlldr userid=MyOracleUser/MyOraclePassword#MyOracleServerIPaddress:port/MyOracleSIDorService DATA=datafile.csv CONTROL=controlfile.ctl LOG=logfile.log BAD=notloadedrows.bad
Good luck !
From Oracle 18c you could use Inline External Tables:
Inline external tables enable the runtime definition of an external table as part of a SQL statement, without creating the external table as persistent object in the data dictionary.
With inline external tables, the same syntax that is used to create an external table with a CREATE TABLE statement can be used in a SELECT statement at runtime. Specify inline external tables in the FROM clause of a query block. Queries that include inline external tables can also include regular tables for joins, aggregation, and so on.
INSERT INTO target_table(time_id, prod_id, quantity_sold, amount_sold)
SELECT time_id, prod_id, quantity_sold, amount_sold
FROM EXTERNAL (
(time_id DATE NOT NULL,
prod_id INTEGER NOT NULL,
quantity_sold NUMBER(10,2),
amount_sold NUMBER(10,2))
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir1
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '|')
LOCATION ('sales_9.csv') REJECT LIMIT UNLIMITED) sales_external;

Resources