Skip hyphen in hive - hadoop

I have executed a query in HIVE CLI that should generate an External Table .
"create EXTERNAL TABLE IF NOT EXISTS hassan( code int, area_name string,
male_60_64 STRUCT,
male_above_65 STRUCT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';"
It works fine but if I put "-" instead of "_" I will face with error.
"create EXTERNAL TABLE IF NOT EXISTS hassan ( code int, area_name string, male-60-64 STRUCT< c1 : string, x-user : string>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';"
Any help would be greatly appreciated.

The answer by Addy already provided an example of how to use a hyphen in a column name. Here is an addition that describes how this works in different versions of Hive, according to the documentation:
In Hive 0.12 and earlier, only
alphanumeric and underscore characters are allowed in table and
column names.
In Hive 0.13 and later, column names can contain any
Unicode character (see HIVE-6013). Any column name that is specified
within backticks (`) is treated literally. Within a backtick string,
use double backticks (``) to represent a backtick character. Backtick
quotation also enables the use of reserved keywords for table and
column identifiers.
To revert to pre-0.13.0 behavior and restrict
column names to alphanumeric and underscore characters, set the
configuration property hive.support.quoted.identifiers to none. In
this configuration, backticked names are interpreted as regular
expressions. For details, see Supporting Quoted Identifiers in Column
Names.
In addition to that, you can also find the syntax for STRUCT there, which should help you with the error that you mentioned in the comments:
struct_type : STRUCT < col_name : data_type [COMMENT col_comment],
...>
Update:
Note that hyphens in complex types (so inside structs) do not appear to be supported.

Try Quoted Identifiers
create table hassan( code int, `area_name` string, `male-60-64` STRUCT, `male-above-65` STRUCT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
Reference:
https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html

Related

How to avoid " " in select statement in column name for column that has numberAlphabet pattern in Oracle?

I have doubts regarding double quote column name in Oracle. I tried creating column name in number_alphabets pattern but this won't work. Then I used double quote and I was able to create table with this column name. When I do select, column name comes within double quote.
I have attached script in here.
CREATE TABLE test
(
"100_title" VARCHAR2(200) NULL
)
SELECT * FROM test
When I do select, in result set, column name will be "100_title" but I do not want "" in it. Is there a way to fix this?
From the Database Object Names and Qualifiers documentation:
Nonquoted identifiers cannot be Oracle Database reserved words. Quoted identifiers can be reserved words, although this is not recommended.
and
Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character.
Nonquoted identifiers can only contain alphanumeric characters from your
database character set and the underscore (_). Database links can contain
periods (.) and "at" signs (#).
Quoted identifiers can contain any characters and punctuations marks as well
as spaces. However, neither quoted nor nonquoted identifiers can contain
double quotation marks or the null character (\0).
So your question:
When I do select, in result set, column name will be "100_title" but I do not want "" in it. Is there a way to fix this?
The column identifier 100_title starts with a non-alphabetic character so by point 6 of that documentation you must use double quotes with the identifier.
How the column name displays depends on the user interface you are using. On db<>fiddle, the column name is displayed without quotes and this will be the same with many other interfaces.
If the user interface you are using only outputs the identifier with surrounding quotes then you could change the identifier from "100_title" to title_100 as this starts with an alphabetic character and contains only alpha-numeric and underscore characters and, thus, does not need to be quoted.
The short version is "no; pick a name that starts with a letter"
If you use a name that starts with a number you'll have to use " every time you mention the column name, and you'll have to get the case right. Your column is called "100_title", not "100_Title" or "100_TITLE"
Call it title_100, then you can refer to it as any case, even TiTLe_100 if you like, and generally your life will be easier

sqlldr WHEN clause

I am trying to code a sqlldr.ctl file WHEN Clause to limit the records imported to those matching a portion of the current Schema's name.
The code I have (which does NOT work) is:
LOAD DATA
TRUNCATE INTO TABLE TMP_PRIM_ACCTS
when REGION_NUM = substr(user,-3,3)
Fields terminated by "|" Optionally enclosed by '"'
Trailing NULLCOLS
( PORTFOLIO_ACCT,
PRIMARY_ACCT_ID NULLIF (PRIMARY_ASSET_ID="NULL"),
REGION_NUM NULLIF (PARTITION_NUM="NULL")
)
sqlldr returns:
SQL*Loader-350: Syntax error at line 3.
Expecting quoted string or hex identifier, found "substr".
when PARTITION_NUM = substr(user,-3,3)
I cannot put single quotes around "user", because that turns it into the literal string "user". Can anyone explain how I can reference the "active" User in this WHEN Clause?
Thank you!
Can you try something like this? (now I can't make test with SQLLDR, but this is syntax I used for changing values):
when REGION_NUM = "substr(:user,-3,3)"
It doesn't look like you can. The documentation only shows fixed values:
Trying to use an expression in when that clause (or in nullif; thought I'd try to see if you could cause a rejection based on null PK value) you just see the literal value in the log:
Table TMP_PRIM_ACCTS, loaded when REGION_NUM = 0X73756273747228757365722c2d332c3329(character 'substr(user,-3,3)')
which is sort of what you referred when you said you couldn't quote user, but you'd have to quite the whole thing anyway. Using :user doesn't work either, the colon is seen as just another character, it doesn't try to find a column called user instead.
The simplest approach may be to pre-process the data file and remove any rows which don't match the pattern (e.g. via a regex). That would actually be slightly easier if you used an external table instead of SQL*Loader.
Alternatively, generate your control file and embed the correct literal value based on the user you'll connect as.

SQL Loader incompatible length

This is my control file
FIELDS (
dummy1 filler terminated by "cid=",
address enclosed by "<address>" and "</address>"
...
The address column in the table is varchar(10).
If the address in the file is over 10 characters then SQL*Loader cannot load it.
How I can capture address truncating to 10 characters?
The documentation has a section on applying SQL operators to fields.
A wide variety of SQL operators can be applied to field data with the SQL string. This string can contain any combination of SQL expressions that are recognized by the Oracle database as valid for the VALUES clause of an INSERT statement. In general, any SQL function that returns a single value that is compatible with the target column's datatype can be used.
In this case you can use the substr() function on the value from the file:
...
dummy filler terminated by "cid=",
address enclosed by "<address>" and "</address>" "substr(:address, 1, 10)"
...
The quoted "substr(:address, 1, 10)" passes the initial value from the file through the function before inserting the resulting 10 character (maximum) value, however long the original value in the file was. Note the colon before the name in that function call.
If your file is XML then you might be better off loading it as an external table and then using the built-in XML query tools to extract the data you want, rather than trying to parse it through delimited field definitions.

How to call ora_hash function inside control file in sql loader?

I'm trying to call a function(ORA_HASH) inside sqlldr but I'm not able to achive the target.
Data File
abc.txt
AKY,90035,"G","DP",20150121,"",0,,,,,,"","E8BD4346-A174-468B-ABC2-1586B81A8267",1,17934,5099627512855,"TEST of CLOROM","",14.00,"",14.00,17934,5099627512855,"TEST of CLOROM",14.00,"ONE TO BE T ONE",344,0,"98027f93-4f1a-44b2-b609-7ffbb041a375",,,AKY8035,"Taken Test","L-20 Shiv Lok"
AKY,8035,"D","DP",20150121,"",0,,,,,,"","E8BD4346-A174-468B-ABC2-1586B81A8267",2,17162,5099627885843,"CEN TESt","",15.00,"",250.00,17162,5099627885843,"CEN TESt",15.00,"ONE TDAILY",3659,0,"09615cc8-77c9-4781-b51f-d44ec85bbe54",,,LLY8035,"Taken Test","L-20 Shiv Lok"
Control file
cnt_file.ctl
load data
into table Table_XYZ
fields terminated by "," optionally enclosed by '"'
F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17,F18,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28,F29,F30,F31 ORA_HASH(CONCAT(F2,F5,F6,F9,F10,F12,F13,F14,F15,F16,F17,F19,F21,F22)),F32 ORA_HASH(CONCAT(f23,H24,F7,F8,F3)),F33,F34,F35
sqlldr "xxxxx/yyyyy" control=cnt_file.ctl data=abc.txt
whenever I'm executing sqlldr from Linux box I'm getting below error
SQL*Loader-350: Syntax error at line 4.
Expecting "," or ")", found "ORA_HASH".
F29,F30,F31,KEY_CLMNS_HASH ORA_HASH(CONCAT( F2,F5
^
Any idea
You might consider using a virtual column on the table to which you are loading the data.
For columns which are deterministically based on other column values in the same row, that usually ends up being a more simple solution than anything involving SQL*Loader.
You're doing a few things wrong. The immediate error is because the Oracle function call has to be enclosed in double quotes:
...,F31 "ORA_HASH(CONCAT(F2,F5,F6,...))",...
The second issue is that the concat function only takes two arguments, so you would either have to nest (lots of) concat calls, or more readably use the concatenation operator instead:
...,F31 "ORA_HASH(F2||F5||F6||...)",...
And finally you need to prefix the field names inside your function call with a colon:
...,F31 "ORA_HASH(:F2||:F5||:F6||...)",...
This is explained in the documentation:
The following requirements and restrictions apply when you are using SQL strings:
...
The SQL string must be enclosed in double quotation marks.
And
To refer to fields in the record, precede the field name with a colon (:). Field values from the current record are substituted. A field name preceded by a colon (:) in a SQL string is also referred to as a bind variable. Note that bind variables enclosed in single quotation marks are treated as text literals, not as bind variables.

LINES TERMINATED BY only supports newline '\n' right now

I have files where the column is delimited by char(30) and the lines are delimited by char(31). I'm using these delimiters mainly because the columns may contain newline (\n), so the default line delimiter for hive is not useful for us.
I have tried to change the line delimiter in hive but get the error below:
LINES TERMINATED BY only supports newline '\n' right now.
Any suggestion?
Write custom SerDe may work?
is there any plan to enhance this functionality in hive in new releases?
thanks
Not sure if this helps, or is the best answer, but when faced with this issue, what we ended up doing is setting the 'textinputformat.record.delimiter' Map/Reduce java property to the value being used. In our case it was a string "{EOL}", but could be any unique string for all practical purposes.
We set this in our beeline shell which allowed us to pull back the fields correctly. It should be noted that once we did this, we converted the data to Avro as fast as possible so we didn't need to explain to every user, and the user's baby brother, to set the {EOL} line delimiter.
set textinputformat.record.delimiter={EOL};
Here is the full example.
#example CSV data (fields broken by '^' and end of lines broken by the String '{EOL}'
ID^TEXT
11111^Some THings WIth
New Lines in THem{EOL}11112^Some Other THings..,?{EOL}
111113^Some crazy thin
gs
just crazy{EOL}11114^And Some Normal THings.
#here is the CSV table we laid on top of the data
CREATE EXTERNAL TABLE CRAZY_DATA_CSV
(
ID STRING,
TEXT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\136'
STORED AS TEXTFILE
LOCATION '/archive/CRAZY_DATA_CSV'
TBLPROPERTIES('skip.header.line.count'='1');
#here is the Avro table which we'll migrate into below.
CREATE EXTERNAL TABLE CRAZY_DATA_AVRO
(
ID STRING,
TEXT STRING
)
STORED AS AVRO
LOCATION '/archive/CRAZY_DATA_AVRO'
TBLPROPERTIES ('avro.schema.url'='hdfs://nameservice/archive/avro_schemas/CRAZY_DATA.avsc');
#And finally, the magic is here. We set the custom delimiter and import into our Avro table.
set textinputformat.record.delimiter={EOL};
INSERT INTO TABLE CRAZY_DATA_AVRO SELECT * from CRAZY_DATA_CSV;
I have worked it out by using the option during the extract --hive-delims-replacement ' ' in sqoop so the characters \n \001 \r are removed from the columns.

Resources