Using reserved words in Hive - hadoop

I'm migrating data to Hive 1.2, and I realized that, by default, I'm no longer allowed to use reserved words as column names. If you want to use reserved words, you need to explicitly set the below setting:
hive.support.sql11.reserved.keywords=false
My question is, does changing this default value result in any unexpected issues? Are there any problems I should be aware of before changing it?
By the way, this change is documented in this ticket: https://issues.apache.org/jira/browse/HIVE-6617

This configuration property hive.support.sql11.reserved.keywords was added in Hive 1.2.0 with HIVE-6617 and is removed in Hive 2.3.0 with HIVE-14872
It was removed to simplify parser logic and reduce the size of generated parser code.
Please read the description in the HIVE-14872 for more details.
Taking this into account, rewrite your code using quoted identifiers (using backticks) OR rename identifiers the sooner the better.

Related

Why does protobuf's FieldMask use field names instead of field numbers?

In the docs for FieldMask the paths use the field names (e.g., foo.bar.buzz), which means renaming the message field names can result in a breaking change.
Why doesn't FieldMask use the field numbers to define the path?
Something like 1.3.1?
You may want to consider filing an issue on the GitHub protocolbuffers repo for a definitive answer from the code's authors.
Your proposal seems logical. Using names may be a historical artifact. There's a possibly relevant comment on an issue thread in that repo:
https://github.com/protocolbuffers/protobuf/issues/3793#issuecomment-339734117
"You are right that if you use FieldMasks then you can't safely rename fields. But for that matter, if you use the JSON format or text format then you have the same issue that field names are significant and can't be changed easily. Changing field names really only works if you use the binary format only and avoid FieldMasks."
The answer for your question lies in the fact FieldMasks are a convention/utility developed on top of the proto3 schema definition language, and not a feature of it (and that utility is not present in all of the language bindings)
While you’re right in your observation that it can break easily (as schemas tend evolve and change), you need to consider this design choice from a user friendliness POV:
If you’re building an API and want to allow the user to select the field set present inside the response payload (the common use case for field masks), it’ll be much more convenient for you to allow that using field paths, rather then binary fields indices, as the latter would force the user of the gRPC/protocol generated code to be “aware” of the schema. That’s not always the desired case when providing API as a code software packages.
While implementing this as a proto schema feature can allow the user to have the best of both worlds (specify field paths, have them encoded as binary indices) for binary encoding, it would also:
Complicate code generation requirements
Still be an issue for plain text encoding.
So, you can understand why it was left as an “external utility”.

tarantool java connector & space ids

Tarantool java connector provides API to select/update/insert/delete/... tuples in spaces. The first argument in these API methods is a space ID. There is no documentation for this API and I don't clearly undestand how to get these IDs.
The sample code from github gets IDs evaluating box.space.<space>.id - not using API but directly "writing" command into socket... It seems this is not a good approach (?).
As I see system spaces _space/_vspace have constant IDs = 280/281. Is it good approach to use these constants to select spaces IDs?
UPD: I found constant _VSPACE = 281 in the class SQLDatabaseMetadata. It's used in Tarantool JDBC driver. It's protected.
You are right. You need to fetch space id-name mapping from _VSPACE first and then use these values to perform requests against certain spaces. Or you can lean on the fact that a first user-defined space has id 512, then next one 513, etc.
We plan to support automatic schema loading and space names, but don't support it yet: https://github.com/tarantool/tarantool-java/issues/137

How to set Oracle to deal with object names case-sensitively?

I know I can use double-quote to force case-sensitive object names, but I want to avoid using it. I wonder if there is a database option to set to make Oracle deal with object names in a case-sensitive manner without using double-quote. Thanks!
There is no such option. If you want to force everything to be case-sensitive (and I would strongly question the wisdom of that), you would need to use double-quoted identifiers throughout your code.

A leading question mark in oracle using datastage to import from text to oracle?

The question mark "?" appears only in the front of the first field of the first row to insert.
For once, I changed the ftp upload file type to text/ascii (rather than binary) and it seemed resolve the problem. But later it came back.
The server OS is aix5.3.
DataStage is 7.5x2.
Oracle is 11g.
I used ue to save the file to utf-8, using unix end mark.
Has anyone got this thing before?
The question mark itself doesn't mean much as it could be only a "mask" for some special character which is not recognized by the database. You didn't provide any details about your environment, so my opinions here are only a guess. I hope it can give you a little of a light.
How is the text file created? If it's a file created in a windows environment you're very likely to have character like this due brake lines {CR}{LF} characters.
What is the datatype for the oracle table?
Char datatype will "fill" every position according to the size of the field, I'd recommend to use varchar instead on this case.
If it's not the case, I would edit the file in Hex mode and check for the Ascii code for this specific character then use a TRIM (if parallel) or Convert(if server) to replace the character.
The convert function would be something like this:
Convert(Char([ascii_char_number]),'',[your_string])
Alternatively you can use the Trim function if your job is a parallel job
Trim([your_string],[ascii_char_number],'L')
The option "L" will remove all leading characters. You might need to adapt this function to suit your needs. If you're not familiar with the TRIM function you can find more details at the datastage online documentation.
The only warning I'd give when doing this, is that you'll be deleting data from your original source of data, so make sure you're not deleting any valid information when manipulating a file like this as this is not a very recommended practice between the ETL gurus out there.
Any questions, give me a shout. Happy to help if I can.
Cheers
I had a similar issue where unprintable characters were being displayed as '?' and datastage was throwing a warning when processing these records. It was ok for me to not display those unprintable characters, so I used the function ICONV which converts those characters into printable ones. There are multiple options, I chose the one which will convert them to '.' which worked for me. More details are available in the IBM pages below:
https://www-01.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/r_deeref_String_Functions.html
http://docs.intersystems.com/ens201317/csp/docbook/DocBook.UI.Page.cls?KEY=RVBS_foconv
The conversion I used:
ICONV(column_name,"MCP")

How to do SQL injection on Oracle

I'm doing an audit of a system, which the developers insist is SQL injection proof. This they achieve by stripping out the single-quotes in the login form - but the code behind is not parameterized; it's still using literal SQL like so:
username = username.Replace("'", "");
var sql = "select * from user where username = '" + username + "'";
Is this really secure? Is there another way of inserting a single quote, perhaps by using an escape character? The DB in use is Oracle 10g.
Maybe you can also fail them because not using bind variables will have a very negative impact on performance.
A few tips:
1- It is not necessarily the ' character that can be used as a quote. Try this:
select q'#Oracle's quote operator#' from dual;
2- Another tip from "Innocent Code" book says: Don't massage invalid input to make it valid (by escaping or removing). Read the relevant section of the book for some very interesting examples. Summary of rules are here.
Have a look at the testing guide here: http://www.owasp.org/index.php/Main_Page That should give you more devious test scenarios, perhaps enough to prompt a reassessment of the SQL coding strategy :-)
No, it is not secure. SQL injection doesn't require a single-quote character to succeed. You can use AND, OR, JOIN, etc to make it happen. For example, suppose a web application having a URL like this: http://www.example.com/news.php?id=100.
You can do many things if the ID parameter is not properly validated. For example, if its type is not checked, you could simply use this: ?id=100 AND INSERT INTO NEWS (id, ...) VALUES (...). The same is valid for JOIN, etc. I won't teach how to explore it because not all readers have good intentions like you appear to have. So, for those planning to use a simple REPLACE, be aware that this WILL NOT prevent an attack.
So, no one can have a name like O'Brian in their system?
The single quote check won't help if the parameter is numeric - then 1; DROP TABLE user;-- would cause some trouble =)
I wonder how they handle dates...
If the mechanism for executing queries got smart like PHP, and limited queries to only ever run one query, then there shouldn't be an issue with injection attacks...
What is the client language ? That is, we'd have to be sure exactly what datatype of username is and what the Replace method does in regard to that datatype. Also how the actual concatenation works for that datatype. There may be some character set translation that would translate some quote-like character in UTF-8 to a "regular" quote.
For the very simple example you show it should just work, but the performance will be awful (as per Thilo's comment). You'd need to look at the options for cursor_sharing
For this SQL
select * from user where username = '[blah]'
As long as [blah] didn't include a single quote, it should be interpreted as single CHAR value. If the string was more than 4000 bytes, it would raise an error and I'd be interested to see how that was handled. Similarly an empty string or one consisting solely of single quotes. Control characters (end-of-file, for example) might also give it some issues but that might depend on whether they can be entered at the front-end.
For a username, it would be legitimate to limit the characterset to alphanumerics, and possibly a limited set of punctuation (dot and underscore perhaps). So if you did take a character filtering approach, I'd prefer to see a whitelist of acceptable characters rather than blacklisting single quotes, control characters etc.
In summary, as a general approach it is poor security, and negatively impacts performance. In particular cases, it might (probably?) won't expose any vulnerabilities. But you'd want to do a LOT of testing to be sure it doesn't.

Resources