HUE Query Results - Expired - hadoop

Team,
I am using HUE-BEEWAX (Hive UI) to execute hive queries. So far, I have been always able to access the query results of queries execute on the same day, but today I see lot of the queries results shown as expired despite running them just an hour back.
my question is?
When does query result set become expired?
What settings control this?
Is it possible to retain this result-set somewhere in HDFS? (how?)
Regards

My understanding is that it's controlled by Hive, not Hue (Beeswax). When HiveServer is restarted it cleans up the scratch directories.
This is controlled by this setting : hive.start.cleanup.scratchdir.
Are you restarting your HiveServers?
Looking through some code, I found that Beeswax sets the scratch directory to "/tmp/hive-beeswax-" + Hadoop Username.

Related

Take Data From Oracle to Cassandra in every day

We want to take tables from Oracle to Cassandra every day. Because tables is updated in Oracle everyday. So when i searched this , i find these options:
Extract oracle tables as a file , then write Cassandra
Using sqoop to get tables from oracle, write Map Reduce job and insert into Cassandra ?
I am not sure which way is the appropriate ? Also is there another options ?
Thank you.
Option 1
Extracting oracle tables as a file and then writing to Cassandra manually everyday can be tiresome process unless if you are scheduling a cron job. I have tried this before, but if the process fails then logging it might be an issue. If you are using this process and exporting to CSV and trying to write to cassandra then I would suggest using cassandra bulk loader (https://github.com/brianmhess/cassandra-loader)
Option 2
I haven't worked with this, so can't speak about this.
Option 3 (I use this)
I use an open source tool, Pentaho Data Integration (Spoon) (https://community.hitachivantara.com/docs/DOC-1009855-data-integration-kettle) to solve this problem. It's fairly a simple process
spoon. You can automate this process by using a carte server (spoon server) which has logging capabilities as well as automatic restarting if the process failed in between.
Let me know if you found any other solution that worked for you.

Hanging JDBC query with Impala

I am having a weird issue. I am building an automated testing A/B tool in order compare Impala to other sources. I have 100 queries I am trying to run, that run fine through other sources and within Impala through Hue. I try to run certain queries into Impala from Java and it hangs. I run same text in Hue and it works. I figure it’s a non-printable character and remove all of them before issuing the query. Still hangs. No exception bubbles up, sits there for an hour. No evidence of query within Cloudera Manager I am at a loss. This is for SQL A/B testing. I have Jethro running consistently in 1-2 second range on almost all queries we tested. I can't document comparison to Impala across 100 queries until I get over this hump. I tried setting Simba JDBC logging through connection string and that did nothing.
Any thoughts on next steps for debugging the issue? Believe it or not, I am locked down corporate machine and do not have HEX or ASCII char view of my query.

After restarting the services, the impala tables are not coming up

After restarting the Impala server, we are not able to see the tables(i.e. tables are not coming up).Anyone help me what order we have to follow to avoid this issue.
Thanks,
Srinivas
You should try running "invalidate metadata;" from impala-shell. This usually clears up tables not being visible as impala caches metadata.
From:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_invalidate_metadata.html
The following example shows how you might use the INVALIDATE METADATA
statement after creating new tables (such as SequenceFile or HBase tables) through the Hive shell. Before the INVALIDATE METADATA statement was issued, Impala would give a "table not found" error if you tried to refer to those table names.

Changing Transaction Isolation level setting behavior in Sqoop

We are currently trying to use Sqoop to ingest data from Hadoop to Azure SQL Data Warehouse but getting error related to Transaction isolation level. What's happening is that Sqoop tries to set transaction isolation level to READ COMMITTED while trying to import/export whereas this feature isn't currently supported in Azure SQL Data warehouse. I've tried using --relaxed-isolation parameter of Sqoop but still no effect.
As a solution, I am thinking to:
1. Change Sqoop source code to alter Sqoop's behavior to not set transaction level
2. Look for APIs (if any) that may allow me to change this Sqoop's behavior programmatically.
Has anyone encountered such scenario? Looking for suggestions for the proposed solutions and how to go about them.
This issue has just been resolved in Sqoop: https://issues.apache.org/jira/browse/SQOOP-2349
Otherwise, #wBob's comment about using Polybase is definitely best practice: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-azure-sql-data-warehouse-connector#use-polybase-to-load-data-into-azure-sql-data-warehouse

Oozie cannot access metastore database in HUE

I'm on CDH4, in HUE, I have a database in Metastore Manager named db1. I can run Hive queries that create objects in db1 with no problem. I put those same queries in scripts and run them through Oozie and they fail with this message:
FAILED: SemanticException 0:0 Error creating temporary folder on: hdfs://lad1dithd1002.thehartford.com:8020/appl/hive/warehouse/db1.db. Error encountered near token 'TOK_TMP_FILE'
I created db1 in the Metastore Manager as HUE user db1, and as HUE user admin, and as HUE user db1, and nothing works. The db1 user also has a db1 ID on the underlying Linux cluster, if that helps.
I have chmod'd the /appl/hive/warehouse/db1.db to read, write, execute to owner, group, other, and none of that makes a difference.
I'm almost certain it's a rights issue, but what? Oddly, I have this working under another ID where I had hacked some combination of things that seemed to have worked, but I'm not sure how. It was all in HUE, so if possible, I'd like a solution doable in HUE so I can easily hand it off to folks who prefer to work at the GUI level.
Thanks!
Did you also add hive-site.xml into your Files and Job XML fields? Hue has great tutorial about how to run Hive job. Watch it here. Adding of hive-site.xml is described around 4:20.
Exact same error on Hadoop MapR.
Root cause : Main database and temporary(scrat) database were created by different users.
Resolution : Creating both folders with same ID might help with this.

Resources