Hive ql Driver how to specify database name other than default - hadoop

I am writing a sample program to connect to Hive metastore using org.apache.hadoop.hive.ql.Driver class. A sample snippet is as below
String userName = "test";
HiveConf conf = new HiveConf(SessionState.class);
conf.set("fs.default.name", "hdfs://" + hadoopMasterHost + ":8020");
conf.set("hive.metastore.local","false");
conf.set("hive.metastore.warehouse.dir","/user/hive/warehouse");
conf.set("hive.metastore.uris","thrift://" + hadoopMasterHost + ":9083");
conf.set("hadoop.bin.path", "/usr/hdp/2.2.0.0-2041/hadoop/bin");
conf.set("yarn.nodemanager.hostname", hadoopMasterHost);
conf.set("yarn.resourcemanager.hostname", hadoopMasterHost);
ss = new MyCliSessionState(conf);
ss.out = new PrintStream(System.out, true, "UTF-8");
ss.err = new PrintStream(System.err, true, "UTF-8");
SessionState.start(ss);
driver = new Driver(conf);
query = "show tables";
if (userName == null || userName.isEmpty())
return driver.run(query);
UserGroupInformation ugi = createUgi(userName);
CommandProcessorResponse response = ugi.doAs(new PrivilegedExceptionAction<CommandProcessorResponse>() {
public CommandProcessorResponse run() throws Exception {
CliSessionState ss = null;
ss = new MyCliSessionState(conf);
ss.out = new PrintStream(System.out, true, "UTF-8");
ss.err = new PrintStream(System.err, true, "UTF-8");
// refresh thread local SessionState and Hive
SessionState.start(ss);
Hive.get(conf, true);
return driver.run(query);
}
});
return response;
I am able to connect to default database and get list of all tables. But how can I connect to other databases (other than default) ? I tried searching hive configuration property, but could not found one to specify database name. Can somebody help me please.

Looks like you want to do things the hard way, and re-implement the Beeline utility. For most people it would appear to be a masochist attempt, but who am I to judge?
Anyway, at this point you have to execute HQL commands, like anyone else... and anyone should know about the "use" command:
driver.run("use " +argDatabase) ;
// check status
driver.run("show tables") ;
// check status, parse output
driver.run("describe extended " +argTable) ;
// check status, parse output

Related

Inserting large amounts of data into hive (around 64MB)

Ok, so I have a hive table on a remote hadoop node set up on a linux machine. I'm having an issue when attempting to insert a large json string, large as in possibly 64MB or more given that map reduce won't work well unless I approach that limit. I've successfully transfered over 8 - 9MB, but that's as high as it gets, if I attempt to do more than the query fails. I also had to override C#'s default json serializer to do this, not a good practice I know, but I really don't know any other way to do this.
Anyway this is how I store data into Hive:
namespace HadoopWebService.Controllers
{
public class LogsController : Controller
{
// POST: HadoopRequest
[HttpPost]
public ContentResult Create(string json)
{
OdbcConnection hiveConnection = new OdbcConnection("DSN=Hadoop Server;UID=XXXX;PWD=XXXX");
hiveConnection.Open();
Stream req = Request.InputStream;
req.Seek(0, SeekOrigin.Begin);
string request = new StreamReader(req).ReadToEnd();
ContentResult response;
string query;
try
{
query = "INSERT INTO TABLE error_log (json_error_log) VALUES('" + request + "')";
OdbcCommand command = new OdbcCommand(query, hiveConnection);
command.ExecuteNonQuery();
command.CommandText = query;
response = new ContentResult { Content = "{status: 1}", ContentType = "application/json" };
hiveConnection.Close();
return response;
}
catch(Exception error)
{
response = new ContentResult { Content = "{status: 0, message:" + error.ToString()+ "}" };
System.Diagnostics.Debug.WriteLine(error.Message.ToString());
hiveConnection.Close();
return response;
}
}
}
}
Is there some setting which I can use to insert larger amounts of data? I assume there must be some buffer that is failing to load everything. I've checked on google but I haven't found anything, mainly because this probably isn't the way to insert properly into Hadoop, but I'm really out of options right now, I can't use HDInsight, all I've got is the ODBC connection.
EDIT: This is the error I get:
System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000][HiveODBC]
(35) Error from Hive: error code: ‘0’ error message: ‘ExecuteStatement
finished with operation state: ERROR_STATE’.
message:System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000]
[Microsoft][HiveODBC] (35) Error from Hive: error code: '0' error
message: 'ExecuteStatement finished with operation state:
ERROR_STATE'. at
System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle,
RetCode retcode) at
System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior
behavior, String method, Boolean needReader, Object[] methodArguments,
SQL_API odbcApiMethod) at
System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior
behavior, String method, Boolean needReader) at
System.Data.Odbc.OdbcCommand.ExecuteNonQuery()

Pig Not Interpreting Int Correctly -- Custom Loader

So this is my first time to ever use Pig and I'm having a hard time getting it to interpret my data correctly. I dont want to have to define a schema for my input files until run time, so I wrote a super simple custom loader where the only changes I made to PigStorage were changing the GetSchema Method to read the first two lines of my file and create a schema off of it:
public ResourceSchema getSchema(String location,
Job job) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(location.replace("file://", "")));
String[] line = br.readLine().split(",");
String[] data = br.readLine().split(",");
List<FieldSchema> fields = new ArrayList<FieldSchema>();
for(int f = 0; f< line.length; f++)
{
Byte type = GetType(data[f].replace("\"", ""));
fields.add(new FieldSchema(line[f].replace("\"", ""), type));
}
schema = new ResourceSchema(new Schema(fields));
return schema;
}
private Byte GetType(Object Data)
{
try{
int number = Integer.parseInt(Data.toString());
return org.apache.pig.data.DataType.INTEGER;
}
catch(Exception e){}
try{
double dnumber = Double.parseDouble(Data.toString());
return org.apache.pig.data.DataType.DOUBLE;
}
catch(Exception e){}
return org.apache.pig.data.DataType.CHARARRAY;
}
When I load a file and run DESCRIBE on it, it looks like what I want, for instance:
{CU_NUMBER: int,CYCLE_DATE: chararray,JOIN_NUMBER: int,RSSD: int,CU_TYPE: int,CU_NAME: chararray}
And the first 10 Rows look like this:
(1,9/30/2013 0:00:00,2,"50377","1","MORRIS SHEPPARD TEXARKANA")
(5,9/30/2013 0:00:00,6,"859879","1","FIRST CASTLE")
(6,9/30/2013 0:00:00,7,"54571","1","THE NEW ORLEANS FIREMEN'S")
(12,9/30/2013 0:00:00,11,"56678","1","FRANKLIN TRUST")
(13,9/30/2013 0:00:00,12,"861676","1","E")
(16,9/30/2013 0:00:00,14,"59277","1","WOODMEN")
(19,9/30/2013 0:00:00,16,"863773","1","NEW HAVEN TEACHERS")
(22,9/30/2013 0:00:00,17,"61074","1","WATERBURY CONNECTICUT TEACHER")
(26,9/30/2013 0:00:00,19,"866372","1","FARMERS")
(28,9/30/2013 0:00:00,21,"953375","1","CENTRIS")
However, when I try to do stuff with the data like:
FOICU = LOAD 'file:///home/biadmin/NCUA/foicu.txt' USING org.apache.pig.builtin.PigStorageInferSchema(',', '-schema');
FirstSixColumns = FOREACH FOICU GENERATE CU_NUMBER, CYCLE_DATE, JOIN_NUMBER, RSSD, CU_TYPE, CU_NAME;
TopTen = LIMIT FirstSixColumns 10;
FOICUFiltered = FILTER TopTen BY CU_NUMBER > 20;
CU_FIVE = FILTER TopTen BY CU_NUMBER == 5;
DUMP FOICUFiltered;
DUMP CU_FIVE;
FOICUFiltered returns all 10 rows even though 7 of them have a CU_NUMBER less than 20:
(1,9/30/2013 0:00:00,2,"50377","1","MORRIS SHEPPARD TEXARKANA")
(5,9/30/2013 0:00:00,6,"859879","1","FIRST CASTLE")
(6,9/30/2013 0:00:00,7,"54571","1","THE NEW ORLEANS FIREMEN'S")
(12,9/30/2013 0:00:00,11,"56678","1","FRANKLIN TRUST")
(13,9/30/2013 0:00:00,12,"861676","1","E")
(16,9/30/2013 0:00:00,14,"59277","1","WOODMEN")
(19,9/30/2013 0:00:00,16,"863773","1","NEW HAVEN TEACHERS")
(22,9/30/2013 0:00:00,17,"61074","1","WATERBURY CONNECTICUT TEACHER")
(26,9/30/2013 0:00:00,19,"866372","1","FARMERS")
(28,9/30/2013 0:00:00,21,"953375","1","CENTRIS")
And CU_FIVE returns no rows at all.
Does anybody know what I've done wrong here and is there a better way to dynamically load the schema at run time without using schema files?

OrientDB POJO Method proxy not working properly

I am using the OObjectDatabaseTx implementation of OrientDB to store my POJOs in the database. When I try to retrieve some POJOs with a SQL commant, I get the result set but the attributes of the POJOs seem to be empty (getters regurning null).
I register my classes properly with
db.getEntityManager().registerEntityClass(MyUser.class);
The following code describes my problem:
Map<String, String> params = new HashMap<String, String>();
params.put("name", username);
List<MyUser> users = db.command(
new OSQLSynchQuery<MyUser>(
"select * from MyUser where "
+ "name = :name"))
.execute(params);
for (MyUser founduser : users) {
ODocument doc = db.getRecordByUserObject(founduser, false);
String pass = doc.field("pwd");
assertEquals(pass != null, true); // passes
assertEquals(founduser.getPwd() != null, true); // fails
}
How can I get the method getPwd to return the proper value?
I am now using Version 1.3.0 and this has worked before (afaik in 1.1.0).
Can you see if the POJO has the "pwd" field set inside of it?

NHib 3 Configuration & Mapping returning empty results?

Note: I'm specifically not using Fluent NHibernate but am using 3.x's built-in mapping style. However, I am getting a blank recordset when I think I should be getting records returned.
I'm sure I'm doing something wrong and it's driving me up a wall. :)
Background / Setup
I have an Oracle 11g database for a product by IBM called Maximo
This product has a table called workorder which lists workorders; that table has a field called "wonum" which represents a unique work order number.
I have a "reporting" user which can access the table via the maximo schema
e.g. "select * from maximo.workorder"
I am using Oracle's Managed ODP.NET DLL to accomplish data tasks, and using it for the first time.
Things I've Tried
I created a basic console application to test this
I added the OracleManagedClientDriver.cs from the NHibernate.Driver on the master branch (it is not officially in the release I'm using).
I created a POCO called WorkorderBriefBrief, which only has a WorkorderNumber field.
I created a class map, WorkorderBriefBriefMap, which maps only that value as a read-only value.
I created a console application with console output to attempt to write the lines of work orders.
The session and transaction appear to open correct,
I tested a standard ODP.NET OracleConnection to my connection string
The Code
POCO: WorkorderBriefBrief.cs
namespace PEApps.Model.WorkorderQuery
{
public class WorkorderBriefBrief
{
public virtual string WorkorderNumber { get; set; }
}
}
Mapping: WorkorderBriefBriefMap.cs
using NHibernate.Mapping.ByCode;
using NHibernate.Mapping.ByCode.Conformist;
using PEApps.Model.WorkorderQuery;
namespace ConsoleTests
{
public class WorkorderBriefBriefMap : ClassMapping<WorkorderBriefBrief>
{
public WorkorderBriefBriefMap()
{
Schema("MAXIMO");
Table("WORKORDER");
Property(x=>x.WorkorderNumber, m =>
{
m.Access(Accessor.ReadOnly);
m.Column("WONUM");
});
}
}
}
Putting it Together: Program.cs
namespace ConsoleTests
{
class Program
{
static void Main(string[] args)
{
NHibernateProfiler.Initialize();
try
{
var cfg = new Configuration();
cfg
.DataBaseIntegration(db =>
{
db.ConnectionString = "[Redacted]";
db.Dialect<Oracle10gDialect>();
db.Driver<OracleManagedDataClientDriver>();
db.KeywordsAutoImport = Hbm2DDLKeyWords.AutoQuote;
db.BatchSize = 500;
db.LogSqlInConsole = true;
})
.AddAssembly(typeof(WorkorderBriefBriefMap).Assembly)
.SessionFactory().GenerateStatistics();
var factory = cfg.BuildSessionFactory();
List<WorkorderBriefBrief> query;
using (var session = factory.OpenSession())
{
Console.WriteLine("session opened");
Console.ReadLine();
using (var transaction = session.BeginTransaction())
{
Console.WriteLine("transaction opened");
Console.ReadLine();
query =
(from workorderbriefbrief in session.Query<WorkorderBriefBrief>() select workorderbriefbrief)
.ToList();
transaction.Commit();
Console.WriteLine("Transaction Committed");
}
}
Console.WriteLine("result length is {0}", query.Count);
Console.WriteLine("about to write WOs");
foreach (WorkorderBriefBrief wo in query)
{
Console.WriteLine("{0}", wo.WorkorderNumber);
}
Console.WriteLine("DONE!");
Console.ReadLine();
// Test a standard connection below
string constr = "[Redacted]";
OracleConnection con = new OracleConnection(constr);
con.Open();
Console.WriteLine("Connected to Oracle Database {0}, {1}", con.ServerVersion, con.DatabaseName.ToString());
con.Dispose();
Console.WriteLine("Press RETURN to exit.");
Console.ReadLine();
}
catch (Exception ex)
{
Console.WriteLine("Error : {0}", ex);
Console.ReadLine();
}
}
}
}
Thanks in advance for any help you can give!
Update
The following code (standard ADO.NET with OracleDataReader) works fine, returning the 16 workorder numbers that it should. To me, this points to my use of NHibernate more than the Oracle Managed ODP.NET. So I'm hoping it's just something stupid that I did above in the mapping or configuration.
// Test a standard connection below
string constr = "[Redacted]";
OracleConnection con = new Oracle.ManagedDataAccess.Client.OracleConnection(constr);
con.Open();
Console.WriteLine("Connected to Oracle Database {0}, {1}", con.ServerVersion, con.DatabaseName);
var cmd = new OracleCommand();
cmd.Connection = con;
cmd.CommandText = "select wonum from maximo.workorder where upper(reportedby) = 'MAXADMIN'";
cmd.CommandType = CommandType.Text;
Oracle.ManagedDataAccess.Client.OracleDataReader reader = cmd.ExecuteReader();
while (reader.Read())
{
Console.WriteLine(reader.GetString(0));
}
con.Dispose();
When configuring NHibernate, you need to tell it about your mappings.
I found the answer -- thanks to Oskar's initial suggestion, I realized it wasn't just that I hadn't added the assembly, I also needed to create a new mapper.
to do this, I added the following code to the configuration before building my session factory:
var mapper = new ModelMapper();
//define mappingType(s) -- could be an array; in my case it was just 1
var mappingType = typeof (WorkorderBriefBriefMap);
//use AddMappings instead if you're mapping an array
mapper.AddMapping(mappingType);
//add the compiled results of the mapper to the configuration
cfg.AddMapping(mapper.CompileMappingForAllExplicitlyAddedEntities());
var factory = cfg.BuildSessionFactory();

ODP.NET and parameters

I have built a common app that works with PostgreSQL and should work on Oracle.
However i'm getting strange errors when inserting records through a parametrized query.
My formatted query looks like this:
"INSERT INTO layer_mapping VALUES (#lm_id,#lm_layer_name,#lm_layer_file);"
Unlike Npgsql which documents how to use the parameters, i could not found how Oracle "prefers" them to be used. I could only find :1, :2, :3, for example.
I do not wanto use sequential parameters, i want to use them in a named way.
Is there a way to do it? Am i doing something wrong?
Thanks
You can use named parameters with ODP.NET like so:
using (var cx=new OracleConnection(connString)){
using(var cmd=cx.CreateCommand()){
cmd.CommandText="Select * from foo_table where bar=:bar";
cmd.BindByName=true;
cmd.Parameters.Add("bar",barValue);
///...
}
}
I made this lib https://github.com/pedro-muniz/ODPNetConnect/blob/master/ODPNetConnect.cs
so you can do parameterized write and read like this:
ODPNetConnect odp = new ODPNetConnect();
if (!String.IsNullOrWhiteSpace(odp.ERROR))
{
throw new Exception(odp.ERROR);
}
//Write:
string sql = #"INSERT INTO TABLE (D1, D2, D3) VALUES (:D1, :D2, :D3)";
Dictionary<string, object> params = new Dictionary<string, object>();
params["D1"] = "D1";
params["D2"] = "D2";
params["D3"] = "D3";
int affectedRows = odp.ParameterizedWrite(sql, params);
if (!String.IsNullOrWhiteSpace(odp.ERROR))
{
throw new Exception(odp.ERROR);
}
//read
string sql = #"SELECT * FROM TABLE WHERE D1 = :D1";
Dictionary<string, object> params = new Dictionary<string, object>();
params["D1"] = "D1";
DataTable dt = odp.ParameterizedRead(sql, params);
if (!String.IsNullOrWhiteSpace(odp.ERROR))
{
throw new Exception(odp.ERROR);
}
Notes: you have to change these lines in ODPNetConnect.cs to set connection string:
static private string devConnectionString = "SET YOUR DEV CONNECTION STRING";
static private string productionConnectionString = "SET YOUR PRODUCTION CONNECTION STRING";
And you need to change line 123 to set environment to dev or prod.
public OracleConnection GetConnection(string env = "dev", bool cacheOn = false)

Resources