Listing MS SQL Server table in OOZIE via SQOOP Action

Listing MS SQL Server table in OOZIE via SQOOP Action - shell

I am able to execute the following SQOOP command in CLI perfectly.
sqoop list-tables
--connect 'jdbc:sqlserver://xx.xx.xx.xx\MSSQLSERVER2012:1433;username=usr;password=xxx;database=db'
--connection-manager org.apache.sqoop.manager.SQLServerManager
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver
-- --schema schma
But getting errors while trying out the same in OOZIE (HUE)
2055 [main] ERROR org.apache.sqoop.manager.CatalogQueryManager -
Failed to list tables java.sql.SQLException: No suitable driver found
for 'jdbc:sqlserver://xx.xx.xx.xx\MSSQLSERVER2012:1433;username=usr;password=xxx;database=db'
-
2057 [main] ERROR org.apache.sqoop.Sqoop - Got exception running
Sqoop: java.lang.RuntimeException: java.sql.SQLException: No suitable
driver found for 'jdbc:sqlserver://xx.xx.xx.xx\MSSQLSERVER2012:1433;username=usr;password=xxx;database=db'
How can we get it to work in oozie?
(Working on Cloudera Hadoop Distribution)

This worked for me using CDH 5.11 and the Hue Workflow Editor to create an Oozie>Sqoop1 workflow...but it REQUIRES you to hard code the UserName and Password arguments... Screenshots are included below.
Here is the Step-by-Step:
Open the Hue > Workflow Editor
Create a new workflow
Drag the Sqoop 1 action into the the "drop your action here" grey box.
Ignore the default Sqoop command box and instead hit the + to the right of the ARGUMENTS below the Sqoop command box to add a new argument.
Add "import" without the double quote marks as the very first argument.
Delete the entire content of the Sqoop command box, it needs to be empty.
Add a new argument with the value of "--connect" without the double quotes.
Add a new argument with the value of "jdbc:sqlserver://YourServerNameHere;database=YourDatabaseNameHere"
Add a new argument with the value of "--username"
Add a new argument with the value of "YourSQLServerNamedUserNameHere"
Add a new argument with the value of "--password"
Add a new argument with the value of "--query"
Add a new argument with the value of "Select * from OptionalDBNameHere.SchemaNameHere.TableNameHere Where $CONDITIONS"
Add a new argument with the value of "--delete-target-dir"
Add a new argument with the value of "--target-dir"
Add a new argument with the value of "hdfs://FDQServerName:PortNumber8020IsDefault/User/full/path/to/where/you/want/the/csv/file/placed/in/hdfs/NewFolderForThisTableHere" -- The last folder will be deleted and re-created each time you run the sqoop job.
Add a new argument with the value of "num-mappers"
Add a new argument with the value of "1"
Important:
A. The "Where $CONDITIONS" is critical to have at the end of the SQL Select statement in item 13. It will not run without it.
B. This uses a SQL Server Named User account with access to the DBServer Database and Table you want to Sqoop.
B. Entering arguments like this is required if your Named User does not have the default schema set to "dbo" or if the schema of your table is not the default schema for the database and user.
C. The SQL Server JDBC driver is placed correctly in your installation. For my particular version of Cloudera the location is: "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/sqoop/lib/sqljdbc41.jar" but you may also try putting it in either "/var/lib/oozie" or "/var/lib/sqoop"...not sure either of those work on their own.
D. I have not been successful at replacing the UserName and Password I hardcoded in as Arguments with values from a job.properties file. I believe it is possible but I have been unable to find anyone who can clearly show how to do it and days of brute force trial and error have been unsuccessful.
Here are screenshots showing what this looks like when done.
SqoopCommandAsArguments
SqoopCommandAsArgumentsSuccess

Related

Sqoop through JAVA API

We are trying to sqoop data from mysql to HDFS. When we run the code the data gets stored in local file system. We want the data to be in HDFS. Can any one suggest us with the following code?
SqoopOptions options = new SqoopOptions();
options.setConnectString("jdbc:mysql:hostname/db_name");
options.setUsername("user");
options.setPassword("pass");
options.setTableName("table");
options.setDirectMode(true);
options.setNumMappers(4);
options.setDriverClassName("com.mysql.jdbc.Driver");
options.setSqlQuery("select * from table");
options.setWhereClause("value > 15.0");
options.setTargetDir("output");
options.doHiveImport();
System.out.println();
int ret=new ImportTool().run(options);
System.out.println(ret);

I ran the same program in hdfs and got the output :)

Here the issue is with options.setTargetDir("output");
You are not specifying a qualifying HDFS path. If you change "output" with a valid HDFS path, you should be able to run the code from anywhere and still get a proper result.

Connect to Teradata Using Airflow JDBC Connection

I'm trying to execute a SqlSensor task in Airflow using a connection to Teradata database. The connection is configured as follow:
I have provide in particular 2 driver paths separated by ", " but I am not sure if it's the proper way to do it?
/home/airflow/java_sample/tdgssconfig.jar
/home/airflow/java_sample/terajdbc4.jar
When the DAG executes, it triggers the error message
[2017-08-02 02:32:45,162] {models.py:1342} INFO - Executing <Task(SqlSensor): check_running_batch> on 2017-08-02 02:32:12
[2017-08-02 02:32:45,179] {base_hook.py:67} INFO - Using connection to: jdbc:teradata://myservername.mycompanyname.org/database=MYDBNAME,TMODE=ANSI,CHARSET=UTF8
[2017-08-02 02:32:45,313] {sensors.py:109} INFO - Poking: SELECT BATCH_KEY FROM MYDBNAME.AUDIT_BATCH WHERE BATCH_OWNER='ARO_TEST' AND AUDIT_STATUS_KEY=1;
[2017-08-02 02:32:45,316] {base_hook.py:67} INFO - Using connection to: jdbc:teradata://myservername.mycompanyname.org/database=MYDBNAME,TMODE=ANSI,CHARSET=UTF8
[2017-08-02 02:32:45,497] {models.py:1417} ERROR - java.lang.RuntimeException: Class com.teradata.jdbc.TeraDriver not found
What am I doing wrong?

The appropriate way to input multiple jars in the connections page is to separate both fully qualified paths with a comma which you did above.
I can confirm this is the approach I took and it worked (Airflow 10.1.1 and 10.1.2).
See: https://github.com/apache/airflow/blob/master/airflow/hooks/jdbc_hook.py#L51
Bonus: If you use Ad Hoc Query in Data Profiling to test it out, you'll notice that you'll get an error when you send a SELECT statement because Airflow wraps it in a LIMIT clause which TD doesn't support.

The solution provided by my team member was to merge the two jar into a single jar file. After doing it and indicating that new jar file in the driver path, it worked as expected.
Here is the link to the JAR file: https://github.com/alexisrolland/linux-setup/blob/master/teradataDriverJdbc.jar
Here is a code snippet example to use the connection in a SQLSensor Task:
CheckRunningBatch = SqlSensor(
task_id='check_running_batch',
conn_id='ed_data_quality_edw_dev',
sql="SELECT CASE WHEN MAX(BATCH_KEY) IS NOT NULL THEN 0 ELSE 1 END FROM DATABASE.AUDIT_BATCH WHERE STATUS_KEY=1;",
poke_interval=300,
dag=dag)

Hive Browser Throwing Error

I am trying to put some basic query in hive editor in hue browser , but it is returning the following error whereas my Hivecli works fine and able to execute queries. Could someone help me?
Fetching results ran into the following error(s):
Bad status for request TFetchResultsReq(fetchType=1,
operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,
operationId=THandleIdentifier(secret='r\t\x80\xac\x1a\xa0K\xf8\xa4\xa0\x85?\x03!\x88\xa9',
guid='\x852\x0c\x87b\x7fJ\xe2\x9f\xee\x00\xc9\xeeo\x06\xbc')),
orientation=4, maxRows=-1):
TFetchResultsResp(status=TStatus(errorCode=0, errorMessage="Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]",
sqlState=None,
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]:24:23",
'org.apache.hive.service.cli.operation.OperationManager:getOperationLogRowSet:OperationManager.java:229',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:687',
'sun.reflect.GeneratedMethodAccessor14:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:606',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:415',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy19:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745'], statusCode=3), results=None,
hasMoreRows=None)

This error could be either due to HiveServer2 not running or Hue does not have access to hive_conf_dir.
Check whether the HiveServer2 has been started and is running. It uses the port 10000 by default.
netstat -ntpl | grep 10000
If it is not running, start the HiveServer2
$HIVE_HOME/bin/hiveserver2
Also check the Hue configuration file hue.ini. The hive_conf_dir property must be set under [beeswax] section. If not set, add this property under [beeswax]
hive_conf_dir=$HIVE_HOME/conf
Restart supervisor after making these changes.

Oracle Data Integrator SQL to HDFS IKM returns error

I am using ODI (12.1.3.0.0). I created topology for Oracle DB which is OK and I created topology for HDFS using File technology where I think the problem is in.
DataServer for HDFS, I left JDBC driver empty, and filled JDBC Url with hdfs://remotehostname:port
Physical Schema for HDFS, I filled both Schema and Work Schema with /my/path
Then created Logical Schema and Model. After that created Datastore under the model with these definitions.
Name: TestName
Resource Name: TESTFILE.txt
File Format: Fixed
After all these, created a project and a mapping under the project.
Finally when I run the mapping I see these errors:
ODI-1217: Session Oracle2HDFSMapping_Physical_SESS (15) fails with return code ODI-1298.
ODI-1226: Step Physical_STEP fails after 1 attempt(s).
ODI-1240: Flow Physical_STEP fails while performing a Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- operation. This flow loads target table null.
ODI-1298: Serial task "SERIAL-MAP_MAIN- (10)" failed because child task "SERIAL-EU-GGUSER_UNIT (20)" is in error.
ODI-1298: Serial task "SERIAL-EU-GGUSER_UNIT (20)" failed because child task "Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- (40)" is in error.
Caused By: java.io.IOException: Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at java.lang.Runtime.exec(Runtime.java:617)
at java.lang.Runtime.exec(Runtime.java:450)
at java.lang.Runtime.exec(Runtime.java:347)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:54)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:29)
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:52)
at oracle.odi.runtime.agent.execution.SessionTask.processTask(SessionTask.java:203)
at oracle.odi.runtime.agent.execution.SessionTask.doExecuteTask(SessionTask.java:114)
at oracle.odi.runtime.agent.execution.AbstractSessionTask.execute(AbstractSessionTask.java:886)
at oracle.odi.runtime.agent.execution.SessionExecutor$SerialTrain.runTasks(SessionExecutor.java:2198)
at oracle.odi.runtime.agent.execution.SessionExecutor.executeSession(SessionExecutor.java:591)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:718)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:611)
at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:203)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor.doProcessStartAgentTask(TaskExecutorAgentRequestProcessor.java:800)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.access$1400(StartSessRequestProcessor.java:74)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$StartSessTask.doExecute(StartSessRequestProcessor.java:702)
at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:180)
at oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$2.run(DefaultAgentTaskExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:385)
at java.lang.ProcessImpl.start(ProcessImpl.java:136)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 20 more
I wonder where I did it wrong?

For a file Datastore, you need to define the attributes (columns) by opening the Datastore and going on the attribute tab. If the file already exists, you can reverse-engineer the attributes and rename them and change the datatype if needed.
The error message you received for the second task mentions that the file (generated in the fist task) does not exist. So there might be a problem with the first task, probably due to the missing attributes in your datastore.
Here is a detailed article about SQL To HDFS file (Sqoop) KM written by the ODI A-Team : http://www.ateam-oracle.com/importing-data-from-sql-databases-into-hadoop-with-sqoop-and-oracle-data-integrator-odi/

DefaultDataPath is empty in VS2010 SQL2008 database deployment script

I have a VS2010 database project pointing to a SQL2005 database. When I deploy it, it correct picks up the DefaultDataPath from the SQL instance and everything works.
Today, I changed the project type from SQL205 to SQL2008 and changed the deploy properties to point to my SQL2008 server. However, now when I try to deploy, I get this error:
Error SQL01268: .Net SqlClient Data Provider: Msg 5105, Level 16, State 2, Line 1 A file activation error occurred. The physical file name '\AutoDeployedTRS.mdf' may be incorrect. Diagnose and correct additional errors, and retry the operation.
Error SQL01268: .Net SqlClient Data Provider: Msg 1802, Level 16, State 1, Line 1 CREATE DATABASE failed. Some file names listed could not be created. Check related errors.
An error occurred while the batch was being executed.
The reason for this error is that the SQL script created by VS contain these three lines:
:setvar DatabaseName "AutoDeployedTRS"
:setvar DefaultDataPath "\"
:setvar DefaultLogPath "\"
If I check the SQL Instance properties (through the UI or by reading the registry), they are set correctly so it seems like VS2010 can't pick them up for some reason.
Any ideas?

Try to go to location Schema Objects\Database level objects\Storage\Files.
There can be found two files:
Open file [your_database_name].sql and make parameter
FILENAME = '$(DefaultDataPath)$(DatabaseName).mdf'.
Then open file [your_database_name]_log.sql and make parameter
FILENAME = '$(DefaultDataPath)$(DatabaseName)_log.ldf'.
After that try to deploy your project. This parameters now are defined during deployment according to current target database path. Hope it will help you.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio