Apache Kylin and Sqoop - Is there a way to edit the Sqoop generated SQL statement? - sqoop

I am working with Apache Kylin and using Sqoop to connect to my PostgreSQL database. I have a cube created based on a fact table that references the same dimensional table twice. So the problem arises when I try to build the cube, I get the following error on the first step of the job (#1 Step Name: Sqoop To Flat Hive Table):
ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: table name "d_date" specified more than once
The problem is that Sqoop generates SQL and references the table d_date twice and gives it the same alias both times so the SQL statement fails... Can I configure it in any way to fix this issue?
Edit: If the answer is no, that is also helpful, I just really need to know whether there is anything I can do to fix this...
This is the generated SQL (bold is where the problem is):
SELECT f_exam.course_id as F_EXAM_COURSE_ID ,f_exam.academic_year_id as F_EXAM_ACADEMIC_YEAR_ID ,f_exam.semester_id as F_EXAM_SEMESTER_ID ,f_exam.exam_id as F_EXAM_EXAM_ID ,f_exam.exam_app_user_created_id as F_EXAM_EXAM_APP_USER_CREATED_ID ,f_exam.exam_available_from_date_id ,f_exam.exam_available_from_time_id as F_EXAM_EXAM_AVAILABLE_FROM_TIME_ID ,f_exam.exam_available_to_date_id as F_EXAM_EXAM_AVAILABLE_TO_DATE_ID ,f_exam.exam_available_to_time_id as F_EXAM_EXAM_AVAILABLE_TO_TIME_ID ,f_exam.exam_ordinal_id as F_EXAM_EXAM_ORDINAL_ID ,d_time_day.time_day_id as D_AVAILABLE_FROM_TIME_TIME_DAY_ID ,d_time_day.hour_minutes_seconds as D_AVAILABLE_FROM_TIME_HOUR_MINUTES_SECONDS ,d_time_day.the_seconds as D_AVAILABLE_FROM_TIME_THE_SECONDS ,d_time_day.the_minutes as D_AVAILABLE_FROM_TIME_THE_MINUTES ,d_time_day.the_hours as D_AVAILABLE_FROM_TIME_THE_HOURS ,d_time_day.period_of_day as D_AVAILABLE_FROM_TIME_PERIOD_OF_DAY ,d_time_day.time_day_id as D_AVAILABLE_TO_TIME_TIME_DAY_ID ,d_time_day.hour_minutes_seconds as D_AVAILABLE_TO_TIME_HOUR_MINUTES_SECONDS ,d_time_day.the_seconds as D_AVAILABLE_TO_TIME_THE_SECONDS ,d_time_day.the_minutes as D_AVAILABLE_TO_TIME_THE_MINUTES ,d_time_day.the_hours as D_AVAILABLE_TO_TIME_THE_HOURS ,d_time_day.period_of_day as D_AVAILABLE_TO_TIME_PERIOD_OF_DAY ,f_exam.number_of_questions as F_EXAM_NUMBER_OF_QUESTIONS ,f_exam.duration_in_seconds as F_EXAM_DURATION_IN_SECONDS ,f_exam.number_of_students_participated as F_EXAM_NUMBER_OF_STUDENTS_PARTICIPATED ,f_exam.is_forward_only_01 as F_EXAM_IS_FORWARD_ONLY_01 ,f_exam.max_score_possible as F_EXAM_MAX_SCORE_POSSIBLE ,f_exam.max_score as F_EXAM_MAX_SCORE ,f_exam.min_score as F_EXAM_MIN_SCORE ,f_exam.pass_percentage as F_EXAM_PASS_PERCENTAGE ,f_exam.max_score_percentage as F_EXAM_MAX_SCORE_PERCENTAGE ,f_exam.min_score_percentage as F_EXAM_MIN_SCORE_PERCENTAGE ,f_exam.avg_score as F_EXAM_AVG_SCORE ,f_exam.median as F_EXAM_MEDIAN ,f_exam.first_quartile as F_EXAM_FIRST_QUARTILE ,f_exam.third_quartile as F_EXAM_THIRD_QUARTILE ,f_exam.interquartile_range as F_EXAM_INTERQUARTILE_RANGE ,f_exam.minimum_without_outliers as F_EXAM_MINIMUM_WITHOUT_OUTLIERS ,f_exam.maximum_without_outliers as F_EXAM_MAXIMUM_WITHOUT_OUTLIERS FROM public.f_exam f_exam INNER JOIN public.d_course d_course ON f_exam.course_id = d_course.course_id INNER JOIN public.d_academic_year d_academic_year ON f_exam.academic_year_id = d_academic_year.academic_year_id INNER JOIN public.d_semester d_semester ON f_exam.semester_id = d_semester.semester_id INNER JOIN public.d_exam d_exam ON f_exam.exam_id = d_exam.exam_id INNER JOIN public.d_app_user d_app_user ON f_exam.exam_app_user_created_id = d_app_user.app_user_id INNER JOIN public.d_date d_date ON f_exam.exam_available_from_date_id = d_date.date_id INNER JOIN public.d_time_day d_time_day ON f_exam.exam_available_from_time_id = d_time_day.time_day_id INNER JOIN public.d_date d_date ON f_exam.exam_available_to_date_id = d_date.date_id INNER JOIN public.d_time_day d_time_day ON f_exam.exam_available_to_time_id = d_time_day.time_day_id INNER JOIN public.d_ordinal d_ordinal ON f_exam.exam_ordinal_id = d_ordinal.ordinal_id WHERE 1=1 AND (f_exam.exam_available_from_date_id >= 20120101 AND f_exam.exam_available_from_date_id < 20170101) AND (1 = 0)

The long long SQL is generated by Kylin and submitted to sqoop for execution. So you really need is to fix the duplicated alias of the two public.d_date d tables in Kylin model definition.
In Kylin model designer, around your fact table f_exam, there must be two public.d_date d. Set their alias to different names, save the model and try build again. This shall change the generated SQL and let the sqoop step pass.

Related

"Commands out of sync" error when trying to execute a procedure in MySQL 8

When executing below code in phpMyAdmin:
use db;
DELIMITER $$
DROP PROCEDURE IF EXISTS McaTest3$$
CREATE PROCEDURE McaTest3()
BEGIN
SELECT
cl.*
FROM `condition_library` cl
LEFT JOIN condition_custom cc on cl.condition_library_id = cc.condition_library_id
and cc.active = 1
AND (cc.permit_application_id = 20231 OR cc.permit_id = NULL)
WHERE FIND_IN_SET(cl.`condition_library_id`, '13070')
AND cl.active = 1
and cc.condition_library_id IS NULL;
END$$
DELIMITER ;
call McaTest3();
Getting error:
Error
Static analysis:
1 errors were found during analysis.
Missing expression. (near "ON" at position 25)
SQL query: Edit Edit
SET FOREIGN_KEY_CHECKS = ON;
MySQL said: Documentation
#2014 - Commands out of sync; you can't run this command now
This happens when there is no record found in the table which is at LEFT JOIN.
When the same is ran in MySQL Workbench: NO ERROR and return empty dataset.
Same procedure when executed from Application (Appian) is failing as well… Any clues?
Another question on Stackoverflow answered my issue:
link: MySQL error #2014 - Commands out of sync; you can't run this command now

confluent - kafka-connect - JDBC source connector - ORA-00933: SQL command not properly ended

I've the following sql query in my kafka jdbc source connector properties file :
query=SELECT * FROM JENNY.WORKFLOW where ID = '565231'
If I run the same query in sql developer, it works fine and fetching the results. But if I use the same query in the "jdbc_workflow_connect.properties", getting the following error :
(io.confluent.connect.jdbc.source.JdbcSourceTaskConfig:223)
[2018-09-19 12:32:15,130] INFO WorkerSourceTask{id=Workflow-DB-source-0}
Source task finished initialization and start
(org.apache.kafka.connect.runtime.WorkerSourceTask:158)
[2018-09-19 12:32:15,328] ERROR Failed to run query for table
TimestampIncrementingTableQuerier{name='null', query='SELECT * FROM
JENNY.WORKFLOW where ID = '565231'', topicPrefix='workflow_data1',
timestampColumn='null', incrementingColumn='ID'}: {}
(io.confluent.connect.jdbc.source.JdbcSourceTask:247)
java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1017)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:655)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:566)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:215)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:58)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:776)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:897)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1034)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3820)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3867)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1502)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.executeQuery(TimestampIncrementingTableQuerier.java:201)
at io.confluent.connect.jdbc.source.TableQuerier.maybeStartQuery(TableQuerier.java:84)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.maybeStartQuery(TimestampIncrementingTableQuerier.java:55)
at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:179)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Here is my JDBC source connector properties file content :
name=Workflow-DB-source
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
connection.password = ******
connection.url = jdbc:oracle:thin:#1.1.1.1:****/****
connection.user = *****
table.types=TABLE
query=SELECT * FROM JENNY.WORKFLOW where ID = '565231'
mode=incrementing
incrementing.column.name=ID
topic.prefix=workflow_data1
timestamp.delay.interval.ms=60000
transforms:createKey
transforms.createKey.type:org.apache.kafka.connect.transforms.ValueToKey
transforms.createKey.fields:ID
I'm using ojdbc7.jar
Observation :
If I remove the "WHERE" clause, the query is working fine(like below) :
SELECT * FROM JENNY.WORKFLOW
Please let me know if I'm doing something wrong or any modifications required for setting in the jdbc source connector.
Thanks in advance.
From the documentation of the JDBC Connect Configuration options you may read
If specified, the query to perform to select new or updated rows. Use this setting if you want to join tables, select subsets of columns in a table, or filter data. If used, this connector will only copy data using this query – whole-table copying will be disabled. Different query modes may still be used for incremental updates, but in order to properly construct the incremental query, it must be possible to append a WHERE clause to this query (i.e. no WHERE clauses may be used).
So if you realy want to consider only the part of the table with a given ID you must wrap the query as follows
select * from (SELECT * FROM JENNY.WORKFLOW where ID = '565231')
But please be sure you checked the documentation of the Configuration Options and you know the role of the query parameter.

Sonar 5.6 : cannot update issue

I try to update an issue on the UI (assign/set severity/open). But nothing append.
When i see the network exchange i have a 404:
{"errors":[{"msg":"Issue with key '76b53a17-fa8f-4d04-b999-1fd5e401fee0' does not exist"}]}
But i found my issue in my database (mysql):
mysql> select kee from issues where kee='76b53a17-fa8f-4d04-b999-1fd5e401fee0';
+--------------------------------------+
| kee |
+--------------------------------------+
| 76b53a17-fa8f-4d04-b999-1fd5e401fee0 |
+--------------------------------------+
1 row in set (0.00 sec)
We try to find the query executed my sonar. We just find it on the head version of sonar (IssueFinder and IBatis config) and it's work:
select i.id,
i.kee as kee,
i.rule_id as ruleId,
i.severity as severity,
i.manual_severity as manualSeverity,
i.message as message,
i.line as line,
i.locations as locations,
i.gap as gap,
i.effort as effort,
i.status as status,
i.resolution as resolution,
i.checksum as checksum,
i.assignee as assignee,
i.author_login as authorLogin,
i.tags as tagsString,
i.issue_attributes as issueAttributes,
i.issue_creation_date as issueCreationTime,
i.issue_update_date as issueUpdateTime,
i.issue_close_date as issueCloseTime,
i.created_at as createdAt,
i.updated_at as updatedAt,
r.plugin_rule_key as ruleKey,
r.plugin_name as ruleRepo,
r.language as language,
p.kee as componentKey,
i.component_uuid as componentUuid,
p.module_uuid as moduleUuid,
p.module_uuid_path as moduleUuidPath,
p.path as filePath,
root.kee as projectKey,
i.project_uuid as projectUuid,
i.issue_type as type
from issues i
inner join rules r on r.id=i.rule_id
inner join projects p on p.uuid=i.component_uuid
inner join projects root on root.uuid=i.project_uuid
where i.kee='76b53a17-fa8f-4d04-b999-1fd5e401fee0';
it's return one row.
What can i do? Is it a bug?
The ES folder is probably corrupted. Here are the steps to clean it up :
Stop the SonarQube server
Remove the {SONARQUBE_INSTALLATION}/data/es folder
Restart the server

SPARK SQL (1.5.1) connect to Oracle and write to Avro

I am using spark-sql to connect to oracle databse and getting data as dataframes. I would like to write this retrieved data into avro file. While writing to avro I am seeing multiple issues, could you help us.
Here is the code -
val df = sqlContext.read.format("jdbc")
.options(Map( "driver"->"oracle.jdbc.driver.OracleDriver",
"url" -> "jdbc:oracle:thin:user/password#host/service"
, "numPartitions" -> "1", "dbtable"-> "
(Select * from schema.table WHERE STAGE_NUM <=39 and
guid='I284ba1f9cdba11dea82ab9f4ee295c21')"))
.load()
df.write.format("com.databricks.spark.avro").save("Outputfile")
Dependencies that are there in my project -
<dependency><br> <groupId>org.apache.spark</groupId><br> <artifactId>spark-sql_2.10</artifactId><br> <version>1.5.1</version><br></dependency><br><dependency><br> <groupId>com.databricks</groupId><br> <artifactId>spark-avro_2.10</artifactId><br> <version>2.0.1</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro</artifactId><br> <version>1.7.7</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro-mapred</artifactId><br> <version>1.7.7</version><br></dependency>
Here is the exception information -
java.lang.RuntimeException: com.databricks.spark.avro.DefaultSource does not allow create table as select
If I use - df.write.avro("headnotes"), I get the following exception.
java.lang.IllegalAccessError: tried to access class org.apache.avro.SchemaBuilder$FieldDefault from class com.databricks.spark.avro.SchemaConverters$$anonfun$convertStructToAvro$1

Test the existence of a Teradata table and create the table if non-existent

Our Continuous Inegration server (Hudosn) is having a strange issue when attempting to run a simple create table statement in Teradata.
This statement tests the existence of the max_call table:
unless $teradata_connection.table_exists? :arm_custom_db__max_call_attempt_parameters
$teradata_connection.run('CREATE TABLE all_wkscratchpad_db.max_call_attempt_parameters AS (SELECT * FROM arm_custom_db.max_call_attempt_parameters ) WITH NO DATA')
end
The table_exists? method does the following:
def table_exists?(name)
v ||= false # only retry once
sch, table_name = schema_and_table(name)
name = SQL::QualifiedIdentifier.new(sch, table_name) if sch
from(name).first
true
rescue DatabaseError => e
if e.to_s =~ /Operation not allowed for reason code "7" on table/ && v == false
# table probably needs reorg
reorg(name)
v = true
retry
end
false
end
So as per the from(name).first line, the test which this method is performing is just a simple select statement, which, in SQL, looks like: SELECT TOP 1 MAX(CAST(MAX_CALL_ATTEMPT_CNT AS BIGINT)) FROM ALL_WKSCRATCHPAD_DB.MAX_CALL_ATTEMPT_PARAMETERS
The above SQL statement executes perfectly fine within Teradata SQL Assistant, so it's not a SQL syntax issue. The generic ID which our testing suite (Rubymine) uses is also not the issue; that ID has select access to the arm_custom_db.
The exeption which I can see is being thrown (within the builds console output on Hudson) is
Sequel::DatabaseError: Java::ComTeradataJdbcJdbc_4Util::JDBCException. Since this execption is a subclass of DatabaseError, the exception shouldn't be the problem either.
Also: We use unless statements like this every day for hundreds of different tables, and all except this one work correctly. This statement just seems to be a problem.
The complete error message which appears in the builds console output of Hudson is as follows:
[2015-01-07T13:56:37.947000 #16702] ERROR -- : Java::ComTeradataJdbcJdbc_4Util::JDBCException: [Teradata Database] [TeraJDBC 13.10.00.17] [Error 3807] [SQLState 42S02] Object 'ALL_WKSCRATCHPAD_DB.MAX_CALL_ATTEMPT_PARAMETERS' does not exist.: SELECT TOP 1 MAX(CAST(MAX_CALL_ATTEMPT_CNT AS BIGINT)) FROM ALL_WKSCRATCHPAD_DB.MAX_CALL_ATTEMPT_PARAMETERS
Sequel::DatabaseError: Java::ComTeradataJdbcJdbc_4Util::JDBCException: [Teradata Database] [TeraJDBC 13.10.00.17] [Error 3807] [SQLState 42S02] Object 'ALL_WKSCRATCHPAD_DB.MAX_CALL_ATTEMPT_PARAMETERS' does not exist.
I don't understand why this specific bit of code is giving me issues...there does not appear to be anything special about this table or database, and all SQL code executes perfectly fine in Teradata when I am signed in with the same exact user ID that is being used to execute the code from Hudson.

Resources