Docker dies executing JDBC Query - spring

I'm running a spring boot project in docker and I'm executing the following JDBC code:
SqlRowset rowset = jdbcTemplate.queryForRowSet(queryString);
The query string is the following:
SELECT *
FROM tableName
WHERE timestamp >= '2022-04-03 01:00:00.000000 -0700'
AND timestamp <= '2022-04-13 01:59:59.999999 -0700'
ORDER BY timestamp
LIMIT 2500
OFFSET 0
While executing this query through JDBC the docker container dies and deletes. I have other queries that work fine but for some reason this one that has dates included kills the docker machine. Just wondering if anyone knows why this would be happening?
The last log that gets spit out in the docker container is:
2022-06-20 19:56:00,985 DEBUG org.springframework.jdbc.core.JdbcTemplate :
Executing SQL query
[SELECT * FROM tableName WHERE timestamp >= '2022-04-03 01:00:00.000000 -0300' AND timestamp <= '2022-04-13 01:59:59.999999 -0300' ORDER BY timestamp LIMIT 2500 OFFSET 0]

I figured out what was wrong. There wasn't enough memory being allocated for the Docker container in my environment file.
I followed the directions in the link #eol provided:
You can try and change the entrypoint to keep the container running even after your spring-app crashes. You should then be able to see what's really going on. Check this for example: Keep Your Docker Container Running for Debugging
I removed our tag to remove the Docker container on a failure.
I SSH'd into the container and did some investigation. I discovered an out-of-memory error in the status "OOMKilled": "true". This led me to the conclusion that memory wasn't allocated to the size we needed.

Related

Failed to set ConnectorClientConfigOverridePolicy to All in Debezium mysql connector in docker

Trying to set ConnectorClientConfigOverridePolicy, by adding CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY=All. During start up debezium connector fails with matches all=All. Seems, CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY duplicates value, instead "All", value is "all=All"
Stopping due to error [org.apache.kafka.connect.cli.ConnectDistributed]
org.apache.kafka.connect.errors.ConnectException: Failed to find any class that implements interface org.apache.kafka.connect.connector.policy.ConnectorClientConfigOverridePolicy and which name matches all=All
Is it bug or I am doing something wrong?
Using debezium docker 1.5
this is partly bug and partly misconfiguration.
The env var should be named CONNECT_CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY=All
But at the same time the start script that processes all env vars named CONNECT_ does not check for the underscore so CONNECTOR... matches too which breaks the further logic.

Connect to Teradata Using Airflow JDBC Connection

I'm trying to execute a SqlSensor task in Airflow using a connection to Teradata database. The connection is configured as follow:
I have provide in particular 2 driver paths separated by ", " but I am not sure if it's the proper way to do it?
/home/airflow/java_sample/tdgssconfig.jar
/home/airflow/java_sample/terajdbc4.jar
When the DAG executes, it triggers the error message
[2017-08-02 02:32:45,162] {models.py:1342} INFO - Executing <Task(SqlSensor): check_running_batch> on 2017-08-02 02:32:12
[2017-08-02 02:32:45,179] {base_hook.py:67} INFO - Using connection to: jdbc:teradata://myservername.mycompanyname.org/database=MYDBNAME,TMODE=ANSI,CHARSET=UTF8
[2017-08-02 02:32:45,313] {sensors.py:109} INFO - Poking: SELECT BATCH_KEY FROM MYDBNAME.AUDIT_BATCH WHERE BATCH_OWNER='ARO_TEST' AND AUDIT_STATUS_KEY=1;
[2017-08-02 02:32:45,316] {base_hook.py:67} INFO - Using connection to: jdbc:teradata://myservername.mycompanyname.org/database=MYDBNAME,TMODE=ANSI,CHARSET=UTF8
[2017-08-02 02:32:45,497] {models.py:1417} ERROR - java.lang.RuntimeException: Class com.teradata.jdbc.TeraDriver not found
What am I doing wrong?
The appropriate way to input multiple jars in the connections page is to separate both fully qualified paths with a comma which you did above.
I can confirm this is the approach I took and it worked (Airflow 10.1.1 and 10.1.2).
See: https://github.com/apache/airflow/blob/master/airflow/hooks/jdbc_hook.py#L51
Bonus: If you use Ad Hoc Query in Data Profiling to test it out, you'll notice that you'll get an error when you send a SELECT statement because Airflow wraps it in a LIMIT clause which TD doesn't support.
The solution provided by my team member was to merge the two jar into a single jar file. After doing it and indicating that new jar file in the driver path, it worked as expected.
Here is the link to the JAR file: https://github.com/alexisrolland/linux-setup/blob/master/teradataDriverJdbc.jar
Here is a code snippet example to use the connection in a SQLSensor Task:
CheckRunningBatch = SqlSensor(
task_id='check_running_batch',
conn_id='ed_data_quality_edw_dev',
sql="SELECT CASE WHEN MAX(BATCH_KEY) IS NOT NULL THEN 0 ELSE 1 END FROM DATABASE.AUDIT_BATCH WHERE STATUS_KEY=1;",
poke_interval=300,
dag=dag)

Hive Browser Throwing Error

I am trying to put some basic query in hive editor in hue browser , but it is returning the following error whereas my Hivecli works fine and able to execute queries. Could someone help me?
Fetching results ran into the following error(s):
Bad status for request TFetchResultsReq(fetchType=1,
operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,
operationId=THandleIdentifier(secret='r\t\x80\xac\x1a\xa0K\xf8\xa4\xa0\x85?\x03!\x88\xa9',
guid='\x852\x0c\x87b\x7fJ\xe2\x9f\xee\x00\xc9\xeeo\x06\xbc')),
orientation=4, maxRows=-1):
TFetchResultsResp(status=TStatus(errorCode=0, errorMessage="Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]",
sqlState=None,
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]:24:23",
'org.apache.hive.service.cli.operation.OperationManager:getOperationLogRowSet:OperationManager.java:229',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:687',
'sun.reflect.GeneratedMethodAccessor14:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:606',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:415',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy19:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745'], statusCode=3), results=None,
hasMoreRows=None)
This error could be either due to HiveServer2 not running or Hue does not have access to hive_conf_dir.
Check whether the HiveServer2 has been started and is running. It uses the port 10000 by default.
netstat -ntpl | grep 10000
If it is not running, start the HiveServer2
$HIVE_HOME/bin/hiveserver2
Also check the Hue configuration file hue.ini. The hive_conf_dir property must be set under [beeswax] section. If not set, add this property under [beeswax]
hive_conf_dir=$HIVE_HOME/conf
Restart supervisor after making these changes.

LogStash::ConfigurationError: com.mysql.jdbc.Driver not loaded

When I use the logstash_input_jdbc plugin sync MySQL and my local elastic search,
The below errors appear, But I search for a long time, but I have no resolve method until now.
./logstash -f ./logstash_jdbc_test/jdbc.conf
Pipeline aborted due to error {:exception=>#,
:backtrace=>["/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.0.2/lib/logstash/plugin_mixins/jdbc.rb:156:in
prepare_jdbc_connection'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.0.2/lib/logstash/inputs/jdbc.rb:167:in
register'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:330:in
start_inputs'", "org/jruby/RubyArray.java:1613:ineach'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:329:in
start_inputs'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:180:in
start_workers'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:136:in
run'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/agent.rb:465:in
start_pipeline'"], :level=>:error}
Yesterday, I find the reason.
The reason is:
In my install path /elasticsearch-jdbc-2.3.2.0/lib, the size of mysql-connector-java-5.1.38.jar is zero.
So I download the new mysql-connector-java-5.1.38.jar, and copy to the path of /elasticsearch-jdbc-2.3.2.0/lib.
And then, my problem resolved.
Now I can sync date between mysql and elaticsearch quickly.

Hive issue using yarn

I am running hive sql on yarn,
it's throwing error with join condition , I am able to create External as well as internal table but failed to create table when use command
create table as AS SELECT name from student.
when running same query through hive cli it's working fine but with spring jog it throws error
2016-03-28 04:26:50,692 [Thread-17] WARN
org.apache.hadoop.hive.shims.HadoopShimsSecure - Can't fetch tasklog:
TaskLogServlet is not supported in MR2 mode.
Task with the most failures(4):
-----
Task ID:
task_1458863269455_90083_m_000638
-----
Diagnostic Messages for this Task:
AttemptID:attempt_1458863269455_90083_m_000638_3 Timed out after 1 secs
2016-03-28 04:26:50,842 [main] INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application
application_1458863269455_90083
2016-03-28 04:26:50,849 [main] ERROR com.mapr.fs.MapRFileSystem - Failed to
delete path maprfs:/home/pro/amit/warehouse/scratdir/hive_2016-03-28_04-
24-32_038_8553676376881087939-1/_task_tmp.-mr-10003, error: No such file or
directory (2)
2016-03-28 04:26:50,852 [main] ERROR org.apache.hadoop.hive.ql.Driver -
FAILED: Execution Error, return code 2 from
As per my findings I think there is some issue with scratdir.
Kindly suggest if any one face same issue.
This issue occurs if the recursive directory doesnot exist. Hive doesnt automatically create directories recursively.
Please check existence of directories to child\table level from root
I faced a similar issue while running the below Hive query
select * from <db_name>.<internal_tbl_name> where <field_name_of_double_type> in (<list_of_double_values>) order by <list_of_order_fields> limit 10;
I performed an explain on the above statement and below was the result.
fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2d00001590]: it still exists.
2017-05-08 04:26:37,969 WARN [41289638-cd53-4d4b-88c9-3359e9ec99e2 main] fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2700001591]: it still exists.
Time taken: 0.886 seconds, Fetched: 24 row(s)
And checked the logs through
yarn logs -applicationID application_1458863269455_90083
The error happened after a MapR upgrade from the admin team. It is probably due to some upgrade or installation issue and Tez configurations (as suggested by the line 873 in log below). Or probably, the Hive query is syntactically not supporting the Tez optimization. Saying so, because another Hive query on an external table is running fine in my case. Have to check a bit deeper though.
Though not sure but the error line in the logs that looks to be most relevant is as follows:
2017-05-08 00:01:47,873 [ERROR] [main] |web.WebUIService|: Tez UI History URL is not set
Solution:
It is probably happening due to some open files or applications that are using some resources. Pls check https://unix.stackexchange.com/questions/11238/how-to-get-over-device-or-resource-busy
You can run the explain <your_Hive_statement>
In the result execution plan, you can come across the filenames/dirs that Hive execution engine fails to delete e.g.
2017-05-08 04:26:37,969 WARN [41289638-cd53-4d4b-88c9-3359e9ec99e2 main] fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2d00001590]: it still exists.
Go to the path given in the step 2 e.g. /hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/
In path 3, doing ls -a or lsof +D /path will show the open process_ids blocking the files from delete.
If you run ps -ef | grep <pid>, you get
hive_username <pid> 19463 1 05:19 pts/8 00:00:35 /opt/mapr/tools/jdk1.7.0_51/jre/bin/java -Xmx256m -Dhiveserver2.auth=PAM -Dhiveserver2.authentication.pam.services=login -Dmapr_sec_enabled=true -Dhadoop.login=maprsasl -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.id.str=hive_username -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dlog4j.configurationFile=hive-log4j2.properties -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/opt/mapr/hive/hive-2.1/bin/../conf/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.saslprovider=com.mapr.security.maprsasl.MaprSaslProvider -Djavax.net.ssl.trustStore=/opt/mapr/conf/ssl_truststore org.apache.hadoop.util.RunJar /opt/mapr/hive/hive-2.1//lib/hive-cli-2.1.1-mapr-1703.jar org.apache.hadoop.hive.cli.CliDriver
CONCLUSION:
The HiveCLiDriver clearly shows that running "Hive on Spark" (or managed) tables through Hive CLI is not supported any more from Hive 2.0 onwards and it is going to be deprecated going forward. You have to use HiveContext in Spark for running Hive queries. But you can still run queries on Hive external tables through Hive CLI.

Resources