I am running below psypark code (pyspark version is 1.6.0)
from pyspark.sql import HiveContext
from pyspark import SparkContext, SparkConf
if __name__ == '__main__':
conf = SparkConf().setAppName('Testing')
sc = SparkContext(conf=conf)
hivec = HiveContext(sc)
df = hivec.sql("select * from product_replica where product_price>100")
df.write.option("compression","snappy").mode("overwrite").save("/user/cloudera/practice1/problem8/product/output",format="parquet")
sc.stop()
The output i am getting is as below, even though the parquet files are created in hdfs directory but i cannot read them with parque-tools. I get error .parquet file doesnt exist
[cloudera#quickstart sparkTransformationsPractice]$ spark-submit hive_context.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.13.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/hive/lib/hive-jdbc-1.1.0-cdh5.13.0-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/parquet/lib/parquet-format-2.1.0-cdh5.13.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/parquet/lib/parquet-hadoop-bundle-1.5.0-cdh5.13.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/parquet/lib/parquet-pig-bundle-1.5.0-cdh5.13.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[shaded.parquet.org.slf4j.helpers.NOPLoggerFactory] 19/12/11 11:21:14
WARN thread.QueuedThreadPool: 2 threads could not be stopped
Can you please explain what this output means, I think i might be doing something wrong
Related
I am using Hadoop 2.6.0 and oozie 5.2.0 version.
Trying to run example Map reduce job in oozie but getting below error.
hadoop1#ip-172-31-84-37:/usr/local/oozie-5.2.0/examples/target/examples/apps/map-reduce$ oozie job -oozie http://localhost:11000/oozie -config job.properties -run
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/oozie/oozie-5.2.0/embedded-oozie-server/webapp/WEB-INF/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/oozie/oozie-5.2.0/embedded-oozie-server/webapp/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/oozie/oozie-5.2.0/libext/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Error: E0307 : E0307: Runtime error [Could not check whether file [hdfs://localhost:8020/user/hadoop1/examples/apps/map-reduce/workflow.xml] exists on HDFS. Error message: Call From ip-172-31-84-37/172.31.84.37 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused]
error screenshot
Not sure if I have to change any configurations. I tried giving public dns in job.properties file instead of localhost but still got the same error. Any help is appreciated.
Call From ip-172-31-84-37/172.31.84.37 to localhost:8020 seems to say that your Oozie server (on localhost) is not running HDFS on port 8020. Use the wiki page in the error to debug your network setup...
Assuming you have an external HDFS address, then fs.defaultFS in $HADOOP_CONF_DIR/core-site.xml needs updated with the correct address.
I am trying to run Apache Kafka from git source but I am getting multiple SLF4J bindings warn which leads into a failure when using docker containers.
according to the documentation;
git clone --depth 1 --branch 3.1 https://github.com/apache/kafka.git
cd kafka
./gradlew clean
./gradlew dependencyUpdates
./gradlew jar -PscalaVersion=2.13.6
export UUID=$(./bin/kafka-storage.sh random-uuid); ./bin/kafka-storage.sh format -t $UUID -c ./config/kraft/server.properties
as a result I get:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/core/build/dependant-libs-2.13.6/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/tools/build/dependant-libs-2.13.6/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/trogdor/build/dependant-libs-2.13.6/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/api/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/transforms/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/runtime/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/file/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/mirror/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/mirror-client/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/json/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/basic-auth-extension/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/core/build/dependant-libs-2.13.6/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/tools/build/dependant-libs-2.13.6/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/trogdor/build/dependant-libs-2.13.6/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/api/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/transforms/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/runtime/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/file/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/mirror/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/mirror-client/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/json/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ubuntu/kafka/connect/basic-auth-extension/build/dependant-libs/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
any ideas on how to solve this issue?
The only issue here is that the kafka-storage script is loading the same jar multiple times.
The versions aren't different, so there's no actual problem. The bin/kafka-run-class script should be doing proper classpath isolation
You shouldn't need Gradle to run KRaft mode
I am new to hadoop, after installing Hive when I enter hive command in command prompt it giving me following error. Installed Hadoop version is 3.1.3. Derby is also installed with 10.12.1.1 version.
C:\apache-hive-2.1.0-bin\bin>hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-
2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-
1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only
errors to the console.
Connecting to jdbc:hive2://
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
Beeline version 2.1.0 by Apache Hive
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
Connection is already closed.
C:\apache-hive-2.1.0-bin\bin>
I use HDP3.1. And I Ambari to deploy hadoop cluster and hive. After deployed, I can run hive in shell successfully. And then I deploy Apache Kylin2.6, it can sync hive table. But when I build the cube, I got the following error:
java.io.IOException: OS command error exit with return code: 1, error message: SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://datacenter1:2181,datacenter2:2181,datacenter3:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
19/02/15 10:04:53 [main]: INFO jdbc.HiveConnection: Connected to datacenter3:10000
19/02/15 10:04:53 [main]: WARN jdbc.HiveConnection: Failed to connect to datacenter3:10000
19/02/15 10:04:53 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper
Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify dfs.replication at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
Cannot run commands specified using -e. No current connection
The command is:
hive -e "USE default;
I run hive command in shell. It's success. The connection string is same as the string when run build cube in kylin. I'm confused why it is success in shell but failed in building cube.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://datacenter1:2181,datacenter2:2181,datacenter3:2181/default;password=hdfs;serviceDiscoveryMode=zooKeeper;user=hdfs;zooKeeperNamespace=hiveserver2
19/02/15 12:10:19 [main]: INFO jdbc.HiveConnection: Connected to datacenter3:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
0: jdbc:hive2://datacenter1:2181,datacenter2:>
You can try to add these two properties to hive-site.xml.
<property>
<name>hive.security.authorization.sqlstd.confwhitelist</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
<property>
<name>hive.security.authorization.sqlstd.confwhitelist.append</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
Finally, I found the root cause. There is 'Cannot modify dfs.replication at runtime.' error message in the error log. Kylin set this property in $KYLIN_HOME/conf/kylin_hive_conf.xml. And when it is running hive command, it will auto append the properties in that file. The final command likes: hive --hiveconf dfs.replication=2 ..........
It looks like that dfs.replication property can't be appened to hive command. I removed this property in kylin_hive_conf.xml. And it works now.
Today I installed sqoop-1.4.6 in hadoop cluster, when I use sqoop import data to hdfs and it's ok, but when I see the sqoop log, it's only a little log and not print mapreduce to the log file. I think the reason from hadoop but I don't know how to resolve it.
The sqoop log info shows:
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found
binding in
[jar:file:/home/hadoop/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hbase/hbase-1.1.5/lib/phoenix-4.7.0-HBase-1.1-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hbase/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory] Note:
/tmp/sqoop-hadoop/compile/737c3aeaaf2b0b38c618b5a7bd2b3411/QueryResult.java
uses or overrides a deprecated API. Note: Recompile with
-Xlint:deprecation for details.