Sqoop Import keeps on running on simple import

Sqoop Import keeps on running on simple import - hadoop

I was trying to execute simple sqoop import on my localhost single node hadoop cluster but sqoop import command keeps on running. However when I am executing Sqoop --eval command it works fine. Can anybody help me? Commands and output are as below:
hdoop#satyam-VirtualBox:~$ sqoop import
--connect "jdbc:mysql://localhost:3306/hdoop?useSSL=false"
--username root
--password root
--table emp
--target-dir /user/sqoop/emp
Warning: /home/hdoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hdoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hdoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hdoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/home/hdoop/hadoop-3.2.1/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: invalid variable name
/home/hdoop/hadoop-3.2.1/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: invalid variable name
2023-02-17 14:05:40,223 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2023-02-17 14:05:40,321 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2023-02-17 14:05:40,437 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2023-02-17 14:05:40,437 INFO tool.CodeGenTool: Beginning code generation
2023-02-17 14:05:40,988 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1
2023-02-17 14:05:41,167 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1
2023-02-17 14:05:41,182 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hdoop/hadoop-3.2.1
Note: /tmp/sqoop-hdoop/compile/a51ad891661499fbe0ae5f8f1561fe1a/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2023-02-17 14:05:46,831 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdoop/compile/a51ad891661499fbe0ae5f8f1561fe1a/emp.jar
2023-02-17 14:05:46,858 WARN manager.MySQLManager: It looks like you are importing from mysql.
2023-02-17 14:05:46,858 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2023-02-17 14:05:46,858 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2023-02-17 14:05:46,859 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2023-02-17 14:05:46,869 INFO mapreduce.ImportJobBase: Beginning import of emp
2023-02-17 14:05:46,870 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2023-02-17 14:05:47,038 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2023-02-17 14:05:49,801 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2023-02-17 14:05:49,954 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
2023-02-17 14:05:51,144 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hdoop/.staging/job_1676616824652_0005
2023-02-17 14:05:51,358 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,595 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,745 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,869 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,972 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,046 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,117 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,170 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,260 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,404 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:53,116 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:53,486 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:57,153 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:57,979 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:58,260 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:58,953 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:59,259 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:59,498 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:59,881 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:00,090 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:00,324 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:01,008 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:01,349 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:03,067 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:04,058 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:04,821 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:05,708 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:06,831 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:07,953 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:08,450 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:09,294 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:09,970 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:13,460 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:13,555 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,020 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,075 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,154 INFO db.DBInputFormat: Using read commited transaction isolation
2023-02-17 14:06:14,154 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(eno), MAX(eno) FROM emp
2023-02-17 14:06:14,163 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 1 to: 8
2023-02-17 14:06:14,216 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,275 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,306 INFO mapreduce.JobSubmitter: number of splits:4
2023-02-17 14:06:14,692 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,726 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1676616824652_0005
2023-02-17 14:06:14,726 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-02-17 14:06:15,076 INFO conf.Configuration: resource-types.xml not found
2023-02-17 14:06:15,077 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-02-17 14:06:15,386 INFO impl.YarnClientImpl: Submitted application application_1676616824652_0005
2023-02-17 14:06:15,442 INFO mapreduce.Job: The url to track the job: http://satyam-VirtualBox:8088/proxy/application_1676616824652_0005/
2023-02-17 14:06:15,444 INFO mapreduce.Job: Running job: job_1676616824652_0005
Note : After this it keeps on running. I tried clearing /tmp files after stopping all services and then restarting all services but nothing worked.
I also tried restarting virtual machine but it did not work. Please help in running sqoop import smoothly.

Related

Not able to establish connection to the kerberos and SASL enabled kafka producer

I am trying to connect to kafka producer that is kerberos and SSL enabled,
Here is the properties.yml
spring:
autoconfigure:
exclude[0]: org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfiguration
exclude[1]: org.springframework.boot.actuate.autoconfigure.security.servlet.ManagementWebSecurityAutoConfiguration
kafka:
topics:
- name: SOME_TOPIC
num-partitions: 5
replication-factor: 1
bootstrap-servers:
- xxx:9092
- yyy:9092
- zzz:9092
autoCreateTopics: false
template:
default-topic: SOME_TOPIC
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
properties:
security:
protocol: SASL_SSL
ssl:
enabled:
protocols: TLSv1.2
truststore:
location: C:\\resources\\truststorecred.jks
password: truststorepass
type: JKS
sasl:
mechanism: GSSAPI
kerberos:
service:
name: kafka
and VM options as follow.
-Djava.security.auth.login.config=C:\jaas.conf
-Djava.security.krb5.conf=C:\resources\krb5.ini
Jaas.conf as follow
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab="C:\\resources\\serviceacc#xxx.keytab"
principal="serviceacc#xxx.COM"
useTicketCache=true
serviceName="kafka";
};
able to logged in to kerberos but immediate it is failing with below exception.
bootstrap.servers = [xxxx.com:9092, yyyy.com:9092, zzzz.com:9092]
client.id =
connections.max.idle.ms = 300000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 120000
retries = 5
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = kafka
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = SASL_SSL
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = C:\\resources\\truststorecred.jks
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
2019-12-21 14:56:16.115 INFO 24216 --- [ main] o.a.k.c.s.authenticator.AbstractLogin : Successfully logged in.
2019-12-21 14:56:16.117 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT refresh thread started.
2019-12-21 14:56:16.118 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT valid starting at: Sat Dec 21 14:56:15 IST 2019
2019-12-21 14:56:16.119 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT expires: Sun Dec 22 00:56:15 IST 2019
2019-12-21 14:56:16.119 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT refresh sleeping until: Sat Dec 21 23:13:36 IST 2019
2019-12-21 14:56:16.912 INFO 24216 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 1.0.2
2019-12-21 14:56:16.912 INFO 24216 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : 2a121f7b1d402825
2019-12-21 14:56:22.085 WARN 24216 --- [| adminclient-1] o.a.k.common.network.SslTransportLayer : Failed to send SSL Close message
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.write0(Native Method) ~[na:1.8.0_191]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51) ~[na:1.8.0_191]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_191]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_191]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_191]
at org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:213) ~[kafka-clients-1.0.2.jar:na]
at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:176) ~[kafka-clients-1.0.2.jar:na]
at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:703) [kafka-clients-1.0.2.jar:na]
at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:61) [kafka-clients-1.0.2.jar:na]
at org.apache.kafka.common.network.Selector.doClose(Selector.java:741) [kafka-clients-1.0.2.jar:na]
at org.apache.kafka.common.network.Selector.close(Selector.java:729) [kafka-clients-1.0.2.jar:na]
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:522) [kafka-clients-1.0.2.jar:na]
at org.apache.kafka.common.network.Selector.poll(Selector.java:412) [kafka-clients-1.0.2.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) [kafka-clients-1.0.2.jar:na]
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1006) [kafka-clients-1.0.2.jar:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
2019-12-21 14:56:22.087 WARN 24216 --- [| adminclient-1] org.apache.kafka.clients.NetworkClient : [AdminClient clientId=adminclient-1] Connection to node -2 terminated during authentication. This may indicate that authentication failed due to invalid credentials.
2019-12-21 14:56:26.598 WARN 24216 --- [| adminclient-1] o.a.k.common.network.SslTransportLayer : Failed to send SSL Close message
Help would be greatly appreciated.
Thank you

Just a small change worked for me
security.protocol: SASL_PLAINTEXT

Configure GremlinServer to JanusGraph with HBase and Elasticsearch

Can't create instance of GremlinServer with HBase and Elasticsearch.
When i run shell script: bin/gremlin-server.sh config/gremlin.yaml. I get exception:
Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build()
Gremlin-server logs
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/user/janusgraph/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/user/janusgraph/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
0 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
135 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Configuring Gremlin Server from config/gremlin.yaml
211 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
557 [main] INFO org.janusgraph.diskstorage.hbase.HBaseCompatLoader - Instantiated HBase compatibility layer supporting runtime HBase version 1.2.6: org.janusgraph.diskstorage.hbase.HBaseCompat1_0
835 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - HBase configuration: setting zookeeper.znode.parent=/hbase-unsecure
836 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: main.local,data1.local,data2.local
836 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied Zookeeper Port from root.storage.port to hbase.zookeeper.property.clientPort: 2181
866 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1214 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x1e44b638 connecting to ZooKeeper ensemble=main.local:2181,data1.local:2181,data2.local:2181
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:host.name=main.local
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_212
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/jre
1221 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/home/user/janusgraph/conf/gremlin-server:/home/user/janusgraph/lib/slf4j-log4j12-
// Here hanusgraph download very many dependencies
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.version=3.10.0-862.el7.x86_64
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.name=user
1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.home=/home/user
1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/home/user/janusgraph
1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=main.local:2181,data1.local:2181,data2.local:2181 sessionTimeout=90000 watcher=hconnection-0x1e44b6380x0, quorum=main.local:2181,data1.local:2181,data2.local:2181, baseZNode=/hbase-unsecure
1274 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server data2.local/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate using SASL (unknown error)
1394 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to data2.local/xxx.xxx.xxx.xxx, initiating session
1537 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server data2.local/xxx.xxx.xxx.xxx:2181, sessionid = 0x26b266353e50014, negotiated timeout = 60000
3996 [main] INFO org.janusgraph.core.util.ReflectiveConfigOptionLoader - Loaded and initialized config classes: 13 OK out of 13 attempts in PT0.631S
4103 [main] INFO org.reflections.Reflections - Reflections took 60 ms to scan 2 urls, producing 0 keys and 0 values
4400 [main] WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting cache.db-cache-time=180000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000). Use the ManagementSystem interface instead of the local configuration to control this setting.
4453 [main] WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting cache.db-cache-clean-wait=20 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (50). Use the ManagementSystem interface instead of the local configuration to control this setting.
4473 [main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing master protocol: MasterService
4474 [main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing zookeeper sessionid=0x26b266353e50014
4485 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Session: 0x26b266353e50014 closed
4485 [main-EventThread] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - EventThread shut down
4500 [main] INFO org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Generated unique-instance-id=c0a8873843641-main-local1
4530 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - HBase configuration: setting zookeeper.znode.parent=/hbase-unsecure
4530 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: main.local,data1.local,data2.local
4531 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied Zookeeper Port from root.storage.port to hbase.zookeeper.property.clientPort: 2181
4532 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x5bb3d42d connecting to ZooKeeper ensemble=main.local:2181,data1.local:2181,data2.local:2181
4532 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=main.local:2181,data1.local:2181,data2.local:2181 sessionTimeout=90000 watcher=hconnection-0x5bb3d42d0x0, quorum=main.local:2181,data1.local:2181,data2.local:2181, baseZNode=/hbase-unsecure
4534 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server main.local/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate using SASL (unknown error)
4534 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to main.local/xxx.xxx.xxx.xxx:2181, initiating session
4611 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server main.local/xxx.xxx.xxx.xxx:2181, sessionid = 0x36b266353fd0021, negotiated timeout = 60000
4616 [main] INFO org.janusgraph.diskstorage.Backend - Configuring index [search]
5781 [main] INFO org.janusgraph.diskstorage.Backend - Initiated backend operations thread pool of size 16
6322 [main] INFO org.janusgraph.diskstorage.Backend - Configuring total store cache size: 186687592
7555 [main] INFO org.janusgraph.graphdb.database.IndexSerializer - Hashing index keys
7925 [main] INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog - Loaded unidentified ReadMarker start time 2019-06-13T09:54:08.929Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller#656d10a4
7927 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] was successfully configured via [config/db.properties].
7927 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build()
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.initializeGremlinScriptEngineManager(GremlinExecutor.java:522)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.<init>(GremlinExecutor.java:126)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.<init>(GremlinExecutor.java:83)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor$Builder.create(GremlinExecutor.java:813)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:169)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:363)
Caused by: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build()
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.initializeGremlinScriptEngineManager(GremlinExecutor.java:492)
... 7 more
Graph configuration:
storage.backend=hbase
storage.hostname=main.local,data1.local,data2.local
storage.port=2181
storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure
cache.db-cache=true
cache.db-cache-clean-wait=20
cache.db-cache-time=180000
cache.db-cache-size=0.5
index.search.backend=elasticsearch
index.search.hostname=xxx.xxx.xxx.xxx
index.search.port=9200
index.search.elasticsearch.client-only=false
gremlin.graph=org.janusgraph.core.JanusGraphFactory
host=0.0.0.0
Gremlin-server configuration
host: localhost
port: 8182
channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer
graphs: { graph: config/db.properties }
scriptEngines: {
gremlin-groovy: {
plugins: {
org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: { classImports: [java.lang.Math], methodImports: [java.lang.Math#*] },
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: { files: [scripts/janusgraph.groovy] }
}
}
}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] } }
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true } }
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] } }
metrics: {
slf4jReporter: {enabled: true, interval: 180000}
}
What do I need to do to server start without error?

Flume ng / Avro source, memory channel and HDFS Sink - Too many small files

I'm facing a strange issue.
I'm looking to aggregate a lot of information from flume to an HDFS.
I applied recommanded configuration to avoid a too many small files, but it didn't works.
Here is my configuration file.
# single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5458
a1.sources.r1.threads = 20
# Describe the HDFS sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://myhost:myport/user/myuser/flume/events/%{senderType}/%{senderName}/%{senderEnv}/%y-%m-%d/%H%M
a1.sinks.k1.hdfs.filePrefix = logs-
a1.sinks.k1.hdfs.fileSuffix = .jsonlog
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.batchSize = 100
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#never roll-based on time
a1.sinks.k1.hdfs.rollInterval=0
##10MB=10485760, 128MB=134217728, 256MB=268435456
a1.sinks.kl.hdfs.rollSize=10485760
##never roll base on number of events
a1.sinks.kl.hdfs.rollCount=0
a1.sinks.kl.hdfs.round=false
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 5000
a1.channels.c1.transactionCapacity = 1000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
This configuration works, and I see my files. But the weight average of the file is 1.5kb.
Flume console output provide this kind of information.
16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Renaming hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp to hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog
16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Renaming hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp to hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog
16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484509.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484509.jsonlog.tmp
Someone has an idea about the issue ?
Here is some info about flume behavior.
The command is
flume-ng agent -n a1 -c /path/to/flume/conf --conf-file sample-flume.conf -Dflume.root.logger=TRACE,console -Xms8192m -Xmx16384m
Note : the logger directive doesn't works. I don't understand why but I'm ...
Flume starting output is :
16/08/03 15:32:55 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
16/08/03 15:32:55 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:sample-flume.conf
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
16/08/03 15:32:55 INFO node.AbstractConfigurationProvider: Creating channels
16/08/03 15:32:55 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
16/08/03 15:32:55 INFO node.AbstractConfigurationProvider: Created channel c1
16/08/03 15:32:55 INFO source.DefaultSourceFactory: Creating instance of source r1, type avro
16/08/03 15:32:55 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: hdfs
16/08/03 15:32:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
16/08/03 15:32:56 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1]
16/08/03 15:32:56 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:Avro source r1: { bindAddress: 0.0.0.0, port: 5458 } }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#466ab18a counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
16/08/03 15:32:56 INFO node.Application: Starting Channel c1
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/08/03 15:32:56 INFO node.Application: Starting Sink k1
16/08/03 15:32:56 INFO node.Application: Starting Source r1
16/08/03 15:32:56 INFO source.AvroSource: Starting Avro source r1: { bindAddress: 0.0.0.0, port: 5458 }...
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
16/08/03 15:32:56 INFO source.AvroSource: Avro source r1 started.
Since I cannot have more verbose output, I have to suppose that information like
[...]
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
[...]
indicates that the sinks is correctly configured.
PS : I saw followings answers but none of those works (I should miss something ...).
flume-hdfs-sink-generates-lots-of-tiny-files-on-hdfs
too-many-small-files-hdfs-sink-flume
flume-tiering-data-flows-using-the-avro-source-and-sink
flume-hdfs-sink-keeps-rolling-small-files

Increase batch size as per your requirement
a1.sinks.k1.hdfs.batchSize =

Flume :Exec source cat command is not writing on HDFS

i'm trying to write data into Hdfs using Flume-ng for exec source.But it always ended with exit code 127.and it's also showing warning like
Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null).
This is exec.conf file
execAgent.sources=e
execAgent.channels=memchannel
execAgent.sinks=HDFS
execAgent.sources.e.type=org.apache.flume.source.ExecSource
execAgent.sources.e.channels=memchannel
execAgent.sources.e.shell=/bin/bash
execAgent.sources.e.command=tail -f /home/sample.txt
execAgent.sinks.HDFS.type=hdfs
execAgent.sinks.HDFS.channel=memchannel
execAgent.sinks.HDFS.hdfs.path=hdfs://ip:address:port/user/flume/
execAgent.sinks.HDFS.hdfs.fileType=DataStream
execAgent.sinks.HDFS.hdfs.writeFormat=Text
execAgent.channels.memchannel.type=file
execAgent.channels.memchannel.capacity=1000
execAgent.channels.memchannel.transactionCapacity=100
execAgent.sources.e.channels=memchannel
execAgent.sinks.HDFS.channel=memchannel
this is the output i'm getting on console
15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:exec.conf
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: execAgent
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:55 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [execAgent]
15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Creating channels
15/04/17 06:24:55 INFO channel.DefaultChannelFactory: Creating instance of channel memchannel type file
15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Created channel memchannel
15/04/17 06:24:55 INFO source.DefaultSourceFactory: Creating instance of source e, type org.apache.flume.source.ExecSource
15/04/17 06:24:55 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
15/04/17 06:24:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
15/04/17 06:24:56 INFO node.AbstractConfigurationProvider: Channel memchannel connected to [e, HDFS]
15/04/17 06:24:56 INFO node.Application: Starting new configuration:{ sourceRunners:{e=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:e,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#2577d2c2 counterGroup:{ name:null counters:{} } }} channels:{memchannel=FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }} }
15/04/17 06:24:56 INFO node.Application: Starting Channel memchannel
15/04/17 06:24:56 INFO file.FileChannel: Starting FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }...
15/04/17 06:24:56 INFO file.Log: Encryption is not enabled
15/04/17 06:24:56 INFO file.Log: Replay started
15/04/17 06:24:56 INFO file.Log: Found NextFileID 0, from []
15/04/17 06:24:56 INFO file.EventQueueBackingStoreFile: Preallocated /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 to 16232 for capacity 1000
15/04/17 06:24:56 INFO file.EventQueueBackingStoreFileV3: Starting up with /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 and /root/.flume/file-channel/checkpoint/checkpoint_1429251896225.meta
15/04/17 06:24:57 INFO file.Log: Last Checkpoint Fri Apr 17 06:24:56 UTC 2015, queue depth = 0
15/04/17 06:24:57 INFO file.Log: Replaying logs with v2 replay logic
15/04/17 06:24:57 INFO file.ReplayHandler: Starting replay of []
15/04/17 06:24:57 INFO file.ReplayHandler: read: 0, put: 0, take: 0, rollback: 0, commit: 0, skip: 0, eventCount:0
15/04/17 06:24:57 INFO file.Log: Rolling /root/.flume/file-channel/data
15/04/17 06:24:57 INFO file.Log: Roll start /root/.flume/file-channel/data
15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)
15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Direct Memory Allocation: Allocation = 1048576, Allocated = 0, MaxDirectMemorySize = 18874368, Remaining = 18874368
15/04/17 06:24:57 INFO file.LogFile: Opened /root/.flume/file-channel/data/log-1
15/04/17 06:24:57 INFO file.Log: Roll end
15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Start checkpoint for /root/.flume/file-channel/checkpoint/checkpoint_1429251896225, elements to sync = 0
15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1429251897136, queueSize: 0, queueHead: 0
15/04/17 06:24:57 INFO file.Log: Updated checkpoint for file: /root/.flume/file-channel/data/log-1 position: 0 logWriteOrderID: 1429251897136
15/04/17 06:24:57 INFO file.FileChannel: Queue Size after replay: 0 [channel=memchannel]
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: memchannel, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memchannel started
15/04/17 06:24:57 INFO node.Application: Starting Sink HDFS
15/04/17 06:24:57 INFO node.Application: Starting Source e
15/04/17 06:24:57 INFO source.ExecSource: Exec source starting with command:tail -f /home/sample.txt
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SOURCE, name: e, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: e started
15/04/17 06:24:57 INFO source.ExecSource: Command [tail -f /home/brillio/sample.txt] exited with 127

From the source documentation
1) Modify the parameter : execAgent.sources.e.type to exec
2) Remove the execAgent.sources.e.shell parameter from your configuration

Check permission to see if user can run tail -f /home/brillio/sample.txt on your target dir

org.apache.solr.common.SolrException: Not Found

I want to do a web crawler using Nutch 1.9 and Solr 4.10.2
The crawling is working but when it comes to indexing there is a problem. I looked for the problem and I tried so many methods but nothing seem to work. This is what I get:
Indexer: starting at 2015-03-13 20:51:08
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
And when I see the log file this is what I get:
2015-03-13 20:51:08,768 INFO indexer.IndexingJob - Indexer: starting at 2015-03-13 20:51:08
2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: deleting gone documents: false
2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: URL filtering: false
2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: URL normalizing: false
2015-03-13 20:51:09,117 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-03-13 20:51:09,117 INFO indexer.IndexingJob - Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
2015-03-13 20:51:09,121 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: testCrawl/crawldb
2015-03-13 20:51:09,122 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: testCrawl/linkdb
2015-03-13 20:51:09,122 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311221258
2015-03-13 20:51:09,234 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311222328
2015-03-13 20:51:09,235 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311222727
2015-03-13 20:51:09,236 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150312085908
2015-03-13 20:51:09,282 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-03-13 20:51:09,747 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2015-03-13 20:51:20,904 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: content dest: content
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: title dest: title
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: host dest: host
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: segment dest: segment
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: boost dest: boost
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: digest dest: digest
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2015-03-13 20:51:21,192 INFO solr.SolrIndexWriter - Indexing 250 documents
2015-03-13 20:51:21,192 INFO solr.SolrIndexWriter - Deleting 0 documents
2015-03-13 20:51:21,342 INFO solr.SolrIndexWriter - Indexing 250 documents
2015-03-13 20:51:21,437 WARN mapred.LocalJobRunner - job_local1194740690_0001
org.apache.solr.common.SolrException: Not Found
Not Found
request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:135)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:458)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2015-03-13 20:51:21,607 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
So please any help?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sqoop Import keeps on running on simple import - hadoop

Related

Not able to establish connection to the kerberos and SASL enabled kafka producer

Configure GremlinServer to JanusGraph with HBase and Elasticsearch

Flume ng / Avro source, memory channel and HDFS Sink - Too many small files

Flume :Exec source cat command is not writing on HDFS

org.apache.solr.common.SolrException: Not Found

Categories

Resources