Sqoop Import keeps on running on simple import - hadoop
I was trying to execute simple sqoop import on my localhost single node hadoop cluster but sqoop import command keeps on running. However when I am executing Sqoop --eval command it works fine. Can anybody help me? Commands and output are as below:
hdoop#satyam-VirtualBox:~$ sqoop import
--connect "jdbc:mysql://localhost:3306/hdoop?useSSL=false"
--username root
--password root
--table emp
--target-dir /user/sqoop/emp
Warning: /home/hdoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hdoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hdoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hdoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/home/hdoop/hadoop-3.2.1/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: invalid variable name
/home/hdoop/hadoop-3.2.1/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: invalid variable name
2023-02-17 14:05:40,223 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2023-02-17 14:05:40,321 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2023-02-17 14:05:40,437 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2023-02-17 14:05:40,437 INFO tool.CodeGenTool: Beginning code generation
2023-02-17 14:05:40,988 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1
2023-02-17 14:05:41,167 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1
2023-02-17 14:05:41,182 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hdoop/hadoop-3.2.1
Note: /tmp/sqoop-hdoop/compile/a51ad891661499fbe0ae5f8f1561fe1a/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2023-02-17 14:05:46,831 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdoop/compile/a51ad891661499fbe0ae5f8f1561fe1a/emp.jar
2023-02-17 14:05:46,858 WARN manager.MySQLManager: It looks like you are importing from mysql.
2023-02-17 14:05:46,858 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2023-02-17 14:05:46,858 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2023-02-17 14:05:46,859 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2023-02-17 14:05:46,869 INFO mapreduce.ImportJobBase: Beginning import of emp
2023-02-17 14:05:46,870 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2023-02-17 14:05:47,038 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2023-02-17 14:05:49,801 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2023-02-17 14:05:49,954 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
2023-02-17 14:05:51,144 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hdoop/.staging/job_1676616824652_0005
2023-02-17 14:05:51,358 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,595 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,745 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,869 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:51,972 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,046 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,117 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,170 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,260 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:52,404 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:53,116 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:53,486 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:57,153 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:57,979 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:58,260 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:58,953 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:59,259 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:59,498 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:05:59,881 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:00,090 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:00,324 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:01,008 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:01,349 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:03,067 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:04,058 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:04,821 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:05,708 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:06,831 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:07,953 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:08,450 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:09,294 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:09,970 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:13,460 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:13,555 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,020 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,075 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,154 INFO db.DBInputFormat: Using read commited transaction isolation
2023-02-17 14:06:14,154 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(eno), MAX(eno) FROM emp
2023-02-17 14:06:14,163 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 1 to: 8
2023-02-17 14:06:14,216 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,275 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,306 INFO mapreduce.JobSubmitter: number of splits:4
2023-02-17 14:06:14,692 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-17 14:06:14,726 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1676616824652_0005
2023-02-17 14:06:14,726 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-02-17 14:06:15,076 INFO conf.Configuration: resource-types.xml not found
2023-02-17 14:06:15,077 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-02-17 14:06:15,386 INFO impl.YarnClientImpl: Submitted application application_1676616824652_0005
2023-02-17 14:06:15,442 INFO mapreduce.Job: The url to track the job: http://satyam-VirtualBox:8088/proxy/application_1676616824652_0005/
2023-02-17 14:06:15,444 INFO mapreduce.Job: Running job: job_1676616824652_0005
Note : After this it keeps on running. I tried clearing /tmp files after stopping all services and then restarting all services but nothing worked.
I also tried restarting virtual machine but it did not work. Please help in running sqoop import smoothly.
Related
Not able to establish connection to the kerberos and SASL enabled kafka producer
I am trying to connect to kafka producer that is kerberos and SSL enabled, Here is the properties.yml spring: autoconfigure: exclude[0]: org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfiguration exclude[1]: org.springframework.boot.actuate.autoconfigure.security.servlet.ManagementWebSecurityAutoConfiguration kafka: topics: - name: SOME_TOPIC num-partitions: 5 replication-factor: 1 bootstrap-servers: - xxx:9092 - yyy:9092 - zzz:9092 autoCreateTopics: false template: default-topic: SOME_TOPIC producer: key-serializer: org.apache.kafka.common.serialization.StringSerializer value-serializer: org.springframework.kafka.support.serializer.JsonSerializer properties: security: protocol: SASL_SSL ssl: enabled: protocols: TLSv1.2 truststore: location: C:\\resources\\truststorecred.jks password: truststorepass type: JKS sasl: mechanism: GSSAPI kerberos: service: name: kafka and VM options as follow. -Djava.security.auth.login.config=C:\jaas.conf -Djava.security.krb5.conf=C:\resources\krb5.ini Jaas.conf as follow KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab="C:\\resources\\serviceacc#xxx.keytab" principal="serviceacc#xxx.COM" useTicketCache=true serviceName="kafka"; }; able to logged in to kerberos but immediate it is failing with below exception. bootstrap.servers = [xxxx.com:9092, yyyy.com:9092, zzzz.com:9092] client.id = connections.max.idle.ms = 300000 metadata.max.age.ms = 300000 metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 receive.buffer.bytes = 65536 reconnect.backoff.max.ms = 1000 reconnect.backoff.ms = 50 request.timeout.ms = 120000 retries = 5 retry.backoff.ms = 100 sasl.jaas.config = null sasl.kerberos.kinit.cmd = /usr/bin/kinit sasl.kerberos.min.time.before.relogin = 60000 sasl.kerberos.service.name = kafka sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.ticket.renew.window.factor = 0.8 sasl.mechanism = GSSAPI security.protocol = SASL_SSL send.buffer.bytes = 131072 ssl.cipher.suites = null ssl.enabled.protocols = [TLSv1.2] ssl.endpoint.identification.algorithm = null ssl.key.password = null ssl.keymanager.algorithm = SunX509 ssl.keystore.location = null ssl.keystore.password = null ssl.keystore.type = JKS ssl.protocol = TLS ssl.provider = null ssl.secure.random.implementation = null ssl.trustmanager.algorithm = PKIX ssl.truststore.location = C:\\resources\\truststorecred.jks ssl.truststore.password = [hidden] ssl.truststore.type = JKS 2019-12-21 14:56:16.115 INFO 24216 --- [ main] o.a.k.c.s.authenticator.AbstractLogin : Successfully logged in. 2019-12-21 14:56:16.117 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT refresh thread started. 2019-12-21 14:56:16.118 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT valid starting at: Sat Dec 21 14:56:15 IST 2019 2019-12-21 14:56:16.119 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT expires: Sun Dec 22 00:56:15 IST 2019 2019-12-21 14:56:16.119 INFO 24216 --- [xxx.COM] o.a.k.c.security.kerberos.KerberosLogin : [Principal=serviceacc#xxx.COM]: TGT refresh sleeping until: Sat Dec 21 23:13:36 IST 2019 2019-12-21 14:56:16.912 INFO 24216 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 1.0.2 2019-12-21 14:56:16.912 INFO 24216 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : 2a121f7b1d402825 2019-12-21 14:56:22.085 WARN 24216 --- [| adminclient-1] o.a.k.common.network.SslTransportLayer : Failed to send SSL Close message java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.write0(Native Method) ~[na:1.8.0_191] at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51) ~[na:1.8.0_191] at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_191] at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_191] at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_191] at org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:213) ~[kafka-clients-1.0.2.jar:na] at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:176) ~[kafka-clients-1.0.2.jar:na] at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:703) [kafka-clients-1.0.2.jar:na] at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:61) [kafka-clients-1.0.2.jar:na] at org.apache.kafka.common.network.Selector.doClose(Selector.java:741) [kafka-clients-1.0.2.jar:na] at org.apache.kafka.common.network.Selector.close(Selector.java:729) [kafka-clients-1.0.2.jar:na] at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:522) [kafka-clients-1.0.2.jar:na] at org.apache.kafka.common.network.Selector.poll(Selector.java:412) [kafka-clients-1.0.2.jar:na] at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) [kafka-clients-1.0.2.jar:na] at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1006) [kafka-clients-1.0.2.jar:na] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191] 2019-12-21 14:56:22.087 WARN 24216 --- [| adminclient-1] org.apache.kafka.clients.NetworkClient : [AdminClient clientId=adminclient-1] Connection to node -2 terminated during authentication. This may indicate that authentication failed due to invalid credentials. 2019-12-21 14:56:26.598 WARN 24216 --- [| adminclient-1] o.a.k.common.network.SslTransportLayer : Failed to send SSL Close message Help would be greatly appreciated. Thank you
Just a small change worked for me security.protocol: SASL_PLAINTEXT
Configure GremlinServer to JanusGraph with HBase and Elasticsearch
Can't create instance of GremlinServer with HBase and Elasticsearch. When i run shell script: bin/gremlin-server.sh config/gremlin.yaml. I get exception: Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build() Gremlin-server logs SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/user/janusgraph/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/user/janusgraph/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 0 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - \,,,/ (o o) -----oOOo-(3)-oOOo----- 135 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Configuring Gremlin Server from config/gremlin.yaml 211 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics 557 [main] INFO org.janusgraph.diskstorage.hbase.HBaseCompatLoader - Instantiated HBase compatibility layer supporting runtime HBase version 1.2.6: org.janusgraph.diskstorage.hbase.HBaseCompat1_0 835 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - HBase configuration: setting zookeeper.znode.parent=/hbase-unsecure 836 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: main.local,data1.local,data2.local 836 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied Zookeeper Port from root.storage.port to hbase.zookeeper.property.clientPort: 2181 866 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1214 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x1e44b638 connecting to ZooKeeper ensemble=main.local:2181,data1.local:2181,data2.local:2181 1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:host.name=main.local 1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_212 1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation 1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/jre 1221 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/home/user/janusgraph/conf/gremlin-server:/home/user/janusgraph/lib/slf4j-log4j12- // Here hanusgraph download very many dependencies 1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp 1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA> 1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux 1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64 1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.version=3.10.0-862.el7.x86_64 1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.name=user 1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.home=/home/user 1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/home/user/janusgraph 1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=main.local:2181,data1.local:2181,data2.local:2181 sessionTimeout=90000 watcher=hconnection-0x1e44b6380x0, quorum=main.local:2181,data1.local:2181,data2.local:2181, baseZNode=/hbase-unsecure 1274 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server data2.local/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate using SASL (unknown error) 1394 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to data2.local/xxx.xxx.xxx.xxx, initiating session 1537 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server data2.local/xxx.xxx.xxx.xxx:2181, sessionid = 0x26b266353e50014, negotiated timeout = 60000 3996 [main] INFO org.janusgraph.core.util.ReflectiveConfigOptionLoader - Loaded and initialized config classes: 13 OK out of 13 attempts in PT0.631S 4103 [main] INFO org.reflections.Reflections - Reflections took 60 ms to scan 2 urls, producing 0 keys and 0 values 4400 [main] WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting cache.db-cache-time=180000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000). Use the ManagementSystem interface instead of the local configuration to control this setting. 4453 [main] WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting cache.db-cache-clean-wait=20 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (50). Use the ManagementSystem interface instead of the local configuration to control this setting. 4473 [main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing master protocol: MasterService 4474 [main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing zookeeper sessionid=0x26b266353e50014 4485 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Session: 0x26b266353e50014 closed 4485 [main-EventThread] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - EventThread shut down 4500 [main] INFO org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Generated unique-instance-id=c0a8873843641-main-local1 4530 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - HBase configuration: setting zookeeper.znode.parent=/hbase-unsecure 4530 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: main.local,data1.local,data2.local 4531 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied Zookeeper Port from root.storage.port to hbase.zookeeper.property.clientPort: 2181 4532 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x5bb3d42d connecting to ZooKeeper ensemble=main.local:2181,data1.local:2181,data2.local:2181 4532 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=main.local:2181,data1.local:2181,data2.local:2181 sessionTimeout=90000 watcher=hconnection-0x5bb3d42d0x0, quorum=main.local:2181,data1.local:2181,data2.local:2181, baseZNode=/hbase-unsecure 4534 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server main.local/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate using SASL (unknown error) 4534 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to main.local/xxx.xxx.xxx.xxx:2181, initiating session 4611 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server main.local/xxx.xxx.xxx.xxx:2181, sessionid = 0x36b266353fd0021, negotiated timeout = 60000 4616 [main] INFO org.janusgraph.diskstorage.Backend - Configuring index [search] 5781 [main] INFO org.janusgraph.diskstorage.Backend - Initiated backend operations thread pool of size 16 6322 [main] INFO org.janusgraph.diskstorage.Backend - Configuring total store cache size: 186687592 7555 [main] INFO org.janusgraph.graphdb.database.IndexSerializer - Hashing index keys 7925 [main] INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog - Loaded unidentified ReadMarker start time 2019-06-13T09:54:08.929Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller#656d10a4 7927 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] was successfully configured via [config/db.properties]. 7927 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-* Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build() at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.initializeGremlinScriptEngineManager(GremlinExecutor.java:522) at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.<init>(GremlinExecutor.java:126) at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.<init>(GremlinExecutor.java:83) at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor$Builder.create(GremlinExecutor.java:813) at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:169) at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89) at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110) at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:363) Caused by: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build() at java.lang.Class.getMethod(Class.java:1786) at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.initializeGremlinScriptEngineManager(GremlinExecutor.java:492) ... 7 more Graph configuration: storage.backend=hbase storage.hostname=main.local,data1.local,data2.local storage.port=2181 storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure cache.db-cache=true cache.db-cache-clean-wait=20 cache.db-cache-time=180000 cache.db-cache-size=0.5 index.search.backend=elasticsearch index.search.hostname=xxx.xxx.xxx.xxx index.search.port=9200 index.search.elasticsearch.client-only=false gremlin.graph=org.janusgraph.core.JanusGraphFactory host=0.0.0.0 Gremlin-server configuration host: localhost port: 8182 channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer graphs: { graph: config/db.properties } scriptEngines: { gremlin-groovy: { plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {}, org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {}, org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}, org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: { classImports: [java.lang.Math], methodImports: [java.lang.Math#*] }, org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: { files: [scripts/janusgraph.groovy] } } } } serializers: - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] } } - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true } } - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] } } metrics: { slf4jReporter: {enabled: true, interval: 180000} } What do I need to do to server start without error?
Flume ng / Avro source, memory channel and HDFS Sink - Too many small files
I'm facing a strange issue. I'm looking to aggregate a lot of information from flume to an HDFS. I applied recommanded configuration to avoid a too many small files, but it didn't works. Here is my configuration file. # single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 5458 a1.sources.r1.threads = 20 # Describe the HDFS sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs://myhost:myport/user/myuser/flume/events/%{senderType}/%{senderName}/%{senderEnv}/%y-%m-%d/%H%M a1.sinks.k1.hdfs.filePrefix = logs- a1.sinks.k1.hdfs.fileSuffix = .jsonlog a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.batchSize = 100 a1.sinks.k1.hdfs.useLocalTimeStamp = true #never roll-based on time a1.sinks.k1.hdfs.rollInterval=0 ##10MB=10485760, 128MB=134217728, 256MB=268435456 a1.sinks.kl.hdfs.rollSize=10485760 ##never roll base on number of events a1.sinks.kl.hdfs.rollCount=0 a1.sinks.kl.hdfs.round=false # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 5000 a1.channels.c1.transactionCapacity = 1000 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 This configuration works, and I see my files. But the weight average of the file is 1.5kb. Flume console output provide this kind of information. 16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp 16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp 16/08/03 09:48:31 INFO hdfs.BucketWriter: Renaming hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp to hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog 16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp 16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp 16/08/03 09:48:31 INFO hdfs.BucketWriter: Renaming hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp to hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog 16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484509.jsonlog.tmp 16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484509.jsonlog.tmp Someone has an idea about the issue ? Here is some info about flume behavior. The command is flume-ng agent -n a1 -c /path/to/flume/conf --conf-file sample-flume.conf -Dflume.root.logger=TRACE,console -Xms8192m -Xmx16384m Note : the logger directive doesn't works. I don't understand why but I'm ... Flume starting output is : 16/08/03 15:32:55 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 16/08/03 15:32:55 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:sample-flume.conf 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1] 16/08/03 15:32:55 INFO node.AbstractConfigurationProvider: Creating channels 16/08/03 15:32:55 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory 16/08/03 15:32:55 INFO node.AbstractConfigurationProvider: Created channel c1 16/08/03 15:32:55 INFO source.DefaultSourceFactory: Creating instance of source r1, type avro 16/08/03 15:32:55 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: hdfs 16/08/03 15:32:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false 16/08/03 15:32:56 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1] 16/08/03 15:32:56 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:Avro source r1: { bindAddress: 0.0.0.0, port: 5458 } }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#466ab18a counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } 16/08/03 15:32:56 INFO node.Application: Starting Channel c1 16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean. 16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started 16/08/03 15:32:56 INFO node.Application: Starting Sink k1 16/08/03 15:32:56 INFO node.Application: Starting Source r1 16/08/03 15:32:56 INFO source.AvroSource: Starting Avro source r1: { bindAddress: 0.0.0.0, port: 5458 }... 16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean. 16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started 16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean. 16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 16/08/03 15:32:56 INFO source.AvroSource: Avro source r1 started. Since I cannot have more verbose output, I have to suppose that information like [...] 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1 16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1 [...] indicates that the sinks is correctly configured. PS : I saw followings answers but none of those works (I should miss something ...). flume-hdfs-sink-generates-lots-of-tiny-files-on-hdfs too-many-small-files-hdfs-sink-flume flume-tiering-data-flows-using-the-avro-source-and-sink flume-hdfs-sink-keeps-rolling-small-files
Increase batch size as per your requirement a1.sinks.k1.hdfs.batchSize =
Flume :Exec source cat command is not writing on HDFS
i'm trying to write data into Hdfs using Flume-ng for exec source.But it always ended with exit code 127.and it's also showing warning like Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null). This is exec.conf file execAgent.sources=e execAgent.channels=memchannel execAgent.sinks=HDFS execAgent.sources.e.type=org.apache.flume.source.ExecSource execAgent.sources.e.channels=memchannel execAgent.sources.e.shell=/bin/bash execAgent.sources.e.command=tail -f /home/sample.txt execAgent.sinks.HDFS.type=hdfs execAgent.sinks.HDFS.channel=memchannel execAgent.sinks.HDFS.hdfs.path=hdfs://ip:address:port/user/flume/ execAgent.sinks.HDFS.hdfs.fileType=DataStream execAgent.sinks.HDFS.hdfs.writeFormat=Text execAgent.channels.memchannel.type=file execAgent.channels.memchannel.capacity=1000 execAgent.channels.memchannel.transactionCapacity=100 execAgent.sources.e.channels=memchannel execAgent.sinks.HDFS.channel=memchannel this is the output i'm getting on console 15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:exec.conf 15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS 15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS 15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS 15/04/17 06:24:54 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: execAgent 15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS 15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS 15/04/17 06:24:55 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [execAgent] 15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Creating channels 15/04/17 06:24:55 INFO channel.DefaultChannelFactory: Creating instance of channel memchannel type file 15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Created channel memchannel 15/04/17 06:24:55 INFO source.DefaultSourceFactory: Creating instance of source e, type org.apache.flume.source.ExecSource 15/04/17 06:24:55 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs 15/04/17 06:24:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false 15/04/17 06:24:56 INFO node.AbstractConfigurationProvider: Channel memchannel connected to [e, HDFS] 15/04/17 06:24:56 INFO node.Application: Starting new configuration:{ sourceRunners:{e=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:e,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#2577d2c2 counterGroup:{ name:null counters:{} } }} channels:{memchannel=FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }} } 15/04/17 06:24:56 INFO node.Application: Starting Channel memchannel 15/04/17 06:24:56 INFO file.FileChannel: Starting FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }... 15/04/17 06:24:56 INFO file.Log: Encryption is not enabled 15/04/17 06:24:56 INFO file.Log: Replay started 15/04/17 06:24:56 INFO file.Log: Found NextFileID 0, from [] 15/04/17 06:24:56 INFO file.EventQueueBackingStoreFile: Preallocated /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 to 16232 for capacity 1000 15/04/17 06:24:56 INFO file.EventQueueBackingStoreFileV3: Starting up with /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 and /root/.flume/file-channel/checkpoint/checkpoint_1429251896225.meta 15/04/17 06:24:57 INFO file.Log: Last Checkpoint Fri Apr 17 06:24:56 UTC 2015, queue depth = 0 15/04/17 06:24:57 INFO file.Log: Replaying logs with v2 replay logic 15/04/17 06:24:57 INFO file.ReplayHandler: Starting replay of [] 15/04/17 06:24:57 INFO file.ReplayHandler: read: 0, put: 0, take: 0, rollback: 0, commit: 0, skip: 0, eventCount:0 15/04/17 06:24:57 INFO file.Log: Rolling /root/.flume/file-channel/data 15/04/17 06:24:57 INFO file.Log: Roll start /root/.flume/file-channel/data 15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null) 15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Direct Memory Allocation: Allocation = 1048576, Allocated = 0, MaxDirectMemorySize = 18874368, Remaining = 18874368 15/04/17 06:24:57 INFO file.LogFile: Opened /root/.flume/file-channel/data/log-1 15/04/17 06:24:57 INFO file.Log: Roll end 15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Start checkpoint for /root/.flume/file-channel/checkpoint/checkpoint_1429251896225, elements to sync = 0 15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1429251897136, queueSize: 0, queueHead: 0 15/04/17 06:24:57 INFO file.Log: Updated checkpoint for file: /root/.flume/file-channel/data/log-1 position: 0 logWriteOrderID: 1429251897136 15/04/17 06:24:57 INFO file.FileChannel: Queue Size after replay: 0 [channel=memchannel] 15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: memchannel, registered successfully. 15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memchannel started 15/04/17 06:24:57 INFO node.Application: Starting Sink HDFS 15/04/17 06:24:57 INFO node.Application: Starting Source e 15/04/17 06:24:57 INFO source.ExecSource: Exec source starting with command:tail -f /home/sample.txt 15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully. 15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started 15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SOURCE, name: e, registered successfully. 15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: e started 15/04/17 06:24:57 INFO source.ExecSource: Command [tail -f /home/brillio/sample.txt] exited with 127
From the source documentation 1) Modify the parameter : execAgent.sources.e.type to exec 2) Remove the execAgent.sources.e.shell parameter from your configuration
Check permission to see if user can run tail -f /home/brillio/sample.txt on your target dir
org.apache.solr.common.SolrException: Not Found
I want to do a web crawler using Nutch 1.9 and Solr 4.10.2 The crawling is working but when it comes to indexing there is a problem. I looked for the problem and I tried so many methods but nothing seem to work. This is what I get: Indexer: starting at 2015-03-13 20:51:08 Indexer: deleting gone documents: false Indexer: URL filtering: false Indexer: URL normalizing: false Active IndexWriters : SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username for authentication solr.auth.password : password for authentication Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) And when I see the log file this is what I get: 2015-03-13 20:51:08,768 INFO indexer.IndexingJob - Indexer: starting at 2015-03-13 20:51:08 2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: deleting gone documents: false 2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: URL filtering: false 2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: URL normalizing: false 2015-03-13 20:51:09,117 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-03-13 20:51:09,117 INFO indexer.IndexingJob - Active IndexWriters : SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username for authentication solr.auth.password : password for authentication 2015-03-13 20:51:09,121 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: testCrawl/crawldb 2015-03-13 20:51:09,122 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: testCrawl/linkdb 2015-03-13 20:51:09,122 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311221258 2015-03-13 20:51:09,234 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311222328 2015-03-13 20:51:09,235 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311222727 2015-03-13 20:51:09,236 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150312085908 2015-03-13 20:51:09,282 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-03-13 20:51:09,747 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2015-03-13 20:51:20,904 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: content dest: content 2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: title dest: title 2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: host dest: host 2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: segment dest: segment 2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: boost dest: boost 2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: digest dest: digest 2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2015-03-13 20:51:21,192 INFO solr.SolrIndexWriter - Indexing 250 documents 2015-03-13 20:51:21,192 INFO solr.SolrIndexWriter - Deleting 0 documents 2015-03-13 20:51:21,342 INFO solr.SolrIndexWriter - Indexing 250 documents 2015-03-13 20:51:21,437 WARN mapred.LocalJobRunner - job_local1194740690_0001 org.apache.solr.common.SolrException: Not Found Not Found request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:135) at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88) at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50) at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:458) at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) 2015-03-13 20:51:21,607 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) So please any help?