spark1.3 can not read data from HDFS1 - hadoop

I'm using Spark1.3.1 trying to read data from HDFS as:
val sc = new SparkContext(sparkConf)
val lines = sc.textFile("hdfs://192.168.0.104:9000/cur/part-r-02554")
I met the following excepton:
Exception in thread "main" java.io.IOException: Failed on local exception:
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group
tag did not match expected tag.; Host Details : local host is:
"hadoop104/192.1168.1.104"; destination host is: "hadoop104":9000;

Try looking for property fs.defaultFS or fs.default.name in your core-site.xml. Check if 192.168.0.104 is configured as value, and not a hostname.
If a hostname is configured as a value, this is bound to give you an error - as this is very strictly followed. Either, use whatever is configured in core-site.xml or do not use an IP/hostname and just go ahead with hdfs:/cur/part-r-02554

Related

JMeter - Listen to a port on a given IP

I want to run a jmeter test, which listens to a port on a given ip, and prints the messages which are being sent to that port. I have tried using this:
SocketAddress inetSocketAddress = new InetSocketAddress(InetAddress.getByName("<client ipAddress>"),<port number>);
def server = new ServerSocket();
server.bind(inetSocketAddress);
while(true) {
server.accept { socket ->
log.info('Someone is connected')
socket.withStreams { input, output ->
def line = input.newReader().readLine()
log.info('Received message: ' + line)
}
log.info("Connection processed")
}
}
But this is giving me error - "Cannot assign requested address: JVM_Bind
"
Is there any alternate way to approach this? Or what changes do i need to do for the current approach to work?
You copied and pasted this code from the right place and it should work just fine. Evidence:
as per the BindException documentation
Signals that an error occurred while attempting to bind a socket to a local address and port. Typically, the port is in use, or the requested local address could not be assigned.
So I can think of 2 options:
Your <client ipAddress> is not correct and cannot be resolved.
Something is already running on the <port number>, you cannot have 2 applications listening to the same port, the first one will be successful and another one will fail
More information:
Fixing java.net.BindException: Cannot assign requested address: JVM_Bind in Tomcat, Jetty
Apache Groovy - Why and How You Should Use It

What's the meaning of: "java.sql.SQLRecoverableException: I/O-Error: Unknown host specified" if TNS-alias specified

Using Oracle JDBC driver with TNS-alias instead of host:port:SID a'la
jdbc:oracle:thin:#TNS_ALIAS
you may get this error message
java.sql.SQLRecoverableException: I/O-Error: Unknown host specified
while calling
java.sql.DriverManager.getConnection
But there is no problem with the hostname specified by TNS-alias.
Sadly this error message does not point to the real reason:
The error occurs if driver cannot find the tnsnames.ora config file.
Solution
You have to ensure that system-property "oracle.net.tns_admin" is set before connecting and points to the directory containing the tnsnames.ora.

/hbase/meta-region-server because node does not exist (not an error)

I'm running hbase in a cluster mode and I'm getting the following error:
DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil - catalogtracker-on-hconnection-0x6e704bd0x0, quorum=node2:2181, baseZNode=/hbase Set watcher on znode that does not yet exist, /hbase/meta-region-server
I had the similar the error and got it resolved by doing these:
1) Making sure HBase Client Version is compatible with the HBase version on the cluster.
2) Adding hbase-site.xml to your application classpath, so that HBase client determines all the appropriate HBase configurations from it.
val conf = org.apache.hadoop.hbase.HBaseConfiguration.create()
// Instead of the following settings, pass hbase-site.xml in classpath
// conf.set("hbase.zookeeper.quorum", hbaseHost)
// conf.set("hbase.zookeeper.property.clientPort", hbasePort)
HBaseAdmin.checkHBaseAvailable(conf);
log.debug("HBase found! with conf " + conf);

Error in Accumulo's tablet server when scanning for data

I have a bunch of tables in Accumulo with one master and 2 tablet servers containing a bunch of tables storing millions of records. The problem is that whenever I scan the tables to get a few records out, the tablet server logs keep throwing this error
2015-11-12 04:38:56,107 [hdfs.DFSClient] WARN : Failed to connect to /192.168.250.12:50010 for block, add to deadNodes and continue. java.io.IOException: Got error, status message opReadBlock BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 received exception java.io.IOException: Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
java.io.IOException: Got error, status message opReadBlock BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 received exception java.io.IOException: Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.accumulo.core.file.rfile.bcfile.Utils$Version.<init>(Utils.java:264)
at org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.<init>(BCFile.java:823)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:246)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:257)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:137)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:209)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:368)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:137)
at org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:843)
at org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
at org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:69)
at org.apache.accumulo.tserver.tablet.Compactor.openMapDataFiles(Compactor.java:279)
at org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:322)
at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214)
at org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1976)
at org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2093)
at org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44)
at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:745)
I think it is more of a HDFS related issue as opposed to an Accumulo one, so I checked the logs of the datanode and found the same message,
Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
But as INFO in the logs. What I don't understand is that why am I getting this error.
I can see that the pool name of the file (BP-1881591466-192.168.1.111-1438767154643) that I am trying to access contains a IP address (192.168.1.111) which does not match the IP address of any of the servers (self and remote). Actually, 192.168.1.111 was the old IP address of the Hadoop Master server, but I had changed it. I use domain names instead of IP addresses so the only place where I made the changes were in the host files of the machines in the cluster. None of the Hadoop/Accumulo configurations use IP addresses. Does anyone know what the issue is here? I have spent days on it and still am not able to figure it out.
The error you are receiving indicates that Accumulo cannot read part of one of its files from HDFS. The NameNode is reporting that a block is located on a particular DataNode (in your case, 192.168.250.12). However, when Accumulo attempts to read from that DataNode, it fails.
This likely indicates a corrupt block in HDFS, or a temporary network issue. You can try to run hadoop fsck / (the exact command may vary, depending on version) to perform a health check of HDFS.
Also, the IP address mismatch in the DataNode appears to indicate that the DataNode is confused about the HDFS pool it is a part of. You should restart that DataNode after double-checking its configuration, DNS, and /etc/hosts for any anomolies.

Setting elasticsearch properties in spark-submit

I'm trying to launch Spark jobs that use Elastic Search input via command line using spark-submit as described in http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html
I'm setting the properties in a file, but when launching spark-submit it gives the following warnings:
~/spark-1.0.1-bin-hadoop1/bin/spark-submit --class Main --properties-file spark.conf SparkES.jar
Warning: Ignoring non-spark config property: es.resource=myresource
Warning: Ignoring non-spark config property: es.nodes=mynode
Warning: Ignoring non-spark config property: es.query=myquery
...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed
My config file looks like (with correct values):
es.nodes nodeip:port
es.resource index/type
es.query query
Setting the properties in the Configuration object in the code works, but I need to avoid this workaround.
Is there a way to set those properties via command line?
I don't know if you resolved your issue (if so, how?), but I found this solution:
import org.elasticsearch.spark.rdd.EsSpark
EsSpark.saveToEs(rdd, "spark/docs", Map("es.nodes" -> "10.0.5.151"))
Bye
When you pass a config file to spark-submit, it only loads configs that start with 'spark.'
So, in my config I simply use
spark.es.nodes <es-ip>
and in the code itself I have to do
val conf = new SparkConf()
conf.set("es.nodes", conf.get("spark.es.nodes"))

Resources