Use Flume to stream a webpage data to HDFS

Use Flume to stream a webpage data to HDFS - hadoop

I have 3 node cluster, using latest cloudera parcels for 5.9 version. OS is CentOS 6.7 on all three of them. I am using Flume for the first time.
My purpose is to stream webpage data into HDFS. However this webpage is a third party website, news site in my case so I don't know which Port to use to connect.
Curl and telnet was happening on port 80, hence i used it. But getting error.
My Flume.conf is:
tier1.sources = http-source
tier1.channels = mem-channel-1
tier1.sinks = hdfs-sink
tier1.sources.http-source.type = http
tier1.sources.http-source.handler = org.apache.flume.source.http.JSONHandler
tier1.sources.http-source.bind = 132.247.1.32
tier1.sources.http-source.port = 80
tier1.sources.http-source.channels = mem-channel-1
tier1.channels.mem-channel-1.type = memory
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.channel = mem-channel-1
tier1.sinks.hdfs-sink.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.mem-channel-1.capacity = 100
Error
2016-12-19 16:45:00,353 WARN org.mortbay.log: failed SelectChannelConnector#132.247.1.32:80: java.net.BindException: Cannot assign requested address
2016-12-19 16:45:00,353 WARN org.mortbay.log: failed Server#36772002: java.net.BindException: Cannot assign requested address
2016-12-19 16:45:00,353 ERROR org.apache.flume.source.http.HTTPSource: Error while starting HTTPSource. Exception follows.
java.net.BindException: Cannot assign requested address**
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.mortbay.jetty.nio.SelectChannelConnector.doStart2016-12-19 16:45:00,364 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{
name:http-source,state:IDLE} } - Exception follows.
java.lang.RuntimeException: java.net.BindException: Cannot assign requested address(SelectChannelConnector.java:315)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.Server.doStart(Server.java:235)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:207)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Try changing the source config as below :
httpagent.sources.http-source.port = 80
httpagent.sources.http-source.bind = localhost
httpagent.sources.http-source.url = 132.247.1.32
Note : If 132.247.1.32 doesn't work, try giving the hostname.

The HTTP source for Flume provides a way to use GET/POST to send data into a flume agent. The HTTP source does not go out and fetch data from websites for you, it sets up a HTTP server and waits for GET/POST and accepts the data from that endpoint.
My recommendation would be to create a custom source that fetches the webpage you require.

Related

failed to launch apache.spark.master

Whenever i run start-master.sh command on my local machine i am getting following error please someone help me to fix this issue
Terminal Error
Error which i get in terminal
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.0.1-bin-hadoop2.6/logs/spark-andani-org.apache.spark.deploy.master.Master-1-andani.sakha.com.out
failed to launch org.apache.spark.deploy.master.Master:
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
Log Error
If i check the spark log file following is the error
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkMaster' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)

The error is due to your sparkMaster was not able to contact to your internal IP
Once check your /etc/hosts file weather it pointing to proper host name or your previous IP address might have changed.
Reconfigure it and run the command once again.

com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect

I have 2 node in Cassandra cluster with IP:Port aa.aaa.a.aaa:9043(node) and xx.xxx.x.xxx:9043. When i trying to connect using following config **PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 4)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 200)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 200);
cluster = Cluster.builder()
.addContactPointsWithPorts(socketAddressList)
.withPoolingOptions(poolingOptions)
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy())).build();
Session session = cluster.connect(cassandraDB);**
I am getting following exception 16/01/14 09:52:45 INFO core.NettyUtil: Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16/01/14 09:52:46 WARN core.Cluster: ***You listed /xx.xxx.x.xxx:9043 in your contact points, but it could not be reached at startup*
16/01/14 09:52:47 INFO policies.DCAwareRoundRobinPolicy: Using data-center name 'name' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)ent.Futures$CombinedFuture setExceptionAndMaybeLog
SEVERE: input future failed.
com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:156)
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:139)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /xx.xxx.x.xxx:9042
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
... 6 more
16/01/14 09:52:47 ERROR core.Session: Error creating pool to /xx.xxx.x.xxx:9042
com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:156)
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:139)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /xx.xxx.x.xxx:9042
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)**
My Question are:
why it is trying to connect on port 9042, while no where i am using this port in code as well as in config file ?
cassandra version: Cassandra 2.2.1

why it is trying to connect on port 9042, while no where i am using this port in code as well as in config file ?
9042 is the default port for the CQL binary protocol.
Can you show us the content of the variable socketAddressList that you passed to the cluster builder ?
Is there any reason you're using port 9043 instead of the default 9042 port ?

Error writing event data into HDFS through flume

I am using cdh3 update 4 tarball for development purpose. I have hadoop up and running. Now, I also downloaded equivalent flume tarball from cloudera viz 1.1.0 and tried writing a tail of log file into hdfs using hdfs-sink. When I run the flume agent, it starts okay but ends up in error when it attempts writing the new event data into hdfs. I couldn't find better group to post this question than stackoverflow.
here is flume configuration I am using
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
agent.sources.exec-source.type=exec
agent.sources.exec-source.command=tail -F /locationoffile
agent.sinks.hdfs-sink.type=hdfs
agent.sinks.hdfs-sink.hdfs.path=hdfs://localhost:8020/flume
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess
agent.channels.ch1.type=memory
agent.channels.ch1.capacity=1000
agent.sources.exec-source.channels=ch1
agent.sinks.hdfs-sink.channel=ch1
Also, this is a small snippet of error that gets displayed in console when it receives new event data and tries writing it into hdfs.
13/03/16 17:59:21 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/user/hdfs-user/flume/apacheaccess.1363436060424.tmp
13/03/16 17:59:22 WARN hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "sumit-HP-Pavilion-dv3-Notebook-PC/127.0.0.1"; destination host is: "localhost":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1215)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1173)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:272)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:261)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:78)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:805)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1060)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:369)
at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:65)
at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:49)
at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:190)
at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:157)
at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:154)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:154)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:316)
at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:718)
at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:715)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:71)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:861)
at org.apache.hadoop.ipc.Client.call(Client.java:1141)
... 37 more
13/03/16 17:59:27 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/user/hdfs-user/flume/apacheaccess.1363436060425.tmp
13/03/16 17:59:27 WARN hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "sumit-HP-Pavilion-dv3-Notebook-PC/127.0.0.1"; destination host is: "localhost":8020;

As people in cloudera mail list suggest, there are probable reasons of this error:
The HDFS safemode is turned on. Try to run hadoop fs -safemode leave and see if the error goes away.
Flume and Hadoop versions are mismatched. To check this replace the hadoop-core.jar in flume/lib directory with the one found in hadoop's installation folder.

Server returns 403 during secondary namenode docheckpoint with namenode

I am configuring hadoop on clusters.
All node started successfully, but secondary node failed doCheckpoint with following log:
2011-10-25 11:09:07,207 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint:
2011-10-25 11:09:07,208 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Server returned HTTP response code: 403 for URL: https://name.node.http:50470/getimage?getimage=1
at sun.reflect.GeneratedConstructorAccessor24.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:234)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:183)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:364)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:353)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:353)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:438)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:329)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:288)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:337)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1110)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:285)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: https://name.node.http:50470/getimage?getimage=1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2308)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getHeaderField(HttpsURLConnectionImpl.java:271)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:175)
... 14 more
Seems namenode rejects request of secondarynode with http error code 403.
Kerberos is configured with hadoop, and auth is passed by namenode to accept the request of secondary namenode:
2011-10-25 11:27:40,033 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successfull for hadoop/secondarynamenode#MY.DOMAIN.COM
2011-10-25 11:27:40,100 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successfull for hadoop/secondarynamenode#MY.DOMAIN.COM for protocol=interface org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol
2011-10-25 11:27:40,101 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 123.58.169.92
Does anyone know how could that happen? How can I fix it?
Thanks very much.

I think it's more appropriate to move my comment above to here as an answer.
This error is because of the _HOST macro setting of secondary namemode principal in hdfs-site.xml, if there is no dfs.secondary.http.address set in hdfs-site.xml, the _HOST will be translated by the one who use it.
I this case, code runs in namenode, so, _HOST parsed to namenode address, since kerberos principal composed of name, hostname, realm, that's a different principal, that's why authentication failed.

Cassandra Upgrade 0.8.2->0.8.4 get error "failed connecting to all endpoints"

After upgrade of cassandra from 0.8.2 to 0.8.4, got this error
I have restarted cassandra, removed data, etc. nothing helps
I have 6 identical machines in the cloud, before it was working fine.
If I make netstat then it shows port 9160 listening
nodetool ... ring - responces with 6 machines UP.
what could be the problem? : (
Exception in thread "main" java.io.IOException: Could not get input splits
at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at WordCount.run(Unknown Source)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: failed connecting to all endpoints slave1/98.188.69.242
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153)
... 7 more
Caused by: java.io.IOException: failed connecting to all endpoints slave1/98.188.69.242
at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:234)
at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70)
at org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190)
at org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I don't know whether you have solved the problem. I met with the same problem as yours (Same configuration as yours too) and trrie to solve it.
Problem location:
public List call() throws Exception { ... List
tokens = getSubSplits(keyspace, cfName, range, conf); ...
In the method getSubSplits when calling method
createConnection(host, ConfigHelper.getRpcPort(conf), true)
, the format of host is not right. It is hostname/10.197.34.111 sometimes (ip_address), so createConnection will fail. We need to extract the ip address and then call createConnection.
You can try to change the code and try Hadoop again.
Good luck!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Use Flume to stream a webpage data to HDFS - hadoop

Try changing the source config as below : httpagent.sources.http-source.port = 80 httpagent.sources.http-source.bind = localhost httpagent.sources.http-source.url = 132.247.1.32 Note : If 132.247.1.32 doesn't work, try giving the hostname.

Related

failed to launch apache.spark.master

com.datastax.driver.core.TransportException: [/xx.xxx.x.xxx:9042] Cannot connect

Error writing event data into HDFS through flume

Server returns 403 during secondary namenode docheckpoint with namenode

Cassandra Upgrade 0.8.2->0.8.4 get error "failed connecting to all endpoints"

Categories

Resources