Hadoop LZO & SnappyCodec error in Hadoop and Hive

Hadoop LZO & SnappyCodec error in Hadoop and Hive - hadoop

I am using Ubuntu-12.04,Hadoop-1.0.2,Hive-0.10.0
while reading data about 1 million records from hive I got error below for query
select * from raw_pos limit 10000;
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
so I installed Snappy for Hadoop in $HADOOP_HOME/lib folder,which produces files libsnappy.a, libsnappy.la,libsnappy.so,libsnappy.so.1,libsnappy.so.1.1.4
also add hadoop-lzo-0.4.3.jar in $HADOOP_HOME/lib/ & make changes in cor-site.xml,mapred-site.xml as follow
Core-site.xml:-
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/apache/hadoop-1.0.4/hadoop_temp/</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.SnappyCodec
</value>
mapred-site.xml :-
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:54311</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
but when I started hive & do show databases, gives error:-
Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!

Modify your core-site.xml to this and see if it helps :
<property>
<name>io.compression.codecs</name>
<value>com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

Related

Unable to start name node while configuring Hadoop for Lustre

I'm trying to integrate hadoop with intel lustre. I have added hadoop-lustre-plugin-3.1.0 to hadoop-2.7.3/lib/native folder. Lustre is mounted at /mnt/lustre. I'm getting following error when I start hadoop using start-all.sh
[root#master hadoop]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
17/04/06 17:36:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on [ ]
...
core-site.xml :
<property>
<name>fs.defaultFS</name>
<value>lustre:///</value>
</property>
<property>
<name>fs.lustre.impl</name>
<value>org.apache.hadoop.fs.LustreFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.lustre.impl</name>
<value>org.apache.hadoop.fs.LustreFileSystemlustre</value>
</property
<property>
<name>fs.lustrefs.mount</name>
<value>/mnt/lustre/hadoop</value>
<description>This is the directory on Lustre that acts as the root level for Hadoop services</description>
</property>
<property>
<name>lustre.stripe.count</name>
<value>1</value>
</property>
<property>
<name>lustre.stripe.size</name>
<value>4194304</value>
</property>
<property>
<name>fs.block.size</name>
<value>1073741824</value>
</property>
maprd-site.xml
<property>
<name>mapreduce.job.map.output.collector.class</name>
<value>org.apache.hadoop.mapred.SharedFsPlugins$MapOutputBuffer</value>
</property>
<property>
<name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name>
<value>org.apache.hadoop.mapred.SharedFsPlugins$Shuffle</value>
</property>
hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/mnt/lustre/hadoop/hadoop_tmp/namenode</value>
<description>true</description>
</property>
Is there any configuration that I have missed in configuration files?

As fs.defaultFS holds the lustre specific URI, the startup script is unable to determine the host in which Namenode has to be started.
Add this property in hdfs-site.xml,
<property>
<name>dfs.namenode.rpc-address</name>
<value>namenode_host:port</value>
</property>

ConnectException: connect error: No such file or directory when trying to connect to '50010' using importtsv on hbase

I configured short-circuit settings on both hdfs-site.xml and hbase-site.xml. And I run importtsv on hbase to import data from HDFS to HBase on Hbase cluster. I look over the log on each datanode and all datanode have ConnectException i said to the title.
2017-03-31 21:59:01,273 WARN [main] org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory: error creating DomainSocket
java.net.ConnectException: connect(2) error: No such file or directory when trying to connect to '50010'
at org.apache.hadoop.net.unix.DomainSocket.connect0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:250)
at org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.createSocket(DomainSocketFactory.java:164)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextDomainPeer(BlockReaderFactory.java:753)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:469)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:783)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:717)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:421)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:332)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:617)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:841)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:889)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:696)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
at org.apache.hadoop.io.Text.readString(Text.java:471)
at org.apache.hadoop.io.Text.readString(Text.java:464)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:751)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-03-31 21:59:01,277 WARN [main] org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: ShortCircuitCache(0x34f7234e): failed to load 1073750370_BP-642933002-"IP_ADDRESS"-1490774107737
EDIT
hadoop 2.6.4
hbase 1.2.3
hdfs-site.xml
<property>
<name>dfs.namenode.dir</name>
<value>/home/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/home/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hdfs/dn</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:50090</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>hadoop1:8020</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>50</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>50</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.block.local-path-access.user</name>
<value>hbase</value>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>775</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>_PORT</value>
</property>
<property>
<name>dfs.client.domain.socket.traffic</name>
<value>true</value>
</property>
hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop1/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>50</value>
</property>
<property>
<name>hfile.block.cache.size</name>
<value>0.5</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.size</name>
<value>0.3</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.size.lower.limit</name>
<value>0.65</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>_PORT</value>
</property>

Short-circuit reads make use of a UNIX domain socket. This is a special path in the filesystem that allows the Client and the DataNodes to communicate. You will need to set a path (not port) to this socket. The DataNode should be able to create this path.
The parent directory of the path value (for ex: /var/lib/hadoop-hdfs/) must exist and should be owned by the hadoop superuser. Also make sure any user except the HDFS user or root has no access to this path.
mkdir /var/lib/hadoop-hdfs/
chown hdfs_user:hdfs_user /var/lib/hadoop-hdfs/
chmod 750 /var/lib/hadoop-hdfs/
Add this property to hdfs-site.xml on all datanodes and clients.
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
Restart the services after making the changes.
Note: Paths under /var/run or /var/lib are commonly used.

GridGain No FileSystem for scheme: ggfs

everyone
I want to use GridGain in Hadoop 2.4.0
my hadoop config under that
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop-data</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>ggfs://ggfs#R</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/hadoop-data/journal</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>r,host002,host004</value>
</property>
<property>
<name>fs.AbstractFileSystem.ggfs.impl</name>
<value>org.gridgain.grid.ggfs.hadoop.v2.GridGgfsHadoopFileSystem</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>NEVER</value>
</property>
</configuration>
finish setting and start hdfs
I use
hadoop fs -ls /
ls: No FileSystem for scheme: ggfs
How should I do
Thanks

Add the followings to the core-site.xml:
<property>
<name>fs.ggfs.impl</name>
<value>org.gridgain.grid.ggfs.hadoop.v1.GridGgfsHadoopFileSystem</value>
</property>
The second version of Hadoop File System API is used rarely. The most of parts of Hadoop ecosystem works through first version of API.
And if you want to use GGFS only you don't need to start HDFS services.

Hadoop not communicating with resourcemanager

Hi currently I'm running hadoop 2.4.1. I have created a simple java program DefaultMapperClass.java using eclipse and packaged it into ex1.jar
When I try to invoke this program via hadoop shell using the command,
**hadoop jar /home/Maddy/ex1.jar DefaultMapperClass hdfs://localhost/users/root/input/Hadoop.txt hdfs://localhost/users/root/output**
I get the below output in hadoop shell
**[root#localhost Maddy]# hadoop jar /home/Maddy/ex1.jar DefaultMapperClass hdfs://localhost/users/root/input/Hadoop.txt hdfs://localhost/users/root/output
14/09/05 19:26:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Job started: Fri Sep 05 19:26:35 CDT 2014
14/09/05 19:26:35 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
[root#localhost Maddy]#**
Seems like hadoop shell is trying to connect to resource manager but unsuccessful but there is no error message
mapred-site.xml file:
**<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>**
yarn-site.xml:
**<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
</configuration>**
What is missing here? Why execution is terminated after attempting to connect to resource manager?

I would suggest removing the following configurations from the yarnsite.xml as they are unnecessary :
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
You can access the resource manager at localhost:8088

Hbase: I cluster hbase in Distributed mode and starting fine but when i run hbase shell I can't create table

I'm so newby in hbase cluster , I cluster hbase in Distributed mode and starting fine but when i run hbase shell I can't create table this error is shown:
my base-site.xml configuration is
<property>
<name>hbase.master</name>
<value>matser:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-namnode:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
</property>
<property>
<name>hbase.zookeeper.property.clientport</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>usr/local/hbase/temp</value>
</property>
could you please help me ?Thanks in advance

The version of Hbase should compatible to Hadoop version.Downgrade the Hbase it'll work fine.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hadoop LZO & SnappyCodec error in Hadoop and Hive - hadoop

Related

Unable to start name node while configuring Hadoop for Lustre

ConnectException: connect error: No such file or directory when trying to connect to '50010' using importtsv on hbase

GridGain No FileSystem for scheme: ggfs

Hadoop not communicating with resourcemanager

Hbase: I cluster hbase in Distributed mode and starting fine but when i run hbase shell I can't create table

Categories

Resources