Can't copy file into HDFS - hadoop

I have problem with HDSF.
I can't copy any files into it, but I have ample space in DataNodes?
Maybe I have some bad configuration?

You should provide specific details like the exception you get, steps you follow etc, Since you have not specified any information at all, i would say check for the config files to make sure you have all the required entries in corresponding files :
In core-site.xml you should have
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipaddress:port</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/$user/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
Similarly hdfs-site.xml should have
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/$user/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/$user/hdfs/name</value>
</property>
</configuration>
And finally the mapred-site.xml should have
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ip:port</value>
</property>
</configuration>
Hope this helps.

Related

Hadoop can't execute a basic Example

The software I'm using:
System:macOS Mojave 10.14.2
Hadoop:3.1.1
JDK:10.0.2
I execute this command:hadoop jar /usr/local/Cellar/hadoop/3.1.1/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar pi 2 5, it failed:
I need help, thank you!!!
In hadoop-env.sh, I just add the sentence:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-10.0.2.jdk/Contents/Home
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
I solved it.
It's the reason of java version.
When I added the two lines of code to yarn-env.sh, it didnt't work for me.
export YARN_RESOURCEMANAGER_OPTS="--add-modules=ALL-SYSTEM"
export YARN_NODEMANAGER_OPTS="--add-modules=ALL-SYSTEM"
In the end, I change the java version, set it to java8 and deleted above two lines of code, it worked for me.
You can set it in hadoop-env.sh.
Thx

Apache Kylin not able to load models/configuration

I'm new to hadoop,hive, hbase and kylin. I tried to install thoose first three, and it's seems to be working.
After that I tried to install apache kylin, run the sample.sh and success.
After running the script I restart and open the web interface. Some page cannot be opened ex: /cube, /models, /admin/config
The problem is: I can see there are 5 tables created in hive, and also 2 cubes created. But when I open in web gui, the models is in loading-state and I cannot build the cube.
When I try to build the cube
I cannot find any infomative log (Or maybe there is one, but I don't know about it)
kylin.log
https://pastebin.com/TUZkQepa
hadoop-hadoop-namenode-master.log
https://pastebin.com/T8eNt3PY
hadoop-hadoop-secondarynamenode-master.log
https://pastebin.com/iMJDNFfU
yarn-hadoop-resourcemanager-master.log
https://pastebin.com/TGwJWTRF
hbase-hadoop-zookeeper-master.log
https://pastebin.com/Ym6eky5h
hbase-hadoop-master-master.log
https://pastebin.com/p1ygfw4W
Here is the configuration for hadoop
(yarn-site.xml)
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Configuration for hbase
regionservers
slave2
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/datadir</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave2</value>
</property>
</configuration>
Configuration for hive
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>gwudainget</value>
<description>password for connecting to mysql server</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
</configuration>
For kylin, I use default configuration, because I don't really know what to do with the kylin configuration.
What i use:
hadoop 2.7.5 binary
hbase 1.2.6 binary
hive 1.2.2 binary
kylin 2.2.0 source (I just add logs)

Unable to download and upload files from hdfs in namenode ui

I am getting the following error while uploading a file in user interface of http://awsip:50070/explorer.html#
Error:
Couldn't upload the file abc.zip.
core-site.xml :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:8020</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
</description>
</property>
</configuration>
hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/app/hadoop/tmp/namenode</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/app/hadoop/tmp/datanode</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>0.0.0.0:8020</value>
</property>
<property>
<name>dfs.namenode.http-bind-host</name>
<value>0.0.0.0</value>
</property>
</configuration>

Hadoop : DataNode change directory not taking effect

We are using hadoop 2.7.3 changed the hdfs-site.xml to point to new directory provided permissions on new directory too ...and ran start-dfs.sh and stop-dfs.sh ..on name node ...but changes are not taking effect it still points to the old directory ...
Am I missing anything while doing the configuration changes? And how can we make sure to use the new directory?
it's a multi node cluster
this is the hdfs-site.xml on name node
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///test/hadoop/hadoopinfra/hdfs/namenode</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///tmp/hadoop/data</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>2368709120</value>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>1.0</value>
</property>
</configuration>
this is the hdfs-site.xml under data node
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///test/hadoop/hadoopinfra/hdfs/datanode</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///tmp/hadoop/data</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>2368709120</value>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>1.0</value>
</property>
</configuration>

SecondaryNameNode Error - Lock on /app/hadoop/tmp/dfs/namesecondary/in_use.lock acquired by nodename

I'm just starting with Hadoop, facing issues in starting SecondaryNameNode(SNN). I could see below error from the logs
Error:
2015-10-28 00:26:58,495 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /app/hadoop/tmp/dfs/namesecondary/in_use.lock acquired by nodename 10496#sam-Notebook
Below are my conf files, is this because both NameNode and SNN tries to access/use same tmp directory?
hdfs-site.xml
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/sam/hadoop/dfs/data/</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/sam/hadoop/dfs/name/</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<!-- <value>localhost:54311</value> -->
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
PS: I used article as a reference to install hadoop
over looked, the actual error was "2015-10-27 23:34:21,320 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields." To fix it, delete the namesecondary directory under /app/hadoop/tmp/dfs and restarted hadoop

Resources