parquet version used to write a file - hadoop

is there a way to find out what parquet version was used to write a parquet file in HDFS?
I'm trying to see if various files were written using the same parquet version or different versions.

$ hadoop jar parquet-tools-1.9.0.jar meta my-parquet-file.parquet |grep "parquet-mr version"
creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)

Other option using parquet-tools
parquet-tools meta --debug file.parquet

Related

Where is the bin directory in Hadoop project dir?

I have been following the official as well as DigitalOcean's documentation(tutorial) but I could not follow their lead. Each of them suggests editing a file an running the Hadoop after editing:
etc/hadoop/hadoop-env.sh in the dist directory.
I am unable to find such a file in the whole extracted dir from the latest stable release as well a release of 2.7.7
Where is the etc/hadoop/hadoop-env.sh ?
The paths in the guide may or may not represent the actual bundle packaging. You should run find and see the results from the tar.

When import file, it reads it as CSV and garbles the data

Running sparkling-shell (tried versions 2.2.2 - 2.2.6) on with Spark2 (under CDH 5.13 under Linux 7.2). CSV and ZIP files import fine, but when I tried to import a Parquet file, it reads it as CSV and garbles the data.
Anyone has any suggestions?
Shankar
Sparkling-water 2.2.7 seems to work better. However, it was looking for the "h20.jar" file for the parsers. So, installed the latest version of h2o, then modified the "sparkling-shell" script to include the h2o.jar file while launching the spark-shell.
Shankar.

Hadoop Installation - Directory structure - /etc/hadoop

I downloaded hadoop 2.7 and each installation guide that I have found mentions /etc/hadoop/.. directory but the distribution that I have downloaded doesn't have this directory.
I tried with Hadoop 2.6 as well and it doesn't have this directory either.
Should I create these directories ?
Caveat; I am a complete newbie !
Thanks in advance.
It seems you have downloaded the source. Build the Hadoop source then you will get that folder.
To build the hadoop source, refer building.txt file available in the Hadoop package that you have downloaded.
Try downloading hadoop-2.6.0.tar.gz instead of hadoop-2.6.0-src.tar.gz from Hadoop Archive. As #Kumar has mentioned, you might have source distribution.
If you don't want to compile hadoop from source, then download hadoop-2.6.0.tar.gz from the link given above.
Try to download compiled hadoop from the below link
http://a.mbbsindia.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz instead of downloading source
you will get the /etc/hadoop/.. path in it..
Download it from Apache Website
Apache Hadoop link

Sqoop Installation with hadoop 2.2.0?

I am trying to install all apache hadoop components in my system. I installed hadoop-2.2.0, hive-0.11.0, pig-0.12.0, hbase-0.96.0, now its time to install sqoop. So please suggest me installation steps of sqoop which is compatable with hadoop-2.2.0 and hbase.
Hope for reply soon
thanks in advance for reply back.
#Naveen:The link that you have provided is for Sqoop2.It is not specifically for Hadoop 2.0 branch.Basically it tries to resolve and enhance Sqoop by changing the design to client server model(i.e it's major promises include ease of use,ease of extension,security).For more details,find this interesting video for sqoop2 #https://www.youtube.com/watch?v=hg683-GOWP4.
We can use the latest Sqoop(version 1.4.4(compiled library for hadoop2.0) or 1.4.5) from ASF.Just download the correct version of sqoop for hadoop 2.0 branch.For e.g sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz can be downloaded and used without any issue with Hadoop 2.0+ versions.
If you couldn't find sqoop version(I assume you are using versions earlier than 1.4.4) for Hadoop2.0 + from the ASF site,you have to recompile the sqoop source code for hadoop 2.0 branch.But it is not required since you can just use the latest sqoop version which supports hadoop 2.0(Hope you are not looking for production ready sqoop version for hadoop 2.0 since the recent versions of sqoop for hadoop2 is still in alpha phase!!!)
I haven't tried Sqoop2 yet.It will also help with the new enhancements for all Hadoop versions 1.0,2.0.
Thank you
Try These steps for installation of sqoop with hadoop 2.2.0
https://sqoop.apache.org/docs/1.99.1/Installation.html

javaerror while installing the hive

I want to install Hive and hadoop on my ubuntu.I followed this article all of things seems good but the end step when I write this command an error about Java appear like this:
/home/babak/Downloads/hadoop/bin/../bin hadoop: row 258:/usr/lib/j2sdk1.5-sun/bin/java: file or Folder not found
what should i do to solve this problem?
You need to find where on your machine java is installed:
which java
and then from there follow any symlinks or wrapper scripts to the actual location of the java executable.
An easier way to do this is to run the file indexer and then locate the file (here i use the jps executable, which is in the same folder as java:
#> sudo updatedb
#> locate jps
Whatever you get back, trim off the bin/jps suffix, and that's your JAVA_HOME value. If you can't find the executable, than you'll need to install java
Hadoop requires Java version 1.6 or higher. It seems like hadoop is looking for Java 1.5. Also, make sure the variable HADOOP_HOME is set in file /conf/hadoop-env.sh
I have a line like the following in mine:
export JAVA_HOME=/usr/lib/jvm/java-6-sun/

Resources