How to write log4j log files directly to HDFS? - hadoop

How to write log4j log file directly to Hadoop Distributed File System ? with out using FLUME, Scribe, Kafka. any other way

Related

Is there any way to log file directly in Parquet format using log4j

I need to log the file directly in parquet format using Log4j for some analytics requirements. Is there any way we can do that directly?

How HBase add its dependency jars and use HADOOP_CLASSPATH

48. HBase, MapReduce, and the CLASSPATH
By default, MapReduce jobs deployed to a MapReduce cluster do not have access to either the HBase configuration under $HBASE_CONF_DIR or the HBase classes.
To give the MapReduce jobs the access they need, you could add hbase-site.xml_to _$HADOOP_HOME/conf and add HBase jars to the $HADOOP_HOME/lib directory. You would then need to copy these changes across your cluster. Or you could edit $HADOOP_HOME/conf/hadoop-env.sh and add hbase dependencies to the HADOOP_CLASSPATH variable. Neither of these approaches is recommended because it will pollute your Hadoop install with HBase references. It also requires you restart the Hadoop cluster before Hadoop can use the HBase data.
The recommended approach is to let HBase add its dependency jars and use HADOOP_CLASSPATH or -libjars.
I'm learning how HBase interacts with MapReduce
I know what the above two ways mean, but I don't know how to configure the recommended way
Could anyone tell me how to configure it in the recommended way?
As the docs show, prior to running hadoop jar, you can export HADOOP_CLASSPATH=$(hbase classpath) and you can use hadoop jar ... -libjars [...]
The true recommended way would be to bundle your HBase dependencies as an Uber JAR in your mapreduce application
The only caveat is that you need to ensure that your project uses the same/compatible hbase-mapreduce client versions as the server.
That way, you don't need any extra configuration, except maybe specifying the hbase-site.xml

Spring batch read from parquet file

I am trying to read parquet file in Spring Batch Job and write is to JDBC. Is there any sample code for reader bean which can be used in springframework batch StepBuilderFactory?
Spring for Apache Hadoop has capabilities for reading and writing Parquet files. You can read more about that project here: https://spring.io/projects/spring-hadoop

Avro Schema Generation in HDFS

I have a scenario where I have some set of avro files in HDFS.And I need generate Avro Schema files for those AVRO data files in HDFS.I tried researching using Spark (https://github.com/databricks/spark-avro/blob/master/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala).
Is there any other than bringing the AVRO data file to local and doing HDFS PUT .
Any Suggestions are welcomed.Thanks !
Every avro file incorporates in it avro schema that it was written with. You can extract this schema using avro-tools.jar(download from maven). You can download only one part(assuming all other files were written with same schema) and use avro tools(java -jar ~/workspace/avro-tools-1.7.7.jar getschema xxx.avro) to extract it

Pig accessing HBase using Spring Data Hadoop

Has anyone got experience of using Spring Data Hadoop to run a Pig script that connects to HBase using Elephant Bird's HBaseLoader?
I'm new to all of the above, but need to take some existing Pig scripts that were executed via a shell script and instead wrap them up in a self-contained Java application. Currently the scripts are run from a specific server that has Hadoop, HBase and Pig installed, and config for all of the above in /etc/. Pig has the HBase config on its classpath, so I'm guessing this is how it know how to connect to HBase
I want to have all configuration in Spring. Is this possible if I need Pig to connect to HBase? How do I configure HBase such that the Pig script and the Elephant Bird library will know how to connect to it?

Resources