Unix Shell Script as UDF for Pig and Hive - shell

Can we use unix shell script instead of using (Java or Python) for User Defined in Apache Pig and Hive?
If it is possible how can we mention in Hive Query or Pig script?

No, you can't use unix shell script as Pig UDF. Pig UDFs are currently supported only in six languages: Java, Jython, Python, JavaScript, Ruby and Groovy.
Please refer this link for more details
http://pig.apache.org/docs/r0.14.0/udf.html

Related

Is hive, Pig or Impala used from command line only?

I am new to Hadoop and have this confusion. Can you please help?
Q. How Hive, Pig or Impala are used in practical projects? Are they used from command line only or from within Java, Scala etc?
One can use Hive and Pig from the command line, or run scripts written in their language.
Of course it is possible to call(/build) these scripts in any way you like, so you could have a Java program build a pig command on the fly and execute it.
The Hive (and Pig) languages are typically used to talk to a Hive database. Besides this, it is also possible to talk to the hive database via a link (JDBC/ODBC). This could be done directly from anywhere, so you could let a java program make a JDBC connection to talk to your Hive tables.
Within the context of this answer, I belive everything I said about the Hive language also applies to Impala.

Spark & HCatalog?

I feel comfortable with loading HCatalog using Pig and was wondering if it's possible to use Spark instead of Pig. Unfortunately, I'm quite new with Spark...
Can you provide any materials on how to start? Are there any Spark libraries to use?
Any Examples? I've made all exercises on http://spark.apache.org/ but they are focusing on RDD and don't go any further..
I will be grateful for any help...
Regards
Pawel
You can use spark SQL to read from Hive Table instead of HCatalog.
https://spark.apache.org/sql/
You can apply same transformations like Pig using Spark Java/Scala/Python language like filter, join, group by..
You can reference the following link for using HCatalog InputFormat wrapper with Spark; which was written prior to SparkSQL.
https://gist.github.com/granturing/7201912
Our systems have loaded both and we can use either. Spark takes on traits of the language you are using, Scala, Python...,. For example using Spark with Python you can utilize many of the libraries of Python within Spark.

Process pst file from apache Pig

i am trying to process pst file from apache Pig. Do i need to write User define function to read pst?.How can I process/extracts pst using Pig way Is there any UDF ,library available in java which I can use in my Pig script. Any help would be great-full .
Thanks,
Vamshikrishna
I'm not aware that there is support for .pst files so probably yes, you need to write an UDF and more precisely a LOAD UDF.

Can PL/SQL Reliably be Converted to Pig Lating or an Oozie Pipeline with Pig Latin and Hive

I am curious about replacing my Oracle db with Hadoop and am learning about the Hadoop ecosystem.
I have many PL/SQL scripts that would require replacement if I were to go this route.
I am under the impression that with some hard work I would be able to convert/translate any PL/SQL script into an analogous Pig Latin script. If not only Pig Latin, then a combination of Hive and Pig via Oozie.
Is this correct?
While most SQL statements can be translated into equivalent Pig and/or Hive statements there are several limitations that are inherent to the hadoop filesystem that get passed down to the languages. The primary limitation is that HDFS is a write-once, read-many system. This means that a statement that includes something like an UPDATE SQL command, or a DELETE sql command will not work. This is primarily due to the fact that both would require that the programming language be capable of changing the contents of an already existing file, which would contradict the write-once paradigm of hadoop.
There are however workaround to these. These commands can both be simulated through copying the file in question and making the changes when writing to the copy, deleting the original, and moving the copy into the original's location. Neither pig nor Hive have this functionality so you would have to slightly branch out of these languages in order to do so. For instance a few lines of bash could probably handle the deletion amd movement of the copy once the pig script has executed. Given that you can use bash to call the pig script in the first place this allows for a fairly simple solution. Or you could look into HBase which provides the ability to do something similar. However both solutions involve things outside of Pig/Hive, so if you absolutely cannot go outside of those languages the answer is no.
You can use PL/HQL - Procedural SQL on Hadoop which is open source project and it is intended to provide PL/SQL-like procedural language for Hive and other SQL-on-Hadoop implementations.
PL/HQL is an open source tool (Apache License 2.0) that implements procedural SQL language for Apache Hive and other SQL-on-Hadoop implementations.
PL/HQL language is compatible to a large extent with Oracle PL/SQL, ANSI/ISO SQL/PSM (IBM DB2, MySQL, Teradata i.e), Teradata BTEQ, PostgreSQL PL/pgSQL (Netezza), Transact-SQL (Microsoft SQL Server and Sybase) that allows you leveraging existing SQL/DWH skills and familiar approach to implement data warehouse solutions on Hadoop. It also facilitates migration of existing business logic to Hadoop.

FastExport script for loading data into Hadoop?

I am new to the FastExport script.
Can anyone give me an example FastExport script for loading data from Teradata into Hadoop.
I need the entire script.
Thanks in advance.
AFAIK you can't FastExport directly from Teradata to Hadoop.
But there's a new Teradata Conector for Hadoop which supports both import and export.
http://developer.teradata.com/connectivity/articles/teradata-connector-for-hadoop-now-available
The tutorial shows sample scripts to export to Hadoop:
https://developer.teradata.com/sites/all/files/Teradata%20Connector%20for%20Hadoop%20Tutorial%20v1%200%20final.pdf

Resources