hadoop hive udf fails - hadoop

I've written the following UDF:
ISO8601ToHiveFormat.java:
package hiveudfs;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public class ISO8601ToHiveFormat extends UDF {
public String hourFromISO8601(final String d){
try{
if( d == null )
return null;
SimpleDateFormat sdf1= new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
SimpleDateFormat sdf2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
return sdf2.format(sdf1.parse(d));
} catch (ParseException pe) {
return null;
}
}
}
In the src folder of my project I ran the follwing compile command to compile it:
javac -cp /usr/lib/hive/lib/hive-exec-0.10.0-cdh4.3.0.jar ISO8601ToHiveFormat.java
and supsequntly I packed it into a jar
jar cf ../../HiveUDFs.jar hiveudfs/ISO8601ToHiveFormat.*
So, then I started hive and did:
hive> add jar /home/tom/Java/HiveUDFs.jar;
Added /home/tom/Java/HiveUDFs.jar to class path
Added resource: /home/tom/Java/HiveUDFs.jar
hive> create temporary function hourFromISO8601 as 'hiveudfs.ISO8601ToHiveFormat';
OK
Time taken: 0.083 seconds
hive> SELECT hourFromISO8601(logtimestamp) FROM mytable LIMIT 10;
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'logtimestamp': No matching method for class hiveudfs.ISO8601ToHiveFormat with (string). Possible choices:
hive>
The output of
hive> describe mytable;
OK
...
logtimestamp string
...
What am I doing wrong here?

toom - you have to override this(evaluate) method. then only UDF works
public class yourclassname extends UDF {
public String **evaluate**(your args) {
// your computation logic
return your_result;
}
}

As ramisetty.vijay says, you need to override the evaluate() method. Note you can provide multiple implementations of evaluate both with differing input parameters as well as return type.

Related

Add dependence of the project into my custom gradle plugin

I'm new in the Gradle world and I'm writing a personal plugin for executing the operation on the database, an example:
create a database, delete the database, create a table e insert a value into database, but I have a problem with import dependence for the project that uses my plugin, an example for creating a database using a JDBC I have to need the driver JDBC for the database, this driver is content into project main.
My question is: How getting a dependency jar for the database into my Gradle plugin?
This is my code
package io.vincentpalazzo.gradledatabase.task;
import io.vincentpalazzo.gradledatabase.exstension.GradleDatabaseExstension;
import io.vincentpalazzo.gradledatabase.persistence.DataSurce;
import org.gradle.api.DefaultTask;
import org.gradle.api.artifacts.Configuration;
import org.gradle.api.tasks.TaskAction;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLClassLoader;
import java.util.Iterator;
import java.io.File;
import java.util.Set;
/**
* #author https://github.com/vincenzopalazzo
*/
public class CreateDatabaseTask extends DefaultTask {
#TaskAction
public void createAction() {
GradleDatabaseExstension project = getProject().getExtensions().findByType(GradleDatabaseExstension.class);
String url = project.getUrl();
String driverClass = project.getDriver(); //The drive name database is different
String username = project.getUsername();
String password = project.getPassword();
String nameDatabase = project.getNameDatabase();
String nameJar = project.getNameJar();
if (findDependecyFileJarForDriver(nameJar)) {
System.out.println("Jar findend");
} else {
System.out.println("Jar not found");
}
DataSurce dataSource = new DataSurce();
if (dataSource.connectionDatabase(driverClass, url, username, password)) {
if (dataSource.createDatabese(nameDatabase)) {
System.out.println("Database " + nameDatabase + " created");
}
}
}
private boolean findDependecyFileJarForDriver(String nameJar) {
if (nameJar == null || nameJar.isEmpty()) {
throw new IllegalArgumentException("The input parameter is null");
}
Iterator<Configuration> iterable = getProject().getConfigurations().iterator();
boolean finded = false;
while ((!finded) || (iterable.hasNext())) {
Configuration configuration = iterable.next();
Set<File> filesSet = configuration.resolve();
for (File file : filesSet) {
String nameFile = file.getName();
if (nameFile.contains(nameJar)) {
//Now?;
finded = true;
}
}
}
return finded;
}
}
And this is my project and this is the referend for my post on Gradle forum
Sorry for my terrible English but I'm learning
I want to add the answer to this post.
The better solution I found is using this plugin
I used the plugin inside the my code, this is an example
public abstract class AbstractTaskGradleDatabase extends DefaultTask {
protected JarHelper jarHelper;
protected Optional<File> jar;
protected void init(){
jarHelper = new JarHelper(getProject());
jar = jarHelper.fetch("nameDependence");
}
}
inside the builld.gradle
dependencies {
implementation gradleApi()
implementation 'com.lingocoder:jarexec.plugin:0.3'
}
ps: the answer can be changed in the time because the version of the plugin is an beta

IllegalStateException: Cannot convert value of type 'java.sql.Timestamp' to required type 'java.time.LocalDateTime' for property

I'm working on an spring-boot/jpa/ mysql project. Now so far everything worked with DateTime objects when fetching/storing objects with the repository.
The problem has now occured when I use the Jdbc Template to execute a custom sql query.
org.springframework.beans.ConversionNotSupportedException: Failed to convert property
value of type 'java.sql.Timestamp' to required type java.time.LocalDateTime' for
property 'from_time': no matching editors or conversion strategy found
The idea is to fetch Time slots (has a start time and duration in minutes) that are overlapping with a new incoming entry.
To get back my objects I was first using a BeanPropertyMapper and then switched to a custom NestedRowMapper.
The resulting conflicting time slots I want to get look like this:
{
id: 1
comment: "i worked 60minutes"
from_time: "2018-06-16 13:00"
duration_minutes: 60
task: {
name: "My task"
...
}
}
This is the method where I run into the issue:
public List<TimeSlot> getOverlappingEntries(TimeSlot timeslot) throws SQLException {
String sql = "SELECT time_slot.comment, time_slot.from_time,"
+ "DATE_ADD(from_time, INTERVAL duration_minutes MINUTE) AS end_time, "
+ " task.name as `task.name`, task.category as `task.category` "
+ " FROM `time_slot` " + " INNER JOIN task on task.id = time_slot.task_id "
+ " WHERE person_id = ? "
+ " HAVING ? < end_time AND DATE_ADD(? ,INTERVAL ? MINUTE) > from_time;";
PreparedStatementCreator prepared = (con) -> {
PreparedStatement prep = con.prepareStatement(sql);
prep.setObject(1, timeslot.person.id);
prep.setObject(2, timeslot.from_time);
prep.setObject(3, timeslot.from_time);
prep.setObject(4, timeslot.durationMinutes);
logger.info(prep.toString());
return prep;
};
return this.connector.query(prepared, NestedRowMapper.get(TimeSlot.class));
}
Now I would imagine spring is capable of converting those objects easily. And anyway there is the simple way of timestamp.toLocalDateTime() to do so. The problem seems more how to register this as a converter service or how to fix spring-boot configuration to do so.
I tried already a custom converter service but that didn't help:
#javax.persistence.Converter
public class SqlTimestampToLocalDateTimeConverter implements Converter<Timestamp,
LocalDateTime>, AttributeConverter<Timestamp, LocalDateTime> {
#Convert
#Override
public LocalDateTime convert(Timestamp source) {
return source.toLocalDateTime();
}
#Override
public LocalDateTime convertToDatabaseColumn(Timestamp attribute) {
return attribute.toLocalDateTime();
}
#Override
public Timestamp convertToEntityAttribute(LocalDateTime dbData) {
return Timestamp.valueOf(dbData);
}
}
Also many other answers on the internet mentioned that this was already implemented with spring framework 4.x.
The dependencies in the project look like this (build.gradle):
dependencies {
compile "org.springframework.boot:spring-boot-starter-thymeleaf:2.0.2.RELEASE"
compile "org.springframework.boot:spring-boot-starter-web:2.0.2.RELEASE"
compile "org.springframework.boot:spring-boot-starter-security:2.0.2.RELEASE"
compile "org.springframework.boot:spring-boot-starter-data-jpa:2.0.2.RELEASE"
compile "mysql:mysql-connector-java:5.1.46"
compileOnly "org.springframework.boot:spring-boot-devtools:2.0.2.RELEASE"
compile 'org.springframework.data:spring-data-rest-webmvc:3.0.7.RELEASE'
compile 'com.querydsl:querydsl-jpa:4.1.4'
compile 'com.querydsl:querydsl-apt:4.1.4:jpa'
testCompile("junit:junit")
testCompile("org.springframework.boot:spring-boot-starter-test")
testCompile("org.springframework.security:spring-security-test")
}
Thank you for any hints, how to solve this!
/edit:
I think I see a possible workaround now. What I could do is just to fetch the id's of all time slots and then use the repository to fetch the actual objects with their data (also their task objects).
But that feels definitely not like the optimal solution...
This is the NestedRowMapper I use:
import org.springframework.beans.*;
import org.springframework.jdbc.core.RowMapper;
import org.springframework.jdbc.support.JdbcUtils;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
public class NestedRowMapper<T> implements RowMapper<T> {
private Class<T> mappedClass;
public static <T> NestedRowMapper<T> get(Class<T> mappedClass) {
return new NestedRowMapper<>(mappedClass);
}
public NestedRowMapper(Class<T> mappedClass) {
this.mappedClass = mappedClass;
}
#Override
public T mapRow(ResultSet rs, int rowNum) throws SQLException {
try {
T mappedObject = this.mappedClass.newInstance();;
BeanWrapper bw = PropertyAccessorFactory.forBeanPropertyAccess(mappedObject);
bw.setAutoGrowNestedPaths(true);
ResultSetMetaData meta_data = rs.getMetaData();
int columnCount = meta_data.getColumnCount();
for (int index = 1; index <= columnCount; index++) {
try {
String column = JdbcUtils.lookupColumnName(meta_data, index);
Object value = JdbcUtils.getResultSetValue(rs, index, Class.forName(meta_data
.getColumnClassName(index)));
bw.setPropertyValue(column, value);
} catch (TypeMismatchException | NotWritablePropertyException
| ClassNotFoundException e) {
e.printStackTrace();
}
}
return mappedObject;
} catch (InstantiationException | IllegalAccessException e1) {
throw new RuntimeException(e1);
}
}
}
You're on the right lines that you can define a RowMapper that tells your app what type of object each column needs to be mapped to. I would recommend trying to use JdbcTemplate.query method: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/core/JdbcTemplate.html#query-java.lang.String-java.lang.Object:A-org.springframework.jdbc.core.RowMapper-
You will need to define a RowMapper (not necessarily a NestedRowMapper, you could try ParameterizedRowMapper), then pass that into query with your SQL and WHERE conditions mapped as args.
I think the bast way to use BeanPropertyRowMapper.newInstance(TimeSlot.class) in your getOverlappingEntries method
try this on NestedRowMapper.mapRow
if (value instanceof Timestamp) value = ((Timestamp) value).toLocalDateTime();

Passing a bag as an input for UDF in PIG

I'm trying to pass a databag(final) as an input.
dump final;
gives:-
(4,john,john,David,Banking ,4,M,20-01-1994,78.65,345000,Arkansasdest1,Destination)
(4,john,john,David,Banking ,4,M,20-01-1994,78.65,345000,Arkanssdest2,Destination)
(4,johns,johns,David,Banking ,4,M,20-01-1994,78.65,345000,ArkansasSrc1,source)
(4,johns,johns,David,Banking ,4,M,20-01-1994,78.65,345000,ArkansaSrc2,source)
I'm about to write an UDF for processing the above databag and finding mismatch between Source and Destination, in order to do that i have to check whether my UDF accepts databag or not. so i wrote one sample UDF below:
package PigUDFpck;
import java.io.IOException;
import java.util.Iterator;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
public class databag extends EvalFunc<DataBag> {
TupleFactory mTupleFactory = TupleFactory.getInstance();
BagFactory mBagFactory = BagFactory.getInstance();
public DataBag exec(Tuple input) throws IOException { // different return type
DataBag result = mBagFactory.newDefaultBag(); // change here
DataBag values = (DataBag)input.get(0);
for (Iterator<Tuple> iterator = values.iterator(); iterator.hasNext();) {
Tuple tuple = iterator.next();
//logic
Tuple t = mTupleFactory.getInstance().newTuple();
t.append(tuple);
result.add(t);
}
return result; // change here
}
}
After that I registered the path using
REGISTER /usr/local/pig/UDF/UDFBAG.jar;
DEFINE Databag Databag(); // not sure how to define it
2017-02-16 19:07:05,875 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s). //got this warning after defining.
final1 = FOREACH final GENERATE(Databag(final));
ERROR 1200: Pig script failed to parse:
Invalid scalar projection: final : A column needs to be projected from a relation for it to be used as a scalar
Please help me on Defining the UDF and how to pass a DataBag to UDF
Thanks
Try
final1 = FOREACH final GENERATE(Databag(*));
Though as far as I see, your final contains tuples, not bags of tuples, so you'll probably need to first group it by some key. in that case it will be smth like
final1 = FOREACH (group final [by key or all]) GENERATE(Databag(final));

how to solve "Error during parsing. could not instantiate" in pig?

Hello every one i am new in Pig i am trying following pig script :
then it shows following error :
ERROR 1000: Error during parsing. could not instantiate 'UPER' with arguments 'null' Details at logfile: /home/training/pig_1371303109105.log
my Pig script:
register udf.jar;
A = LOAD 'data1.txt' USING PigStorage(',') AS (name:chararray, class:chararray, age:int);
B = foreach A generate UPER(class);
I follow this tutorial .
My java class is :
enter code here
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import java.io.*;
public class UPER extends EvalFunc<String>{
#Override
public String exec(Tuple input) throws IOException {
// TODO Auto-generated method stub
if(input == null ||input.size() ==0)
return null;
try
{
String str=(String)input.get(0);
return str.toUpperCase();
}
catch(Exception e){
throw new IOException("Caught exception processing input row ", e);
}}
}
I found the following information from your error log:
Caused by: java.lang.Error: Unresolved compilation problem:
The type org.apache.commons.logging.Log cannot be resolved. It is indirectly referenced from required .class files
at UPER.<init>(UPER.java:1)
I guess that org.apache.commons.logging.Log is not in your environment. How did you run your Pig script? This class should have been in the Pig envrionment. org.apache.commons.logging.Log is in commons-logging-*.*.*.jar

Passing a filename to Java UDF from Pig using distributed cache

I am using a small map file in my Java UDF function and I want to pass the filename of this file from Pig through the constructor.
Following is the relevant part from my UDF function
public GenerateXML(String mapFilename) throws IOException {
this(null);
}
public GenerateXML(String mapFilename) throws IOException {
if (mapFilename != null) {
// do preocessing
}
}
In the Pig script I have the following line
DEFINE GenerateXML com.domain.GenerateXML('typemap.tsv');
This works in local mode, but not in distributed mode. I am passing the following parameters to Pig in command line
pig -Dmapred.cache.files="/path/to/typemap.tsv#typemap.tsv" -Dmapred.create.symlink=yes -f generate-xml.pig
And I am getting the following exception
2013-01-11 10:39:42,002 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file generate-xml.pig, line 16, column 42> Failed to generate logical plan. Nested exception: java.lang.RuntimeException: could not instantiate 'com.domain.GenerateXML' with arguments '[typemap.tsv]'
Any idea what I need to change to make it work?
The problem is solved now.
It seems that when I run the Pig script using following parameters
pig -Dmapred.cache.files="/path/to/typemap.tsv#typemap.tsv" -Dmapred.create.symlink=yes -f generate-xml.pig
The /path/to/typemap.tsv should be the local path and not a path in HDFS.
You can use getCacheFiles function in a Pig UDF and it will be enough - you don't have to use any additional properties like mapred.cache.files. Your case can be implemented like this:
public class UdfCacheExample extends EvalFunc<Tuple> {
private Dictionary dictionary;
private String pathToDictionary;
public UdfCacheExample(String pathToDictionary) {
this.pathToDictionary = pathToDictionary;
}
#Override
public Tuple exec(Tuple input) throws IOException {
Dictionary dictionary = getDictionary();
return createSomething(input);
}
#Override
public List<String> getCacheFiles() {
return Arrays.asList(pathToDictionary);
}
private Dictionary getDictionary() {
// lazy initialization here
}
}

Resources