how to solve "Error during parsing. could not instantiate" in pig?

how to solve "Error during parsing. could not instantiate" in pig? - hadoop

Hello every one i am new in Pig i am trying following pig script :
then it shows following error :
ERROR 1000: Error during parsing. could not instantiate 'UPER' with arguments 'null' Details at logfile: /home/training/pig_1371303109105.log
my Pig script:
register udf.jar;
A = LOAD 'data1.txt' USING PigStorage(',') AS (name:chararray, class:chararray, age:int);
B = foreach A generate UPER(class);
I follow this tutorial .
My java class is :
enter code here
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import java.io.*;
public class UPER extends EvalFunc<String>{
#Override
public String exec(Tuple input) throws IOException {
// TODO Auto-generated method stub
if(input == null ||input.size() ==0)
return null;
try
{
String str=(String)input.get(0);
return str.toUpperCase();
}
catch(Exception e){
throw new IOException("Caught exception processing input row ", e);
}}
}

I found the following information from your error log:
Caused by: java.lang.Error: Unresolved compilation problem:
The type org.apache.commons.logging.Log cannot be resolved. It is indirectly referenced from required .class files
at UPER.<init>(UPER.java:1)
I guess that org.apache.commons.logging.Log is not in your environment. How did you run your Pig script? This class should have been in the Pig envrionment. org.apache.commons.logging.Log is in commons-logging-*.*.*.jar

Related

Unable to load a resource from a different jar file than my springboot jar from the commandline but it works in eclipse

I have three projects and three jar files.
My springboot project has a dependency on two other projects.
When I run the application within eclipse, by going to the main class within the springboot application everything is fine.
When I try to run the app with
java -jar target/springbootproj.jar
It fails with
java.lang.IllegalArgumentException: illegal path sql/h2/h2_full.sql
The offending line is
public static String getClassLoaderResourceAsString(String path) {
InputStream is = getClassLoaderResourceAsStream(path);
String retval;
try {
retval = new String(is.readAllBytes());
} catch (IOException e) {
throw new RuntimeException("unable to read resource '" + path + "' " + e.getMessage());
}
return retval;
}
I can see the resource file in my Springboot jar if I extract the dependent jar fie
jar xvf target/vend-web-22.1.0-SNAPSHOT.jar BOOT-INF/lib/core
and then look in the extracted jar, this inter project resource loading works for me very well, until I create a springboot jar.
This runs just fine within Eclipse but fails when I run the springboot jar, built with maven from the commandline with
java -jar springboot-webapp.jar
Trying this:
#Component
#SpringBootApplication
#EnableConfigurationProperties(StorageProperties.class)
public class UploadingFilesApplication {
#Autowired
private ResourceLoader resourceLoader;
#Value("classpath:joblogger/postgresql/joblog_schema.sr.sql")
private Resource joblogResource;
public static void main(String[] args) {
SpringApplication.run(UploadingFilesApplication.class, args);
}
#Bean
CommandLineRunner init(StorageService storageService) {
System.out.println("resourceLoader is " + resourceLoader);
System.out.println("joblogResource is " + joblogResource);
try {
String ddl Files.readString(Paths.get(joblogResource.getURI()), StandardCharsets.UTF_8);
results in
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.boot.CommandLineRunner]: Factory method 'init' threw exception; nested exception is java.nio.file.FileSystemNotFoundException
Getting closer
I wrote the following
package org.webapp
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
import org.springframework.util.ResourceUtils;
public class ResourceGetter {
public static String loadResource(String resourceName) {
try {
File file = ResourceUtils.getFile("classpath:" + resourceName);
return FileUtils.readFileToString(file,"UTF-8");
} catch (IOException e) {
throw new RuntimeException(e) ;
}
}
}
And I get
Caused by: java.io.FileNotFoundException: class path resource [/joblogger/postgresql/joblog_schema.sr.sql] cannot be resolved to absolute file path because it does not reside in the file system: jar:file:/home/jjs/git/diamond-9.5/javautil/vendweb/target/vend-web-22.1.0-SNAPSHOT.jar!/BOOT-INF/lib/javautil-core-22.1.0-SNAPSHOT.jar!/joblogger/postgresql/joblog_schema.sr.sql
But it appears to be there.
jar xvf target/vend-web-22.1.0-SNAPSHOT.jar BOOT-INF/lib/javautil-core-22.1.0-SNAPSHOT.jar
extracted: BOOT-INF/lib/javautil-core-22.1.0-SNAPSHOT.jar
jar tvf BOOT-INF/lib/javautil-core-22.1.0-SNAPSHOT.jar joblogger/postgresql/joblog_schema.sr.sql
4107 Tue Mar 29 16:33:22 EDT 2022 joblogger/postgresql/joblog_schema.sr.sql
enter code here

Apache Beam - Unable to infer a Coder on a DoFn with multiple output tags

I am trying to execute a pipeline using Apache Beam but I get an error when trying to put some output tags:
import com.google.cloud.Tuple;
import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.windowing.FixedWindows;
import org.apache.beam.sdk.transforms.windowing.Window;
import org.apache.beam.sdk.values.TupleTag;
import org.apache.beam.sdk.values.TupleTagList;
import org.joda.time.Duration;
import java.lang.reflect.Type;
import java.util.Map;
import java.util.stream.Collectors;
/**
* The Transformer.
*/
class Transformer {
final static TupleTag<Map<String, String>> successfulTransformation = new TupleTag<>();
final static TupleTag<Tuple<String, String>> failedTransformation = new TupleTag<>();
/**
* The entry point of the application.
*
* #param args the input arguments
*/
public static void main(String... args) {
TransformerOptions options = PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(TransformerOptions.class);
Pipeline p = Pipeline.create(options);
p.apply("Input", PubsubIO
.readMessagesWithAttributes()
.withIdAttribute("id")
.fromTopic(options.getTopicName()))
.apply(Window.<PubsubMessage>into(FixedWindows
.of(Duration.standardSeconds(60))))
.apply("Transform",
ParDo.of(new JsonTransformer())
.withOutputTags(successfulTransformation,
TupleTagList.of(failedTransformation)));
p.run().waitUntilFinish();
}
/**
* Deserialize the input and convert it to a key-value pairs map.
*/
static class JsonTransformer extends DoFn<PubsubMessage, Map<String, String>> {
/**
* Process each element.
*
* #param c the processing context
*/
#ProcessElement
public void processElement(ProcessContext c) {
String messagePayload = new String(c.element().getPayload());
try {
Type type = new TypeToken<Map<String, String>>() {
}.getType();
Gson gson = new Gson();
Map<String, String> map = gson.fromJson(messagePayload, type);
c.output(map);
} catch (Exception e) {
LOG.error("Failed to process input {} -- adding to dead letter file", c.element(), e);
String attributes = c.element()
.getAttributeMap()
.entrySet().stream().map((entry) ->
String.format("%s -> %s\n", entry.getKey(), entry.getValue()))
.collect(Collectors.joining());
c.output(failedTransformation, Tuple.of(attributes, messagePayload));
}
}
}
}
The error shown is:
Exception in thread "main" java.lang.IllegalStateException: Unable to
return a default Coder for Transform.out1 [PCollection]. Correct one
of the following root causes: No Coder has been manually specified;
you may do so using .setCoder(). Inferring a Coder from the
CoderRegistry failed: Unable to provide a Coder for V. Building a
Coder using a registered CoderProvider failed. See suppressed
exceptions for detailed failures. Using the default output Coder from
the producing PTransform failed: Unable to provide a Coder for V.
Building a Coder using a registered CoderProvider failed.
I tried different ways to fix the issue but I think I just do not understand what is the problem. I know that these lines cause the error to happen:
.withOutputTags(successfulTransformation,TupleTagList.of(failedTransformation))
but I do not get which part of it, what part needs a specific Coder and what is "V" in the error (from "Unable to provide a Coder for V").
Why is the error happening? I also tried to look at Apache Beam's docs but they do not seems to explain such a usage nor I understand much from the section discussing about coders.
Thanks

First, I would suggest the following -- change:
final static TupleTag<Map<String, String>> successfulTransformation =
new TupleTag<>();
final static TupleTag<Tuple<String, String>> failedTransformation =
new TupleTag<>();
into this:
final static TupleTag<Map<String, String>> successfulTransformation =
new TupleTag<Map<String, String>>() {};
final static TupleTag<Tuple<String, String>> failedTransformation =
new TupleTag<Tuple<String, String>>() {};
That should help the coder inference determine the type of the side output. Also, have you properly registered a CoderProvider for Tuple?

Thanks to #Ben Chambers' answer, Kotlin is:
val successTag = object : TupleTag<MyObj>() {}
val deadLetterTag = object : TupleTag<String>() {}

malformedURLException : no protocol

I am trying to execute a simple Hadoop program to read the contents of the file and print it on to the console.
I am following Hadoop The definitive guide : URLCat example
I am getting malformed URL Exception : no protocol
When i use -cat with hdfs://localhost/user/training/test.txt i am getting the contents printed out but when i use the same path while executing the jar i am getting the mentioned exception.
I have added static block where it sets the URLStreamHandlerFactory
EDITED :
My Program :
import java.io.InputStream;
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
// vv URLCat
public class URLCat {
static {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception {
InputStream in = null;
try {
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
}
}
}

hadoop hive udf fails

I've written the following UDF:
ISO8601ToHiveFormat.java:
package hiveudfs;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public class ISO8601ToHiveFormat extends UDF {
public String hourFromISO8601(final String d){
try{
if( d == null )
return null;
SimpleDateFormat sdf1= new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
SimpleDateFormat sdf2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
return sdf2.format(sdf1.parse(d));
} catch (ParseException pe) {
return null;
}
}
}
In the src folder of my project I ran the follwing compile command to compile it:
javac -cp /usr/lib/hive/lib/hive-exec-0.10.0-cdh4.3.0.jar ISO8601ToHiveFormat.java
and supsequntly I packed it into a jar
jar cf ../../HiveUDFs.jar hiveudfs/ISO8601ToHiveFormat.*
So, then I started hive and did:
hive> add jar /home/tom/Java/HiveUDFs.jar;
Added /home/tom/Java/HiveUDFs.jar to class path
Added resource: /home/tom/Java/HiveUDFs.jar
hive> create temporary function hourFromISO8601 as 'hiveudfs.ISO8601ToHiveFormat';
OK
Time taken: 0.083 seconds
hive> SELECT hourFromISO8601(logtimestamp) FROM mytable LIMIT 10;
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'logtimestamp': No matching method for class hiveudfs.ISO8601ToHiveFormat with (string). Possible choices:
hive>
The output of
hive> describe mytable;
OK
...
logtimestamp string
...
What am I doing wrong here?

toom - you have to override this(evaluate) method. then only UDF works
public class yourclassname extends UDF {
public String **evaluate**(your args) {
// your computation logic
return your_result;
}
}

As ramisetty.vijay says, you need to override the evaluate() method. Note you can provide multiple implementations of evaluate both with differing input parameters as well as return type.

Passing a filename to Java UDF from Pig using distributed cache

I am using a small map file in my Java UDF function and I want to pass the filename of this file from Pig through the constructor.
Following is the relevant part from my UDF function
public GenerateXML(String mapFilename) throws IOException {
this(null);
}
public GenerateXML(String mapFilename) throws IOException {
if (mapFilename != null) {
// do preocessing
}
}
In the Pig script I have the following line
DEFINE GenerateXML com.domain.GenerateXML('typemap.tsv');
This works in local mode, but not in distributed mode. I am passing the following parameters to Pig in command line
pig -Dmapred.cache.files="/path/to/typemap.tsv#typemap.tsv" -Dmapred.create.symlink=yes -f generate-xml.pig
And I am getting the following exception
2013-01-11 10:39:42,002 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file generate-xml.pig, line 16, column 42> Failed to generate logical plan. Nested exception: java.lang.RuntimeException: could not instantiate 'com.domain.GenerateXML' with arguments '[typemap.tsv]'
Any idea what I need to change to make it work?

The problem is solved now.
It seems that when I run the Pig script using following parameters
pig -Dmapred.cache.files="/path/to/typemap.tsv#typemap.tsv" -Dmapred.create.symlink=yes -f generate-xml.pig
The /path/to/typemap.tsv should be the local path and not a path in HDFS.

You can use getCacheFiles function in a Pig UDF and it will be enough - you don't have to use any additional properties like mapred.cache.files. Your case can be implemented like this:
public class UdfCacheExample extends EvalFunc<Tuple> {
private Dictionary dictionary;
private String pathToDictionary;
public UdfCacheExample(String pathToDictionary) {
this.pathToDictionary = pathToDictionary;
}
#Override
public Tuple exec(Tuple input) throws IOException {
Dictionary dictionary = getDictionary();
return createSomething(input);
}
#Override
public List<String> getCacheFiles() {
return Arrays.asList(pathToDictionary);
}
private Dictionary getDictionary() {
// lazy initialization here
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to solve "Error during parsing. could not instantiate" in pig? - hadoop

Related

Unable to load a resource from a different jar file than my springboot jar from the commandline but it works in eclipse

Apache Beam - Unable to infer a Coder on a DoFn with multiple output tags

malformedURLException : no protocol

hadoop hive udf fails

Passing a filename to Java UDF from Pig using distributed cache

Categories

Resources