Cannot get schema from loadFunc org.apache.pig.builtin.AvroStorage - hadoop

I am getting following error while running following pig script
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/avro.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/json-simple-1.1.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/jackson-core-asl-1.8.8.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/jackson-mapper-asl-1.8.8.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar
list_cookies = LOAD '/user/xyz/testbed/llama-2014-Oct-12d/abc'
USING org.apache.pig.piggybank.storage.avro.AvroStorage();
got following error
2014-10-22 11:51:14,705 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc org.apache.pig.builtin.AvroStorage
Details at logfile: /home/xyz/pig_1413991623605.log

In my case, it was simply the fact that the input folder did not exist. Pig error messages are off the mark and not at all helpful. After changing the input folder to one that existed, this error went away. So, be sure to check that before spending a lot of time more difficult debugging!

Related

Can't use 'put'() to add data to hbase with happybase

My python version is 3.7, and after I ran pip3 install happybase, I started the command hbase thrift start and tried to write a brief .py file as following:
import happybase
connection = happybase.Connection('master')
table = connection.table('jmlr') #'jmlr' is a table in hbase
for i in table.scan():
print(i)
table.put('001', {'title':'dasds'}) #error here
connection.close()
When it's about to run table.put(), it reported such an error:
thriftpy2.transport.base.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')
And at the same time, the thrift reported an error:
ERROR [thrift-worker-1] thrift.TBoundedThreadPoolServer: Error occurred during processing of message. java.lang.IllegalArgumentException: Invalid famAndQf provided.
But just now I ran this python file again, it gave me a different error in thrift:
thrift.TBoundedThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Bad version in readMessageBegin
I have tried to add parameters like protocol='compact', transport='framed', but this didn't work, even the table.scan() failed.
Everything in the hbase shell is OK, so I can't figure out what went wrong, I'm about to collapse.
I ran into the same issue and found this sollution. You need to add even empty Column Qualifier ( ':' symbol as delimiter between Column Family and Column Qualifier) into put() method:
table.put('001:', {'title':'dasds'})
Also, you have a different error message after second run of script because thrift server is already failed.
I hope it will help you.

Error: command "bash" failed with no error message?

I am using terraform on my Mac system, and terraform apply results with below error
Error: command "bash" failed with no error message
on ssm.tf line 7, in data "external" "ssm-dynamic-general":
7: data "external" "ssm-dynamic-general" {
However there is nothing wrong in ssm.tf file, same runs perfectly fine in my another system.
Can some one please let me know what i am missing here?
You might have done what I accidentally did: not follow the external program protocol:
https://www.terraform.io/docs/providers/external/data_source.html#external-program-protocol
In my particular case, I failed to send the errors that were coming from my program to standard error. Instead, those errors were going to standard out.
That's why Terraform wasn't able to report on those errors.
So if you send any and all errors from your program to standard error using > &2, you should be able to see those errors when you run terraform plan.

Required field 'uncompressed_page_size' was not found in serialized data! Parquet

I am getting below error while trying to save parquet file from local directory using pyspark.
I tried spark 1.6 and 2.2 both give same error
It display's schema properly but gives error at the time of writing file.
base_path = "file:/Users/xyz/Documents/Temp/parquet"
reg_path = "file:/Users/xyz/Documents/Temp/parquet/ds_id=48"
df = sqlContext.read.option( "basePath",base_path).parquet(reg_path)
out_path = "file:/Users/xyz/Documents/Temp/parquet/out"
df2 = df.coalesce(5)
df2.printSchema()
df2.write.mode('append').parquet(out_path)
org.apache.spark.SparkException: Task failed while writing rows
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
In my own case, I was writing a custom Parquet Parser for Apache Tika and I experienced this error. It turned out that if the file is being used by another process, the ParquetReader will not be able to access uncompressed_page_size. Hence, causing the error.
Verify if other processes are not holding on to the file.
Temporary resolved by the spark config:
"spark.sql.hive.convertMetastoreParquet": "false"
Although it would has extra cost, but a walkaround approach by now.

Pig register jar, file does not exist error

I'm using Hortonworks sandbox and trying to run a simple pig script. There appear to be annoying error related to "file does not exist".
Below is the script:
REGISTER '/piggybank.jar';
inp = load '/my.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage..
ERROR 2997: Encountered IOException. File does not exist:
hdfs://sandbox.hortonworks.com:8020/tmp/udfs/ '/piggybank.jar'
However, my jar is present at the root(/) and I have given proper permission as well. Don't know why the path is pointing to /tmp/udfs....
Can anyone provide some suggestion?
Do not place the path within quotes. Also provide full URI of the Jar file location.
REGISTER hdfs://sandbox.hortonworks.com:8020/piggybank.jar;
Refer REGISTER (a jar/script).

PIG Cassandra ERROR 2118 Could not get input splits

I started off trying to do simple pig+cassandra integration with this tutorial from datastax: http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/ana/anaPigExRel.html
but when i try to store the result into cql, i get this error:
Message: org.apache.pig.backend.executionengine.ExecException: ERROR
2118: Could not get input splits
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
any ideas whats happening? i read some answers here, referring to changing my PIG_PARTITIONER to Murmur3Partitioner
which i already did and it still happens. is it configuration issue?
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
I found out that after doing:
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
i need to do source ~/.bashrc and do pig from that particular console.
though I get another error, but I think this case is solved.

Resources