I started off trying to do simple pig+cassandra integration with this tutorial from datastax: http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/ana/anaPigExRel.html
but when i try to store the result into cql, i get this error:
Message: org.apache.pig.backend.executionengine.ExecException: ERROR
2118: Could not get input splits
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
any ideas whats happening? i read some answers here, referring to changing my PIG_PARTITIONER to Murmur3Partitioner
which i already did and it still happens. is it configuration issue?
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
I found out that after doing:
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
i need to do source ~/.bashrc and do pig from that particular console.
though I get another error, but I think this case is solved.
Related
I am trying to generate extracted_queries.json file using persistgraphql cli utility but when I am running the following command
persistgraphql src/app/ --add_typename
it results in this error
Unable to process input path src/app/. Error message:
Syntax Error GraphQL request (1:1) Unexpected <EOF>
1:
^
How can I fix this?
After looking for answers for a while I am able to make it work.
In my case, we have queries and mutations written in js files so below command worked for me
persistgraphql src/app --js --extension=js --add_typename
My python version is 3.7, and after I ran pip3 install happybase, I started the command hbase thrift start and tried to write a brief .py file as following:
import happybase
connection = happybase.Connection('master')
table = connection.table('jmlr') #'jmlr' is a table in hbase
for i in table.scan():
print(i)
table.put('001', {'title':'dasds'}) #error here
connection.close()
When it's about to run table.put(), it reported such an error:
thriftpy2.transport.base.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')
And at the same time, the thrift reported an error:
ERROR [thrift-worker-1] thrift.TBoundedThreadPoolServer: Error occurred during processing of message. java.lang.IllegalArgumentException: Invalid famAndQf provided.
But just now I ran this python file again, it gave me a different error in thrift:
thrift.TBoundedThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Bad version in readMessageBegin
I have tried to add parameters like protocol='compact', transport='framed', but this didn't work, even the table.scan() failed.
Everything in the hbase shell is OK, so I can't figure out what went wrong, I'm about to collapse.
I ran into the same issue and found this sollution. You need to add even empty Column Qualifier ( ':' symbol as delimiter between Column Family and Column Qualifier) into put() method:
table.put('001:', {'title':'dasds'})
Also, you have a different error message after second run of script because thrift server is already failed.
I hope it will help you.
I'm using Hortonworks sandbox and trying to run a simple pig script. There appear to be annoying error related to "file does not exist".
Below is the script:
REGISTER '/piggybank.jar';
inp = load '/my.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage..
ERROR 2997: Encountered IOException. File does not exist:
hdfs://sandbox.hortonworks.com:8020/tmp/udfs/ '/piggybank.jar'
However, my jar is present at the root(/) and I have given proper permission as well. Don't know why the path is pointing to /tmp/udfs....
Can anyone provide some suggestion?
Do not place the path within quotes. Also provide full URI of the Jar file location.
REGISTER hdfs://sandbox.hortonworks.com:8020/piggybank.jar;
Refer REGISTER (a jar/script).
I am getting following error while running following pig script
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/avro.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/json-simple-1.1.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/jackson-core-asl-1.8.8.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/jackson-mapper-asl-1.8.8.jar
REGISTER /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar
list_cookies = LOAD '/user/xyz/testbed/llama-2014-Oct-12d/abc'
USING org.apache.pig.piggybank.storage.avro.AvroStorage();
got following error
2014-10-22 11:51:14,705 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc org.apache.pig.builtin.AvroStorage
Details at logfile: /home/xyz/pig_1413991623605.log
In my case, it was simply the fact that the input folder did not exist. Pig error messages are off the mark and not at all helpful. After changing the input folder to one that existed, this error went away. So, be sure to check that before spending a lot of time more difficult debugging!
I am following the steps mentioned on the AWS to use an interactive Hive session using SSH.
I used the following resources
https://github.com/ucbtwitter/getting-started/wiki/Using-Elastic-Map-Reduce-via-Command-Line
http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/SignUp.html
I was getting this error initially "Error: Missing key access-id" and then I fixed my JSON file. The JSON file is in the same format as mentioned in the above links.
When I run this command
./elastic-mapreduce
I am getting the following error :-
Error: Unable to parse credentials.json: can't convert String into Integer.
I checked the values required in JSON at AWS as well.
Does anyone has an idea why am I getting this error?
The region value in the credentials.json must be of int type.
{......
......
"region": 1
}