Unable to insert a valid json in hive, getting a MismatchedTokenException - hadoop

I am getting MismatchedTokenException on executing query as below:
0: jdbc:hive2://localhost:10000> INSERT INTO TABLE test_data
. . > VALUES ('s92bd2d2u922432c43', 'd93d2e03422f234',
. . > '{"Foo": "ABC","Bar": "20090101100000","Quux": {"QuuxId": 1234,"QuuxName":
. . > "Sam it doen't matter"}}');
Error: Error while compiling statement: FAILED: ParseException line 3:88 mismatched
input 't' expecting ) near ''{"Foo": "ABC","Bar": "20090101100000","Quux": {"QuuxId":
1234,"QuuxName": "Sam it doen'' in statement (state=42000,code=40000)
It seems due to extra ' in sentence "Sam it doen't matter".. it's failing.
But this is a valid json. How this can be resolved ?

It looks like that extra ' is terminating the string from Hive's perspective, so it doesn't matter if it's valid JSON because it doesn't get a chance to pass it along to whatever is going to parse the JSON. You can escape the ' from the Hive command parser using a \ similar to:
select get_json_object('{"Test":"This isn\'t a test"}','$');

Related

PIG: Unable to open iterator for alias AliasName.Scalar has more than one row in the output

I am new to pig and trying to learn on my own.
I have written a script to get the epoch time with a word that is reading from words.txt file.
Here is the script.
words = LOAD 'words.txt' AS word:chararray;
B = FOREACH A GENERATE CONCAT(CONCAT(A.word,'_'),(chararray)ToUnixTime(CurrentTime());
dump B;
But the issue is, if words.txt file have only one word it is giving proper output.
If it is having multiple words like
word1
word2
word3
word4
then it is giving the following error
ERROR 1066: Unable to open iterator for alias B
java.lang.Exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Scalar has more than one row in the output. 1st : (word1 ), 2nd :(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar"
should be "foo::bar" ) at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
0: Scalar has more than one row in the output. 1st : (word1 ), 2nd
:(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar"
should be "foo::bar" ) at
org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:122) at
o
Please suggest me to solve this issue.
Thank you.
solved on my own.
just removed the A. from the inner CONCAT. It worked for me.
script:
words = LOAD 'words.txt' AS word:chararray;
B = FOREACH A GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime());
dump B;

remove some lines in log file

I have a big log file.
After removing the timestamp of each line, I sort it by cat logfile | sort -u > logfile, so that the logs are clean and organized as
failed to correct PL.ASBF..HHZ.2011.348 because of divided by zero
failed to correct PL.ASBF..HHZ.2011.349 because of divided by zero
failed to correct PL.ASBF..HHZ.2011.350 because of divided by zero
.
. (lines not shown here)
.
failed to correct PL.ASBF..HHZ.2015.364 because of divided by zero
failed to correct PL.ASBF..HHZ.2015.365 because of divided by zero
.
.
. (lines not shown here)
.
.
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.129 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.130 because of Illegal format
.
. (lines not shown here)
.
failed to correct PL.HSPB..HHZ.2014.364 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
I can get the logged items (e.g. PL.HSPB in above example) by
grep -oE " [0-9A-Z]*\.[0-9A-Z]*" logfile | sort -u
However, I also want to known the date info and to make it clearer, I want to remove the intermedia lines. For example,
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.129 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.130 because of Illegal format
.
. (lines not shown here)
.
failed to correct PL.HSPB..HHZ.2014.364 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
after removal becomes
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
i.e., for an item, only the first and last lines are kept (the digits are year and julian day).
Is there any shell command to make this with easy?
Script:
$ cat hhz.py
#!/usr/bin/env python
import sys, re
from collections import OrderedDict
undateds = set()
firsts = OrderedDict()
lasts = OrderedDict()
while True:
line = sys.stdin.readline()
if line == '':
break
line = line.rstrip("\n")
x = re.match("(.*HHZ\.)([0-9][0-9][0-9][0-9]\.[0-9]+)( .*)", line)
if x is None:
continue
before = x.group(1)
during = x.group(2)
after = x.group(3)
undated = re.sub("(.*HHZ\.)[0-9][0-9][0-9][0-9]\.[0-9]+ (.*)", line, before+after)
if not undated in firsts:
firsts[undated] = line
lasts[undated] = line
for undated in firsts:
first = firsts[undated]
last = lasts[undated]
print first
if first != last:
print last
Input:
$ cat hhz.dat
failed to correct PL.ASBF..HHZ.2011.348 because of divided by zero
failed to correct PL.ASBF..HHZ.2011.349 because of divided by zero
failed to correct PL.ASBF..HHZ.2011.350 because of divided by zero
failed to correct PL.ASBF..HHZ.2015.364 because of divided by zero
failed to correct PL.ASBF..HHZ.2015.365 because of divided by zero
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.129 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.130 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.130 because of Something else
failed to correct PL.HSPB..HHZ.2014.364 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
Output:
$ hhz.py < hhz.dat
failed to correct PL.ASBF..HHZ.2011.348 because of divided by zero
failed to correct PL.ASBF..HHZ.2015.365 because of divided by zero
failed to correct PL.HSPB..HHZ.2011.128 because of Illegal format
failed to correct PL.HSPB..HHZ.2014.365 because of Illegal format
failed to correct PL.HSPB..HHZ.2011.130 because of Something else
Group things by regexing out the date part. The undated is the uniqified name.
Get first in group by doing an ordered-dict put if not already set.
Get last in group by doing ordered-dict put unconditionally.
Use OrderedDict to preserve input-file ordering (use dict if you don't want that)
Check first != last to avoid printing the same thing twice in case there is only one item in the group

How to process multi - delimiter file in pig 0.8

I have input text file( name multidelimiter) with followings records
1,Mical,2000;10
2,Smith,3000;20
I have written pig code as follows
A =LOAD '/user/input/multidelimiter' AS line;
B = FOREACH A GENERATE FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)[,](.*)[,](.*)[;]')) AS (f1,f2,f3,f4);
But this code in not work given following error
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 78. Encountered: <EOF> after : "\'(.*)[,](.*)[,](.*)[;"
I refereed following links but not able to resolve my error
how to load files with different delimiter each time in piglatin
Please help me get out from this error.
Thanks.
Solution for your input example:
LOAD as comma separated, than STRSPLIT by ';' and FLATTEN
Finally got solution.
Here is my solution:
A =LOAD '/user/input/multidelimiter' using PigStorage(',') as (empid,ename,line);
B = FOREACH A GENERATE empid,ename, FLATTEN( REGEX_EXTRACT_ALL( line,'(.*)\\u003B(.*)')) AS (sal:int,deptno:int);

Can Execute my query in database but when trying in Birt exception aries

when i am trying to execute this query in Birt:
select a.ag_code , COUNT(distinct(a.usr_id)),b.AG_NAME
from photo a,
(select ag_code,AG_NAME from agent
WHERE AG_TYPE = 'AS' AND AG_USEFLAG = 'Y' AND AG_NAME LIKE 'M%')b
where a.upload_time BETWEEN TO_DATE('20131116000000','yyyymmddhh24miss')
AND TO_DATE('20131129235959','yyyymmddhh24miss')
and a.status = 'S'
and a.ag_code = b.ag_code
group by a.ag_code,b.AG_NAME
order by a.Ag_CODE,b.AG_NAME;
This exception arises:
org.eclipse.birt.report.engine.api.EngineException: Error happened while running the report.
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.doRun(DatasetPreviewTask.java:318)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.runDataset(DatasetPreviewTask.java:280)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.execute(DatasetPreviewTask.java:91)
at org.eclipse.birt.report.designer.data.ui.dataset.DataSetPreviewer.preview(DataSetPreviewer.java:68)
at org.eclipse.birt.report.designer.data.ui.dataset.ResultSetPreviewPage$5.run(ResultSetPreviewPage.java:366)
at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:121)
Caused by: org.eclipse.birt.data.engine.odaconsumer.OdaDataException: Cannot get the result set metadata.
org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object.
SQL error #1:ORA-00911: invalid character
;
java.sql.SQLException: ORA-00911: invalid character
at org.eclipse.birt.data.engine.odaconsumer.ExceptionHandler.newException(ExceptionHandler.java:52)
at org.eclipse.birt.data.engine.odaconsumer.ExceptionHandler.throwException(ExceptionHandler.java:108)
at org.eclipse.birt.data.engine.odaconsumer.ExceptionHandler.throwException(ExceptionHandler.java:84)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.getRuntimeMetaData(PreparedStatement.java:414)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.getProjectedColumns(PreparedStatement.java:377)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.doGetMetaData(PreparedStatement.java:347)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.execute(PreparedStatement.java:563)
at org.eclipse.birt.data.engine.executor.DataSourceQuery.execute(DataSourceQuery.java:972)
at org.eclipse.birt.data.engine.impl.PreparedOdaDSQuery$OdaDSQueryExecutor.executeOdiQuery(PreparedOdaDSQuery.java:503)
at org.eclipse.birt.data.engine.impl.QueryExecutor.execute(QueryExecutor.java:1208)
at org.eclipse.birt.data.engine.impl.ServiceForQueryResults.executeQuery(ServiceForQueryResults.java:233)
at org.eclipse.birt.data.engine.impl.QueryResults.getResultIterator(QueryResults.java:178)
at org.eclipse.birt.data.engine.impl.QueryResults.getResultMetaData(QueryResults.java:132)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.extractQuery(DatasetPreviewTask.java:352)
at org.eclipse.birt.report.engine.api.impl.DatasetPreviewTask.doRun(DatasetPreviewTask.java:309)
... 5 more
Caused by: org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object.
SQL error #1:ORA-00911: invalid character
;
java.sql.SQLException: ORA-00911: invalid character
at org.eclipse.birt.report.data.oda.jdbc.Statement.executeQuery(Statement.java:481)
at org.eclipse.birt.report.data.oda.jdbc.Statement.getMetaUsingPolicy1(Statement.java:420)
at org.eclipse.birt.report.data.oda.jdbc.Statement.getMetaData(Statement.java:316)
at org.eclipse.birt.report.data.oda.jdbc.bidi.BidiStatement.getMetaData(BidiStatement.java:56)
at org.eclipse.datatools.connectivity.oda.consumer.helper.OdaQuery.doGetMetaData(OdaQuery.java:423)
at org.eclipse.datatools.connectivity.oda.consumer.helper.OdaQuery.getMetaData(OdaQuery.java:390)
at org.eclipse.birt.data.engine.odaconsumer.PreparedStatement.getRuntimeMetaData(PreparedStatement.java:407)
... 16 more
Caused by: java.sql.SQLException: ORA-00911: invalid character
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:330)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:287)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:744)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:218)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:812)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1048)
at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:853)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1153)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3369)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3414)
at org.eclipse.birt.report.data.oda.jdbc.Statement.executeQuery(Statement.java:477)
... 22 more
Try aliasing your distinct count column - like so:
select a.ag_code , COUNT(distinct(a.usr_id)) as distinct_users, b.AG_NAME
...
You need to remove the ";" at the end of your query. Adding the ";" might work in most tools (like PL/SQL Developer), but those tools are removing the ";" before sending it to oracle.
This section of your error message
Caused by: org.eclipse.birt.data.engine.odaconsumer.OdaDataException: Cannot get the result set metadata.
org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object.
SQL error #1:ORA-00911: invalid character
Indicates your SQL is bad, Mark Bannister, has suggested a solution. Dependining on how bad your SQL is this part of the error message can be more helpfull, callling out specific areas to review.

How to properly use Alias in Codeigniter

Here is my code:
$this->db->select('course_name AS Course Name,course_desc AS Course Description,display_public AS Display Status',FALSE);
$this->db->from('courses');
$this->db->where('tennant_id',$tennant_id);
$this->db->order_by('course_name','ASC');
$query = $this->db->get();
and I got an error:
A Database Error Occurred
Error Number: 1064
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Name, course_desc AS Course Description, display_public AS Display Status FROM (' at line 1
and I got an error:
A Database Error Occurred
Error Number: 1064
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Name, course_desc AS Course Description, display_public AS Display Status FROM (' at line 1
SELECT course_name AS Course Name,
course_desc AS Course Description,
display_public AS Display Status
FROM (`courses`) WHERE `tennant_id` = 'elicuarto#apploma.com'
ORDER BY `course_name` ASC
Filename: C:\wamp\www\coursebooking\system\database\DB_driver.php
Line Number: 330
Try
$this->db->select('course_name AS `Course Name`, course_desc AS `Course Description`, display_public AS `Display Status`', FALSE);
It's the space in your alias that is messing with you.
UPDATE
I'm not sure why you would want to, but I see nothing preventing you from writing
$this->db->select("course_name AS `{$variable}`", FALSE);
(showing just one field for simplicity)
UPDATE 2
Should be standard string conversion so I don't know why it doesn't work for you.. there's always split strings...
$this->db->select('course_name AS `' . $variable . '`', FALSE);

Resources