Creating index in hive 0.9 - hadoop

I am trying to create index on tables in Hive 0.9. One table has 1 billion rows, another has 30 Million rows. The command I used is (other than creating the table and so on)
CREATE INDEX DEAL_IDX_1 ON TABLE DEAL (ID) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD;
alter index DEAL_IDX_1 ON DEAL rebuild;
set hive.optimize.autoindex=true;
set hive.optimize.index.filter=true;
For the 30 Mill. row table, the rebuilding process looks alright (mapper and reducer both finished) until in the end it prints
Invalid alter operation: Unable to alter index.
FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.DDLTask
Checking the log, and it had the error
java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver"
Not sure why this error was encountered, but anyway, I added the derby-version.jar:
add jar /path/derby-version.jar
The reported error was resolved, but still got another error:
org.apache.hadoop.hive.ql.exec.FileSinkOperator:
StatsPublishing error: cannot connect to database
Not sure how to solve the problem. I do see the created index table under hive/warehouse though.
For the 1 Billion row table, it is another story. The mapper just got stuck at 2% or so. And error showed
FATAL org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space
I attempted to enforce max heap size, as well as max mapr memory (see the settings mentioned somewhere but not in hive's configuration settings):
set mapred.child.java.opts = -Xmx6024m
set mapred.job.map.memory.mb=6000;
set mapred.job.reduce.memory.mb=4000;
However, this is not help. The mapper would still got stuck at 2% with the same error.

I had a similar problem of the index creating and in the hive/warehouse, but the process as a whole failing. My index_name was TypeTarget (yours is DEAL_IDX_1) and after many days of trying different approaches, making the index_name all lowercase (typetarget) fixed the issue. My problem was in Hive 0.10.0.
Also, the class not found and StatsPublishing issue is because by default, hive.stats.autogather is turned on. Turning that off (false) in hive-site.xml should get rid of those issues.
Hopefully this helps anyone looking for a quick fix.

Related

ODI-1228: Task Load data-LKM SQL to Oracle- fails on the target > connection

I'm working with Oracle Data Integrator inserting information from original source to temp table (BI_DSA.TMP_TABLE)
ODI-1228: Task Load data-LKM SQL to Oracle- fails on the target
connection BI_DSA. Caused By: java.sql.BatchUpdateException:
ORA-12899: value too large for column
"BI_DSA"."C$_0DELTA_TABLE"."FIELD" (actual: 11, maximum: 10)
I tried changing the lenght of 'FIELD' to more than 10 and reverse engineering but it didn't work.
Is this error coming from the original source? I'm doing a replica so I just have view privileges on it and I believe so because is the C$ table where the error comes from.
Thanks for the help!
Solution: I tried with the length option before like the answers suggested but didn't work, I noticed the orginal source modified their field lenght so I reverse enginereed source table and problem solved.
Greetings!
As Bobby mentioned in the comment it might come from the byte/char semantics.
The C$ tables created by the LKMs usually copy the structure of the source data. So a workaround would be to go in the model and manually increase the size of the FIELD column in the source datastore (even if it doesn't represent what is in the database). The C$ table will be created whith that size on the next run.

Duplicate Indexes error using Informatica 10.4

While running an informatica mapping in v10.4 I'm getting the following error.
The mapping essentially calls a complex stored procedure in Oracle to "swap out" a temporary file to a partitioned fact table.
CMN_1022 [
ORA-20014: FINISH_SP: ORA-20010: Duplicate Indexes: ORA-12801: error signaled in parallel query server P00I
ORA-06512: at "DIMDW.FACT_EXCHANGE_PARTITION_PKG", line 1650
ORA-20010: Duplicate Indexes: ORA-12801: error signaled in parallel query server P00I
ORA-06512: at "DIMDW.FACT_EXCHANGE_PARTITION_PKG", line 1292
ORA-12801: error signaled in parallel query server P00I
ORA-28604: table too fragmented to build bitmap index (172073921,57,56)
ORA-06512: at "DIMDW.FACT_EXCHANGE_PARTITION_PKG", line 1277
ORA-06512: at "DIMDW.FACT_EXCHANGE_PARTITION_PKG", line 1277
ORA-06512: at "DIMDW.FACT_EXCHANGE_PARTITION_PKG", line 1593
I do not know what this error means to Informatica.
Can anyone help me decipher it SPECIFIC TO INFORMATICA
The problem is specific to Oracle, so not sure how to make the answer specific to Informatica, especially without being able to see the details of what the workflow is trying to do.
The ORA-20014: FINISH_SP: ORA-20010: Duplicate Indexes: error is a custom message from the application code. The real key appears to be here: "ORA-28604: table too fragmented to build bitmap index (172073921,57,56)"
It looks like Informatica is attempting to build an index - indirectly through the DIMDW.FACT_EXCHANGE_PARTITION_PKG package - and the process is throwing an error. A simple Google search on ORA-28604 yields the following:
ORA-28604: table too fragmented to build bitmap index (%s,%s,%s)
*Cause: The table has one or more blocks that exceed the maximum number
of rows expected when creating a bitmap index. This is probably
due to deleted rows. The values in the message are:
(data block address, slot number found, maximum slot allowed)
*Action: Defragment the table or block(s). Use the values in the message
to determine the FIRST block affected. (There may be others).
Since this involves the physical fragmentation of the data in the Oracle database, you will almost certainly need to get the DBA involved to troubleshoot this further. Your Informatica workflow likely isn't going anywhere until this is corrected in the database.

Insufficient memory error in proc sort

My data is stored in Oracle table MY_DATA. This table contains only 2 rows with 7 columns. But when I execute step:
proc sort data=oraclelib.MY_DATA nodupkey out=SORTED_DATA;
by client_number;
run;
the following error appears:
ERROR: The SAS System stopped processing this step because of insufficient memory.
If I comment nodupkey option then error disappears. If I copy dataset in work library and execute proc sort on it then everything is OK too.
My memory options:
SORTSIZE=1073741824
SUMSIZE=0
MAXMEMQUERY=268435456
LOADMEMSIZE=0
MEMSIZE=31565617920
REALMEMSIZE=0
What can be the root of the problem and how can I fix it?
My Oracle password was in grace period and when I changed it the issue disappeared.

Hive out of memory even with two rows

I tested Hive with the following queries:
create table test (key string, value string) stored as orc;
insert into table test values ('a','a'), ('b','b');
select key, count(*) from test group by key;
And I got the out-of-memory error:
Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
I have searched online, but people usually got this error when they were working on some bigger files. In my case, the file only has two rows, and my computer has 14G memory.
I have set /etc/hadoop/conf/hadoop-env.sh HADOOP_HEAPSIZE to 1024. It does not work.
First I increased tez.runtime.io.sort.mb, but I got this error instead: tez.runtime.io.sort.mb should be larger than 0 and should be less than the available task memory
Then I increased hive.tez.java.opts (and some other parameters) as suggested by #Hellmar Becker. That fixed the problem.
I got the same error while creating the truck table as ORC in this Hadoop Hello World tutorial. You can try to compress the ORC storage using:
CREATE TABLE XXX STORED AS ORC TBLPROPERTIES ("orc.compress.size"="1024");
I hope this helps (for me, it worked).
They also agree that its issue in the Sandbox...
https://community.hortonworks.com/questions/34426/failure-to-execute-hive-query-from-lab-2.html#comment-35900
Tried many solutions , not working . Time being using this work around -
CREATE TABLE avg_mileage (truckid STRING,avgmpg BIGINT ) STORED AS ORC;

Can someone explain ORA-29861 error in plain english and its possible cause?

I have an application implemented in Grails framework using underlying Hibernate. After it runs for a while, I got an Oracle DB error and resolved it by rebuilding the offending index. I wonder if anyone can propose the possible cause(s) and ways to prevent it from happening.
Caused by:
org.springframework.jdbc.UncategorizedSQLException:
Hibernate operation: Could not execute JDBC batch update;
uncategorized SQLException for SQL [update RSS_ITEM set guid=?,
pubdate=?, link=?, rss_source_id=?, title=?, description=?,
rating_raw=?, rating_tuned=?, date_created=?, date_locked=? where
RSS_ITEM_ID=?]; SQL state [99999]; error code [29861]; ORA-29861:
domain index is marked LOADING/FAILED/UNUSABLE
; nested exception is java.sql.BatchUpdateException:
ORA-29861:
domain index is marked LOADING/FAILED/UNUSABLE
To locate broken index use:
select index_name,index_type,status,domidx_status,domidx_opstatus from user_indexes where index_type like '%DOMAIN%' and (domidx_status <> 'VALID' or domidx_opstatus <> 'VALID');
To rebuild the index use:
alter index INDEX_NAME rebuild;
Domain indexes are a special type of index. It is possible to build our own using OCI but the chances are you're using one of the index types offered by Oracle Text. I say this as your table seems to include free text columns.
The most commonly used Text index is the CTXSYS.CONTEXT index type. The point about this index type is that it is not maintained transactionally, so as to minimize the effort involved in indexing large documents. This means when you insert or update a document into your table it is not indexed immediately. Instead is that a background process, such as a database job, which kicks off the index synchronization on a regular basis. The index is unusable while it is being synchronized. If the resync fails for any reason then you will need to drop and recreate the index.
Is this a regular occurrence for you? If so you may need to re-appraise your application. Perhaps a different sort of index (such as CTXSYS.CTXCAT) might be more appropriate. One thing which strikes me about your error message is that your UPDATE statement touches a lot of columns, including what looks like the primary key. This makes me think you have a single generic update statement which sets every column regardless of whether it has actually changed. This is bad practice with normal indexes; it will kill your application if you are using text indexes.
http://ora-29861.ora-code.com/
Cause: An attempt has been made to access a domain index that is being
built or is marked failed by an
unsuccessful DDL or is marked unusable
by a DDL operation.
Action: Wait if the specified index is marked LOADING Drop the
specified index if it is marked FAILED
Drop or rebuild the specified index if
it is marked UNUSABLE.
That should hopefully be enough context. Can you figure out the problem from that?

Resources