oozie-db > why sa.wf_jobs contains only a few rows - derby

I am trying to access Oozie's provided derby with ij. According to my understanding all the workflow info is stored in table SA.WF_JOBS, but when try to select all from table ``SA.WF_JOBS` there are only a few rows even tough actually thousands of workflows are generated already. Anyone has any idea why was it like that?

Related

Creating Hive View - Turn off metadata lookup from Hive Metastore

Is it possible to create a hive view on top of a nonexistent hive table or views?. This ability will help us deploy the hive DDL without any order at the time of refresh (migrating tables or views from one environment to another). In our environment, we have views built on top of another view. If we deploy them in any order, with the default setup some of the views may fail saying the underlying table/view doesn't exist. Looking to see if we can turn off the metadata lookup from hive metastore so that the type checks are not done at the time of view creation. It can be enforced after the deployment or at the time of querying the view for data retrieval because by that time all the views/tables will be completely deployed and there won't be any type checking related errors.
I checked on the internet for pointers but I couldn't find any. Any suggestions in this regard will be helpful to us.
Thanks in advance.
Add IF NOT EXISTS to all create statements and run all several times until errors disappear.
If executed 2 times in the wrong order like this, second run will succeed without any error:
drop view if exists my_view;
create view if not exists my_view as select from table1; --fails first time, succeeds on second run
drop table if exists table1;
create table if not exists table1(id int);

Oracle Data Integrator- ODI 12.2.1--Loadplan Issue no of records count issue

I come across a scenario in my project.I am loading data from file to Table using ODI.I am running My interfaces through loadplan.I've 1000 Records in my source file,and also getting 1000 records in target file.but when I'm checking ODI loadplan execution log its showing number of insert is 2000.can anyone please help.or is it a ODI bug.?
The number of inserts does not only show the inserts in the target table but also all the insert happening in temporary tables. Depending on the knowledge modules (KMs) used in an interface, ODI might load data in a C$_ table (LKM) or I$_ table (IKM/CKM). The rows loaded in these table will also be counted.
You can look at the code generated in the operator to check if your KMs are using using these temporary. You can also simulate an execution to see the code generated.

Oracle Table Queried or Modified Date

I’ve been tasked with doing some housekeeping on an Oracle schema I have access to. In a nutshell, I’d like to drop any tables that have not been ‘used’ in the last 3 months (tables that haven’t been queried or had data manipulated in the last 3 months). I have read/write access to the schema but I’m not a DBA; I run relatively basic DML/DDL queries in Oracle.
I’m trying to figure out if there’s a way for me to identify old/redundant tables; here’s what I’ve tried (mostly unsuccessfully)
USER_TABLES was my first port of call, but the LAST_ANALYZED date in this table doesn’t seem to be the last modified/queried date I’m looking for
Googling has brought DBA_Hist tables to my attention, I’ve tried querying some of these (i.e. DBA_HIST_SYSSTAT) but I’m confronted with (ORA-00942: table or view does not exist)
I’ve also tried querying V$SESSION_WAIT, V$ACTIVE_SESSION_HISTORY and V$SEGMENT_STATISTICS, but I get the same ORA-00942 error
I’d be grateful for any advice about whether the options above actually offer the sort of information I need about tables, and if so what I can do to work around the errors I’m getting. Alternatively, are there any other options that I could explore?
Probably the easiest thing to do, to be 100% sure, is to enable auditing on the Oracle tables that you're interested in (possibly all of them). Once enabled, Oracle has an audit table (dba_audit_trail) that you can query to find if the table(s) have been accessed. You can enable auditing by issuing: AUDIT on . BY SESSION;
I chose "by session" so that you only get a single record per session, no matter how many times the session performs the operation (to minimize the records in the audit table).
Example:
audit select on bob.inventory by session;
Then you can query the dba_audit_trail after some time passes to see if any records show up for that table.
You can disable auditing by issuing the "noaudit" command.
Hope that helps.
-Jim

Why we need to move external table to managed hive table?

I am new to Hadoop and learning Hive.
In Hadoop definative guide 3rd edition page no. 428 last paragraph
I don't understand below paragraph regarding external table in HIVE.
"A common pattern is to use an external table to access an initial dataset stored in HDFS (created by another process), then use a Hive transform to move the data into a managed Hive table."
Can anybody explain briefly what above phrase says?
Usually the data in the initial dataset is not constructed in the optimal way for queries.
You may want to modify the data (like modifying some columns adding columns, making aggregation etc) and to store it in a specific way (partitions / buckets / sorted etc) so that the queries would benefit from these optimizations.
The key difference between external and managed table in Hive is that data in the external table is not managed by Hive.
When you create external table you define HDFS directory for that table and Hive is simply "looking" in it and can get data from it but Hive can't delete or change data in that folder. When you drop external table Hive only deletes metadata from its metastore and data in HDFS remains unchanged.
Managed table basically is a directory in HDFS and it's created and managed by Hive. Even more - all operations for removing/changing partitions/raw data/table in that table MUST be done by Hive otherwise metadata in Hive metastore may become incorrect (e.g. you manually delete partition from HDFS but Hive metastore contains info that partition exists).
In Hadoop definative guide I think author meant that it is a common practice to write MR-job that produces some raw data and keeps it in some folder. Than you create Hive external table which will look into that folder. And than safelly run queries without the risk to drop table etc.
In other words - you can do MR job that produces some generic data and than use Hive external table as a source of data for insert into managed tables. It helps you to avoid creating boring similar MR jobs and delegate this task to Hive queries - you create query that takes data from external table, aggregates/processes it how you want and puts the result into managed tables.
Another goal of external table is to use as a source data from remote servers, e.g. in csv format.
There is no reason to move table to managed unless you are going to enable ACID or other features supported only for managed tables.
The list of differences in features supported by managed/external tables may change in future, better use current documentation. Currently these features are:
ARCHIVE/UNARCHIVE/TRUNCATE/MERGE/CONCATENATE only work for managed
tables
DROP deletes data for managed tables while it only deletes
metadata for external ones
ACID/Transactional only works for
managed tables
Query Results Caching only works for managed
tables
Only the RELY constraint is allowed on external tables
Some Materialized View features only work on managed tables
You can create both EXTERNAL and MANAGED tables on top of the same location, see this answer with more details and tests: https://stackoverflow.com/a/54038932/2700344
Data structure has nothing in common with external/managed table type. If you want to change structure you do not necessarily need to change table managed/external type
It is also mentioned in the book.
when your table is external table.
you can use other technologies like PIG,Cascading or Mapreduce to process it .
You can also use multiple schemas for that dataset.
and You can also create data lazily if it is external table.
when you decide that dataset should be used by only Hive,make it hive managed table.

Attempting to use SQL-Developer to analyze a system table dump created with 'exp'

I'm attempting to recover the data from a specific table that exists in a system table dump I performed earlier. I would like to append the rows existing in the dump to any rows that may exist in the active table. The problem is, it's likely that the name of the table in the dump is not the same as what exists in the database currently (They're dynamically created with a prefix of ARC_TREND_). In addition, I don't know the name of the table as it exists in the dump, I was hoping to use SQL Developer to analyze the dump file as I can recognize the correct table by it's columns and it's existing rows.
While i'm going on blind faith that SQL Developer can work with my dump file, when attempting to open it, i'm getting a Java Heap OutOfMemory exception raised. I've adjusted the maximum heap size from 640m to 1024m in both sqldeveloper.bat and in sqldeveloper.conf, but to no avail.
Can someone recommend a course of action for me to take to recover the data from a table which exists in a exp created dump file? A graphical tool would be nice, but I'm no stranger to the command line. I need to analyze the tables that exist in the dump in order to pick the correct one out. Then I assume I can use imp TABLE= to bring it back into the active instance. It likely won't match the existing table name, so I will use SQL Developer to copy the rows from the imported table to the table where I need them to be.
The dump was taken from a Linux server running 10g, and will be imported to (the same server & database instance, upgraded) an 11g instance of the same database.
Thanks
Since you're referring to imp rather than impdp, I assume this wasn't exported with data pump. Either way, I doubt you'll get anything useful through SQL Developer.
Fortunately most of what you're trying to do is quite easy from the command line; just run imp with the INDEXFILE parameter, which will give you a text file containing all the table (commented out with REM) and index creation commands. From that you should be able to spot the table from its column names.
You can't really see any row data though, so if there's more than one possible match you might need to import several tables and inspect the data in them in the database to see which one you really want.

Resources