hive enable describe statement on restricted tables - hadoop

When using a big-data tool like HIVE usually a select * from mytable works,
However, for a user which is only allowed to view specific columns the describe table statement and also integration for tools like Tableau are broken as these no longer can view all (or all allowed) columns as DESCRIBE TABLE is no longer possible i.e. denied in ranger.
Is there a workaround to re-enable the DESCRIBE statement?
Currently, I see that manually creating a masked view.

It turns out that masking the tables in ranger is sufficient to restrict access, but retain the ability to execute the DESCRIBE statement.
Currently, this is a bit tedious as all columns need to be specified manually - probably atlas & tagging would be a more efficient solution.

Related

How to write a generalized trigger for a set of tables?

My aim is to create triggers to few tables upon updation or deletion of entries from these tables.
The trigger should enter the name and columns of the corresponding updated/deleted table to another user table.
Instead of writing individual triggers for each table is it possible to write a single trigger ?
Instead of writing individual triggers for each table is it possible to write a single trigger?
[TL;DR] No
The CREATE TRIGGER syntax is:
Syntax
create_trigger ::=
plsql_trigger_source ::=
simple_dml_trigger ::=
dml_event_clause ::=
As you can see from the syntax diagram, a CREATE TRIGGER for simple DML will be in the format:
CREATE TRIGGER trigger_name
AFTER UPDATE OR DELETE ON table_name
The syntax requires a single table/view identifier to be specified for each trigger.
As #MT0 points out, a trigger is tied to a table so you'd need to create separate triggers for each table.
On the other hand, if you really want to do this, you can write code that dynamically generates the triggers you want for a number of different tables. This is generally a lot more work initially but if you really want to create a bunch of different triggers with the same basic logic, that initial upfront investment may be reasonable.
You could, for example, write a bit of dynamic SQL that creates the triggers you want based on the data dictionary. See this fiddle for an example. I haven't spent any time on niceties like making the generated trigger code particularly easy for a human to read or on fleshing out the requirements on what you actually want to write to the log table. For example, most people would want to check whether the :new value of a column was different from the :old value of a column to see whether it was actually changed rather than using the updating function. I'd also guess that your actual requirements involve writing the actual values that were updated or deleted to the log table which gets complicated based on things like the data types you support and the structure of your log table.
Personally, I always look a bit askance at this sort of requirement. Oracle has lots of built-in functionality that make repetitive triggers less than ideal. For example, I'd much rather enable flashback data archive for whatever tables you want to track than to deal with the overhead of a bunch of triggers.

Invoking an select script through ODI (Oracle Data Integrator)

May I have your opinion on below queries please:
Option 1:
I have select script handy with me which fetch data by joining many source tables and performs some transformations like aggregations (group by), data conversion, sub-string etc.
Can I invoke this script through ODI mapping and return results (transformed data output) can be inserted into target of ODI mapping ?
Option2:
Convert the select script into equivalent ODI mapping by using equivalent ODI transformations , functions , look ups etc and use various tables (tables in join clause) as source of mappings.
Basically develop ODI mapping which is equivalent to provided select script plus a target table to insert records into it.
I need to know pros and cons of both options in above (if option 1 is possible).
Is it still possible to track transformation errors, join source tables and where clause condition related errors etc through ODI with option 1?
Log file for mapping failure will have as granular level details as offered by option 2?
Can I still enable Flow Control at Knowledge Module and redirect select script errors into E$_ error tables provided by ODI?
Thanks,
Rajneesh
Option 1: ODI 12c includes that concept out of the box. On the physical tab of a mapping, click on the source node (datastore). Then in the properties pane, there is the CUSTOM_TEMPLATE option under "Extract Options" menu. This allows to enter a custom SQL statement that will be used instead of the code generated by ODI.
However it is probably less maintainable over time than option 2. SQL is less visual than mapping components. Also if you need to bulk change it, it will be trickier. Changing a component in several mappings can be done with the SDK. Changing SQL code would require to parse it. You might indeed have less information in your operator logs as the SQL would be seen as just one block of code. It also wouldn't provide any lineage.
I believe using Flow Control would work but I haven't tested it.
Option 2 would take more time to complete but with that you would benefit from all the functionalities of ODI.
My own preference would be to occasionally use option 1 for really complex SQL queries but to use option 2 for most of the normal use cases.

Dynamically List contents of a table in database that continously updates

It's kinda real-world problem and I believe the solution exists but couldn't find one.
So We, have a Database called Transactions that contains tables such as Positions, Securities, Bogies, Accounts, Commodities and so on being updated continuously every second whenever a new transaction happens. For the time being, We have replicated master database Transaction to a new database with name TRN on which we do all the querying and updating stuff.
We want a sort of monitoring system ( like htop process viewer in Linux) for Database that dynamically lists updated rows in tables of the database at any time.
TL;DR Is there any way to get a continuous updating list of rows in any table in the database?
Currently we are working on Sybase & Oracle DBMS on Linux (Ubuntu) platform but we would like to receive generic answers that concern most of the platform as well as DBMS's(including MySQL) and any tools, utilities or scripts that can do so that It can help us in future to easily migrate to other platforms and or DBMS as well.
To list updated rows, you conceptually need either of the two things:
The updating statement's effect on the table.
A previous version of the table to compare with.
How you get them and in what form is completely up to you.
The 1st option allows you to list updates with statement granularity while the 2nd is more suitable for time-based granularity.
Some options from the top of my head:
Write to a temporary table
Add a field with transaction id/timestamp
Make clones of the table regularly
AFAICS, Oracle doesn't have built-in facilities to get the affected rows, only their count.
Not a lot of details in the question so not sure how much of this will be of use ...
'Sybase' is mentioned but nothing is said about which Sybase RDBMS product (ASE? SQLAnywhere? IQ? Advantage?)
by 'replicated master database transaction' I'm assuming this means the primary database is being replicated (as opposed to the database called 'master' in a Sybase ASE instance)
no mention is made of what products/tools are being used to 'replicate' the transactions to the 'new database' named 'TRN'
So, assuming part of your environment includes Sybase(SAP) ASE ...
MDA tables can be used to capture counters of DML operations (eg, insert/update/delete) over a given time period
MDA tables can capture some SQL text, though the volume/quality could be in doubt if a) MDA is not configured properly and/or b) the DML operations are wrapped up in prepared statements, stored procs and triggers
auditing could be enabled to capture some commands but again, volume/quality could be in doubt based on how the DML commands are executed
also keep in mind that there's a performance hit for using MDA tables and/or auditing, with the level of performance degradation based on individual config settings and the volume of DML activity
Assuming you're using the Sybase(SAP) Replication Server product, those replicated transactions sent through repserver likely have all the info you need to know which tables/rows are being affected; so you have a couple options:
route a copy of the transactions to another database where you can capture the transactions in whatever format you need [you'll need to design the database and/or any customized repserver function strings]
consider using the Sybase(SAP) Real Time Data Streaming product (yeah, additional li$ence is required) which is specifically designed for scenarios like yours, ie, pull transactions off the repserver queues and format for use in downstream systems (eg, tibco/mqs, custom apps)
I'm not aware of any 'generic' products that work, out of the box, as per your (limited) requirements. You're likely looking at some different solutions and/or customized code to cover your particular situation.

IBM Data Stage - How to find database tables used in jobs

For a project we need to investigate an existing installation of IBM Data Stage, doing a whole lot of ETL in loads of jobs.
The job flow diagrams contain lots of tables being used a source (both in MSSQL as well as Oracle), as well as a target (mostly in Oracle).
My question is now
How can I find all database tables used by all jobs in a certain Data Stage Project ?
I looked in Tools - Advanced Find, and there I can see all "table definitions". BUT, most of the tables actually used in jobs do not show up there, as they are defined as what Data Stage calls "Parallel Jobs" which in effect are SQL queries against database tables.
I am particularly interested in locating TARGET tables which are being loaded by a job.
So to put it bluntly, I want to be able to answer the question "Which job loads table XY ?".
If that is not possible, an automated means of extracting all the SQL statements used by the jobs would be an alternative.
We have access to IBM Websphere Data Stage and Quality Stage Designer 8.1
Exporting the jobs creates a text file that details what the job does. Open the export file in a text editor and you should be able to find SQL inserts with a simple search. Start with searching for SQL keywords like 'INTO' and 'FROM'.
Edit: Alternatively, if every table that was used was defined by importing table definitions, you should be able to find the table definition in the folder for its type. This however, will not make it apparent where and how the table was used (which job, insert or select from?), so I would recommend the first method of searching the Export files.

Forcing Oracle to use Primary Key Index without using Hints

We have an application that generates some temporary tables and then processes the data. I dont really have control of the way the application creates this and the subsequent queries involved. What we have noticed is that Oracle uses a full table scan instead of using the index which is the primary key of the tables. If it used the primary key index the process would run a whole lot faster.
Since I do not have control over the select queries generated by the application I cannot use hints and force Oracle to use primary key index. Is there any other setting I could change somewhere that could force Oracle to use primary key index for the temporary tables?
The two most common reasons for a query not using indexes are:
It's quicker to do a full table scan.
Poor statistics.
If your queries are selecting all of the table or doing joins without mentioning a primary key in the where clause etc., chances are it's quicker to do a full scan. Without the query and indexes, and preferably an explain plan as well it's impossible to tell for certain.
I would, however, recommend that you ask your DBA to re-gather - I hope, if not gather for the first time - statistics on the table. Use dbms_stats.gather_table_stats, with an estimate percentage of 25%+.
If the tables are re-created each time the application is run then try and gather statistics after creation and primary key generation. If they are truncated and re-filled each time, then ask your DBA to rebuild them and the PK and then gather statistics as this could significantly increase query runtime.
With no control over anything I don't see how you can improve the query time any other way.
You can use hints without changing SQL by leveraging SQL Profiles. Wrap your hint(s) into a SQL Profile that takes effect for that particular SQL ID.
I understand you don't have control over SQL, I have many apps where I encounter the same restriction. After checking query structure and statistics as in Ben's post and you have proved that hinting to use the index will improve performance why not try a manually created SQL profile.
Christian Antognini has a great paper here about SQL Profiles and creating them manually. The paper mentions creating SQL Profiles manually is undocumented. I would agree undocumented, but that doesn't necessarily mean unsupported. I would say there is little documentation out there, but if you want proof that Oracle allows manual creation, check the API or look at the coe_xfr_sql_profile.sql file in the SQLT utility directory.
I also posted a cheatsheet on how to quickly manually create a SQL Profile here.

Resources