How do I search for all the columns/field names starting with "XYZ" in Azure Databricks - azure-databricks

I would like to do a big search on all field/columns names that contain "XYZ".
I tried below sql but it's giving me an error.
SELECT
table_name
,column_name
FROM information_schema.columns
WHERE column_name like '%account%'
order by table_name, column_name
ERROR states "Table or view not found: information_schema.columns; line 4, pos 5"

information_schema.columns is not supported in Databricks SQL. There are no in-built views available to get the complete details of tables along with columns. There is SHOW TABLES (database needs to be given) and SHOW COLUMNS (table name needs to be given).
You might have to use Pyspark capabilities to get the required result. First use the following code to get the details of all tables and respective columns:
db_tables = spark.sql(f"SHOW TABLES in default")
from pyspark.sql.functions import *
final_df = None
for row in db_tables.collect():
if(final_df is None):
final_df = spark.sql(f"DESCRIBE TABLE {row.database}.{row.tableName}")\
.withColumn('database',lit(f'{row.database}'))\
.withColumn('tablename',lit(f'{row.tableName}'))\
.select('database','tablename','col_name')
else:
final_df = final_df.union(spark.sql(f"DESCRIBE TABLE {row.database}.{row.tableName}")\
.withColumn('database',lit(f'{row.database}'))\
.withColumn('tablename',lit(f'{row.tableName}'))\
.select('database','tablename','col_name'))
#display(final_df)
final_df.createOrReplaceTempView('req')
Create a view and then apply the following query:
%sql
SELECT tablename,col_name FROM req WHERE col_name like '%id%' order by tablename, col_name

Related

Query to get Unique Indexes having NOT NULL columns - Oracle

Currently I am trying to find all the unique indexes defined in a table which are NOT NULL for Oracle database. What I mean by that is, Oracle allows creating unique indexes on columns which are even defined as NULL.
So if my table has two unique indexes, I want to retrieve the particular unique index which is having all the columns having the NOT NULL constraints.
I did come up with this query:
select ind.index_name, ind_col.column_name, ind.index_type, ind.uniqueness
from sys.dba_indexes ind
inner join sys.dba_ind_columns ind_col on ind.owner = ind_col.index_owner and ind.index_name = ind_col.index_name
where ind.owner in ('ISADRM') and ind.table_name in ('TH_RHELOR') and ind.uniqueness IN ('UNIQUE')
The above query is giving me all the unique indexes with the associated columns, but I am not sure, how should I join the above query with ALL_TAB_COLS which has the NULLABILITY data for all the columns of a table.
I tried joining this table with indexes and tried subquery as well, but not getting appropriate results.
Hence, would request you to please comment on same.
Analytic functions and inline views can help.
The analytic functions let you return detailed data but also create a summary on that data, based on separate windows. The detailed results include index owner, index name, and column name, but the counts are only per index owner and index name.
The first inline view joins the three tables, returns the detailed information, and has analytic functions to generate the count of all columns and the count of all nullable columns. The second inline view only selects rows where those two counts are equal.
--Unique indexes and columns where every column is NOT NULL.
select owner, index_name, column_name
from
(
--All relevant columns and counts of columns and not null columns.
select
dba_indexes.owner,
dba_indexes.index_name,
dba_tab_columns.column_name,
dba_tab_columns.nullable,
count(*) over (partition by dba_indexes.owner, dba_indexes.index_name) total_columns,
sum(case when nullable = 'N' then 1 else 0 end)
over (partition by dba_indexes.owner, dba_indexes.index_name) total_not_null_columns
from dba_indexes
join dba_ind_columns
on dba_indexes.owner = dba_ind_columns.index_owner
and dba_indexes.index_name = dba_ind_columns.index_name
join dba_tab_columns
on dba_ind_columns.table_name = dba_tab_columns.table_name
and dba_ind_columns.column_name = dba_tab_columns.column_name
where dba_indexes.owner = user
and dba_indexes.uniqueness = 'UNIQUE'
order by 1,2,3
)
where total_columns = total_not_null_columns
order by 1,2,3;
Analytic functions and inline views are tricky but they're very powerful once you learn how to use them.

Updating rows with values from another table in ClickHouse

I have two tables, one with data about counties and another with data about states. Different states can sometimes have counties with the same exact name, so I am trying to populate a unique_name column in my counties table that is the concatenation of a county name and the abbreviation of the state where that county is located (e.g.: Honolulu County, HI).
I have come up with the following query:
ALTER TABLE counties
UPDATE unique_name =
(
SELECT concat(counties.name, ', ', states.name_abbr)
FROM counties
INNER JOIN states
ON counties.statefp = states.statefp
) WHERE unique_name = ''
However, I keep getting the following error:
DB::Exception: Unknown identifier: states.statefp, context: required_names: 'states.statefp' source_tables: table_aliases: private_aliases: column_aliases: public_columns: masked_columns: array_join_columns: source_columns: .
The inner query is working perfectly fine on its own, but I don't why this error is coming up when I try to do the update. Any ideas?
ClickHouse does not support dependent joins for ALTER TABLE UPDATE. Fortunately, there is a workaround. You have to create a special Join engine table for the update. Something like this:
CREATE TABLE states_join as states Engine = Join(ANY, LEFT, statefp);
INSERT INTO states_join SELECT * from states;
ALTER TABLE counties
UPDATE unique_name = concat(name, joinGet('states_join', 'name_abbr', statefp))
WHERE unique_name = '';
DROP TABLE states_join;
Note, it only works in 19.x versions.

Is there a way to list tables and columns used in a ORACLE query? [duplicate]

This question already has answers here:
find column names and table names referenced in SQL
(3 answers)
Closed 4 years ago.
I've a huge Oracle query (~20k lines) and I need to list out all the tables and columns used in that query. I've google'd and found few SQL-Parser tools and plugins, but they didn't worked out to me.
Found below perl library, but it is not processing TO_DATE and TO_CHAR columns. it is throwing error.
http://search.cpan.org/~rehsack/SQL-Statement-1.412/lib/SQL/Parser.pm
Is there any other way in which I can list all the tables and columns used in the query?
Not sure if this is the answer but you can create a VIEW using your query.
CREATE VIEW your_view_name
AS SELECT * FROM your_table_or_complex_query;
Then list the tables using the SYS.USER_DEPENDENCIES table.
select *
from SYS.USER_DEPENDENCIES
where type = 'VIEW'
AND REFERENCED_TYPE IN ('VIEW', 'TABLE', 'SYNONYM')
AND name = '<view_name>';
To list the columns, you can use the query below, (the problem with this query is if a column for example COLUMN1 exists in three tables TAB1, TAB2, TAB3 but in your query it only came from TAB1, the three tables will be shown)
SELECT a.referenced_name, b.column_name
FROM SYS.USER_DEPENDENCIES A, USER_TAB_COLS B,
(SELECT dbms_metadata.get_ddl('VIEW','<view_name>','<schema_name>') ddl_text FROM DUAL) c
WHERE TYPE = 'VIEW'
AND REFERENCED_TYPE IN ('VIEW', 'TABLE', 'SYNONYM')
AND name = '<view_name>'
AND b.table_name = a.name
AND INSTR(lower(c.ddl_text), lower(b.column_name)) > 0
ORDER BY referenced_name;

Hive LATERAL VIEW and WHERE Clause using Sub query

I'm looking for a way to optimize my query.
We have a table with events called lea, with a column app_properties, which are tags, stored as a comma separated string.
I would like to select all the events that match the result of a query that select the desired tags.
My first try:
SELECT uuid, app_properties, tag
FROM events
LATERAL VIEW explode(split(app_properties, '(, |,)')) tag_table AS tag
WHERE tag IN (SELECT source_value FROM mapping WHERE indicator = 'Bandwidth Usage')
But Hive will not allow this...
FAILED: SemanticException [Error 10249]: Line 4:6 Unsupported SubQuery Expression 'tag': Correlating expression cannot contain unqualified column references.
Gave it another try by replacing WHERE tag IN by WHERE tag_table.tag IN but not luck...
FAILED: SemanticException Line 4:6 Invalid table alias tag_table' in definition of SubQuery sq_1 [tag_table.tag IN (SELECT source_value FROM mapping WHERE indicator = 'Bandwidth Usage')] used as sq_1 at Line 4:20.
In the end... The query below gives the desired result, but I've a feeling that this is not the most optimized way of solving this use case. Has anyone ran into the same use case where you need the select from a LATERAL VIEW using a Sub query?
SELECT to_date(substring(events.time, 0, 10)) as date, t2.code, t2.indicator, count(1) as total
FROM events
LEFT JOIN (
SELECT distinct t.uuid, im.code, im.indicator
FROM mapping im
RIGHT JOIN (
SELECT tag, uuid
FROM events
LATERAL VIEW explode(split(app_properties, '(, |,)')) tag_table AS tag
) t
ON im.source_value = t.tag AND im.indicator = 'Bandwidth Usage'
WHERE im.source_value IS NOT NULL
) t2 ON (events.uuid = t2.uuid)
WHERE t2.code IS NOT NULL
GROUP BY to_date(substring(events.time, 0, 10)), t2.code, t2.indicator;
The Hive subquery in the WHERE clause can be used with IN, NOT IN, EXIST, or NOT
EXIST as follows. If the alias (see the following example for the employee table) is not specified before columns (name) in the WHERE condition, Hive will report the error Correlating expression cannot contain unqualified column references. This is a limitation of the Hive subquery.
From Apache Hive Essentials.
I guess this problem is also caused by subquery.
events should have an alias

Finding sequences and triggers associated with an Oracle table

I have used this query to fetch the list of sequences belonging to an Oracle database user:
SELECT * FROM all_sequences x,all_tables B
WHERE x.sequence_owner=B.owner AND B.TABLE_NAME='my_table';
But that database user is having many more sequence also, so the query returns me all the sequence of the database user. Can anybody help me to find the particular sequence of my_table using query so that I can get the auto increment id in my application.
i want the query which fetch list of table of my database user with the sequence and triggers used in the table
You can get the triggers associated with your tables from the user_triggers view. You can then look for any dependencies recorded for those triggers in user_dependencies, which may include objects other than sequences (packages etc.), so joining those dependencies to the user_sequences view will only show you the ones you are interested in.
Something like this, assuming you are looking at your own schema, and you're only interesting in triggers that references sequences (which aren't necessarily doing 'auto increment', but are likely to be):
select tabs.table_name,
trigs.trigger_name,
seqs.sequence_name
from user_tables tabs
join user_triggers trigs
on trigs.table_name = tabs.table_name
join user_dependencies deps
on deps.name = trigs.trigger_name
join user_sequences seqs
on seqs.sequence_name = deps.referenced_name;
SQL Fiddle demo.
If you're actually looking at a different schema then you'll need to use all_tables etc. and filter and join on the owner column for the user you're looking for. And if you want to include tables which don't have triggers, or triggers which don't refer to sequences, you can use outer joins.
Version looking for a different schema, though this assumes you have the privs necessary to access the data dictionary information - that the tables etc. are visible to you, which they may not be:
select tabs.table_name,
trigs.trigger_name,
seqs.sequence_name
from all_tables tabs
join all_triggers trigs
on trigs.table_owner = tabs.owner
and trigs.table_name = tabs.table_name
join all_dependencies deps
on deps.owner = trigs.owner
and deps.name = trigs.trigger_name
join all_sequences seqs
on seqs.sequence_owner = deps.referenced_owner
and seqs.sequence_name = deps.referenced_name
where tabs.owner = '<owner>';
If that can't see them then you might need to look at the DBA views, again if you have sufficient privs:
select tabs.table_name,
trigs.trigger_name,
seqs.sequence_name
from dba_tables tabs
join dba_triggers trigs
on trigs.table_owner = tabs.owner
and trigs.table_name = tabs.table_name
join dba_dependencies deps
on deps.owner = trigs.owner
and deps.name = trigs.trigger_name
join dba_sequences seqs
on seqs.sequence_owner = deps.referenced_owner
and seqs.sequence_name = deps.referenced_name
where tabs.owner = '<owner>';
One way would be to run these queries to check if there are any sequence's Pseudocolumns (NEXTVAL and CURRVAL ) used in your functions , procedures, packages, Triggers or PL/SQL JAVA SOURCE.
select * from user_source where
UPPER(TEXT) LIKE '%NEXTVAL%';
select * from all_source where
UPPER(TEXT) LIKE '%NEXTVAL%';
Then go to the specific Procedure, Function or Trigger to check which column/table gets populated by a sequence.
The query could also be used with '%CURRVAL%'
This might not help if you are running inserts from JDBC or other external applications using a sequence.
Oracle 12c introduced the IDENTITY columns, using which you could create a table with an identity column, which is generated by default.
CREATE TABLE t1 (c1 NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
c2 VARCHAR2(10));
This will internally create a sequence that auto-generates the value for the table's column.So, If you would like to know which sequence generates the value for which table, you may query the all_tab_columns
SELECT data_default AS sequence_val
,table_name
,column_name
FROM all_tab_columns
WHERE OWNER = 'HR'
AND identity_column = 'YES';
SEQUENCE_VAL |TABLE_NAME |COLUMN_NAME
-----------------------------------------|-------------------------------------
"HR"."ISEQ$$_78160".nextval |T1 |C1
I found a solution to this problem to guess the sequence of a particular sequence
select * from SYS.ALL_SEQUENCES where SEQUENCE_OWNER='OWNER_NAME' and LAST_NUMBER between (select max(FIELD_NAME) from TABLE_NAME) and (select max(FIELD_NAME)+40 from TABLE_NAME);
This query will guess by search the LAST_NUMBER of the sequence value between MAX value of the field using sequence and Max value + 40 (in my case cache value is 20, so I put 40)
select SEQUENCE_NAME from sys.ALL_TAB_IDENTITY_COLS where owner = 'SCHEMA_NAME' and table_name = 'TABLE_NAME';

Resources