how to join two tables in oracle on blob column? - oracle

how to join two tables in oracle on blob column
when this query is executed "SQL command not properly ended" error message is appearing
select name,photo
from tbl1 join tbl2 on tbl1.photo = tbl2.photo

First, it seems very very odd to have a design where you are storing the same blob in two different tables and very odd that you would want to join on an image. That doesn't seem like a sensible design.
You've tagged this for Oracle 8i. That is an ancient version of Oracle that didn't support the SQL 99 join syntax. You would need to do the join in the where clause instead. You can't directly test for equality between two blob values. But you can use dbms_lob.compare
select name,photo
from tbl1,
tbl2
where dbms_lob.compare(tbl1.photo, tbl2.photo) = 0
This will be rather hideous from a performance perspective. You'll have to compare every photo from tbl1 against every photo from tbl2 and comparing two lobs isn't particularly quick. If you are really intent on comparing images, you are probably better off computing a hash, storing that in a separate column that is indexed, and then comparing the hashes rather than comparing the images directly.

The code:
SELECT
name, photo
FROM
tbl1 T1
INNER JOIN
tbl2 T2
ON
T1.photo = T2.photo
If not running fine, you would have to make few changes in your TABLE structure:
1. ...Add a new TABLE named as IMAGES having columns (image_id, image_blob)
2. ...And then you you would have to change the:
tbl1's blob and tbl2's blob to image_id
3. ...Then perform the JOIN on the basis of COLUMN named as image_id
NOTE: You can not perform GROUP BY, JOIN(any JOIN), CONCAT operations on BLOB datatype
SUGGESTION: save the Paths to images in the DATABASE and save the IMAGES somewhere on that SERVER's Directory (As saving images in BLOB in the DATABASE is not a good practice..... To ensure what i said VISIT HERE)

Related

Chained CTEs in Redshift - How do I know which DIST KEY the CTE will inherit?

I have a view in Redshift which consists of lots of CTEs that are joined (chained) between each other. Inside these CTEs there are joins between multiple tables. If I then Join to a CTE that has a join of multiple tables inside where does the SORT KEY and DIST KEY for the Join from? How does Redshift decide which table in the join in the CTE, the CTE should inherit it's DIST KEY and SORT KEY from? If at all?
For example, tbl1 has a DIST KEY on tbl_key, tbl2 has a DIST KEY on tbl_id, tbl3 has DIST KEY on tbl_key.
First, I create a CTE which is the join of tbl1 and tbl2.
With cte1 as (
Select tbl1.col1, tbl2.col2
From tbl1
Join tbl2 on tbl1.job_no = tbl2.job_id )
Second, I create a CTE that joins to the first CTE
With cte2 as (
Select cte1.*, tbl3.col3
From cte1
Join tbl3 using (tbl_key))
Now my question is, does CTE1 have a DIST KEY on tbl1's DIST KEY of tbl_key or tbl2's DIST KEY of tbl_id? or both? or neither?
In Redshift, CTEs are just used to simplify the reading of sql. They are processed just the same as subqueries. i.e. they are not made physical and therefore do not have their own dist/sort key.
You could rewrite your code as
Select cte1.*, tbl3.col3
From (Select tbl1.col1, tbl2.col2
From tbl1
Join tbl2 on tbl1.job_no = tbl2.job_id
) as cte1
Join tbl3 using (tbl_key)
which can be simplified further as
Select tbl1.col1, tbl2.col2, tbl3.col3
from tbl1
join tbl2 on tbl1.job_no = tbl2.job_id
join tbl3 using (tbl_key)
If you are able to choose your dist/sort keys then you should consider which tables are the biggest and prioritise those accordingly.
for example if tbl1 and tbl2 are large then it may make sense to have them distributed as you described.
However, if tbl2 and tbl3 are both large, it may make sense to distribute both on tbl_key.
When you issue a query Redshift will compile and optimize that query as it sees fit to achieve the best performance and be logically equivalent. Your CTEs look like subqueries to the compile / optimization process and the order in which the joins are performed may have no relation to how you wrote the query.
Redshift makes these optimization choices based on the table metadata that is created / updated by ANALYZE. If you want Redshift to make smart choices on how to join your tables together you will want your table metadata to be up to date. The query plan (including join order and data distribution) is set at query compile, it is not dynamically determined during execution.
One of the choices Redshift makes is how the intermediate data of the query is distributed (your question) but remember that these intermediate results can be for a modified join order. To see what order that Redshift plans to join your tables look at the EXPLAIN plan for the query. The more tables you are joining and the more complex your query, the more choices Redshift has and the less likely it is that the EXPLAIN plan will join in the order you specified. I've worked on clients' queries with dozens of joins and many nested levels of subquery and the EXPLAIN plan is often very different than the original query as written.
So Redshift is trying to make smart choices about the join order and intermediate result distribution. For example it will usually join small tables to large tables first and keep the distribution of the large table. But here large and small are based on post WHERE clause filtering and the guesses Redshift can make based on metadata. The further join is away from the source table metadata (deep into the join tree) the more uncertain Redshift is about what the incoming and outgoing data of the join will look like.
Here the EXPLAIN plan can give you hints about what Redshift is "thinking" - if you see a DIST INNER join Redshift is moving the data of one table (or intermediate result set) to match the other. If you DIST BOTH then Redshift is redistributing both sets of data to some new distribution (usually one of the join on columns). It does this to avoid having only 1 slice with data and all others with nothing to do as this would be very inefficient.
To sum up to see what Redshift is planning to do with your joins look at the EXPLAIN plan. You can also infer some info about intermediate result distribution from the explain plan but is doesn't provide a complete map of what it plans to do.

How can I merge two tables using ROWID in oracle?

I know that ROWID is distinct for each row in different tables.But,I am seeing somewhere that two tables are being merged using rowid.So,I also tried to see it,but I am getting the blank output.
I have person table which looks as:
scrowid is the column which contains rowid as:
alter table ot.person
add scrowid VARCHAR2(200) PRIMARY KEY;
I populated this person table as:
insert into ot.person(id,name,age,scrowid)
select id,name, age,a.rowid from ot.per a;
After this I also created another table ot.temp_person by same steps.Both table has same table structure and datatypes.So, i wanted to see them using inner join and I tried them as:
select * from ot.person p inner join ot.temp_person tp ON p.scrowid=tp.scrowid
I got my output as empty table:
Is there is any possible way I can merge two tables using rowid? Or I have forgotten some steps?If there is any way to join these two tables using rowid then suggest me.
Define scrowid as datatype ROWID or UROWID then it may work.
However, in general the ROWID may change at any time unless you lock the record, so it would be a poor key to join your tables.
I think perhaps you misunderstood the merging of two tables via rowid, unless what you actually saw was a Union, Cross Join, or Full Outer Join. Any attempt to match rowid, requardless of you define it, doomed to fail. This results from it being an internal definition. Rowid in not just a data type it is an internal structure (That is an older version of description but Oracle doesn't link documentation versions.) Those fields are basically:
- The data object number of the object
- The data block in the datafile in which the row resides
- The position of the row in the data block (first row is 0)
- The datafile in which the row resides (first file is 1). The file
number is relative to the tablespace.
So while it's possible for different tables to have the same rowid, it would be exteremly unlikely. Thus making an inner join on them always return null.

How to populate columns of a new hive table from multiple existing tables?

I have created a new table in hive (T1) with columns c1,c2,c3,c4. I want to populate data into this table by querying from other existing tables(T2,T3).
E.g c1 and c2 come from a query run on T2 & the other columns c3 and c4 come from a query run on T3.
Is this possible in hive ? I have done immense research but still am unable to find a solution to this
Didn't something like this work?
create table T1 as
select t2.c1, t2.c2, t3.c3, t3.c4 from (some query against T2) t2 JOIN (some query against T3) t3
Obviously replace JOIN with whatever is needed. I assume some join between T2 and T3 is possible or else you wouldn't be putting their columns alongside each other in T1.
According to the hive documentation, you can use the following syntax to insert data:
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Be careful that:
Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
So, I would make a JOIN between the two existing table, and then insert only the needed values in the target table playing around with SELECT. Or maybe creating a temporary table would allow you to have more control over the data. Just remember to handle the problem with NULL, as stated in the official documentation. This is just an idea, I guess there are other ways to achieve what you need, but could be a good place to start from.

How to use Oracle Materialzed View in a Dimensional Model

I have a dimensional model with a large fact table (millions of rows) which is range partitioned by date and smaller dimensional tables that are not partitioned. I came across materialized views which is often used in these scenarios to improve query performance.
Now, I want to know which way is better of the following two to utilize these materialized views to get aggregated reports.
A. Create one with the by joining the whole fact table with each of the dimension tables required.
create materialized view my_mview execute immediate query rewrite
select
fact.col1, dim1.col2, dim2.col3, sum(fact.col4)
from
my_fact fact
inner join
my_dim1 dim1
on fact.dim1_key = dim1.dim1_key
inner join
my_dim2 dim2
on fact.dim2_key = dim2.dim2_key group by fact.col1, dim1.col2, dim2.col3
This seems like the most basic way of using them. But it seems
rather limiting and I would require a new materialzed view for each
variation of the query I want to create.
B. Create it over the aggregation of the fact table and utilize the query rewrite when doing a dimensional join back.
create materialized view my_mview execute immediate query rewrite
select
col1, dim1.dim2_key, dim2.dim_key, sum(fact.col4)
from
my_fact fact
And do the join as above in case A, which will use this aggregated materialzed view for the join and not the whole fact table.
Can anyone tell me when I would use each case or the other?
Your first example works exactly as you described.
For the second example the query should be:
create materialized view my_mview execute immediate query rewrite
select
col1, fact.dim2_key, fact.dim_key, sum(fact.col4)
from
my_fact fact
group by
col1, fact.dim2_key, fact.dim_key
This will automatically speed up aggregates such as
select sum(fact.col4)
from fact
select fact.dim_key,sum(fact.col4)
from fact
group by fact.dim_key
select fact.dim2_key,sum(fact.col4)
from fact
group by fact.dim2_key
I don't think Oracle will rewrite your first type of query to this MV automatically because in the MV the join columns are already grouped by (They also should be grouped in your second example). It never happened for us. This however may also depend on if there are relationships defined between dim and fact table and the value of QUERY_REWRITE_INTEGRITY parameter, so there is still some room for testing here.
You may still get a performance gain by writing a query in a specific way
WITH preaggr as (
select
col1, fact.dim2_key, fact.dim_key, sum(fact.col4)
from
my_fact fact
group by
col1, fact.dim2_key, fact.dim_key
)
select
dim2.col1,
sum(preaggr.col4)
from
preaggr
join
dim2
on
preaggr.dim2_key = fact.dim2_key
group by
dim2.col1

How to select row data as column in Oracle

I have two tables like bellow shows figures
I need to select records as bellow shown figure. with AH_ID need to join in second table and ATT_ID will be the column header and ATT_DTL_STR_VALUE need to get as that column relevant value
Required output
Sounds like you have an Entity-Attribute-Value data model which relational DBs aren't the best at modeling. You may want to look into a key-value store.
However, as Justin suggested, if you're using 11g you can use th pivot clause as follows:
SELECT *
FROM (
SELECT T1.AH_ID, T1.AH_DESCRIPTION, T2.ATT_ID, T2.ATT_DTL_STR_VALUE
FROM T1
LEFT OUTER JOIN T2 ON T1.AH_ID = T2.AH_ID
)
PIVOT (MAX(ATT_DTL_STR_VALUE) FOR (ATT_ID) IN (1));
This statement requires you to hard-code in ATT_ID however there are ways to do it dynamically. More info can be found here.

Resources