update table based on concatenated column value - oracle

I have a table with only 4 columns
First column - The concatenated column values for each row from another table.The columns are concatenated based on column id from the metadata table.The order of concatenation is the same order of column ids.
Second column -I have the comma separated primary key columns.
Now, based on the primary keys in the second column, I need to update the 3rd column which will retrieve the values for the primary key from each of the first concatenated field.
4 column _ it has the table name.
I am using cursor and string functions and it works perfectly fine but when I tested it fir huge millions of data , it failed and the performance is very poor.
Could anyone give please me a single update query for the same
There is a comparison tool which compares the data between 2 tables in different database but with same data structure and it dumps the mismatch rows into a table with all the columns concatenated(pipe seperaed).The columns are in the same order as that of column id and I know the primary keys for that table(concatenated but pipe seperated). So, based on this data I need to extract the primary key values for which there is a data mismatch.
I need to do something like
Update column4(primary key values pipe seperated extracted from column2)

Check this LINK, maybe will be useful. With that query you could concatenate a value with a character you need (this works for 11g2 version, for earlier versions use xmlagg
, xmlelement, extract method).
CREATE TABLE TEST(
FIELD INT);
INSERT INTO TEST VALUES(1);
INSERT INTO TEST VALUES(2);
INSERT INTO TEST VALUES(3);
INSERT INTO TEST VALUES(4);
SELECT listagg(FIELD,',' ) WITHIN GROUP (ORDER BY FIELD)
FROM TEST
Returns '1,2,3,4'

Related

Why does CSVREAD not work as expected when it is supposed to read the column names from the csv file?

According to the H2 documentation for CSVREAD
If the column names are specified (a list of column names separated with the fieldSeparator), those are used, otherwise (or if they are set to NULL) the first line of the file is interpreted as the column names.
I'd expect reading the csv file
id,name,label,origin,destination,length
81,foobar,,19,11,27.4
like this
insert into route select * from csvread ('routes.csv',null,'charset=UTF-8')
would work. However, actually a JdbcSQLIntegrityConstraintViolationException is thrown, saying NULL not allowed for column "ORIGIN" and indicating error code 23502.
If I explicitly add the column names to the insert statement like so,
insert into route (id,name,label,origin,destination,length) select * from csvread ('routes.csv',null,'charset=UTF-8')
it works fine. However, I'd prefer not to repeat myself - following the DRY principle :)
Using version 2.1.212.
The CSVREAD function produces a virtual table. Its column names can be specified in parameters or in the CSV file.
INSERT command with a query doesn't map column names from this query with column names of target table, it uses their ordinal positions instead. Value from the first column of the query is inserted into first column specified in insert column list or into first column of target table if insert column list isn't specified, the second is inserted into second column, and so on.
You can omit insert column list only if your table was defined with the same columns in the same order as in the source query (is your case in the CSV file). If your table has columns declared in different order or it has some additional columns, you need to specify this list.

How to find the position of the primary key which is varchar GUID generated by application row in Oracle

So my use case is i have to find the location of the primary key column so that i can write query like select * from my_table where ID <='00000536-37ee-471c-a8e0-3d233b8102f5'
So my table has a primary key which is varchar type and values of the column is GUID generated by an application.
Here is an example of primary key
000000bd-104e-4fd6-a791-c5422f29e1b5
0000016e-7e68-4453-b360-7ffd1627dc22
00000196-2dba-4532-8cba-1e853c466697
0000025a-cfae-41b4-b8e7-ef854d49e54a
00000260-8bdb-4b30-acdb-5a67efd4dbfe
00000366-552d-48a0-b8a1-20190ccd087c
000003f2-d6d8-4a51-96cc-407063bc568b
000003ff-3d16-4e88-9cf3-bcdf01c39a2b
00000487-1e6c-4d6d-a683-6f11d517962c
000004cc-6359-4a9a-aa2a-70a6b73a06b1
00000536-37ee-471c-a8e0-3d233b8102f5
Now i need to use this table in aws DMS which accepts only query like select * from table where column =,<=,>=
My use case is to find the exact location of the millions of GUID so that i can divide table into multiple query and select based on GUID .
For example if we have 100th GID is 00000536-37ee-471c-a8e0-3d233b8102f5 then i can write query like select * from my table where GUID <=100
The limitation is i can not add any new columns in the existing table because application impact is huge .
How can i do this ?
One Option that i thought but wanted to confirm is below
Create a temp table
Temp table will have auto generated sequence and ID column
Inset into temp table select only GUID from main table with order of GUID .
In this case the value will be stored on order and i an first select GUID based on 100th number and then i can pass that GUID and write my oroginal query
But i am not sure whether this will work on not
Can some one suggest on this or suggest some other option ?
So let me explain what i want .
I want DMS to read may main table in parallel and migrate .
So lets say one DMS task can read nd migrate from 1 to 100,another 100 to 200 another >200 like that .
Currently i can not do because we dont know the position of the primary key and write the query .
If you want to divide your table into chunks of equal sizes, I would take advantage of the hexadecimal nature of the GUIDs. It will be 256 instead of 100 chunks, but this might be acceptable.
CREATE TABLE t (pk VARCHAR2(36) PRIMARY KEY);
INSERT INTO t VALUES ('000000bd-104e-4fd6-a791-c5422f29e1b5');
The easiest option would be
SELECT * FROM t WHERE pk LIKE '%b5';
A bit more advanced:
SELECT pk, to_number(substr(pk, -2),'xx') FROM t;
If you have millions of rows, this is probably faster:
ALTER TABLE t ADD (mycol GENERATED ALWAYS AS (to_number(substr(pk, -2),'xx')));
CREATE INDEX i ON t(mycol);
SELECT * FROM t WHERE mycol=181;
Once your migration is done, you can undo the additional virtual column:
DROP INDEX i;
ALTER TABLE t DROP (mycol);

HIVE partitioned by column becomes all 0 after inserting data from another table

I am using Hortonworks to create partitioned table in HIVE and insert data into it using another table in HIVE. The problem is, after I inserted data into the table I created, all values in the partitioned column (passenger_count) in the resulting table shows 0 even though none of the values in the original table are 0.
Below are the steps I have taken to create the partitioned table and insert data into it:
Run the following query to create table called 'date_partitioned':
create table date_partitioned
(tpep_dropoff_datetime string, trip_distance double)
partitioned by (passenger_count int);
Run the following query to insert data into 'date_partitioned' table, from another existing table:
INSERT INTO TABLE date_partitioned
PARTITION (passenger_count)
SELECT tpep_dropoff_datetime, trip_distance, passenger_count
FROM trips_raw;
The column types and sample values of the 'trips_raw' are shown in the screenshots below:
As you can see, the 'passenger_count' column is int type and contains non-zero values. But when I look at the results from the 'date_partitioned' table, the values from the 'passenger_count' column all show 0. The table also created a duplicate 'passenger_count' (so it has 2 'passenger_count' columns, one of which is empty). You can see from the screenshot below:
Any advise would be greatly appreciated. I am curious as to why the 'passenger_count' show 0 in the resulted table when the original column has no 0, and why there's an additional 'passenger_count' column in the resulted table.
Are you sure that all rows loaded for passenger_count is 0? Can you do a COUNT and GROUP BY passenger_count on both tables? Maybe you're just sampling all zeroes?

Replace specific junk characters from column in hive

I've an issue where one of the column loaded in a hive table contains junk character ("~) in a column suffixed with actual value (ABC). So the actual value that's visible for this column is (ABC"~).
This column can have either ABC (or any such string) or NULL. The table is huge and Update is not an option here.
I've thought of a solution of creating a temp table with this column containing either the string (ABC) or NULL, thereby want to remove this junk character ("~) completely while copying the data from original table to this temp table.
Any help on how I can remove this junk? I tried using regexp function, but no success. Any suggestions?
I was not using regexp properly; my fault.
The data loaded initially in the table had the extra characters attached to a column's values. For Ex: If the column's actual value was Adf452, then the data contained in the cell was Adf452"~.
So I loaded the data to a temp table like this:
insert overwrite table tempTable select colA, colB, colC, regexp_replace(colC,"\"~",""), partitionedCol from origTable;
This simply loaded the data in tempTable without those junk characters.

Difference Between Insert and Append statement in SQL Loader?

Can any one tell me the Difference Between Insert and Append statement in SQL Loader?consider the below example :
Here is my control file
load_1.ctl
load data
infile 'load_1.dat' "str '\r\n'"
insert*/+append/* into table sql_loader_1
(
load_time sysdate,
field_2 position( 1:10),
field_1 position(11:20)
)
Here is my data file
load_1.dat
0123456789abcdefghij
**********##########
foo bar
here comes a very long line
and the next is
short
The documentation is fairly clear; use INSERT when you're loading into an empty table, and APPEND when adding rows to a table that (might) contains data (that you want to keep).
APPEND will still work if your table is empty. INSERT might be safer if you're expecting the table to be empty, as it will error if that isn't true, possibly avoiding unexpected results (particularly if you don't notice and don't get other errors like unique index constraint violations) and/or a post-load data cleanse.
The difference are in two points clear:
append will only add the record if at the end of statement
insert will insert anywhere you want i.e if your table have 10 column you can insert in 5 column only but in append you can't.
in append both your data and the table should have same columns means insert data in row level rather than in column level
and it's also true you cannot use insert if your table have data if it's empty then only you can do use insert.
hope it helps

Resources