I create pages for loading data from CSV through Data Load Wizard. CSV-file contains data not for all fields, but this fields are required.
How can I insert data in fields, that not contains in CSV, for example, using queries to another tables?
My APEX vervion is 4.1.1.00.23
You can do that using Oracle SQL.
If there is a unique ID for the table that you are loading into (let's say you have most of the employee in a csv), you can use that ID to merge or update data from other tables.
For a table like this...
empid,
ename,
zip_code
state_code
Example : you can load this using CSV
EMP:
empid,
ename,
zip_code
and then update the state using this query.
update emp tgt
set state = (select state
from zip_code_lookup src
where src.zip_code = tgt.zip_code
);
If there is more than one column that needs to be updated, I prefer using the merge command.
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_9016.htm#SQLRF01606
you can refer to this link for more information about data loading to multiple tables using the data loading wizard http://vincentdeelen.blogspot.in/2014/03/data-load-for-multiple-tables.html
Related
I'm using Talend Open Studio for Data Integration.
I have tables that are generated every day and table names are suffixed by date like so
dailystats20220127
dailystats20220126
dailystats20220125
dailystats20220124
I have two-part question.
I want to look at the table which has yesterday's date in it so sysdate - 1 and I want to fetch data from yesterday
select 'dailystats' ||to_char(sysdate - 1,'YYYYMMDD') TableName
from dual;
How do I retrieve schema for a dynamic name?
How do I pull data from that table.
I've worked with static table names and its a straightforward process.
If the schema is always the same, you just define it once in your input component.
In your input component set the sql as :
"select [fields] from dailystats"+ TalendDate.formatDate("yyyyMMdd", TalendDate.addDate(TalendDate.getCurrentDate(), -1, "dd"))
I have a use case where in i need to implement SQL based data warehousing activities using Hive.
The software would generate a bunch of csv files. When it transforms into SQL table, an unique id called session is assigned for each csv file and loaded into a SQL table. Let's say, I have 3 columns in csv files. I will have four columns in the SQL table wherein the first column represent the session. This means that, values stored in first csv file is written into the SQL table with the sessios id '1', and values from the second csv file is appended to the SQL table with the session id '2', and so on.
In Hive,
I stored these csv files in hdfs directory and want to create one hive table with the additional columns that represents the session id. I am not sure how I can do it. Any help or clue will be highly appreciated.
Try below approaches:
Using Random session id:
create external table on top of source dataset:
create external table staging (a string, b string, c string) location 'xyz';
Assign a unique id to each row:
insert into table destination as select reflect("java.util.UUID", "randomUUID") AS session_id, s.* from staging;
Using sequence number as session id:
create external table on top of source dataset:
create external table staging (a string, b string, c string) location 'xyz';
first time data load:
CREATE TABLE IF NOT EXISTS max_session_id (session_id int);
Append a sequence id to each record:
insert into table destination
select cast(coalesce(t.session_id,0) + row_number() over () as INT) as session_id, t1.*
from max_session_id t join destination t1 on 1=1;
Maintain max session id in separate table:
DROP TABLE IF EXISTS tmp_max_session_id;
CREATE TABLE tmp_max_session_id AS SELECT COALESCE(MAX(session_id), 0) AS session_id FROM destination;
INSERT OVERWRITE TABLE max_session_id SELECT * FROM tmp_max_session_id;
if you want to tag a same session id per file then add each file as a partition, you may store reflect("java.util.UUID", "randomUUID") or max_session_id in separate table while adding partition use newly generated session_id as partition id.
I am prototyping my pig script to hive. I need to add a status column to the table which is imported from Oracle database.
My pig scripts looks like this:
user_data = LOAD 'USER_DATA' USING PigStorage(',') AS (USER_ID:int,MANAGER_ID:int,USER_NAME:int);
user_data_status = FOREACH user_data GENERATE
USER_ID,
MANAGER_ID,
USER_NAME,
'active' AS STATUS;
Here I am adding the STATUS column with 'active' value to the user_data table.
How can I add column to an existing table to add column while importing the table via Hive QL??
As far as I know, You will have to reload the data as you did in Pig.
For example, If you already have the table user_data with columns USER_ID:int,MANAGER_ID:int,USER_NAME:int and you are looking for USER_ID:int,MANAGER_ID:int,USER_NAME:int, STATUS:active
You can re-load the table user_data_status by using something like this
INSERT OVERWRITE TABLE user_data_status SELECT *, 'active' AS STATUS FROM user_data;
Though there are options to add columns to the existing table, that would only update the metadata in metastore and the values would be defaulted to NULL.
If I was you, I would rather re-load the complete data rather than looking to update the complete table using UPDATE command after Altering the column structure. Hope this helps !
I am interested in loading specific columns into a table created in Hive.
Is it possible to load the specific columns directly or I should load all the data and create a second table to SELECT the specific columns?
Thanks
Yes you have to load all the data like this :
LOAD DATA [LOCAL] INPATH /Your/Path [OVERWRITE] INTO TABLE yourTable;
LOCAL means that your file is on your local system and not in HDFS, OVERWRITE means that the current data in the table will be deleted.
So you create a second table with only the fields you need and you execute this query :
INSERT OVERWRITE TABLE yourNewTable
yourSelectStatement
FROM yourOldTable;
It is suggested to create an External Table in Hive and map the data you have and then create a new table with specific columns and use the create table as command
create table table_name as select statement from table_name;
For example the statement looks like this
create table employee as select id as id,emp_name as name from emp;
Try this:
Insert into table_name
(
#columns you want to insert value into in lowercase
)
select columns_you_need from source_table;
I am new for hadoop and big data concepts. I am using Hortonworks sandbox and trying to manipulate values of a csv file. So I imported the file using file browser and created a table in hive to do some query. Actually I want to have an "insert into values" query to select some rows, change the value of columns(for example change string to binary 0 or 1) and insert it into a new table. SQL LIKE query could be something like this:
Insert into table1 (id, name, '01')
select id, name, graduated
from table2
where university = 'aaa'
Unfortunately hive could not insert (constant) values (without importing from file) and I don`t know how to solve this problem using hive,pig or even mapreduce scripts.
Please help me to fine the solution,I really need to it.
Thanks in advance.
In Hive,
CREATE TABLE table1 as SELECT id, name, graduated FROM table2
WHERE university = 'aaa'
should create a new table with the results of the query.