I'm loading a large amount of data with SQL Loader.
The target table has a unique, system-generated PK.
When the table is populated by the business application, the key is generated programatically.
The extract file for the bulk upload doesn't have key in the record. Also, the upload is running in multiple threads, because of the extremely large volumes, and in stages - one file a day.
Is there way to populate a column with random key char(14), directly in SQL Loader? In other words, can I have something like that in the Control File:
ID EXPRESSION (random number creation expression),
name char(10),
age number
so from the data file
Joe, 10
Mary, 5
I'll create data:
719287398 Joe 10
645743657 Mary 5
Something like ID EXPRESSION "dbms_random.string('l',14)" can be used.
Related
I have a data table in Oracle database in this format to keeps all the transactions in my system:
Customer ID
Transaction ID
001
trans_id_01
001
trans_id_02
002
trans_id_03
003
trans_id_04
As you see, each customer ID can generate many transactions in this table.
Now I need to export the data from each day into CSV files with Apache Nifi.
But the requirement is I need to have around 10k transactions in each file (this is not fixed, can have a bit more or less), with rows sorted by Customer ID. That should be simple, and I have done it with
this processor:
But there's additional requirement to ensure each Customer ID should be in the same file. There should be no case where customer id 005 have some transactions in file no. 1 and another transaction in file no. 2.
If I need to write this logic with pure coding, I think I can do DB query with pagination and write some codes to check for trailing data at the end to be compared with next page before writing each file. But when it comes to implementation with Nifi, I still have no idea how to do this.
But there's additional requirement to ensure each Customer ID should be in the same file. There should be no case where customer id 005 have some transactions in file no. 1 and another transaction in file no. 2.
Try I think ExecuteSQLRecord with a custom select that gets you exactly what you want from Oracle and then use PartitionRecord configured to use the customer ID as the partition column. That will break up the record set.
I don't know how Oracle does it, but this would be the way I'd do it in Postgres:
SELECT CUSTOMER_ID, ARRAY_AGG(TRANSACTION_ID) FROM TRANSACTIONS GROUP BY CUSTOMER_ID
That would create: 001, {trans_id_01, trans_id_02...} and ensure that each result entry from the database has precisely one customer per line and all of their transactions enumerated in a single list.
I have found a solution by using an idea of doing loop flow from : https://gist.github.com/ijokarumawak/01c4fd2d9291d3e74ec424a581659ca8
So I created a loop flow like in the image below:
This will query records of about 40k each time with this sql query
SELECT * FROM TRANSACTIONS WHERE <some filtering> AND CUSTOMER_ID > '${max}' ORDER BY CUSTOMER_ID FETCH FIRST 40000 ROWS With Ties
With ties keyword will help get the ties record to ensure all records of same CUSTOMER_ID is in the same file. Each success flow file will go the right side to write the data into CSV file. While the success flow also go downward to extract data for another iteration. ${max} is retrieved for the biggest value of the current result set, using QueryRecord processor with below query:
SELECT CUSTOMER_ID FROM FLOWFILE ORDER BY CUSTOMER_ID DESC fetch first 1 row only
Then it will go to the next iteration in a loop until there's no data left for the current criteria
I am pretty new to DAX/PowerBI.
I have 3 separate tables, each contains Account Name and Account Number columns.
I already created a 4th table (a derived table of the other 3 tables) showing Account Numbers only once by using DISTINCT/UNION/VALUES. This worked well.
Now I want to bring in Account Name for each unique Account Number onto this 4th table.
I was thinking of using LOOKUPVALUE, but I need it to somehow look up Account Name:
in the union of the 3 separate tables' Account Name column
in the union of the 3 separate tables' Account Number column
per the Account Number shown in this 4th table
Can this be done? I am struggling to write the criteria for 1. and 2.
I want to get the table count for all tables under a folder called "planning" in HADOOP hive database but i couldn't figure out a way to do so. Most of these tables are not inter-linkable and hence cant use full join with common key.
Is there a way to do table count and output to 1 table with each row of record represent 1 table name?
Table name that i have:
add_on
sales
ppu
ssu
car
Secondly, I am a SAS developer. Is the above process do-able in SAS? I tried data dictionary but "nobs" is totally blank for this library. All other SAS datasets can display "nobs" properly. I wonder why and how.
If I want to run a report daily and store the report's date as one of the column headers. Is this possible?
Example output (Counting the activities of employees for that day):
SELECT EMPLOYEE_NAME AS EMPLOYEE, COUNT(ACTIVITY) AS "Activity_On_SYSDATE" FROM EMPLOYEE_ACCESS GROUP BY EMPLOYEE_NAME;
Employee Activity_On_17042016
Jane 5
Martha 8
Sam 11
You are looking to do a reporting job with a data storing tool. The database (and SQL) is for storing and retrieving data, not for creating reports. There are special tools for creating reports.
In database design, it is very unhealthy to encode actual data in table or column name. Neither a table name nor a column name should have, as part of the name (and of the way they are used), an employee id, a date, or any other bit of actual data. Actual data should only be in fields, which in turn are in columns in different tables.
From what you describe, your base table should have columns for employee, activity and date. Then on any given day, if you want the count for the "current" day, you can query with
select employee, count(activity) ct
from table_name
where activity_date = SYSDATE
group by employee
If you want, you can also include the "activity_date" column in the output, that will show for which date the report was run.
Note that I assumed the column name for "date" is "activity_date." And in the output I used "ct" for a column alias, not "count." DATE and COUNT are reserved words, like SYSDATE, and you should NOT use them as table or column name. You could use them as aliases, as long as you don't need to refer to these aliases anywhere else in SQL, but it is still a very bad idea. Imagine you ever need to refer to a column (by name or by alias) and the name or alias is SYSDATE. What would a where clause like this mean?
where sysdate = sysdate
Do you see the problem?
Also, I can't tell from your question - were you thinking of storing these reports back in the database? To what end? It is better to store just one query and run it whenever needed (and make the "activity_date" for which you want the counts) an input parameter, so you can run the query for any date, at any time in the future. There is no need to store the actual daily reports in the database, as long as the base table is properly maintained.
Good luck!
I have two tables in my database i.e. Columns and Data. Data in these tables are like:
Columns:
ID
Name
Data:
1 John
2 Steve
Now I want to create a package which will create a csv file like:
ID NAME
------------
1 John
2 Steve
Can we achieve the same output? I have searched on google but I haven't found any solution.
Please help me....
You can achieve this effect through a script task OR you can create a temporary dataset in SQL Server where you combine the first row of your Column table and append data from the Data table to this. My guess is, you would have to fight with the metadata issues while doing anything of this sort. Another suggestion that I can think of is to dump the combination to a flat file, but again you will have to take care of the metadata conversion.