Are Oracle External Tables per file - oracle

My question if I have in a dicrectory, for instance with 4 files with the same structure, do I need to create one external table to each file or I can create the table in the top of the directory and have 1 external table to the 4 files .
Thanks

As the external tables concepts and exmaples show, you can specify multiple files in the locations clause:
...
LOCATION ('file1.csv', 'file2.csv', 'file3.csv', 'file4.csv')
...
If they are in different directories you can prefix each file name with the relevant directory name:
LOCATION — specifies the data files for the external table.
For ORACLE_LOADER and ORACLE_DATAPUMP, the files are named in the form directory:file. The directory portion is optional. If it is missing, then the default directory is used as the directory for the file. If you are using the ORACLE_LOADER access driver, then you can use wildcards in the file name: an asterisk (*) signifies multiple characters, a question mark (?) signifies a single character.
... but that doesn't seem to be relevant to your situation.
The number and names of the files are fixed though, unless you use wildcards.
If you will always have the same number of files but different names you could potentially add a preprocess clause to rename other files to match those expected names; or probably more practically, just have a single dummy expected name and use a preprocessor to combine all the files into one standard output stream that is actually used by the access driver.

Related

Transfer CSV files from azure blob storage to azure SQL database using azure data factory

I need to transfer around 20 CSV files inside a folder named ActivityPointer in an azure blob storage container to Azure SQL database in a single data factory pipeline, but ActivityPointer contains 20 CSV files and another folder named snapshots inside it. So when I try to create a pipeline and give * to select all the CSV files inside ActivityPointer it includes the snapshots folder too, which should not be included. Is there any possibilities to complete this task. Also I can't create another folder to transform the snapshots folder into it. What can I do now? Anyone can please help me out.
Assuming you want to copy all CSV files within ACtivityPointer folder,
You can use wildcard expression as below :
you can provide path till Active folder and than *.csv
Copy data is also considering the inner folder while using wildcards (even if we use .csv in wildcard file path). So, we have to validate whether it is a file or folder. Please look at the following demonstration.
First use Get Metadata on the required folder with field list as Child items. The debug output will be:
Now use this to iterate through child items using For each activity.
#activity('Get Metadata1').output.childItems
Inside for each, use if condition activity to check whether the current item is a file or not. Use the following condition.
#equals(item().type,'File')
When this is true, you can use copy data to complete copying the file to target table (Ignore the false case). I have create file_name parameter in my source dataset passing its value as #item().name().
This will help you to achieve your requirement. The following is the debug output. I have 4 files and 1 folder. The folder will be ignored, and the rest will be copied into the target table.

How do I add multiple csv files to the catalog in kedro

I have 4 csv files in Azure blob storage, with same metadata, that i want to process. How can i add them to the datacatalog with a single name in Kedro.
I checked this question
https://stackoverflow.com/questions/61645397/how-do-i-add-many-csv-files-to-the-catalog-in-kedro
But this seems to load all the files in the given folder.
But my requirement is to read only given 4 from many files in the azure container.
Example:
I have many files in azure container in which are 4 transaction csv files with names sales_<date_from>_<date_to>.csv, i want to load these 4 transaction csv files into kedro datacatalog under one dataset.
For starters, PartitionedDataSet is lazy, meaning that files are not actually loaded until you explicitly call that function. Even if you have 100 CSV files that get picked up by the PartitionedDataSet, you can select the partitions that you actually load/work with.
Second, what distinguishes these 4 files from the others? If they have a unique suffix, you can use the filename_suffix option to just select them. For example, if you have:
file_i_dont_care_about.csv
first_file_i_care_about.csv
second_file_i_care_about.csv
third_file_i_care_about.csv
fourth_file_i_care_about.csv
you can specify filepath_suffix: _file_i_care_about.csv.
Don’t think there’s a direct way to do this , you can add another subdirectory inside the blob storage with the 4 files and then use
my_partitioned_dataset:
type: "PartitionedDataSet"
path: "data/01_raw/subdirectory/"
dataset: "pandas.CSVDataSet"
Or in case the requirement of using only 4 files is not going to change anytime soon ,you might as well pass 4 files in the catalog.yml separately to avoid over engineering it.

Find Replace in .txt files in a directory

I have a lot of script files (all files have SQL or Trans SQL) say around 2000 files that we used to add with the passage of the time. Now the problem is that in most of the scripts the owner of the database object is not defined and due to my current requirement I must have to set the owner of the database object. How can I update all the script files easily.
One approach is to open each file look for create word and replace/append the owner name before the create key word. I want to do it for all the script files in the directory.
Any ideas?
I want something like find replace.
Many thanks in advance.

oracle external table iterating files

I'm loading files into Oracle via external table. Files will be dropped into the directory about one per minute.
What is the best method to iterate through these files to load?
Once loaded, I don't need connection to that file any longer but need to change to a new file.
preporcessor!
You can copy your new data to the common file (override or append depending on your needs)
ORGANIZATION EXTERNAL
(
ACCESS PARAMETERS
(
...
PREPROCESSOR bin_dir: 'copy_my_file_to_common_file.bat'
)
LOCATION ('common_file.txt')
)
It will be called every time table is accessed

Processing timestamped (or changing) filenames in ODI package

I am working on Oracle Data Integrator 11g
I have to create an ODI package, where I need to process an incoming file. The file name is not a constant string, as it has a timestamp entry appended to it, something like this: FILTER_DATA_011413.TXT
Due to the MMDDYY, I can't hardcode the filename in my package. The way, we're handling it right now is, a shell script lists the files in the directory, and loads the filename into a table (using control file). This table is then queried to get the filename and the same is passed to the variable which stores the filename for processing.
I am looking for any other way, where I can avoiud having this temporary table to store the file name.
Can someone suggest me any alternative?

Resources