The "REJECTMAX" parameter is a technique of executing copy command even though there are invalid records in the csv
(so if i have 100 records, 9 of them are invalid & max rejected is 10 the file will upload)
I wonder if there is a way that i can get as a text the rejected records that prints into the rejected file so i can log it into application error log.
Here you have an example on how to use REJECTED DATA. Suppose you have a table like this:
SQL> CREATE TABLE public.mydata ( id INTEGER ) ;
CREATE TABLE
and an input file containing:
$ cat /tmp/mydata
1
2
3
ABC
4
5
Clearly ABC won't fit into an integer...
So we run:
SQL> COPY public.mydata FROM '/tmp/mydata' REJECTMAX 2 REJECTED DATA '/tmp/mydata.rejected' ;
NOTICE 7850: In a multi-threaded load, rejected record data may be written to additional files
HINT: Rejected data may be written to files [/tmp/mydata.rejected], [/tmp/mydata.rejected.1], etc
Rows Loaded
-------------
5
And now...
$ cat /tmp/mydata.rejected
ABC
Is this what you were looking for?
I want to import a csv file using SQLLDR, but I only want specific records. I have solved this with "WHEN record_type = 1" in my control file.
This works but the log file is getting flooded by "Record xxx: Discarded - failed all WHEN clauses." The input files contain millions of records but only a few percent satisfy the condition, so I end up with a log file with the same size as the input file :)
Am I doing this incorrectly?
Is there another way to discard/filter records when using SQLLDR?
Example Data:
record_type;a;b;c
24;a1;b1;c1
17;a2;b2;c2
22;an;bn;cn
1;a1;b1;c1
1;a2;b2;c2
1;an;bn;cn
Control file
load data
truncate
into table my_table_t
WHEN record_type = 1
(...
)
What you do is right IMO.
SQL*Loader is logging to the finest level of the loading details for you. You can opt out from few of the things.
Yo can disable the DISCARD records logging by adding
SILENT=(DISCARDS) to your SQL*Loader
You can refer the DOC for further details.
If you just want to get rid of the log you can send these log to /dev/null if you using Linux/Unix and NUL on Windows.
Example
Data File.
[oracle#ora12c Desktop]$ cat sample.txt
record_type;a;b;c
24;a1;b1;c1
17;a2;b2;c2
22;an;bn;cn
1;a1;b1;c1
1;a2;b2;c2
1;an;bn;cn
Control file.
[oracle#ora12c Desktop]$ cat control.ctl
load data
infile 'sample.txt'
insert
into table table_1 when record_type = '1'
fields terminated by ";"
(record_type, a, b, c)
Lets try to load records.
[oracle#ora12c Desktop]$ sqlldr jay/password#orapdb1 control=control.ctl data=sample.txt log=/dev/null
SQL*Loader: Release 12.1.0.2.0 - Production on Fri Feb 10 16:05:10 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Path used: Conventional
Commit point reached - logical record count 7
Table TABLE_1:
3 Rows successfully loaded.
Check the log file:
/dev/null
for more information about the load.
There was no log file.
Now we got the only selected records.
SQL> select * from table_1;
RECORD_TYPE A B C
----------- -------------------- -------------------- --------------------
1 a1 b1 c1
1 a2 b2 c2
1 an bn cn
Using the external table, you can then use simple SQL to load your table...
insert into my_table_t( record_type, a, b, c )
select record_type, a, b, c
from my_external_table
where record_type != 1
I put some files into hdfs (/path/to/directory/) which contain data like following;
63 EB44863EA74AA0C5D3ECF3D678A7DF59
62 FABBC9ED9719A5030B2F6A4591EDB180
59 6BF6D40AF15DE2D7E295EAFB9574BBF8
All of them named as _user_hive_warehouse_file_name_000XYZ_A. These files had downloaded from another hdfs.
I'm trying to create external table via Hive;
CREATE EXTERNAL TABLE users(
id int,
user string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path/to/directory/';
It says;
OK
Time taken: 0.098 seconds
select * from users; returns empty.
select count(1) from users; returns 0.
Hive creates the table successfully, but it's always empty. If I put another file like another.txt, that contains the sample data mentioned above, select count(1) from users; returns 3.
What am I missing, why the table is empty?
Environment:
JDK 7
Hadoop 2.6.0
Hive 0.14.0
Ubuntu 14.04
I think you are encountering an issue that is peripherally discussed in HIVE-6431. In particular, this comment is the important one:
By default, FileInputFormat(which is the super class of various formats) in hadoop ignores file name starts with "_" or ".", and hard to walk around this in hive codebase.
The workaround is probably to avoid using filenames that begin with _ or .
When you run any command on Hive, it is run internally as a MapReduce Job on the HDFS path that you stored the file. The job uses the FileInputFormat to read the HDFS files which has a hiddenFileFilter which ignores any files starting with underscore ("_") and ("."). You can actually set other files to ignore by setting the FileInputFormat.SetInputPathFilter to a CustomPathFilter. Hadoop uses the files with underscores are "special" files to show job output and logs. This is probably why they are ignored.
With the Unix shell script, I am doing a bcp out from a table in Server1 using NATIVE format to a file - XXXX.bcpdat, then bcp in the file to a table of same structure in Server2.
The bcp command we have is
bcp "$dbname".."$tablename" out XXXX.bcpdat -n
bcp "$dbname".."$tablename" in XXXX.bcpdat -n -b10000
This bcp_out & bcp in works as expected from/into tables.
But i want to da an urgent change here -
I want to get the total number of rows (a row may have 120 or 30 or 40 records)in the bcp data file (XXXX.bcpdat)
But with the file in Native format i couldn differentiate each row & how its being separated. If i pass head -10 XXXX.bcpdat or tail -10 XXXX.bcpdat it prints everything in the file. "wc -l" or "awk" or "cut" is not helping me to get the count of rows from the file. There is no differentiation where a row ends like how it is in character load of bcp. It would really be great if someone help me at the earliest, how i can get the total number of rows (not records) that is in the bcpdat file. Thanks a loot in advance.
I have an Oracle database backup file (.dmp) that was created with expdp.
The .dmp file was an export of an entire database.
I need to restore 1 of the schemas from within this dump file.
I don't know the names of the schemas inside this dump file.
To use impdp to import the data I need the name of the schema to load.
So, I need to inspect the .dmp file and list all of the schemas in it, how do I do that?
Update (2008-09-18 13:02) - More detailed information:
The impdp command i'm current using is:
impdp user/password#database directory=DPUMP_DIR
dumpfile=EXPORT.DMP logfile=IMPORT.LOG
And the DPUMP_DIR is correctly configured.
SQL> SELECT directory_path
2 FROM dba_directories
3 WHERE directory_name = 'DPUMP_DIR';
DIRECTORY_PATH
-------------------------
D:\directory_path\dpump_dir\
And yes, the EXPORT.DMP file is in fact in that folder.
The error message I get when I run the impdp command is:
Connected to: Oracle Database 10g Enterprise Edition ...
ORA-31655: no data or metadata objects selected for the job
ORA-39154: Objects from foreign schemas have been removed from import
This error message is mostly expected. I need the impdp command be:
impdp user/password#database directory=DPUMP_DIR dumpfile=EXPORT.DMP
SCHEMAS=SOURCE_SCHEMA REMAP_SCHEMA=SOURCE_SCHEMA:MY_SCHEMA
But to do that, I need the source schema.
impdp exports the DDL of a dmp backup to a file if you use the SQLFILE parameter. For example, put this into a text file
impdp '/ as sysdba' dumpfile=<your .dmp file> logfile=import_log.txt sqlfile=ddl_dump.txt
Then check ddl_dump.txt for the tablespaces, users, and schemas in the backup.
According to the documentation, this does not actually modify the database:
The SQL is not actually executed, and the target system remains unchanged.
If you open the DMP file with an editor that can handle big files, you might be able to locate the areas where the schema names are mentioned. Just be sure not to change anything. It would be better if you opened a copy of the original dump.
Update (2008-09-19 10:05) - Solution:
My Solution: Social engineering, I dug real hard and found someone who knew the schema name.
Technical Solution: Searching the .dmp file did yield the schema name.
Once I knew the schema name, I searched the dump file and learned where to find it.
Places the Schemas name were seen, in the .dmp file:
<OWNER_NAME>SOURCE_SCHEMA</OWNER_NAME>
This was seen before each table name/definition.
SCHEMA_LIST 'SOURCE_SCHEMA'
This was seen near the end of the .dmp.
Interestingly enough, around the SCHEMA_LIST 'SOURCE_SCHEMA' section, it also had the command line used to create the dump, directories used, par files used, windows version it was run on, and export session settings (language, date formats).
So, problem solved :)
Assuming that you do not have the log file from the expdp job that generated the file in the first place, the easiest option would probably be to use the SQLFILE parameter to have impdp generate a file of DDL (based on a full import). Then you can grab the schema names from that file. Not ideal, of course, since impdp has to read the entire dump file to extract the DDL and then again to get to the schema you're interested in, and you have to do a bit of text file searching for the various CREATE USER statements, but it should be doable.
The running the impdp command to produce an sqlfile, you will need to run it as a user which has the DATAPUMP_IMP_FULL_DATABASE role.
Or... run it as a low privileged user and use the MASTER_ONLY=YES option, then inspect the master table. e.g.
select value_t
from SYS_IMPORT_TABLE_01
where name = 'CLIENT_COMMAND'
and process_order = -59;
col object_name for a30
col processing_status head STATUS for a6
col processing_state head STATE for a5
select distinct
object_schema,
object_name,
object_type,
object_tablespace,
process_order,
duplicate,
processing_status,
processing_state
from sys_import_table_01
where process_order > 0
and object_name is not null
order by object_schema, object_name
/
http://download.oracle.com/otndocs/products/database/enterprise_edition/utilities/pdf/oow2011_dp_mastering.pdf
Step 1: Here is one simple example. You have to create a SQL file from the dump file using SQLFILE option.
Step 2: Grep for CREATE USER in the generated SQL file (here tables.sql)
Example here:
$ impdp directory=exp_dir dumpfile=exp_user1_all_tab.dmp logfile=imp_exp_user1_tab sqlfile=tables.sql
Import: Release 11.2.0.3.0 - Production on Fri Apr 26 08:29:06 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Username: / as sysdba
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA Job "SYS"."SYS_SQL_FILE_FULL_01" successfully completed at 08:29:12
$ grep "CREATE USER" tables.sql
CREATE USER "USER1" IDENTIFIED BY VALUES 'S:270D559F9B97C05EA50F78507CD6EAC6AD63969E5E;BBE7786A5F9103'
Lot of datapump options explained here http://www.acehints.com/p/site-map.html
You need to search for OWNER_NAME.
cat -v dumpfile.dmp | grep -o '<OWNER_NAME>.*</OWNER_NAME>' | uniq -u
cat -v turn the dumpfile into visible text.
grep -o shows only the match so we don't see really long lines
uniq -u removes duplicate lines so you see less output.
This works pretty well, even on large dump files, and could be tweaked for usage in a script.
My solution (similar to KyleLanser's answer) (on a Unix box):
strings dumpfile.dmp | grep SCHEMA_LIST
In my case, based on Aldur's and slafs' answers I came up with this expression that should tell you just the name of the original schema:
cat -v file.dmp | grep 'SCHEMA_LIST' | uniq -u | grep -o -P '(?<=SCHEMAS\=).*(?=content)'
Tested for a DMP file from Oracle 19.8 version.