Text Fields Acceptable in SQL Loader - oracle

Are there any reserved text characters in SQL Loader ?
Any special characters like &,_" etc which cannot be loaded in Oracle table columns ?
My file column seperator is a pipe {|} character and I will escape to accept this too in my text columns but are there any other reserved characters which I cannot use in the data fields to be interfaced ?

There are none, as far as I can tell.
However, I'd suggest you to choose delimiters wisely because if text you're loading contains delimiters, you'll have problems in figuring out whether e.g. a pipe sign is a delimiter, or part of text to be loaded.
If you can prepare input data so that values are optionally enclosed into double quotes, you'd be able to avoid such problems. However, why having it complicated if it can be simple?

Related

Does a single column CSV file have commas?

When i open my csv file in an excel it looks like this -
Header
Value1
Value2
Value3
Value4
Value5
I want to know whether this file actually has commas in it? I am aware that if i have multiple columns i will see the commas
You can easily test that by opening the file in a text editor (e.g. Notepad on Windows). It will show the file as it is in text format, i.e., with commas present (if they are in the file). I would say that if it is single column, it won't have commas (but rather line breaks between the rows), but if you need to be sure just open it with a text editor.
https://www.ietf.org/rfc/rfc4180.txt
Given there is only one value in each record it would not have a comma given the spec.
Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored. The last field in the
record must not be followed by a comma. For example:

data factory special character in column headers

I have a file I am reading into a blob via datafactory.
Its formatted in excel. Some of the column headers have special characters and spaces which isn't good if want to take it to csv or parquet and then SQL.
Is there a way to correct this in the pipeline?
Example
"Activations in last 15 seconds high+Low" "first entry speed (serial T/a)"
Thanks
Normally, Data Flow can handle this for you by adding a Select transformation with a Rule:
Uncheck "Auto mapping".
Click "+ Add mapping"
For the column name, enter "true()" to process all columns.
Enter an appropriate expression to rename the columns. This example uses regular expressions to remove any character that is not a letter.
SPECIAL CASE
There may be an issue with this is the column name contains forward slashes ("/"). I accidentally came across this in my testing:
Every one of the columns not mapped contains forward slashes. Unfortunately, I cannot explain why this would be the case as Data Flow is clearly aware of the column name. It can be addressed manually by adding a Fixed rule for EACH offending column, which is obviously less than ideal:
ANOTHER OPTION
The other thing you could try is to pre-process the text file with another Data Flow using a Source dataset that has no delimiters. This would give you the contents of each row as a single column. If you could get a handle on the just first row, you could remove the special characters.

Pentaho Load Plain Text File w/ ASCII separator

I'm trying to use Spoon / Kettle to upload a plain text file that is separated by ASCII characters. I can see all the data when I preview the content of the file in Kettle, but no records load when I try to preview rows on the "Content" tab.
According to my research, Kettle should understand my field separator when typed as "$[value]" which in my case is "$[01]". Here's a description of the file structure:
Each file in the feed is in plain text format, separated into columns and rows. Each record has the same set of fields. The following are the delimiters for
each field and record:
Field Separator (FS): SOH (ASCII character 1)
Record Separator (RS) : STX (ASCII character 2) + ā€œnā€
Any record starting with a ā€œ#ā€ and ending with the RS should be treated as a comment by the ingester and ignored. The data provider has also generated a column header line at the beginning of the file, listing field data types.
So my input parameters are:
Filetype: Fixed
Separator: $[01]
Enclosure:
Escape:
...
Format: DOS
Encoding: US-ASCII
Length: Characters
I'm unable to read any records, and I'm not sure if this is the correct approach. Would ingesting this data with java inside of kettle be a better method?
Any help with this would be much appreciated. Thanks!

searching in CLOB for words in a list/table

I have a large table with a clob column (+100,000 rows) from which I need to search for specific words within a certain timeframe.
{select id, clob_field, dbms_lob.instr(clob_field, '.doc',1,1) as doc, --ideally want .doc
dbms_lob.instr(clob_field, '.docx',1,1) as docx, --ideally want .docx
dbms_lob.instr(clob_field, '.DOC',1,1) as DOC, --ideally want .DOC
dbms_lob.instr(clob_field, '.DOCX',1,1) as DOCX --ideally want .DOCX
from clob_table, search_words s
where (to_char(date_entered, 'DD-MON-YYYY')
between to_date('01-SEP-2018') and to_date('30-SEP-2018'))
AND (contains(clob_field, s.words )>0) ;}
The set of words are '.doc', '.DOC', '.docx', and '.docx'. When I use
CONTAINS() it seems to ignore the dot and so provides me with lots of rows, but not with the document extensions in it. It finds emails with .doc as part of the address, so the doc will have a period on either side of it.
i.e. mail.doc.george#here.com
I don't want those occurrences. I have tried it with a space at the end of the word and it ignores the spaces. I have put these in a search table I created, as shown above, and it still ignores the spaces. Any suggestions?
Thanks!!
Here's two suggestions.
The simple, inefficient way is to use something besides CONTAINS. Context indexes are notoriously tricky to get right. So instead of the last line, you could do:
AND regexp_instr(clob_field, '\.docx', 1,1,0,'i') > 0
I think that should work, but it might be very slow. Which is when you'd use an index. But Oracle Text indexes are more complicated than normal indexes. This old doc explains that punctuation characters (as defined in the index parameters) are not indexed, because the point of Oracle Text is to index words. If you want special characters to be indexed as part of the word, you need to add it to the set of printjoin characters. This doc explains how, but I'll paste it here. You need to drop your existing CONTEXT index and re-create it with this preference:
begin
ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute('mylex', 'printjoins', '._-'); -- periods, underscores, dashes can be parts of words
end;
/
CREATE INDEX myindex on clob_table(clob_field) INDEXTYPE IS CTXSYS.CONTEXT
parameters ('LEXER mylex');
Keep in mind that CONTEXT indexes are case-insensitive by default; I think that's what you want, but FYI you can change it by setting the 'mixed_case' attribute to 'Y' on the lexer, right below where you set the printjoins attribute above.
Also it seems like you're trying to search for words which end in .docx, but CONTAINS isn't INSTR - by default it matches entire words, not strings of characters. You'd probably want to modify your query to do AND contains(clob_field, '%.docx')>0

Storing issue for special character in table in rhomobile

using CGI escape i able to save some special character in DB. But I faced a critical issue related to column size.
In one case data size in one column is 12 character. when user inserted 11 spacial character in a view form. then if I escape those spacial character and try to save those whole string in DB, then it is giving a error and that is because of the length if character after escaping 11 spacial character is more than table'c column size(i.e 12 char).
How to solve this type of error ?
Here is the documentation for storing in Database with RhoMobile.
Using the Local Database with Rhom

Resources