XPath: Limit cell content, if delimiter exists - xpath

I need to get the content of a cell, which occasionally contains a ',' character. If so, I need to isolate the content to the portion before the ',' character.
substring-before(//td[contains(text(),'Dokumentnummer')]/following-sibling::td[1],\",\")
This gives me the desired substring, but only if a ',' exists. How can I make it return the whole string, if it does'nt exist?

You can add a ',' before calling substring-before, thus making sure there will allways be at least one comma:
substring-before(concat(//td[contains(text(),'Dokumentnummer')]/following-sibling::td[1],
','),
',')

Related

Find table by column header using xpath

I have the HTML in the screenshot, I can get the table using:
//table[contains('#class','table')]
but there are several similar tables on the page. Now I want to make sure I have the right table by checking that its ths have a specific column header ( In this case 'Sqft)'.
I tried:
//table[contains('#class','table')]//th[contains(text(),'Sqft")
but this is failing. How to I get this working?
//table[contains(#class, 'table') and .//th[contains(., 'Sqft')]]
or the other way around
//th[contains(., 'Sqft')]/ancestor::table[contains(#class, 'table')][1]
On a general note, in order to prevent partial attribute matches, include the token delimiter in the search. For CSS class names, the delimiter is a space:
//table[contains(concat(' ', #class, ' '), ' table '))]

Hive remove multiple space between the string

In Hive how can I replace multiple spaces between a strings ?
select regexp_replace('foot ball',' ',' ')
Expected output is a string with only one space between them
foot ball
Any help is appreciated

Oracle SQL-Loader handling efficiently internal Double Quotes in values

I have some Oracle SQL Loader challenges and looking for an efficient and simple solution.
my source files to be loaded are pipe | delimited, where values are enclosed by Double Quotes ".
the problem seems to be that some of the values contains internal Double Quotes.
e.g.: ..."|"a":"b"|"...
this causes my records to be rejected under the excuse of:
no terminator found after TERMINATED and ENCLOSED field
there are various solutions over the web but non seems to fit:
[1]
I have tried to replace all internal double quotes in quoting the quotes,
but it seems that when applying this function on too many fields on the control files
(I have ~2000+ fields and using FILLER to load only a subset)
the loader complains again:
SQL*Loader-350: Syntax error at line 7.
Expecting "," or ")", found ",".
field1 char(36) "replace(:field1,'"','""')",
(I do not know why but when applying this solution on a narrow subset of columns it does seem to work)
thing is that potentially all fields may include internal double quotes.
[2]
I'm able to load all data when omitting the global optionally enclosed by '"', but then all enclosing quotes becomes part of the data in the target table.
[3]
I can omit the global optionally enclosed by '"' statement and place it only at selected fields,
while try to "replace(:field1,'"','""')" statement on the remainder, but this is difficult to implement,
as I cannot know ahead what are the suspected fields to include internal double quotes.
here are my questions:
is there no simple way to convince the loader to handle with care internal double quotes (when values are enclosed by them)?
if I'm forced to fix the data ad-hock, is there a one liner Linux command to convert only internal double quotes to another string/char,
say, single quotes?
if I'm forced to load data with the quotes to the target table, is there a simple way to remove the enclosing double quotes from all fields,
all at once (the table has ~1000 columns). is the solution practical performance wise to very large tables?
If you never had pipes in the enclosed fields you could do it from the control file. If you can have both pipes and double-quotes within a field then I think you have no choice but to preprocess the files, unfortunately.
Your solution [1], to replace double-quotes with an SQL operator, is happening too late to be useful; the delimiters and enclosures have already been interpreted by SQL*Loader before it does the SQL step. Your solution [2], to ignore the enclosure, would work in combination with [1] - until one of the fields did contain a pipe character. And solution [3] has the same problems as using [1] and/or [2] globally.
The documentation for specifying delimiters mentions that:
Sometimes the punctuation mark that is a delimiter must also be included in the data. To make that possible, two adjacent delimiter characters are interpreted as a single occurrence of the character, and this character is included in the data.
In other words, if you repeated the double-quotes inside the fields then they would be escaped and would appear in the table data. As you can't control the data generation, you could preprocess the files you get to replace all the double-quotes with escaped double quotes. Except you don't want to replace all of them - the ones that are actually real enclosures should not be escaped.
You could use a regular expression to target the relevant characters will skipping others. Not my strong area, but I think you can do this with lookahead and lookbehind assertions.
If you had a file called orig.txt containing:
"1"|A|"B"|"C|D"
"2"|A|"B"|"C"D"
3|A|""B""|"C|D"
4|A|"B"|"C"D|E"F"G|H""
you could do:
perl -pe 's/(?<!^)(?<!\|)"(?!\|)(?!$)/""/g' orig.txt > new.txt
That looks for a double-quote which is not preceded by the line-start anchor or a pipe character; and is not followed by a pipe character or line end anchor; and replaces only those with escaped (doubled) double-quotes. Which would make new.txt contain:
"1"|A|"B"|"C|D"
"2"|A|"B"|"C""D"
3|A|"""B"""|"C|D"
4|A|"B"|"C""D|E""F""G|H"""
The double-quotes at the start and end of fields are not modified, but those in the middle are now escaped. If you then loaded that with a control file with double-quote enclosures:
load data
truncate
into table t42
fields terminated by '|' optionally enclosed by '"'
(
col1,
col2,
col3,
col4
)
Then you would end up with:
select * from t42 order by col1;
COL1 COL2 COL3 COL4
---------- ---------- ---------- --------------------
1 A B C|D
2 A B C"D
3 A "B" C|D
3 A B C"D|E"F"G|H"
which hopefully matches your original data. There may be edge cases that don't work (like a double-quote followed by a pipe within a field) but there's a limit to what you can do to attempt to interpret someone else's data... There may also be (much) better regular expression patterns, of course.
You could also consider using an external table instead of SQL*Loader, if the data file is (or can be) in an Oracle directory and you have the right permissions. You still have to modify the file, but you could do it automatically with the preprocessor directive, rather than needing to do that explicitly before calling SQL*Loader.

Multiple enclosed-by symbols in SQL*Loader

I am loading a CSV file into an Oracle table. One field in some records is enclosed as "abc#xyz.com" and the same field in other records is enclosed as "'abc#xyz.com'". I need to load only abc#xyz.com.
I used OPTIONALLY ENCLOSED BY '"' but it does not help in the second case. Is there a way to specify two symbols in the OPTIONALLY ENCLOSED BY clause? Or what are the other ways of achieving this?
You can apply an SQL operator to trim the leading and trailing single quotes. For example, with a data file containing:
"abc#xyz.com"
"'abc#xyz.com'"
'abc#xyz.com'
And a control file for a dummy table:
LOAD DATA
TRUNCATE INTO TABLE t42
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
EMAIL CHAR(30) "TRIM(BOTH '''' FROM :EMAIL)"
)
This loads the stripped values:
select * from t42;
EMAIL
------------------------------
abc#xyz.com
abc#xyz.com
abc#xyz.com
As you can see, this will load values which are enclosed in single quotes, double quotes, or both - as long as the singles are within the doubles and not the other way around.

Showing only actual column data in SQL*Plus

I'm spooling out delimited text files from SQL*Plus, but every column is printed as the full size per its definition, rather than the data actually in that row.
For instance, a column defined as 10 characters, with a row value of "test", is printing out as "test " instead of "test". I can confirm this by selecting the column along with the value of its LENGTH function. It prints "test |4".
It kind of defeats the purpose of a delimiter if it forces me into fixed-width. Is there a SET option that will fix this, or some other way to make it print only the actual column data.
I don't want to add TRIM to every column, because if a value is actually stored with spaces I want to be able to keep them.
Thanks
I have seen many SQL*plus script, that create text files like this:
select A || ';' || B || ';' || C || ';' || D
from T
where ...
It's a strong indication to me that you can't just switch to variable length output with a SET command.
Instead of ';' you can of course use any other delimiter. And it's up to your query to properly escape any characters that could be confused with a delimiter or a line feed.
Generally, I'd forget SQL Plus as a method for getting CSV out of Oracle.
Tom Kyte has written a nice little Pro-C unloader
Personally I've written a utility which does similar but in perl

Resources