multiline text in varchar2 field - oracle

I have a multiline date, and I'd like to insert it in a table. Then of course, I'd like to retrieve it while preserving the places of cartridge returns.
For example. I have data like this in text file
-------------------------------
| ID | text |
| | |
| 01 | This is headline. |
| 02 | This is all the text.|
| | ¤ |
| | Of great story once |
| 03 | Great weather |
-------------------------------
The ¤ is the indicator of cartridge return. When I try to run the query then data comes like this:
-------------------------------
| ID | text |
| | |
| 01 | This is headline. |
| 02 | This is all the text.|
| 03 | Great weather |
-------------------------------
What I'd like to have in table: (I have no idea how to show cartridge return in the example below)
-----------------------------------------------------
| ID | text |
| | |
| 01 | This is headline. |
| 02 | This is all the text. Of great story once |
| 03 | Great weather |
-----------------------------------------------------
Which is of course, wrong as the data for ID 02 wasn't imported completely.
Here is my script:
LOAD DATA
INFILE "file.txt" BADFILE "file.bad" DISCARDFILE "file.dsc"
APPEND
INTO TABLE text_table
FIELDS TERMINATED BY X'7C' TRAILING NULLCOLS
(
employee_id,
exp_pro CHAR(4000)
)
Any ideas?

First make sure the issue isn't with how you're viewing the data (or the IDE used). Sometimes viewers will simply stop at a linefeed (or carriage return, or some binary char).
Try dumping a hex representation of some data first. for example:
with txt as (
select 'This is line 1.' || chr(13) || chr(10) || 'Line 2.' as lines
from dual
)
select dump(txt.lines, 16) from txt;
You should be able to see the 0d0a (crlf) chars, or whatever other "non-printable" chars exists, if any.

Related

Sqlite3 shell output: word wrap single columns

I am interacting with a sqlite3 database on linux with bash scripts. It is a small tool I use for myself. I am displaying the data in the terminal. Some of the columns contain a lot of text, too much to show in a single line. Is there a possibility to word wrap the output of the select-query? The output I am looking for should look something like this:
rowid | column1 | column2 | column3
------------------------------------------------
1 | value 11 | value 21 | value 31
------------------------------------------------
2 | value 12 | This is a | value 32
| | very long |
| | text |
------------------------------------------------
3 | value 13 | value 23 | value 33
------------------------------------------------
4 | value 14 | value 24 | value 34
Is there a possibility to do this? I was not able to find a solution to this problem. Thanks in advance and BR!

Hive function to retrieve particular array element

I have a table which stores strings in array. Couldn't figure it out why but simple example looks like that:
+--------+----------------------------------+
| reason | string |
+--------+----------------------------------+
| \N | \N\N\N\NXXX - ABCDEFGH\N\N |
| \N | \N\N\N\NXXX - ABCDEFGH |
| \N | \N\N\N\N |
| \N | \N\N\N\NXXX - ABCDEFGH\N |
| \N | \N\N |
| \N | \N\N\N |
| \N | \N |
+--------+----------------------------------+
We couldn't see that in table above but true format of first string looks like that
Basically, what I would like to retrieve is:
+--------+----------------------------------+
| reason | string |
+--------+----------------------------------+
| \N | XXX - ABCDEFGH |
+--------+----------------------------------+
XXX - remains always the same but ABCDEFGH may be any string.
The problem is I can't use table path.path.path_path[4] because string XXX - ABCDEFGH may be 4th or any element of the array (even 20th).
Tried to use where lower(path.path.string) like ('xxx - %') but received error
Select
path.path.reason,
path.path.string
From table_name
Where path.id = '123'
And datestr = '2018-07-21'
This regular expression will do the job for you([^\N$])+.
Assuming the character showed in the image is a $.
First,
you can use regexp_extract() to retrieve particular array element.
It has the following syntax:
regexp_extract(string subject, string pattern, int index)
Second, you can use regexp_replace which has the following syntax:
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
Test Data
WITH string_column
AS (SELECT explode(array('XXX - ABCSSSSSSSSSSSGH\N\N',
'\N$\N$\N$\N$XXX - ABCDEFGH$\N\N',
'\N\N\N\N', '\N\N\N\NXXX - ABCDEFGH\N')) AS
str_column
)
SELECT regexp_replace(regexp_extract(str_column, '([^\N$])+', 0), "$", " ")
AS string_col
FROM string_column
Will result in
------------------------------
| string_col |
------------------------------
| XXX - ABCSSSSSSSSSSSGH |
------------------------------
| XXX - ABCDEFGH |
------------------------------
| |
------------------------------
| XXX - ABCDEFGH |
------------------------------
Note: A '0' which specifies the index produces a match, after the extract based on the pattern.
regexp_extract(str_column, '(,|[^\N$])+', 0)
The following statement will replace occurrence of any '$'
regexp_replace(regexp_extract(str_column, '([^\N$])+', 0), "$", " ")
For more information on
regexp_replace & regexp_extract(): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

Automatically generating documentation about the structure of the database

There is a database that contains several views and tables.
I need create a report (documentation of database) with a list of all the fields in these tables indicating the type and, if possible, an indication of the minimum/maximum values and values from first row. For example:
.------------.--------.--------.--------------.--------------.--------------.
| Table name | Column | Type | MinValue | MaxValue | FirstRow |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | day | date | ‘2010-09-17’ | ‘2016-12-10’ | ‘2016-12-10’ |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | price | double | 1030.8 | 29485.7 | 6023.8 |
:------------+--------+--------+--------------+--------------+--------------:
| … | | | | | |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | day | date | ‘2014-06-20’ | ‘2016-11-28’ | ‘2016-11-16’ |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | owner | string | NULL | NULL | ‘Joe’ |
'------------'--------'--------'--------------'--------------'--------------'
I think the execution of many queries
SELECT MAX(column_name) as max_value, MIN(column_name) as min_value
FROM table_name
Will be ineffective on the huge tables that are stored in Hadoop.
After reading documentation found an article about "Statistics in Hive"
It seems I must use request like this:
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
But this command ended with error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
Do I understand correctly that this request add information to the description of the table and not display the result? Will this request work with view?
Please suggest how to effectively and automatically create documentation for the database in HIVE?

LISTAGG function with two columns

I have one table like this (report)
--------------------------------------------------
| user_id | Department | Position | Record_id |
--------------------------------------------------
| 1 | Science | Professor | 1001 |
| 1 | Maths | | 1002 |
| 1 | History | Teacher | 1003 |
| 2 | Science | Professor | 1004 |
| 2 | Chemistry | Assistant | 1005 |
--------------------------------------------------
I'd like to have the following result
---------------------------------------------------------
| user_id | Department+Position |
---------------------------------------------------------
| 1 | Science,Professor;Maths, ; History,Teacher |
| 2 | Science, Professor; Chemistry, Assistant |
---------------------------------------------------------
That means I need to preserve the empty space as ' ' as you can see in the result table.
Now I know how to use LISTAGG function but only for one column. However, I can't exactly figure out how can I do for two columns at the sametime. Here is my query:
SELECT user_id, LISTAGG(department, ';') WITHIN GROUP (ORDER BY record_id)
FROM report
Thanks in advance :-)
It just requires judicious use of concatenation within the aggregation:
select user_id
, listagg(department || ',' || coalesce(position, ' '), '; ')
within group ( order by record_id )
from report
group by user_id
i.e. aggregate the concatentation of department with a comma and position and replace position with a space if it is NULL.

How to remove repeated columns using ruby FasterCSV

I'm using Ruby 1.8 and FasterCSV.
The csv file I'm reading in has several repeated columns.
| acct_id | amount | acct_num | color | acct_id | acct_type | acct_num |
| 345 | 12.34 | 123 | red | 345 | 'savings' | 123 |
| 678 | 11.34 | 432 | green | 678 | 'savings' | 432 |
...etc
I'd like to condense it to:
| acct_id | amount | acct_num | color | acct_type |
| 345 | 12.34 | 123 | red | 'savings' |
| 678 | 11.34 | 432 | green | 'savings' |
Is there a general purpose way to do this?
Currently my solution is something like:
headers = CSV.read_line(file)
headers = CSV.read_line # get rid of garbage line between headers and data
FasterCSV.filter(file, :headers => headers) do |row|
row.delete(6) #delete second acct_num field
row.delete(4) #delete second acct_id field
# additional processing on the data
row['color'] = color_to_number(row['color'])
row['acct_type'] = acct_type_to_number(row['acct_type'])
end
Assuming you want to get rid of the hardcoded deletions
row.delete(6) #delete second acct_num field
row.delete(4) #delete second acct_id field
Can be replaced by
row = row.to_hash
This will clobber duplicates. The rest of the posted code will keep working.

Resources