HIVE splitting string

HIVE splitting string - hadoop

HIVE :-
I have a column changeContext==>"A345|Fq*A|2017-05-01|2017-05-01" (string) out of which I need to extract A345 as another column. Any suggestion ? P.S. I have tried regexp_extract (running into vertex failure) so any other solution would be perfect.

with t as (select "A345|Fq*A|2017-05-01|2017-05-01" as changeContext)
select substring_index(changeContext,'|',1) option_1
,split(changeContext,'\\|')[0] option_2
,substr(changeContext,1,instr(changeContext,'|')-1) option_3
,regexp_extract(changeContext,'[^|]*',0) option_4
,regexp_replace(changeContext,'\\|.*','') option_5
from t
+----------+----------+----------+----------+----------+
| option_1 | option_2 | option_3 | option_4 | option_5 |
+----------+----------+----------+----------+----------+
| A345 | A345 | A345 | A345 | A345 |
+----------+----------+----------+----------+----------+

Related

Redirect the table generated from Beeline to text file without the grid (Shell Script)

I am currently trying to find a way to redirect the standard output from beeline shell to text file without the grid. The biggest problem I am facing right now is that my columns have negative values and when I'm using regex to remove the '-', it is affecting the column values.
+-------------------+
| col |
+-------------------+
| -100 |
| 22 |
| -120 |
| -190 |
| -800 |
+-------------------+
Here's what I'm doing:
beeline -u jdbc:hive2://localhost:10000/default \
-e "SELECT * FROM $db.$tbl;" | sed 's/\+//g' | sed 's/\-//g' | sed 's/\|//g' > table.txt
I am trying to clean this file so I can read all the data into a variable.

Assumming all your data has the same pattern , where no significant '-' are wrapped in '+' :
[root#machine]# cat boo
+-------------------+
| col |
+-------------------+
| -100 |
| 22 |
| -120 |
| -190 |
| -800 |
+-------------------+
[root#machine]# cat boo | sed 's/\+-*+//g' | sed 's/\--//g' | sed 's/|//g'
col
-100
22
-120
-190
-800

ORACLE: Identify and update invalid duplicate records

I need some help.
I have a table as below.
+---------------+------------------+---------------+-------+
| ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR |
+---------------+------------------+---------------+-------+
| TestItem10001 | TestItem10001 | Cat1 | |
| TestItem10001 | TestItem10001 | Cat2 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10003 | TestItem10003 | Cat3 | |
+---------------+------------------+---------------+-------+
My requirement is: Same ITEM_NO cannot have different ITEM_CATEGORY.
So in above table, TestItem10001 has two different categories as Cat1 and Cat2. Which is invalid. In such case, I want to update ERROR column with some error string like:
+---------------+------------------+---------------+------------------+
| ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR |
+---------------+------------------+---------------+------------------+
| TestItem10001 | TestItem10001 | Cat1 | |
| TestItem10001 | TestItem10001 | Cat2 | INVALID CATEGORY |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10003 | TestItem10003 | Cat3 | |
+---------------+------------------+---------------+------------------+
Please suggest how this can be achieved in a cleaner way with less expensive approach as the real time table will have millions of records.
Thank you in advance.
EDIT1:
Create and inserts as requested in comments.
CREATE TABLE STAGING_TABLE
(
"ITEM_NO" VARCHAR2(1000 BYTE),
"ITEM_DESCRIPTION" VARCHAR2(1000 BYTE),
"ITEM_CATEGORY" VARCHAR2(1000 BYTE),
"ERROR" VARCHAR2(1000 BYTE)
);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10001','TestItem10001','Cat1',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10001','TestItem10001','Cat2',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10002','TestItem10002','Cat3',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10002','TestItem10002','Cat3',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10003','TestItem10003','Cat3',null);

Since it does not matter which row of the duplicates will be marked as invalid, you van use this query:
SELECT ITEM_NO, MIN(ITEM_CATEGORY) MIN_ITEM_CATEGORY
FROM STAGING_TABLE
GROUP BY ITEM_NO
which returns the minimum ITEM_CATEGORY for each ITEM_NO with a MERGE INTO statement:
MERGE INTO STAGING_TABLE s
USING (
SELECT ITEM_NO, MIN(ITEM_CATEGORY) MIN_ITEM_CATEGORY
FROM STAGING_TABLE
GROUP BY ITEM_NO
) t
ON (t.ITEM_NO = s.ITEM_NO AND t.MIN_ITEM_CATEGORY <> s.ITEM_CATEGORY)
WHEN MATCHED THEN
UPDATE SET s.ERROR = 'INVALID CATEGORY'
See the demo.
Results:
> ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR
> :------------ | :--------------- | :------------ | :---------------
> TestItem10001 | TestItem10001 | Cat1 | null
> TestItem10001 | TestItem10001 | Cat2 | INVALID CATEGORY
> TestItem10002 | TestItem10002 | Cat3 | null
> TestItem10002 | TestItem10002 | Cat3 | null
> TestItem10003 | TestItem10003 | Cat3 | null

You can use the below SQL to achieve your purpose
select item_no,item_description,item_category,'INVALID CATEGORY' from (
select count(item_no) over (partition by item_no order by item_category)item_cnt,
count(item_no) over (partition by item_no,item_category order by item_no)
category_cnt,
st.* from staging_table st)
where item_cnt<> category_cnt

Fetch particular column value from rows with specified condition using shell script

I have a sample output from a command
+--------------------------------------+------------------+---------------------+-------------------------------------+
| id | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+-------------------------------------+
| 04584e8a-c210-430b-8028-79dbf741797c | | 99.99.99.91 | |
| 12d2257c-c02b-4295-b910-2069f583bee5 | 20.0.0.92 | 99.99.99.92 | 37ebfa4c-c0f9-459a-a63b-fb2e84ab7f92 |
| 98c5a929-e125-411d-8a18-89877d3c932b | | 99.99.99.93 | |
| f55e54fb-e50a-4800-9a6e-1d75004a2541 | 20.0.0.94 | 99.99.99.94 | fe996e76-ffdb-4687-91a0-9b4df2631b4e |
+--------------------------------------+------------------+---------------------+-------------------------------------+
Now I want to fetch all the "floating _ip_address" for which "port_id" & "fixed_ip_address" fields are blank/empty (In above sample 99.99.99.91 & 99.99.99.93)
How can I do it with shell scripting?

You can use sed:
fl_ips=($(sed -nE 's/\|.*\|.*\|(.*)\|\s*\|/\1/p' inputfile))
Here inputfile is the table provided in the question. The array fl_ips contains the output of sed:
>echo ${#fl_ips[#]}
2 # Array has two elements
>echo ${fl_ips[0]}
99.99.99.91
>echo ${fl_ips[1]}
99.99.99.93

Update in oracle with joining two table

I have these two tables below, I need to update Table1.Active_flag to Y, where Table2.Reprocess_Flag is N.
Table1
+--------+--------------+--------------+--------------+-------------+
| Source | Subject_area | Source_table | Target_table | Active_flag |
+--------+--------------+--------------+--------------+-------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | N |
| d | PRODUCT | PD_PD1 | PD_PD1 | N |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+-------------+
Table2
| Source | Subject_area | Source_table | Target_table | Reprocess_Flag |
+--------+--------------+--------------+--------------+----------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | Y |
| d | PRODUCT | PD_PD1 | PD_PD1 | Y |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+----------------+

Use all three columns in a single select statement.
UPDATE hdfs_cntrl SET active_flag = 'Y'
where (source,subject_area ,source_table ) in ( select source,subject_area ,source_table from proc_cntrl where Reprocess_Flag = 'N');

Updating one table based on data in another table is almost always best done with the MERGE statement.
Assuming source is a unique key in table2:
merge into table1 t1
using table2 t2
on (t1.source = t2.source)
when matched
then update set t1.active_flag = 'Y'
where t2.reprocess_flag = 'N'
;
If you are not familiar with the MERGE statement, read about it - it's just as easy to learn as UPDATE and INSERT and DELETE, it can do all three types of operations in a single statement, it is much more flexible and, in some cases, more efficient (faster).

merge into table1 t1
using table2 t2
on (t1.sorce=t2.source and t1.Subject_area = t2.Subject_area and t1.Source_table = t2.Source_table and t1.Target_table = t2.Target_table and t2.flag_status = 'N')
when matched then update set
t1.flag = 'Y';

UPDATE hdfs_cntrl SET active_flag = 'Y' where source in ( select source from proc_cntrl where Reprocess_Flag = 'N') and subject_area in (select subject_area from proc_cntrl where Reprocess_Flag = 'N') and source_table in (select target_table from proc_cntrl where Reprocess_Flag = 'N')

insert id number only in sql

I have a SQL Server table like this
+----+-----------+------------+
| id | acoount | date |
+----+-----------+------------+
| | John | 2/6/2016 |
| | John | 2/6/2016 |
| | John | 4/6/2016 |
| | John | 4/6/2016 |
| | Andi | 5/6/2016 |
| | Steve | 4/6/2016 |
+----+-----------+------------+
i want insert the id coloumn like this.
+-----------+-----------+------------+
| id | acoount | date |
+-----------+-----------+------------+
| 020616001 | John | 2/6/2016 |
| 020616002 | John | 2/6/2016 |
| 040616001 | John | 4/6/2016 |
| 040616002 | John | 4/6/2016 |
| 050616001 | Andi | 5/6/2016 |
| 040616003 | Steve | 4/6/2016 |
+-----------+-----------+------------+
I want to generate id number of the date provided like this. 02+06+16(from date)+001 = 020616001. if have same date, id + 1.
I have tried but still failed .
I want make it in oracle sql develop.
Someone help me.
Thanks.

Try the below SQL as per the given data, Its in SQL Server 2012....
select REPLACE(CONVERT(VARCHAR(10),convert(date,t.[date]), 101), '/', '')
+'00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) as ID,
t.account,
t.date
from (values ('John','2/6/2016'),
('John','2/6/2016'),
('John','4/6/2016'),
('John','4/6/2016'),
('Andi','5/6/2016'),
('Steve','4/6/2016'))T(account,[date])

Update your table using statement .
update table set id= replace(CONVERT(VARCHAR(10),CONVERT(datetime ,date,103),3) ,'/', '') + Right('00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) ,3)

MySql
i can give you the logic of 020616001 this part right now .......
for same id +1 i have to work on it....that i ll let u know after my work
insert into table_name(id)
select concat
(
if(length (day(current_date))>1,day(current_date),Concat(0,day(current_date))),
if(length (month(current_date))>1,month(current_date),Concat(0,month(current_date))),
(right(year(current_date),2)),'001'
)as id

you cannot convert your dates column to datetime type in normal way because it is dd/mm/yyyy.
Try this,
declare #t table(acoount varchar(50),dates varchar(20))
insert into #t values
('John','2/6/2016')
,('John','2/6/2016')
,('John','4/6/2016')
,('John','4/6/2016')
,('Andi','5/6/2016')
,('Steve','4/6/2016')
;With CTE as
(select * , SUBSTRING(dates,0,charindex('/',dates)) dd
,SUBSTRING(stuff(dates,1,charindex('/',dates),''),0, charindex('/',stuff(dates,1,charindex('/',dates),''))) MM
,right(dates,2) yy
from #t
)
,CTE1 as
(
select *
,ROW_NUMBER()over(partition by yy,mm,dd order by yy,mm,dd)rn from cte c
)
select *, REPLICATE('0',2-len(dd))+cast(dd as varchar(2))
+REPLICATE('0',2-len(MM))+cast(MM as varchar(2))
+yy+REPLICATE('0',3-len(rn))+cast(rn as varchar(2))
from cte1

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

HIVE splitting string - hadoop

HIVE :- I have a column changeContext==>"A345|Fq*A|2017-05-01|2017-05-01" (string) out of which I need to extract A345 as another column. Any suggestion ? P.S. I have tried regexp_extract (running into vertex failure) so any other solution would be perfect.

Related

Redirect the table generated from Beeline to text file without the grid (Shell Script)

ORACLE: Identify and update invalid duplicate records

Fetch particular column value from rows with specified condition using shell script

Update in oracle with joining two table

insert id number only in sql

Categories

Resources