HIVE splitting string - hadoop

HIVE :-
I have a column changeContext==>"A345|Fq*A|2017-05-01|2017-05-01" (string) out of which I need to extract A345 as another column. Any suggestion ? P.S. I have tried regexp_extract (running into vertex failure) so any other solution would be perfect.

with t as (select "A345|Fq*A|2017-05-01|2017-05-01" as changeContext)
select substring_index(changeContext,'|',1) option_1
,split(changeContext,'\\|')[0] option_2
,substr(changeContext,1,instr(changeContext,'|')-1) option_3
,regexp_extract(changeContext,'[^|]*',0) option_4
,regexp_replace(changeContext,'\\|.*','') option_5
from t
+----------+----------+----------+----------+----------+
| option_1 | option_2 | option_3 | option_4 | option_5 |
+----------+----------+----------+----------+----------+
| A345 | A345 | A345 | A345 | A345 |
+----------+----------+----------+----------+----------+

Related

Redirect the table generated from Beeline to text file without the grid (Shell Script)

I am currently trying to find a way to redirect the standard output from beeline shell to text file without the grid. The biggest problem I am facing right now is that my columns have negative values and when I'm using regex to remove the '-', it is affecting the column values.
+-------------------+
| col |
+-------------------+
| -100 |
| 22 |
| -120 |
| -190 |
| -800 |
+-------------------+
Here's what I'm doing:
beeline -u jdbc:hive2://localhost:10000/default \
-e "SELECT * FROM $db.$tbl;" | sed 's/\+//g' | sed 's/\-//g' | sed 's/\|//g' > table.txt
I am trying to clean this file so I can read all the data into a variable.
Assumming all your data has the same pattern , where no significant '-' are wrapped in '+' :
[root#machine]# cat boo
+-------------------+
| col |
+-------------------+
| -100 |
| 22 |
| -120 |
| -190 |
| -800 |
+-------------------+
[root#machine]# cat boo | sed 's/\+-*+//g' | sed 's/\--//g' | sed 's/|//g'
col
-100
22
-120
-190
-800

ORACLE: Identify and update invalid duplicate records

I need some help.
I have a table as below.
+---------------+------------------+---------------+-------+
| ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR |
+---------------+------------------+---------------+-------+
| TestItem10001 | TestItem10001 | Cat1 | |
| TestItem10001 | TestItem10001 | Cat2 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10003 | TestItem10003 | Cat3 | |
+---------------+------------------+---------------+-------+
My requirement is: Same ITEM_NO cannot have different ITEM_CATEGORY.
So in above table, TestItem10001 has two different categories as Cat1 and Cat2. Which is invalid. In such case, I want to update ERROR column with some error string like:
+---------------+------------------+---------------+------------------+
| ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR |
+---------------+------------------+---------------+------------------+
| TestItem10001 | TestItem10001 | Cat1 | |
| TestItem10001 | TestItem10001 | Cat2 | INVALID CATEGORY |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10002 | TestItem10002 | Cat3 | |
| TestItem10003 | TestItem10003 | Cat3 | |
+---------------+------------------+---------------+------------------+
Please suggest how this can be achieved in a cleaner way with less expensive approach as the real time table will have millions of records.
Thank you in advance.
EDIT1:
Create and inserts as requested in comments.
CREATE TABLE STAGING_TABLE
(
"ITEM_NO" VARCHAR2(1000 BYTE),
"ITEM_DESCRIPTION" VARCHAR2(1000 BYTE),
"ITEM_CATEGORY" VARCHAR2(1000 BYTE),
"ERROR" VARCHAR2(1000 BYTE)
);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10001','TestItem10001','Cat1',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10001','TestItem10001','Cat2',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10002','TestItem10002','Cat3',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10002','TestItem10002','Cat3',null);
Insert into STAGING_TABLE (ITEM_NO,ITEM_DESCRIPTION,ITEM_CATEGORY) values ('TestItem10003','TestItem10003','Cat3',null);
Since it does not matter which row of the duplicates will be marked as invalid, you van use this query:
SELECT ITEM_NO, MIN(ITEM_CATEGORY) MIN_ITEM_CATEGORY
FROM STAGING_TABLE
GROUP BY ITEM_NO
which returns the minimum ITEM_CATEGORY for each ITEM_NO with a MERGE INTO statement:
MERGE INTO STAGING_TABLE s
USING (
SELECT ITEM_NO, MIN(ITEM_CATEGORY) MIN_ITEM_CATEGORY
FROM STAGING_TABLE
GROUP BY ITEM_NO
) t
ON (t.ITEM_NO = s.ITEM_NO AND t.MIN_ITEM_CATEGORY <> s.ITEM_CATEGORY)
WHEN MATCHED THEN
UPDATE SET s.ERROR = 'INVALID CATEGORY'
See the demo.
Results:
> ITEM_NO | ITEM_DESCRIPTION | ITEM_CATEGORY | ERROR
> :------------ | :--------------- | :------------ | :---------------
> TestItem10001 | TestItem10001 | Cat1 | null
> TestItem10001 | TestItem10001 | Cat2 | INVALID CATEGORY
> TestItem10002 | TestItem10002 | Cat3 | null
> TestItem10002 | TestItem10002 | Cat3 | null
> TestItem10003 | TestItem10003 | Cat3 | null
You can use the below SQL to achieve your purpose
select item_no,item_description,item_category,'INVALID CATEGORY' from (
select count(item_no) over (partition by item_no order by item_category)item_cnt,
count(item_no) over (partition by item_no,item_category order by item_no)
category_cnt,
st.* from staging_table st)
where item_cnt<> category_cnt

Fetch particular column value from rows with specified condition using shell script

I have a sample output from a command
+--------------------------------------+------------------+---------------------+-------------------------------------+
| id | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+-------------------------------------+
| 04584e8a-c210-430b-8028-79dbf741797c | | 99.99.99.91 | |
| 12d2257c-c02b-4295-b910-2069f583bee5 | 20.0.0.92 | 99.99.99.92 | 37ebfa4c-c0f9-459a-a63b-fb2e84ab7f92 |
| 98c5a929-e125-411d-8a18-89877d3c932b | | 99.99.99.93 | |
| f55e54fb-e50a-4800-9a6e-1d75004a2541 | 20.0.0.94 | 99.99.99.94 | fe996e76-ffdb-4687-91a0-9b4df2631b4e |
+--------------------------------------+------------------+---------------------+-------------------------------------+
Now I want to fetch all the "floating _ip_address" for which "port_id" & "fixed_ip_address" fields are blank/empty (In above sample 99.99.99.91 & 99.99.99.93)
How can I do it with shell scripting?
You can use sed:
fl_ips=($(sed -nE 's/\|.*\|.*\|(.*)\|\s*\|/\1/p' inputfile))
Here inputfile is the table provided in the question. The array fl_ips contains the output of sed:
>echo ${#fl_ips[#]}
2 # Array has two elements
>echo ${fl_ips[0]}
99.99.99.91
>echo ${fl_ips[1]}
99.99.99.93

Update in oracle with joining two table

I have these two tables below, I need to update Table1.Active_flag to Y, where Table2.Reprocess_Flag is N.
Table1
+--------+--------------+--------------+--------------+-------------+
| Source | Subject_area | Source_table | Target_table | Active_flag |
+--------+--------------+--------------+--------------+-------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | N |
| d | PRODUCT | PD_PD1 | PD_PD1 | N |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+-------------+
Table2
| Source | Subject_area | Source_table | Target_table | Reprocess_Flag |
+--------+--------------+--------------+--------------+----------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | Y |
| d | PRODUCT | PD_PD1 | PD_PD1 | Y |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+----------------+
Use all three columns in a single select statement.
UPDATE hdfs_cntrl SET active_flag = 'Y'
where (source,subject_area ,source_table ) in ( select source,subject_area ,source_table from proc_cntrl where Reprocess_Flag = 'N');
Updating one table based on data in another table is almost always best done with the MERGE statement.
Assuming source is a unique key in table2:
merge into table1 t1
using table2 t2
on (t1.source = t2.source)
when matched
then update set t1.active_flag = 'Y'
where t2.reprocess_flag = 'N'
;
If you are not familiar with the MERGE statement, read about it - it's just as easy to learn as UPDATE and INSERT and DELETE, it can do all three types of operations in a single statement, it is much more flexible and, in some cases, more efficient (faster).
merge into table1 t1
using table2 t2
on (t1.sorce=t2.source and t1.Subject_area = t2.Subject_area and t1.Source_table = t2.Source_table and t1.Target_table = t2.Target_table and t2.flag_status = 'N')
when matched then update set
t1.flag = 'Y';
UPDATE hdfs_cntrl SET active_flag = 'Y' where source in ( select source from proc_cntrl where Reprocess_Flag = 'N') and subject_area in (select subject_area from proc_cntrl where Reprocess_Flag = 'N') and source_table in (select target_table from proc_cntrl where Reprocess_Flag = 'N')

insert id number only in sql

I have a SQL Server table like this
+----+-----------+------------+
| id | acoount | date |
+----+-----------+------------+
| | John | 2/6/2016 |
| | John | 2/6/2016 |
| | John | 4/6/2016 |
| | John | 4/6/2016 |
| | Andi | 5/6/2016 |
| | Steve | 4/6/2016 |
+----+-----------+------------+
i want insert the id coloumn like this.
+-----------+-----------+------------+
| id | acoount | date |
+-----------+-----------+------------+
| 020616001 | John | 2/6/2016 |
| 020616002 | John | 2/6/2016 |
| 040616001 | John | 4/6/2016 |
| 040616002 | John | 4/6/2016 |
| 050616001 | Andi | 5/6/2016 |
| 040616003 | Steve | 4/6/2016 |
+-----------+-----------+------------+
I want to generate id number of the date provided like this. 02+06+16(from date)+001 = 020616001. if have same date, id + 1.
I have tried but still failed .
I want make it in oracle sql develop.
Someone help me.
Thanks.
Try the below SQL as per the given data, Its in SQL Server 2012....
select REPLACE(CONVERT(VARCHAR(10),convert(date,t.[date]), 101), '/', '')
+'00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) as ID,
t.account,
t.date
from (values ('John','2/6/2016'),
('John','2/6/2016'),
('John','4/6/2016'),
('John','4/6/2016'),
('Andi','5/6/2016'),
('Steve','4/6/2016'))T(account,[date])
Update your table using statement .
update table set id= replace(CONVERT(VARCHAR(10),CONVERT(datetime ,date,103),3) ,'/', '') + Right('00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) ,3)
MySql
i can give you the logic of 020616001 this part right now .......
for same id +1 i have to work on it....that i ll let u know after my work
insert into table_name(id)
select concat
(
if(length (day(current_date))>1,day(current_date),Concat(0,day(current_date))),
if(length (month(current_date))>1,month(current_date),Concat(0,month(current_date))),
(right(year(current_date),2)),'001'
)as id
you cannot convert your dates column to datetime type in normal way because it is dd/mm/yyyy.
Try this,
declare #t table(acoount varchar(50),dates varchar(20))
insert into #t values
('John','2/6/2016')
,('John','2/6/2016')
,('John','4/6/2016')
,('John','4/6/2016')
,('Andi','5/6/2016')
,('Steve','4/6/2016')
;With CTE as
(select * , SUBSTRING(dates,0,charindex('/',dates)) dd
,SUBSTRING(stuff(dates,1,charindex('/',dates),''),0, charindex('/',stuff(dates,1,charindex('/',dates),''))) MM
,right(dates,2) yy
from #t
)
,CTE1 as
(
select *
,ROW_NUMBER()over(partition by yy,mm,dd order by yy,mm,dd)rn from cte c
)
select *, REPLICATE('0',2-len(dd))+cast(dd as varchar(2))
+REPLICATE('0',2-len(MM))+cast(MM as varchar(2))
+yy+REPLICATE('0',3-len(rn))+cast(rn as varchar(2))
from cte1

Resources