Split last column into two equal halves in unix - shell

I need to split last column into two separate columns & delete some part of it.
Currently all the values in the last column has 6 numbers . I need to split them into two separate columns.
First column should have first three numbers and second column should have next three numbers.
I ultimately want to delete newly created second column.
Data -
ID c1 c2 c3 c4 c5
12 A XY 123 456 657098
The new file should be created as below -
Data 2
ID c1 c2 c3 c4 c5
12 A XY 123 456 657
Thanks

You can use this awk that checks length of last column for each row:
awk 'length($NF) == 6 { $NF = substr($NF, 1, 3) } 1' file
Data -
ID c1 c2 c3 c4 c5
12 A XY 123 456 657

Related

Merge header columns in a matrix in bash but keeping columns that have value in same row separate

I want to merge the headers of the matrix (FS is tab):
12 12 12 13
bb 2
cc 8 3
aa 5
ee 6
like this:
12 12 13
bb 2
cc 8 3
aa 5
ee 6
I tried this
awk 'BEGIN{FS=OFS="\t";maxcolno=1} {printf "%s",$1;if(NR==1){for(oldi=2;oldi<=NF;oldi++){if(!($oldi in newcolno)){printf "%s%s",OFS,$oldi;newcolno[$oldi]=++maxcolno;}old2new[oldi]=newcolno[$oldi];}}else{delete row;for(oldi=2;oldi<=NF;oldi++)row[old2new[oldi]]=row[old2new[oldi]]$oldi;for(newi=2;newi<=maxcolno;newi++)printf "%s%s",OFS,row[newi];}print""}' unmerge.txt > merge.txt
but it forms the following table which is not desired:
12 13
bb 2
cc 83
aa 5
ee 6
Assumptions:
values are to be left-shifted within a row when there is an empty space (to the left) in a column of the same label; this means the aa / 5 value should be shifted to the 1st 12 column (as opposed to the 2nd 12 column as in OP's expected output)
General design:
populate a matrix with the input data
as we process a data row we determine the left-most column in which to shift/place a value
in the END{} block we remove empty columns and then print the remaining matrix
One awk idea:
awk '
BEGIN { FS=OFS="\t" }
NR==1 { matrix[NR][1]=$1
for (i=2;i<=NF;i++) {
matrix[NR][i]=$i
lab2col[$i][++labcnt[$i]]=i # keep track of list of physical columns that a particular label is associated with
}
next
}
{ matrix[NR][1]=$1
delete labcnt
for (i=2;i<=NF;i++) # loop through input fields and ...
if ($i) { # if non-empty then shift to the left-most column with the same header/label
matrix[NR][lab2col[matrix[1][i]][++labcnt[matrix[1][i]]]]=$i
# ^^^^^^^^^^^^ - label at top of current field
# ^^^^^^^^^^^^^^^^^^^^^^ - number of times we have seen this label in this line
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - physical column to map this nth occurrence (of this label) to
}
}
END { # find/remove empty columns
for (j=2;j<=NF;j++) { # loop through list of data columns
valcnt=0 # initialize non-empty counter
for (i=2;i<=NR;i++) # loop through data rows
valcnt+= (matrix[i][j] ? 1 : 0) # keep count of non-empty matrix values
if (valcnt==0) # if all rows in this column are empty then ...
delete matrix[1][j] # delete the column index from the header/1st row of the matrix
}
PROCINFO["sorted_in"]="#ind_num_asc" # make sure we process indices in ascending numerical order
for (i=1;i<=NR;i++) { # loop through rows
for (j in matrix[1]) { # loop through columns (that still exist in the 1st row of the matrix)
printf "%s%s", (j==1 ? "" : OFS), matrix[i][j] # print matrix entry
pfx=OFS
}
print "" # terminate current line of output
}
}
' unmerge.txt
NOTE: requires GNU awk for:
multi-dimensional arrays (aka array of arrays)
the PROCINFO["sorted_in"] feature
This generates:
12 12 13
bb 2
cc 8 3
aa 5
ee 6
Expanding the input a bit:
$ cat unmerge2.txt
12 12 12 13 12 13
bb 2
cc 8 3
aa 5
ee 6
ff 17 87
gg 100 -3
The awk script generates:
12 12 13 13
bb 2
cc 8 3
aa 5
ee 6
ff 87 17
gg 100 -3

Subtract Duplicates between 2 Arrays

Say I have column A which contains 20 unique alphabetical names, and column B which contains 5 alphabetical names. I want to write a formula that counts the unique names in column A and subtracts matching names that exist in column B. For example, if I have A2 = Tom, A3 = Mike, A4 = Ben, A5 = Sam; B2 = Ben then it takes 4 unique names from column A and subtracts the 1 matching name in column B to equal 3. I also want this formula to ignore blank cells across both column ranges.
=COUNTA(IFERROR(UNIQUE(FILTER(A:A, NOT(COUNTIF(B:B, A:A)), LEN(A:A)))))

Oracle LEAD & LAG analytics functions

I have a temp table using to test and need direction with some analytics function. Still trying to figure out my real solution.. and any help to lead me in right direction will be appreciated.
A1 B1
40 5
50 4
60 3
70 2
90 1
Tyring to find the previous value and subtract and add the column
SELECT A1, B1,
(A1-B1) AS C1,
(A1-B1) + LEAD((A1-B1),1,0) OVER (ORDER BY ROWNUM) AS G1
FROM TEST;
The output is not what I expect
A1 B1 C1
40 5 35
50 4 46
60 3 57
70 2 68
90 1 89
From last rows (5th row), first subtract A1 -B2 to get C1..then (C1+ previous A1) - previous row B1 that is ---> 89 + 70 - 2 = 157 (save results in C1 previous row)
4th row: 157+60 -3 = 214
repeat until the first row...
Expected final output should be ;--
A1 B1 C1
40 5 295
50 4 260
60 3 214
70 2 157
90 1 89
LAG and LEAD only get a single row's value not an aggregation of multiple rows and it is not applied recursively.
You want:
SELECT A1,
B1,
SUM( A1 - B1 ) OVER ( ORDER BY ROWNUM
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS C1
FROM test;

Max value: Max of a given column in the cvs file

Input: file1: input job should take input as below file.
C1 - ABC,DEF,GHI,JKL.
C2 - 10,15,20,30.
C3 - B1,B2,B3,B4.
C4 - 5,2,6,9.
Input parameter: column no ( ex.2).

Query in Oracle for running sum

I need to pull the result set with sum of the previous record and current record.
Logic
My table is having one key column C1 and a numeric column C2. I need a result like below example. I need 3 columns as the out put out which 1 columns is with running sum. First two columns are same as source with the thrid columns but
The first record of C3 = first record C2.
Second record C3 = "First Record C2 + Second Record C2";
Third record C3 = "First Record C2 + Second Record C2 + Thrid Record C2"
and it should continue for all the records.
Ex.
I have one source table like
C1 C2
---------
a 1
b 2
c 3
I Need output like below
C1 C2 C3
-------------
a 1 1
b 2 3
c 3 6
select c1, c2, sum(c2) over (order by c2) c3
from table_name

Resources