Split specific column(s) - bash

I have this kind of recrods:
1 2 12345
2 4 98231
...
I need to split the third column into sub-columns to get this (separated by single-space for example):
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Can anybody offer me a nice solution in sed, awk, ... etc ? Thanks!
EDIT: the size of the original third column may vary record by record.

Awk
% echo '1 2 12345
2 4 98231
...' | awk '{
gsub(/./, "& ", $3)
print
}
'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
...
[Tested with GNU Awk 3.1.7]
This takes every character (/./) in the third column ($3) and replaces (gsub()) it with itself followed by a space ("& ") before printing the entire line.

Sed solution:
sed -e 's/\([0-9]\)/\1 /g' -e 's/ \+/ /g'
The first sed expression replaces every digit with the same digit followed by a space. The second expression replaces every block of spaces with a single space, thus handling the double spaces introduced by the previous expression. With non-GNU seds you may need to use two sed invocations (one for each -e).

Using awk substr and printf:
[srikanth#myhost ~]$ cat records.log
1 2 12345 6 7
2 4 98231 8 0
[srikanth#myhost ~]$ awk '{ len=length($3); for(i=1; i<=NF; i++) { if(i==3) { for(j = 1; j <= len; j++){ printf substr($3,j,1) " "; } } else { printf $i " "; } } printf("\n"); }' records.log
1 2 1 2 3 4 5 6 7
2 4 9 8 2 3 1 8 0
You can use this for more than three column records as well.

Using perl:
perl -pe 's/([0-9])(?! )/\1 /g' INPUT_FILE
Test:
[jaypal:~/Temp] cat tmp
1 2 12345
2 4 98231
[jaypal:~/Temp] perl -pe 's/([0-9])(?! )/\1 /g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu sed:
sed 's/\d/& /3g' INPUT_FILE
Test:
[jaypal:~/Temp] sed 's/[0-9]/& /3g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu awk:
gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' INPUT_FILE
Test:
[jaypal:~/Temp] gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1

If you don't care about spaces, this is a succinct version:
sed 's/[0-9]/& /g'
but if you need to remove spaces, we just chain another regexp:
sed 's/[0-9]/& /g;s/ */ /g'
Note this is compatible with the original sed, thus will run on any UNIX-like.

$ awk -F '' '$1=$1' data.txt | tr -s ' '
1 2 1 2 3 4 5
2 4 9 8 2 3 1

This might work for you:
echo -e "1 2 12345\n2 4 98231" | sed 's/\B\s*/ /g'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Most probably GNU sed only.

Related

C-shell: print result from row to column

I have outputs:
1 rpt 4 qor 5 are 6 oip
I want it to become :
1 rpt
4 qor
5 are
6 oip
This is my code:
set d = `sort "04.txt" | uniq -c`
echo $d
With your shown samples, please try following.
xargs -n 2 < Input_file
From man xargs:
-n max-args, --max-args=max-args Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size
(see the -s option) is exceeded, unless the -x option is given, in
which case xargs will exit.
akshay#sys:~$ cat file
1 rpt 4 qor 5 are 6 oip
akshay#sys:~$ sed 's/ /\n/2; P; D' file
1 rpt
4 qor
5 are
6 oip
akshay#sys:~$ awk -v n=2 '{for (i=n+1; i<=NF; i+=n) $i = "\n" $i}1' file
1 rpt
4 qor
5 are
6 oip
akshay#sys:~$ awk -v RS=" " '{$1=$1; ORS=NR%2?FS:"\n" }1' file
1 rpt
4 qor
5 are
6 oip

AWK assign upper value for rank assignment during tie

I'm working on rank assignment to a list of values that is sorted in a file.
A miniature example is
Input:
1
2
2
2
3
4
Instead of normal ranking when there is a tie, I need to assign the upper value. So the required output is
1 1
2 4 #Note that it is not 2, since we have three 2's the upper bound is 4
2 4
2 4
3 5
4 6
I tried something like below, but it is not consistent.
$ awk ' BEGIN{t=0} NR==FNR { a[$1]++; next } { print $1,a[$1]+t; t=a[$1] } ' rank_in.txt rank_in.txt
1 1
2 4
2 6
2 6
3 4
4 2
This answer does normal ranking, so this question is not duplicate.
Instead of doing a double pass or keeping track of memory, we just use a uniq and reconstruct everything:
uniq -c file | awk '{n=n+$1;for(i=1;i<=$1;++i) print $2,n}' -
Two passes with just awk:
$ awk 'NR==FNR{rank[$1]=NR; next} {print $1, rank[$1]}' file file
1 1
2 4
2 4
2 4
3 5
4 6
or one pass with a pipe:
$ nl file | sort -k2,2 -k1,1nr | awk '$2!=prev{rank=$1; prev=$2} {print $2, rank}'
1 1
2 4
2 4
2 4
3 5
4 6
If you don't have nl on your system you could use cat -n or awk '{print NR, $0}' to generate the line numbers.
Try this awk:
awk 'FNR==NR {++fq[$1]; next} p != $1{s+=fq[$1]} {print p=$1, s}' file file
1 1
2 4
2 4
2 4
3 5
4 6
Assumptions:
input data is already sorted
Sample data:
$ cat rank.dat
1
2
2
2
3
4
One awk idea requiring a single pass through the file:
awk '
function print_rank() {
for ( i=1 ; i<=cnt ; i++ )
print id,rank
}
$1 != id { print_rank() # if we have a new id, print last id
cnt=0 # reset counter
}
{ id=$1 # keep track of current id
rank++ # increment rank by 1 for each new row processed
cnt++ # keep track of number of times we see this id
}
END { print_rank() } # flush last id to stdout
' rank.dat
This generates:
1 1
2 4
2 4
2 4
3 5
4 6
Another awk
$ awk ' NR==FNR { a[$1]++; next } { print $1, FNR + --a[$1] } ' rank_in.txt rank_in.txt
1 1
2 4
2 4
2 4
3 5
4 6
$

awk: print first column, then some values, and then all other columns

I want to print the first column, then a couple of columns with fixed values, like this command would do:
awk '{print $1,"1","2","1"}'
and then print all columns except the first after that...
I know this command prints all but the first column:
awk '{$1=""; print $0}'
But that gets rid of the first column.
In other words, this:
3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2
Needs to become this:
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Any ideas?
use a loop to iterate through rest of the columns like this:
awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
As an example:
$echo "3 5 2 2" | awk 'BEGIN{ORS=""}{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1 5 2 2
$
Edit1 :
$ echo "3 5 2 2" | awk 'BEGIN{ORS="\n";OFS="\n"}{print $1,"1","2","1 ";for(i=2;i<=NF;i++) print $i" "}'
3
1
2
1
5
2
2
$
Edit2:
$ echo "3 5 2 2" | awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1
5
2
2
$
Edit3:
$ echo "3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2" | awk '{printf("%s %s ", $1,"1 2 1");for(i=2;i<=NF;i++) printf("%s ", $i); printf "\n"}'
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
You are almost there, you just need to store the first column in a temporary variable:
{
head=$1; # Store $1 in head, used later in printf
$1=""; # Empty $1, so that $0 will not contain first column
printf "%s 1 2 1%s\n", head, $0
}
And a full script:
echo "3 5 2 2" | awk '{head=$1;$1="";printf "%s 1 2 1%s\n", head, $0}'
Another solution with awk:
awk '{sub(/.*/, "1 2 1 "$2, $2)}1' File
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Substitute the 2nd field with "1 2 1" followed by 2nd field itself.
You can do this using sed by replacing the first space by the string you want.
sed 's/ / 1 2 1 /' file
(OR)
With awk by replacing the first field($1):
awk '{$1=$1 " 1 2 1"}1' file
(I prefer the sed solution since it has less characters).

Add column to csv file

I have two files and I need catch the last column of a file and append to other file.
file1
1 2 3
1 2 3
1 2 3
file2
5 5
5 5
5 5
Initial proposal
#!/usr/bin/env bash
column=$(awk '{print $(NF)}' $file1)
paste -d',' $file2 < $column
Expected result
file2
5 5 3
5 5 3
5 5 3
But, This script does not work yet
OBS: I do not know how many columns have in the file. I need more generic solution.
You can use this paste command:
paste -d " " file2 <(awk '{print $NF}' file1)
5 5 3
5 5 3
5 5 3
To append last column of file1 to file2:
paste -d " " file2 <(rev file1 | cut -d " " -f 1 | rev)
Output:
5 5 3
5 5 3
5 5 3
To paste the second column of file 1 to file 2:
while read line; do
read -u 3 c1 c2 c3;
echo $line $c2;
done < file2 3< file1
You can use Perl too:
$ paste -d ' ' file2.txt <(perl -lne 'print $1 if m/(\S+)\s*$/' file1.txt)
5 5 3
5 5 3
5 5 3
Or grep:
$ paste -d ' ' file2.txt <(grep -Eo '(\S+)\s*$' file1.txt)
5 5 3
5 5 3
5 5 3

Search replace string in a file based on column in other file

If we have the first file like below:
(a.txt)
1 asm
2 assert
3 bio
4 Bootasm
5 bootmain
6 buf
7 cat
8 console
9 defs
10 echo
and the second like:
(b.txt)
bio cat BIO bootasm
bio defs cat
Bio console
bio BiO
bIo assert
bootasm asm
bootasm echo
bootasm console
bootmain buf
bootmain bio
bootmain bootmain
bootmain defs
cat cat
cat assert
cat assert
and we want the output will be like this:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
we read each second column in each file in the first file, we search if it exist in each column in each line in the second file if yes we replace it with the the number in the first column in the first file. i did it in only the fist column, i couldn't do it for the rest.
Here the command i use
awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt
3 cat bio bootasm
3 defs cat
3 console
3 bio
3 assert
4 asm
4 echo
4 console
5 buf
5 bio
5 bootmain
5 defs
7 cat
7 assert
7 assert
how should i do to the other columns ?
Thankyou
awk 'NR==FNR{h[$2]=$1;next} {for (i=1; i<=NF;i++) $i=h[$i];}1' a.txt b.txt
NR is the global record number (line number default) across all files. FNR is the line number for the current file. The NR==FNR block specifies what action to take when global line number is equal to the current number, which is only true for the first file, i.e., a.txt. The next statement in this block skips the rest of the code so the for loop is only available to the second file, e.i., b.txt.
First, we process the first file in order to store the word ids in an associative array: NR==FNR{h[$2]=$1;next}. After which, we can use these ids to map the words in the second file. The for loop (for (i=1; i<=NF;i++) $i=h[$i];) iterates over all columns and sets each column to a number instead of the string, so $i=h[$i] actually replaces the word at the ith column with its id. Finally the 1 at the end of the scripts causes all lines to be printed out.
Produces:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
To make the script case-insensitive, add tolower calls into the array indices:
awk 'NR==FNR{h[tolower($2)]=$1;next} {for (i=1; i<=NF;i++) $i=h[tolower($i)];}1' a.txt b.txt
divide and conquer!, a bit archaic but does the job =)
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$1];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 1
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$2];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 2
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$3];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 3
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$4];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 4
paste 1 2 3 4 | tr '\t' ' '
gives:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
in this case I just changed the number of columns and paste the results together with a bit of edition in between.
{
cat a.txt; echo "--EndA--";cat b.txt
} | sed -n '1 h
1 !H
$ {
x
: loop
s/^ *\([[:digit:]]\{1,\}\) *\([^[:cntrl:]]*\)\(\n\)\(.*\)\2/\1 \2\3\4\1/
t loop
s/^ *[[:digit:]]\{1,\} *[^[:cntrl:]]*\n//
t loop
s/^[[:space:]]*--EndA--\n//
p
}
'
"--EndA--" could be something else if chance that it will present in one of the file (a.txt mainly)

Resources