how to replace a lasted special character in each row - shell

cat test1
a a 1 a aa 1 1 111 bb b
a1b a 11 b b b
1 asd fdg 1 bb b
I wanna to replace the end "1" shows in each row with #, keep other data as the same.
my expect result
cat expected_result
a a 1 a aa 1 1 11# bb b
a1b a 1# b b b
1 asd fdg # bb b
Could this condition solved by "sed"? I don't know how to select the last "1" in each row, thanks.

Method 1:
1([^1]*)$ matches the last 1 on the line and everything after:
$ sed -E 's/1([^1]*)$/#\1/' test1
a a 1 a aa 1 1 11# bb b
a1b a 1# b b b
1 asd fdg # bb b
Method 2:
(.*)1 matches everything on the line up to and including the last 1:
$ sed -E 's/(.*)1/\1#/' test1
a a 1 a aa 1 1 11# bb b
a1b a 1# b b b
1 asd fdg # bb b
This works because sed's regular expressions are greedy (more precisely, leftmost-longest). The leftmost-longest match of (.*)1 will match from the beginning of the line through the last 1 on the line.

You can try this. * is greedy, tries to match as much as possible and 1/\1#/ will match the last occurrence of 1 of each line and replace with #. If there is something else like 'x' to match and replace the last occurrence with y then it should be x/\1y/
sed 's/\(.*\)1/\1#/' filename
Output:
a a 1 a aa 1 1 11# bb b
a1b a 1# b b b
1 asd fdg # bb b

using rev and awk solution too here.
rev Input_file | awk '{sub(/1/,"#");print}' | rev
Output will be as follows.
a a 1 a aa 1 1 11# bb b
a1b a 1# b b b
1 asd fdg # bb b

1([^1]*$), will match the latest 1 and anything ahead.
sed -r 's/1([^1]*$)/#\1/' v1
cat test#
a a 1 a aa 1 1 11# bb b
a1b a 1# b b b
1 asd fdg # bb b
cat v1
cat test1
a a 1 a aa 1 1 111 bb b
a1b a 11 b b b
1 asd fdg 1 bb b

Related

Finding a pattern, then executing a line change only after the pattern

I have a file of the like:
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
O 1 2 3 4
H 1 2 3 4
H 1 2 3 4
FRAGNAME=H2ODFT
O 1 2 3 4
H 1 2 3 4
H 1 2 3 4
I want to remove the column "1" from the lines only after the $EFRAG line. and add a label to the O H H as well. My expected output is:
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
I'm new to coding in bash, and I'm not quite sure where to start.
I was thinking of piping a grep command to a sed command, but I'm not sure how that syntax would look. Am also trying to learn awk, but that syntax is even more confusing to me. Currently trying to read a book on it's capabilities.
Any thoughts or ideas would be greatly appreciated!
L
Use the following awk processing:
awk '$0~/\$EFRAG/ {
start = 1; # marker denoting the needed block
split("a b c", suf); # auxiliary array of suffixes
}
start {
if (/^FRAGNAME/) idx = 1; # encountering subblock
if (/^[OH]/) { # if starting with O or H char
$2 = "";
$1 = $1 suf[idx++];
}
}1' test.txt
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
If ed is available/acceptable.
The script.ed (name it to your own hearts content) something like:
/\$EFRAG$/;$g/^O /s/^\([^ ]*\) [^ ]* \(.*\)$/\1a \2/\
/^H /s/^\([^ ]*\) [^ ]* \(.*\)$/\1b \2/\
/^H /s/^\([^ ]*\) [^ ]* \(.*\)$/\1c \2/
,p
Q
Now run
ed -s file.txt < script.ed
Change Q to w if in-place editing is required.
Remove the ,p to silence the output.
This might work for you (GNU sed):
sed -E '1,/\$EFRAG/b;/^O/{N;N;s/^(O) \S+(.*\nH) \S+(.*\nH) \S+/\1a\2b\3c/}' file
Do not process lines from the start of the file until after encountering one containing $EFRAG.
If a line begins O, append the next two lines and then using pattern matching and back references, format those lines accordingly.

Transpose from long file to wide file using linux commands (awk)

I have a long file and I need a wide file, I know do that in R, but I want to do using Linux because is faster. The field separator is tab.
Input file
1 C C
1 G G
1 C G
2 G G
2 C C
2 C G
3 G G
3 C C
3 C C
Output file:
1 2 3
CC GG GG
GG GG CC
CG CG CC
Thank you
awk to the rescue!
assuming consistent data (same number of elements for each key)
$ awk '{k[$1]; a[$1,++c[$1]]=$2$3}
END{for(x in k) printf "%s",x OFS;
print "";
for(i=1;i<=c[$1];i++)
{for(x in k) printf "%s", a[x,i] OFS;
print ""}}' file
1 2 3
CC GG GG
GG CC CC
CG CG CC
the order of columns not guaranteed though...

combine two variables in the same order in for loop unix

It sounds simple but I can't do it in a simple way. In shell for loop, two vars
A=" 1 2 3 4"
B=" a b c d"
, try to print 1a 2b 3c 4d. Tried
A=" 1 2 3 4"
B=" a b c d"
for m in $A
for n in $B;
do echo $m$n done.
The output is
1
2
3
4
5
for
l
in
a
b
c
d
e
Anyone can help this out?
Here's one way to do it:
$ A=(1 2 3 4); B=(a b c d); for i in $(seq 0 3); do echo ${A[$i]}${B[$i]}; done
1a
2b
3c
4d
In your attempt, the for cases aren't closed with a ;, so it keeps interpreting words in your second for statement as cases for the first for statement.
Use instead:
A="1 2 3 4"
B="a b c d"
for m in $A ; do
for n in $B ; do
echo $m$n
done
done

Get n last records and change particular columns on them

I have file like this
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
* a
0 b
I want delete a, b from two last Records in END{} section
Result:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
How can I get n last lines and change fields on them with awk?
Here's one way using any awk:
awk -v count=$(wc -l <file.txt) 'NR > count - 2 { $2 = "" }1' file.txt
Results:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
Or to do awk operations for all records except 2 last lines of input file as a shell script, try ./script.sh file.txt. Contents of script.sh:
command=$(awk -v count=$(wc -l <"$1") 'NR <= count - 2 { $2 = "" }1' "$1"
echo -e "$command"
Results:
1 "45554323" p b
2 "34534567" f a
3 "76546787" u b
2 "56765435" f a
* a
0 b
If you know the value of n - the line number after which you want to delete the last item on the line/colum (here 4) this will work:
awk '{if (NR>4) NF=NF-1}1' data.txt
will give:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
NF = NF -1 makes awk think there is one less field on the line than there is, which is how it doesn't display the last column/item on the line once that condition is met. NR refers to the current line number in the file being read.
awk can't know the number of lines in a file unless it goes through it once, or is given that information (e.g., wc -l). An alternative approach would be to save the last n lines in a buffer (sort of a sliding window/tape-delay type analogy, you are always printing n lines behind) and then process the final n lines in the END block.
This doesn't exactly answer your question but it produces the output you require:
$ gawk '{if (NF < 3) print $1; else print}' input.txt
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
$ cat file
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
* a
0 b
$ awk 'BEGIN{ARGV[ARGC++]=ARGV[ARGC-1]} NR==FNR{nr++; next} FNR>(nr-2) {NF--} 1' file
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
or if you don't mind manually specifying the file name twice:
awk 'NR==FNR{nr++; next} FNR>(nr-2) {NF--} 1' file file

Compare two file columns (unsorted files)

Input File 1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
Input File 2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
Output File
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
Making of Output file
We check 3rd column of Input File 1 and 1st Col of Input File 2.
If they match , we print it in Output file.
Note :
The files are not sorted
I tried :
join -t, A B | awk -F "\t" 'BEGIN{OFS="\t"} {if ($3==$4) print $1,$2,$3,$4,$6}'
But this doesnot work as files are unsorted. so the condition ($3==$4) won't work all the time. Please help .
nawk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
tested below:
> cat file1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
> cat file2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
> awk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
D1 12 DD1 124f2 2
B1 123 BB 124sdf 4
A1 123 AA 1asrf 5
>
You can use join, but you need to sort on the key field first and tell join that the key in the first file is column 3 (-1 3):
join -1 3 <(sort -k 3,3 file1) <(sort file2)
Will get you the correct fields, output (with column -t for output formatting):
AA A1 123 1asrf 5
BB B1 123 124sdf 4
DD1 D1 12 124f2 2
To get the same column ordering listed in the question, you need to specify the output format:
join -1 3 -o 1.1,1.2,1.3,2.2,2.3 <(sort -k 3,3 file1) <(sort file2)
i.e. file 1 fields 1 through 3 then file 2 fields 2 and 3. Output (again with column -t):
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
perl -F'/\t/' -anle 'BEGIN{$f=1}if($f==1){$H{$F[2]}=$_;$f++ if eof}else{$l=$H{$F[0]};print join("\t",$l,#F[1..$#F]) if defined$l}' f1.txt f2.txt
or shorter
perl -F'/\t/' -anle'$f?($l=$H{$F[0]})&&print(join"\t",$l,#F[1..$#F]):($H{$F[2]}=$_);eof&&$f++' f1.txt f2.txt
One way using awk:
awk 'BEGIN { FS=OFS="\t" } FNR==NR { array[$1]=$2 OFS $3; next } { if ($3 in array) print $0, array[$3] }' file2.txt file1.txt
Results:
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
This might work for you (GNU sed):
sed 's|\(\S*\)\(.*\)|/\\s\1$/s/$/\2/p|' file2 | sed -nf - file1

Resources