Bash_shell Use shell to convert three format in one script to another script at one time - shell

cat file1.txt
set A B 1
set C D E 2
set E F 3 3 3 3 3 3
cat file2.txt
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;
please help convert the format in file1.txt to file2.txt, the file2.txt is the output. I just input 3 lines in file1.txt for taking example, but in fact ,there are many command lines same with these 3 format.So the shell command should be adapt to any situation where the content contains these 3 format in file1.txt.

echo "set A B 1
set C D E 2
set E F 3 3 3 3 3 3 " | sed -r 's/set (.) /\1;/;s/([A-Z])*( ([A-Z]))/\1.\3/g;s/([A-Z]) ([0-9])/\1;\2/;s/ ?$/;/'
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;

Related

Finding a pattern, then executing a line change only after the pattern

I have a file of the like:
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
O 1 2 3 4
H 1 2 3 4
H 1 2 3 4
FRAGNAME=H2ODFT
O 1 2 3 4
H 1 2 3 4
H 1 2 3 4
I want to remove the column "1" from the lines only after the $EFRAG line. and add a label to the O H H as well. My expected output is:
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
I'm new to coding in bash, and I'm not quite sure where to start.
I was thinking of piping a grep command to a sed command, but I'm not sure how that syntax would look. Am also trying to learn awk, but that syntax is even more confusing to me. Currently trying to read a book on it's capabilities.
Any thoughts or ideas would be greatly appreciated!
L
Use the following awk processing:
awk '$0~/\$EFRAG/ {
start = 1; # marker denoting the needed block
split("a b c", suf); # auxiliary array of suffixes
}
start {
if (/^FRAGNAME/) idx = 1; # encountering subblock
if (/^[OH]/) { # if starting with O or H char
$2 = "";
$1 = $1 suf[idx++];
}
}1' test.txt
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
If ed is available/acceptable.
The script.ed (name it to your own hearts content) something like:
/\$EFRAG$/;$g/^O /s/^\([^ ]*\) [^ ]* \(.*\)$/\1a \2/\
/^H /s/^\([^ ]*\) [^ ]* \(.*\)$/\1b \2/\
/^H /s/^\([^ ]*\) [^ ]* \(.*\)$/\1c \2/
,p
Q
Now run
ed -s file.txt < script.ed
Change Q to w if in-place editing is required.
Remove the ,p to silence the output.
This might work for you (GNU sed):
sed -E '1,/\$EFRAG/b;/^O/{N;N;s/^(O) \S+(.*\nH) \S+(.*\nH) \S+/\1a\2b\3c/}' file
Do not process lines from the start of the file until after encountering one containing $EFRAG.
If a line begins O, append the next two lines and then using pattern matching and back references, format those lines accordingly.

Find and replace entries in one csv file using another with bash

Main file:
A B
C D
D A
G H
Ref file:
1 A
2 B
3 C
4 D
5 G
6 H
New file:
1 2
3 4
4 1
5 6
I wanna do the above replacement, how can I do that using awk or some simple command line?
awk solution:
awk 'NR==FNR{ a[$2]=$1; next }{ $1=a[$1]; $2=a[$2] }1' reffile mainfile
The output:
1 2
3 4
4 1
5 6
a[$2]=$1 - capturing numbers from reffile into array indexed by letters (e.g. a["A"]=1)
$1=a[$1]; $2=a[$2] - replacing letters in mainfile with respective numbers

bash print complete lines where just the first n characters match

I have created a sorted list of hashes for certain files
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/camera/London 170713/P9110042.JPG
ffb01af8fda1e5c3b74d1eb384d021be1f1577c3 *./Pictures/london/P9110042.JPG
where there are duplicate hashes (just the hashes), I want to print the whole line of all matches
so say there where hashes A B C
A 1
B 2
B 3
C 4
C 5
C 6
in this example all the lines except the first one should be printed
B 2
B 3
C 4
C 5
C 6
Before you continue, look up fdupes.
If you don't want to use a robust tool specifically intended to find duplicate files, you can use sort | uniq:
$ cat file
A 1
B 2
B 3
C 4
C 5
C 6
$ sort file | uniq -w 1 -D
B 2
B 3
C 4
C 5
C 6
Using awk you can do (will work with unsorted file also):
awk 'FNR==NR{seen[$1]++; next} seen[$1]>1' file file
B 2
B 3
C 4
C 5
C 6

Paste side by side multiple files by numerical order

I have many files in a directory with similar file names like file1, file2, file3, file4, file5, ..... , file1000. They are of the same dimension, and each one of them has 5 columns and 2000 lines. I want to paste them all together side by side in a numerical order into one large file, so the final large file should have 5000 columns and 2000 lines.
I tried
for x in $(seq 1 1000); do
paste `echo -n "file$x "` > largefile
done
Instead of writing all file names in the command line, is there a way I can paste those files in a numerical order (file1, file2, file3, file4, file5, ..., file10, file11, ..., file1000)?
for example:
file1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
...
file2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
....
file 3
3 3 3 3 3
3 3 3 3 3
3 3 3 3 3
....
paste file1 file2 file3 .... file 1000 > largefile
largefile
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
....
Thanks.
If your current shell is bash: paste -d " " file{1..1000}
you need rename the files with leading zeroes, like
paste <(ls -1 file* | sort -te -k2.1n) <(seq -f "file%04g" 1000) | xargs -n2 echo mv
The above is for "dry run" - Remove the echo if you satisfied...
or you can use e.g. perl
ls file* | perl -nlE 'm/file(\d+)/; rename $_, sprintf("file%04d", $1);'
and after you can
paste file*
With zsh:
setopt extendedglob
paste -d ' ' file<->(n)
<x-y> is to match positive decimal integer numbers from x to y. x and/or y can be omitted so <-> is any positive decimal integer number. It could also be written [0-9]## (## being the zsh equivalent of regex +).
The (n) is the globbing qualifiers. The n globbing qualifier turns on numeric sorting which sorts on all sequences of decimal digits appearing in the file names.

Add to the end of a predetermined line using sed in bash

I have a file in the format:
C 1 1 2
H 2 2 1
C 3 1 2
C 3 3 2
H 2 3 1
I need to add " f" to the end of specific lines, for example the third line, so the output would be:
C 1 1 2
H 2 2 1
C 3 1 2 f
C 3 3 2
H 2 3 1
From Googling, it seems that I need to use sed, but I couldn't find any examples on how to do specifically what I want.
Thanks in advance.
You are looking for this article on sed. Specifically, the section on restricting to a line number. An example:
sed '3 s/$/f/' < yourFile
awk 'NR==3{$0=$0" f"}1' your_file

Resources