Replacing numbers in a long list by other numbers

Replacing numbers in a long list by other numbers - bash

I am trying to change numbers in a long list by other numbers. For example,
cat inputfile.txt
A 254 B 456 C 546
D 548 E 548 F 458
A 244 B 416 C 566
D 148 E 558 F 428
And I want to change B's value by adding a percentage on it. For example I want to increase the B in the first array by 3% and b in the next one by 2 % as following:
cat inputfile.txt
A 254 B 469.68 C 546
D 548 E 548 F 458
A 244 B 424.32 C 566
D 148 E 558 F 428
I tried the following but it didn't work.
a=(456 416)
b= (469.68 424.32)
for i in ${a[#]};
for j in ${b[#]}; do
sed -i -- "s/${i}/${j}" inputfile.txt
done

There are multiple problems.
cat inputfile.txt
A 254 B 456 C 546
D 548 E 548 F 458
A 244 B 416 C 566
D 148 E 558 F 428
a=(456 416)
b= (469.68 424.32)
for i in ${a[#]};
for j in ${b[#]}; do
sed -i -- "s/${i}/${j}" inputfile.txt
done
a)
Your assignment to b fails. There is no space allowed around assignment.
b) The substitution pattern isn't closed:
sed -i -- "s/${i}/${j}/" inputfile.txt
c) Now it should run with values not empty, but this will try to replace 456 with 469.68 and then with 424.32. This can't work, since the value is already changed to 469.68. Then it will try to change every 416 with both values.
a=(456 416)
b=(469.68 424.32)
for i in ${a[#]};
for j in ${b[#]}; do
sed "s/${i}/${j}/" inputfile.txt
done
You have two corresponding values which need to be in sync, because you want to replace the first by the second. So you have to iterate once, and by the index:
max=${#a[#]}
for i in $(seq 0 $((max - 1))); do
sed "s/${a[$i]}/${b[$i]}/" inputfile.txt
done
I removed the -i for testing from sed.
The last problem is, that there might be number collisions, for example replacing 456 in 1456 with 469.68 or 416 in 416.02 with 424.32.
To prevent this from happening we can put a blank before the number to match and a boundary for matching blank or line end in the end:
sed "s/ ${a[$i]}\b/ ${b[$i]}/" inputfile.txt
The \b-notation has the advantage (over [ $]) that we don't need to catch it, to push it back in the values, it is non-consuming.
a=(456 416)
b=(469.68 424.32)
max=${#a[#]}
for i in $(seq 0 $((max - 1))); do
sed -i "s/ ${a[$i]}\b/ ${b[$i]}/" inputfile.txt
done
I don't know the source of your a and b values - maybe my following reasoning doesn't apply, but the storage of as and bs in arrays seems not optimal. They have to be of same length but aren't guaranteed to be. You may test it, but if one value gets lost, it's hard to find out where it was, to remove the corresponding b-value.
Not combining all originals and all replacements, binding pairs seems the better idea:
a=(456 469.68)
b=(416 424.32)
But that's not far from the final sed-expression, which would be:
a="s/ 456\b/ 469.68/"
b="s/ 416\b/ 424.32/"
Now that's a bit more verbose, but we save the loop complete:
sed -i "${a};${b}" inputfile.txt
and the input file has only to be read once, and now can be testet without -i in complete.
If you happen to have mass data, you can just generate a file like that:
s/ 456\b/ 469.68/
s/ 416\b/ 424.32/
and name it numcorrect.sed, and call it by:
sed -f numcorrect.sed inputfile.txt

Related

What is the "\0" escape and xargs 0n1 do here?

itsmejitu#itsmejitu:~$ numbers=(47 -78 12 45 6)
itsmejitu#itsmejitu:~$ printf "%d \n" ${numbers[#]} | sort -n
-78
6
12
45
47
itsmejitu#itsmejitu:~$ declare -a letters
itsmejitu#itsmejitu:~$ letters=(a c e z l s q a d c v)
itsmejitu#itsmejitu:~$ printf "%s \0" ${letters[#]} | sort -z | xargs -0n1
a
a
c
c
d
e
l
q
s
v
z
itsmejitu#itsmejitu:~$ printf "%s \n" ${letters[#]} | sort -z | xargs -0n1
a
c
e
z
l
s
q
a
d
c
v
Sorting integers is straightforward
I tried to do sorting of letters in bash. Couldn't do it, So my friend sent me this. He couldn't explain though. I looked through printf, xargs manuals. But the terms used there is beyond my understanding(Not a CS student).
Is there any simpler way to understand this?
thanks!!

In the first example, sort sees 5 different numbers separated by line feeds.
In the second example, sort and xargs see 11 different two-character strings (each has a trailing space) separated by null characters.
In the third example, sort and xargs see a single string (containing embedded line feeds and spaces) "separated" by a null character.
It might help to pipe the output of printf through hexdump -C or od to see what sort sees in each case.

how to use sed to replace the specific line/lines in a file with the contents from another file

I want to replace several lines in one of my files with the contents (which consists of the same lines) from another file which is located in another folder with the sed command.
For example: file1.txt is in /storage/file folder, and it looks like this:
'ABC'
'EFG' 001
HJK
file2.txtis located in /storage folder, and it looks like this:
'kkk' 123456789
yyy
so I want to use the content of file2.txt (which is one line) to replace the 2nd and 3rd line of file1.txt, and `file1.txt' should become like this:
'ABC'
'kkk' 123456789
yyy
I probably should make my questions more clear. So I'm trying to write a shell script which can be used to change several lines of a file (let's call it old.txt) with the new contents that I supplied in other files (which only contains the contents to be updated to the old file, for example, these files are dataA.txt,dataB.txt...... ).
Let's say, I want to replace the 3rd line of old.txt which is:
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 100 77760 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
with the new data that I supplied in dataA.txt which is:
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
and to replace the 15th to 18th lines of the old.txt file which looks like:
100 0 1
101 1 2
102 2 1.5
103 4 52
with the supplied `dataB.txt' file which looks like (also contain 4 lines):
-100
-101
-102
-103
As I'm totally new to shell script programming, and I only used sedbefore, I tried the following command:
to change the 3ed line, I did sed -i '3c r ../../dataA.txt' old.txt, r ../../dataA.txt is to find the location of dataA.txt. However, as c needs to be followed by the content that to be changed rather the path of the content that to be changed. so I'm not very sure how to correctly use sed. Another idea that I'm thinking is to insert the dataA.txt ,dataB.txt... in front of the line that I want to modify and then deleted the old lines. But I'm still not sure how to do it after I googled for so long...

To replace a range of lines with entire contents of another file:
sed -e '15r file2' -e '15,18d' file1
To replace a single line with entire contents of another file:
sed -e '2{r file2' -e 'd}' file1
If you don't know whether file2 ends in newline or not, you can use the below trick (see What does this mean in Linux sed '$a\' a.txt):
sed '$ a\' file2 | sed -e '3{r /dev/stdin' -e 'd}' file1
The main trick is to use r command to add contents from the other file for the starting line address. And then delete the line(s) to be replaced. The -e option is needed because everything after r will be treated as filename.
Note that these have been tested with GNU sed, I'm not sure if it will vary for other implementations.
See my github repo for more examples, such as matching lines based on regex instead of line numbers.

It is trivial with ed
printf '%s\n' '2,$d' 'r /storage/file2.txt' ,p Q | ed -s /storagefile/file1.txt
A syntax that should work with more variety of Unix shells.
printf '2,$d\nr /storage/file2.txt\n,p\nQ\n' | ed -s /storage/file/file1.txt
2,$d means 2 and $ are the line addresses, 2 is line 2 and $ is the last line in the buffer and d means delete.
,p means print everything to stdout which is your screen.
Q means silence the error which q will not.
With ed to change line 3 of a file with another content of a file, without using shell variables.
First delete the content of line 3 of the file.
printf '%s\n' '3d' ,p Q | ed -s file1.txt
Then add the content of the other file, say file2.txt at line 3.
printf '2r file2.txt' ,p Q | ed -s file1.txt
To replace a group/set of lines in a file with the content of another file.
First delete the lines, say 15 to 18 from say file1.txt
printf '%s\n' '15,18d' ,p Q | ed -s file1.txt
Then add the content of say file2.txt to line 15 of file1.txt
printf '%s\n' '14r file2.txt' ,p Q | ed -s file1.txt
The Q does not edit anything replace it with w to edit files.
The r appends so 14 r means append the content of another file after line 14 which makes it line 15. Same is true with 2 r append to line 2 which makes it line 3.
Also all of that can be done with one line, this code was adopted with your data/files names. Also this assumes that all the text file are in the same directory where you will run the code below, otherwise add the absolute path of the files in question.
printf '%s\n' '3d' '2r dataA.txt' '15,18d' '14r dataB.txt' ,n Q | ed -s old.txt
Replace the Q with w If you're satisfied with the output and if you want to actually edit the old.txt
the ,n prints everything to stdout which is your screen but with a line number at the front.
To have an idea of what the actual code is being pipe to ed remove or comment out the pipe | and all the code after that.
See info ed or man ed for more info about ed
An example of that ed script.
Create a new directory and cd into it.
mkdir temp && cd temp
cat dataA.txt
Output
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
cat dataB.txt
Output
-100
-101
-102
-103
cat old.txt
Output
foo
bar
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 100 77760 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
a
b
c
d
e
f
g
h
i
j
k
100 0 1
101 1 2
102 2 1.5
103 4 52
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
The script.
printf '%s\n' '3d' '2r dataA.txt' '15,18d' '14r dataB.txt' ,n w | ed -s old.txt
Output
1 foo
2 bar
3 'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
4 a
5 b
6 c
7 d
8 e
9 f
10 g
11 h
12 i
13 j
14 k
15 -100
16 -101
17 -102
18 -103
19 l
20 m
21 n
22 o
23 p
24 q
25 r
26 s
27 t
28 u
29 v
30 w
31 x
32 y
33 z
The actual old.txt
cat old.txt
Output
foo
bar
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
a
b
c
d
e
f
g
h
i
j
k
-100
-101
-102
-103
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

How to divide numbers stored as text into many parts in awk or maybe sed or other?

I need to divide my text file. In my text file, I have numbers. This is a small fragment of my input file. In my text file, I have numbers from 29026 to 58050.
29026 29027 29028 29029 29030 29031 29032 29033 29034 29035 29036 29037 29038 29039 29040
29041 29042 29043 29044 29045 ...........................................................
................................................58029 58030 58031 58032 58033 58034 58035
58036 58037 58038 58039 58040 58041 58042 58043 58044 58045 58046 58047 58048 58049 58050
I must create 225 index groups. Every group must have 129 numbers. So my output will look like
[ Lipid 1 ]
29026 29027 29028 29029 ...................................
...............
...........................29150 29151 29152 29153 29154
[ Lipid 2 ]
...
...
[ Lipid 225 ]
57921 57922 57923 57924 57925 57926......
.....
.......................
58044 58045 58046 58047 58048 58049 58050
Do you have any idea?
Edit
My text file
29026 29027 29028 29029 29030 29031 29032 29033 29034 29035 29036 29037 29038 29039 29040
29041 29042 29043 29044 29045 29046 29047 29048 29049 29050 29051 29052 29053 29054 29055
29056 29057 29058 29059 29060 29061 29062 29063 29064 29065 29066 29067 29068 29069 29070
29071 29072 29073 29074 29075 29076 29077 29078 29079 29080 29081 29082 29083 29084 29085
29086 29087 29088 29089 29090 29091 29092 29093 29094 29095 29096 29097 29098 29099 29100
29101 29102 29103 29104 29105 29106 29107 29108 29109 29110 29111 29112 29113 29114 29115
29116 29117 29118 29119 29120 29121 29122 29123 29124 29125 29126 29127 29128 29129 29130
29131 29132 29133 29134 29135 29136 29137 29138 29139 29140 29141 29142 29143 29144 29145
29146 29147 29148 29149 29150 29151 29152 29153 29154 29155 29156 29157 29158 29159 29160
29161 29162 29163 29164 29165 29166 29167 29168 29169 29170 29171 29172 29173 29174 29175
29176 29177 29178 29179 29180 29181 29182 29183 29184 29185 29186 29187 29188 29189 29190
29191 29192 29193 29194 29195 29196 29197 29198 29199 29200 29201 29202 29203 29204 29205
29206 29207 29208 29209 29210 29211 29212 29213 29214 29215 29216 29217 29218 29219 29220
29221 29222 29223 29224 29225 29226 29227 29228 29229 29230 29231 29232 29233 29234 29235
29236 29237 29238 29239 29240 29241 29242 29243 29244 29245 29246 29247 29248 29249 29250
29251 29252 29253 29254 29255 29256 29257 29258 29259 29260 29261 29262 29263 29264 29265
29266 29267 29268 29269 29270 29271 29272 29273 29274 29275 29276 29277 29278 29279 29280
29281 29282 29283 29284 29285 29286 29287 29288 29289 29290 29291 29292 29293 29294 29295
29296 29297 29298 29299 29300 29301 29302 29303 29304 29305 29306 29307 29308 29309 29310
29311 29312 29313 29314 29315 29316 29317 29318 29319 29320 29321 29322 29323 29324 29325
29326 29327 29328 29329 29330 29331 29332 29333 29334 29335 29336 29337 29338 29339 29340
29341 29342 29343 29344 29345 29346 29347 29348 29349 29350 29351 29352 29353 29354 29355
29356 29357 29358 29359 29360 29361 29362 29363 29364 29365 29366 29367 29368 29369 29370
29371 29372 29373 29374 29375 29376 29377 29378 29379 29380 29381 29382 29383 29384 29385
29386 29387 29388 29389 29390 29391 29392 29393 29394 29395 29396 29397 29398 29399 29400
29401 29402 29403 29404 29405 29406 29407 29408 29409 29410 29411 29412 29413 29414 29415
29416 29417 29418 29419 29420 29421 29422 29423 29424 29425 29426 29427 29428 29429 29430
here I have thousands of lines, but I will not paste all of this text
57736 57737 57738 57739 57740 57741 57742 57743 57744 57745 57746 57747 57748 57749 57750
57751 57752 57753 57754 57755 57756 57757 57758 57759 57760 57761 57762 57763 57764 57765
57766 57767 57768 57769 57770 57771 57772 57773 57774 57775 57776 57777 57778 57779 57780
57781 57782 57783 57784 57785 57786 57787 57788 57789 57790 57791 57792 57793 57794 57795
57796 57797 57798 57799 57800 57801 57802 57803 57804 57805 57806 57807 57808 57809 57810
57811 57812 57813 57814 57815 57816 57817 57818 57819 57820 57821 57822 57823 57824 57825
57826 57827 57828 57829 57830 57831 57832 57833 57834 57835 57836 57837 57838 57839 57840
57841 57842 57843 57844 57845 57846 57847 57848 57849 57850 57851 57852 57853 57854 57855
57856 57857 57858 57859 57860 57861 57862 57863 57864 57865 57866 57867 57868 57869 57870
57871 57872 57873 57874 57875 57876 57877 57878 57879 57880 57881 57882 57883 57884 57885
57886 57887 57888 57889 57890 57891 57892 57893 57894 57895 57896 57897 57898 57899 57900
57901 57902 57903 57904 57905 57906 57907 57908 57909 57910 57911 57912 57913 57914 57915
57916 57917 57918 57919 57920 57921 57922 57923 57924 57925 57926 57927 57928 57929 57930
57931 57932 57933 57934 57935 57936 57937 57938 57939 57940 57941 57942 57943 57944 57945
57946 57947 57948 57949 57950 57951 57952 57953 57954 57955 57956 57957 57958 57959 57960
57961 57962 57963 57964 57965 57966 57967 57968 57969 57970 57971 57972 57973 57974 57975
57976 57977 57978 57979 57980 57981 57982 57983 57984 57985 57986 57987 57988 57989 57990
57991 57992 57993 57994 57995 57996 57997 57998 57999 58000 58001 58002 58003 58004 58005
58006 58007 58008 58009 58010 58011 58012 58013 58014 58015 58016 58017 58018 58019 58020
58021 58022 58023 58024 58025 58026 58027 58028 58029 58030 58031 58032 58033 58034 58035
58036 58037 58038 58039 58040 58041 58042 58043 58044 58045 58046 58047 58048 58049 58050

Here is how I understood your problem:
The input is a text file in several lines, with fifteen numbers on each line, separated by spaces or tabs. Some lines (perhaps the last one) may have fewer than fifteen numbers. (In fact in the solution below it doesn't matter how many numbers are on each line.)
You must group the numbers into sets of 129 numbers each, in sequence. The last group may have less than 129 numbers, if the input cardinality is not an exact multiple of 129. In the solution below, it doesn't matter how many input numbers there are (and therefore how many groups there will be in the output).
For each group of 129 numbers, you must get a few lines in the output. First, a title or label that says [Lipid n] where n is the line number, and then the numbers in that group, shown fifteen per line (so, there will be eight full lines and a ninth line with only 9 numbers on it: 129 = 15 * 8 + 9).
Here's how you can do this. First let's start with a small example, and then we can look at what must be changed for a more general solution.
I will assume that your inputs can be arbitrary numbers of any length; of course, if they are consecutive numbers like you showed in your sample data, then the problem is trivial and completely uninteresting. So let's assume your numbers are in fact any numbers at all. (Not really; I wrote the solution for non-negative integers; but it can be re-written for "tokens" of non-blank characters separated by blanks.)
I start with the following input file:
$ cat lipid-inputs
124 150 178 111 143 177 116
154 194 139 183 132 180 133
185 142 101 159 122 184 151
120 188 161 136 113 189 170
We want to group the 28 input numbers into sets of ten numbers each, and present the output with (at most) seven numbers per line. So: There will be two full groups, and a third group with only eight member numbers (since we have only 28 inputs). The desired output looks like this:
[Lipid 1]
124 150 178 111 143 177 116
154 194 139
[Lipid 2]
183 132 180 133 185 142 101
159 122 184
[Lipid 3]
151 120 188 161 136 113 189
170
Strategy: First write the input numbers one per line, so we can then arrange them ten per line (ten: cardinality of desired groups in the output). Then add line numbers (which will go into the label lines). Then edit the "line number" lines to add the "lipid" stuff, and break the data lines into shorter lines, showing seven tokens each (possibly fewer on the last line in each group).
Implementation: tr to break up the tokens one per line; paste reading repeatedly from standard input, ten stdin lines for each output line; then sed = to add the line numbers (on separate lines); and finally a standard sed for the final editing. The command looks like this:
$ tr -s ' ' '\n' < lipid-inputs | paste -d ' ' - - - - - - - - - - |
> sed = | sed -E 's/^[[:digit:]]+$/[Lipid &]/ ;
> s/(([[:blank:]]*[[:digit:]]+){7}) /\1\n/g'
The output is the one I showed already.
To generalize (so you can apply to your problem): The number of tokens per line in the input file is irrelevant. To get 15 tokens per line in the output, change the hard-coded number 7 to 15 on the last line in the command shown above. And to allocate 129 tokens per line, instead of 10, what needs to be changed is the paste command: I show it reading ten times from stdin. You need 129. So it would be better to create a string of 129 dashes separated by space, in a simple command - rather than hard-coding - and to use that string as an input to paste. I show how to do this for my example, you will adapt for yours.
Define variables to hold your relevant values: how many tokens per lipid (129 in your case, 10 in mine) and how many tokens per line in the output (15 in your case, 7 in mine).
$ tokens_per_lipid=10
$ tokens_per_line=7
Then create a variable to hold the string - - - - [...] needed in the paste command. There are several ways to do this, here's just one:
$ paste_arg=$(yes '-' | head -n $tokens_per_lipid | tr '\n' ' ')
Let's check it:
$ echo $paste_arg
- - - - - - - - - -
OK, so let's re-write the command that does what you need. We must use double-quotes for the argument to sed to allow variable expansion.
$ tr -s ' ' '\n' < lipid-inputs | paste -d ' ' $paste_arg |
> sed = | sed -E "s/^[[:digit:]]+$/[Lipid &]/ ;
> s/(([[:blank:]]*[[:digit:]]+){$tokens_per_line}) /\1\n/g"
[Lipid 1]
124 150 178 111 143 177 116
154 194 139
[Lipid 2]
183 132 180 133 185 142 101
159 122 184
[Lipid 3]
151 120 188 161 136 113 189
170

I have no clue what you really are trying to do, but maybe this does what you want
< input sed -zE 's/(([0-9]+[^0-9]+){129})/[ Lipid # ]\n\1\n/g' | awk 'BEGIN { RS = ORS = "]" } { sub("#", NR) } 1' | sed '$d'
It uses Sed to insert the [ Lipid # ] string (with some newline) every 129 occurrences of [0-9]+[^0-9]+ (which is 1 or more digits followed by 1 or more non-digits); then it uses Awk to substitute # with numbers from one (to do so, it interprets the ] as the record separator, and so it can change # to the number of the record NR); finally it uses Sed again to remove the last line which appears as the last record separator from the Awk processing.
I used Awk for inserting the increasing numbers as there's no easy way to do maths in Sed; I used Sed to break the file and insert text in between as requested as I find it easier than doing it in Awk.
If you need to have all numbers on one line in the output, you can do
< input sed -zE 's/[^0-9]+/ /g;s/(([0-9]+[^0-9]+){129})/[ Lipid # ]\n\1\n/g' | awk 'BEGIN { RS = ORS = "]" } { sub("#", NR) } 1' | sed '$d'
where I have just added s/[^0-9]+/ /g; to collapse whatever happens to be between numbers to a single whitespace.

Faster way to extract data from large file

I have file containing about 40000 frames of Cartesian coordinates of 28 atoms. I need to extract coordinates of atom 21 to 27 from each frame.
I tried using bash script with for-loop.
for i in {0..39999}
do
cat $1 | grep -A 27 "frame $i " | tail -n 6 | awk '{print $2, $3, $4}' >> new_coors.xyz
done
Data have following form:
28
-1373.82296 frame 0 xyz file generated by terachem
Re 1.6345663991 0.9571586961 0.3920887712
N 0.7107677071 -1.0248027788 0.5007181135
N -0.3626961076 1.1948218124 -0.4621264246
C -1.1299268126 0.0792071086 -0.5595954110
C -0.5157993503 -1.1509115191 -0.0469223696
C 1.3354467762 -2.1017253883 1.0125736017
C 0.7611763218 -3.3742177216 0.9821756556
C -1.1378354025 -2.4089069492 -0.1199253156
C -0.4944655989 -3.5108477831 0.4043826684
C -0.8597552614 2.3604180994 -0.9043060625
C -2.1340008843 2.4846545826 -1.4451933224
C -2.4023114639 0.1449111237 -1.0888703147
C -2.9292779079 1.3528434658 -1.5302429615
H 2.3226814021 -1.9233467458 1.4602019023
H 1.3128699342 -4.2076373780 1.3768411246
H -2.1105470176 -2.5059031902 -0.5582958817
H -0.9564415355 -4.4988963635 0.3544299401
H -0.1913951275 3.2219343258 -0.8231465989
H -2.4436044324 3.4620639189 -1.7693069306
H -3.0306593902 -0.7362803011 -1.1626515622
H -3.9523215784 1.4136948699 -1.9142814745
C 3.3621999538 0.4972227756 1.1031860016
O 4.3763020637 0.2022266109 1.5735343064
C 2.2906331057 2.7428149541 0.0483795630
O 2.6669163864 3.8206298898 -0.1683800650
C 1.0351398442 1.4995168190 2.1137684156
O 0.6510904387 1.8559680025 3.1601927094
Cl 2.2433490373 0.2064711824 -1.9226174036
It works but it takes enormous amount of time,
In future I will be working with larger file. Is there faster way to do that?

The reason why your program is slow is that you keep on re-reading your input file over and over in your for-loop. You can do everything with reading your file a single time and use awk instead:
awk '/frame/{c=0;next}{c++}(c>20 && c<27){ print $2,$3,$4 }' input > output
This answer assumes the following form of data:
frame ???
??? x y z ???
??? x y z ???
...
frame ???
??? x y z ???
??? x y z ???
...
The solution checks if it finds the word frame in a line. If so, it sets the atom counter c to zero and skips to the next line. From that point forward, it will always read increase the counter if it reads a new line. If the counter is between 20 and 27 (exclusive), it will print the coordinates.
You can now easily expand on this: Assume you want the same atoms but only from frame 1000 till 1500. You can do this by introducing a frame-counter fc
awk '/frame/{fc++;c=0;next}{c++}(fc>=1000 && fc <=1500) && (c>20 && c<27){ print $2,$3,$4 }' input > output

If frames numbers in file are already in sorted order, e.g. they have numbers 0 - 39999 in this order, then maybe something likes this could do the job (not tested, since we don't have a sample input file, as Jepessen suggested):
cat $1 | grep -A 27 -E "frame [0-9]+ " | \
awk '{if ($1 == "frame") n = 0; if (n++ > 20) print $2, $3, $4}' > new_coors.xyz
(code above made explicitly verbose to be easier to understand and closer to your existing script. If you need a more compact solution check kvantour answer)

You could perhaps use 2 passes of grep, rather than thousands?
Assuming you want the lines 21-27 after every frame, and you don't want to record the frame number itself, the following phrase should get the lines you want, which you can then 'tidy' with awk:
grep -A27 ' frame ' | grep -B6 '-----'
If you also wanted the frame numbers (I see no evidence), or you really want to restrict the range of frame numbers, you could do that with tee and >( grep 'frame') to generate a second file that you would then need to re-merge. If you added -n to grep then you could easily merge sort the files on line number.
Another way to restrict the frame number without doing multiple passes would be a more complex grep expression that describes the range of numbers (-E because life is too short for backticks):
-E ' frame (([0-9]{1,4}|[0-3][0-9]{1,4}) '

check continuity of a number series using if-else

I have a file which contain numbers, say 1 to 300. But the numbers are not continuous. A sample file looks like this
042
043
044
045
078
198
199
200
201
202
203
212
213
214
215
238
239
240
241
242
256
257
258
Now I need to check the continuity of the number series and accordingly write out the output. For example the first 4 numbers are in series, so the output should be
042-045
Next, 078 is a lone number, so the output should be
078
for convenience it can be made to look like
078-078
Then 198 to 203 are continuous. So, next output should be
198-203
and so on. The final output should be like
042-045
078-078
198-203
212-215
238-242
256-258
I just need to know the first and end member of the continuous series and jump on the next series when discontinuity is encountered; The output can be manipulated. I am inclined to use the if statement and can think of a complicated thing like this
num=`cat file | wc -l`
out1=`head -1 file`
for ((i=2;i<=$num;i++))
do
j=`echo $i-1 | bc`
var1=`cat file | awk 'NR='$j'{print}'`
var2=`cat file | awk 'NR='$i'{print}'`
var3=`echo $var2 - $var1 | bc`
if [ $var3 -gt 1 ]
then
out2=$var1
echo $out1-$out2
out1=$var2
fi
done
which works but too lengthy. I am sure there is definitely a short way of doing this.
I am also open to other straight-forward command (or few commands) in shell, awk or a few lines of fortran code that can do it.
Thanking you in anticipation.

This awk one-liner works for given example:
awk 'p+1!=$1{printf "%s%s--",NR==1?"":p"\n",$1}{p=$1}END{print $1}' file
It gives the output for your data as input:
042--045
078--078
198--203
212--215
238--242
256--258

Here is a simple program in Fortran:
program test
implicit none
integer :: first, last, uFile, i, stat
open( file='numbers.txt', newunit=uFile, action='read', status='old' )
read(uFile,*,iostat=stat) i
if ( stat /= 0 ) stop
first = i ; last = i
do
read(uFile,*,iostat=stat) i
if ( stat /= 0 ) exit
if ( i == last+1 ) then
last = i
else
print *,first,'-',last
write(*,'(i3.3,a,i3.3)') first,'-',last
endif
enddo
write(*,'(i3.3,a,i3.3)') first,'-',last
end program
The output is
042-045
078-078
198-203
212-215
238-242
256-258

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Replacing numbers in a long list by other numbers - bash

Related

What is the "\0" escape and xargs 0n1 do here?

how to use sed to replace the specific line/lines in a file with the contents from another file

How to divide numbers stored as text into many parts in awk or maybe sed or other?

Faster way to extract data from large file

check continuity of a number series using if-else

Categories

Resources