I have a number of pdb files and I want to keep only those lines that starts with ^FORMUL and if line has C followed by the number that is larger then (C3,C4,C5,C6..100 etc) then I should not print it. Second condition is that within every line sum of C, H and N should be >= 6, then delete it
So overall delete the lines in which C is followed by number more then 2> and sum of C+O+N is >= then 6.
FORMUL 3 HOH *207(H2 O) (print it)
FORMUL 2 SF4 FE4 S4 (print it)
FORMUL 3 NIC C5 H7 N O7 (don't print, there is C5 and sum is more then 6)
FORMUL 4 HOH *321(H2 O) (print it)
FORMUL 3 HEM 2(C34 H32 FE N4 O4) (don't print, there is C34)
I have tried to do it in perl but lines are soo diverse from each other so Im not sure if it is possible to do.
Over all these conditions chould be included together, meaning that all lines in which C>2 and sum>=6 should be deleted.
C1 O5 N3 should be deleted; C3 N1 01 should not be deleted although C is 3.
In perl I don't know how to assign these two conditions. Here I wrote opposite situation not to delete but to print these lines if these two conditions are not met.
#!/usr/bin/perl
use strict;
use warnings;
my #lines;
my $file;
my $line;
open ($file, '<', '5PCZ.pdb') or die $!;
while (my $line = <$file>)
{
if ($line =~ m/^FORMUL/)
{
push (#lines, $line);
}
}
close $file;
#print "#lines\n";
foreach $line(#lines)
{
if ($line eq /"C"(?=([0-2]))/ )
{
elsif ($line eq "Sum of O,N & C is lt 6")
print #lines
}
}
As you've seen, it's probably easier to write this as a filter that prints the lines that you want to keep. I've also written this following the Unix Filter Model (reads from STDIN, writes to STDOUT) as that makes the program far more flexible (and, interestingly, easier to write!)
Assuming that you're running the program on Linux (or similar) and that your code is in an executable file called my_filter (I recommend a more descriptive name!) then you would call it like this:
$ my_filter < 5PCZ.pdb > 5PCZ.pdb.new
The code would look like this:
#!/usr/bin/perl
use strict;
use warnings;
while (<>) { # read from STDIN a line at a time
# Split data on whitespace, but only into four columns
my #cols = split /\s+/, $_, 4;
next unless $cols[0] eq 'FORMUL';
# Now extract the letter stuff into a hash for easy access.
# We extract letters from the final column in the record.
my %letters = $cols[-1] =~ m/([A-Z])(\d+)/g;
# Give the values we're interested in, a default of 0
$letters{$_} //= 0 for (qw[C O N]);
next if $letters{C} > 2
and $letters{C} + $letters{O} + $letters{N} >= 6;
# I think we can then print the line;
print;
}
This seems to give the correct output for your sample data. And I hope the comments make it obvious how to tweak the conditions.
Extended Awk solution:
awk -F'[[:space:]][[:space:]]+' \
'/^FORMUL/{
if ($4 !~ /\<C/) print;
else {
match($4, /\<C[0-9]+/);
c=substr($4, RSTART+1, RLENGTH);
if (c > 2) next;
else {
match($4, /\<O[0-9]+/);
o=substr($4, RSTART+1, RLENGTH);
match($4, /\<N[0-9]+/);
n=substr($4, RSTART+1, RLENGTH);
if (c+o+n < 6) print
}
}
}' 5PCZ.pdb
The output:
FORMUL 3 HOH *207(H2 O)
FORMUL 2 SF4 FE4 S4
FORMUL 4 HOH *321(H2 O)
This is a question on combinatorics from a non-mathematician, so please try to bear with me!
Given an array of n distinct characters, I want to generate subsets of k characters in a minimal-change order, i.e. an order in which generation i+1 contains exactly one character that was not in generation i. That's not too hard in itself. However, I also want to maximise the number of cases in which the character that is swapped out in generation i+1 is the same character that was swapped in in generation i. To illustrate, for n=7, k=3:
abc abd abe* abf* abg* afg aeg* adg* acg* acd ace* acf* aef adf* ade
bde bdf bef bcf* bce bcd* bcg* bdg beg* bfg* cfg ceg* cdg* cde cdf* cef def deg dfg efg
The asterisked strings indicate the case I want to maximise; e.g. the e that is new in generation 3, abe, replaces a d that was new in generation 2, abd. It doesn't seem possible to have this happen in every generation, but I want it to happen as often as possible.
Typical array sizes that I use are 20-30 and subset sizes around 5-8.
I'm using an odd language, Icon (or actually its derivative Unicon), so I don't expect anyone to post code that I can used directly. But I will be grateful for answers or hints in pseudo-code, and will do my best to translate C etc. Also, I have noticed that problems of this kind are often discussed in terms of arrays of integers, and I can certainly apply solutions posted in such terms to my own problem.
Thanks
Kim Bastin
Edit 15 June 2010:
I do seem to have got into deeper water than I thought, and while I'm grateful for all answers, not all of them have been relevant. As an example of a solution which is NOT adequate, let me post my own Unicon procedure for generating k-ary subsets of a character set s in a minimal change order. Things you need to know to understand the code are: a preposed * means the size of a structure, so if s is a string, *s means the size of s (the number of characters it contains). || is a string concatenation operation. A preposed ! produces each element of a structure, e.g. each character of a string, in turn on successive passes. And the 'suspend' control structure returns a result from a procedure, but leaves the procedure 'in suspense', with all local variables in place, so that new results can be produced if the procedure is called in a loop.
procedure revdoor(s, k)
# Produces all k-subsets of a string or character set s in a 'revolving
# door' order. Each column except the first traverses the characters
# available to it in alphabetical and reverse alphabetical order
# alternately. The order of the input string is preserved.
# If called in a loop as revdoor("abcdefg", 3),
# the order of production is: abc, abd, abe, abf, abg, acg, acf, ace, acd,
# ade, adf, adg, aeg, aef, afg, bfg, bef, beg, bdg, bdf, bde, bcd, bce,
# bcf, bcg, cdg, cdf, cde, cef, ceg, cfg, dfg, deg, def, efg
local i
static Ctl
if /Ctl then { # this means 'if Ctl doesn't exist'
if k = 0 then return ""
Ctl := list(k, 1) # a list of k elements, each initialised to 1.
}
if Ctl[k] = 1 then {
if k = 1 then suspend !s else
every i := 1 to *s-k+1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
} else {
if k = 1 then suspend !reverse(s) else
every i := -k to -*s by -1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
}
# the following line multiplies element k of Ctl by -1 if k < size of Ctl
# (this controls the order of generation of characters),
# and destroys Ctl on final exit from the procedure.
if k < *Ctl then Ctl[k] *:= -1 else Ctl := &null
end
Note that the output of the above procedure is not optimal in my sense. One result of my investigations so far is that the maximum 'swapping score' for a list of k-ary subsets of n elements is not less than comb(n-1, k), or in the case of n=7, k=3, the maximum score is at least comb(6, 3) = 20. I define the 'swapping score' of a list as the number of items in the list whose new element replaces an element in the previous item which was itself new. I haven't got the mathematical equipment to prove this, but it is easy to see with k=1 or k=2. For certain (n,k) a slightly higher score is possible, as in the case of n=7, k=3:
abc abd abe abf abg
acg adg aeg afg
efg dfg cfg bfg
beg bdg bcg
bcd bce bcf
bdf bef
def cef aef
adf acf
acd ace
ade
bde cde
cdf cdg
ceg
deg (swapping score = 21)
It may be noted that the above list is in 'strong minimal change order' (like word golf: the new character is always in the same position as the character it replaces), which may indicate the direction my own work is taking. I hope to post something more in a few days.
Kim
It's fairly straightforward. In order to maximise replacement just think of the characters as numbers and increment the string by one till you have reached the upper limit.
Then check to see that you don't use the same character twice in the string.
I think this would work:
char c[] = {'a', 'b', 'c', 'd', 'e'};
const int n = 5;
const int k = 3;
char s[k];
void print()
{
for( int i = 0; i < k; ++i )
putchar(c[s[i]]);
putchar('\n');
}
bool increment( int m )
{
// reached the limit?
if( ++s[m] == n && m == 0 )
return false;
next:
for( int i = 0; i < m; ++i )
{
if( s[m] == n )
{
// carry
s[m] = 0;
if( !increment( m-1 ))
return false;
goto next;
}
else if( s[i] == s[m] )
{
// the character is already used
++s[m];
goto next;
}
}
return true;
}
int main(int, char**)
{
// initialise
for( int i = 0; i < k; ++i )
s[i] = i;
// enumerate all combinations
do
print();
while(increment(k-1));
}
Kim, your problem description sounds very much like a (homework) attempt to describe the simplest solution for enumerating all k-combinations of a set of n elements - without giving the actual solution away too easily. Anyway, see below for my shot. I used Java but the important parts are not different from C.
public class Homework
{
/**
* Prints all k-combinations of a set of n elements. Answer to this
* question: http://stackoverflow.com/questions/2698551
*/
public static void main(String[] args)
{
Combinations combinations = new Combinations(7, 3);
System.out.printf(
"Printing all %d %d-combinations of a set with %d elements:\n",
combinations.size(), combinations.k, combinations.n);
for (int[] c : combinations)
System.out.println(Arrays.toString(c));
}
/**
* Provides an iterator for all k-combinations of a set of n elements.
*/
static class Combinations implements Iterable<int[]>
{
public final int n, k;
public Combinations(int n, int k)
{
if (n < 1 || n < k)
throw new IllegalArgumentException();
this.n = n;
this.k = k;
}
#Override
public Iterator<int[]> iterator()
{
return new Iterator<int[]>()
{
private int[] c;
#Override
public void remove() { throw new UnsupportedOperationException(); }
#Override
public int[] next()
{
if (c == null)
{
c = new int[k];
for (int i = 0; i < k; i++)
c[i] = i;
}
else
{
int i = c.length - 1;
while (i >= 0 && c[i] == n - k + i)
i--;
if (i < 0)
throw new NoSuchElementException();
c[i]++;
for (int j = i + 1; j < c.length; j++)
c[j] = c[i] + j - i;
}
return c.clone(); // remove defensive copy if performance is more important
}
#Override
public boolean hasNext() { return c == null || c[0] < n - k; }
};
}
/**
* Returns number of combinations: n! / (k! * (n - k)!).
*/
public BigInteger size()
{
BigInteger s = BigInteger.valueOf(n);
for (int i = n - 1; i > n - k; i--)
s = s.multiply(BigInteger.valueOf(i));
for (int i = k; i > 1; i--)
s = s.divide(BigInteger.valueOf(i));
return s;
}
}
}
Here is the output for your example:
Printing all 35 3-combinations of a set with 7 elements:
[0, 1, 2] [0, 1, 3] [0, 1, 4] [0, 1, 5] [0, 1, 6] [0, 2, 3] [0, 2, 4] [0, 2, 5] [0, 2, 6] [0, 3, 4]
[0, 3, 5] [0, 3, 6] [0, 4, 5] [0, 4, 6] [0, 5, 6] [1, 2, 3] [1, 2, 4] [1, 2, 5] [1, 2, 6] [1, 3, 4]
[1, 3, 5] [1, 3, 6] [1, 4, 5] [1, 4, 6] [1, 5, 6] [2, 3, 4] [2, 3, 5] [2, 3, 6] [2, 4, 5] [2, 4, 6]
[2, 5, 6] [3, 4, 5] [3, 4, 6] [3, 5, 6] [4, 5, 6]
Rather than start with an algorithm, I've tried to think of a way to find a form for the maximum "swapping score", so that you know what to shoot for. Often an algorithm for producing the desired structure can emerge from such a proof.
It's been a long time since university, but I've tried to think of a combinatorial model that will help to figure this out, without very much luck.
I started by imagining the set of combinations as vertices in a graph, with a edges corresponding to the "adjacency" (only one element difference) of the combinations. So:
"n choose k" vertices
each vertex has degree k(n-k)
number of edges = "n choose k" * k(n-k) / 2 = "n choose 2" * "n-2 choose k-1"
There's a lot of symmetry to these graphs. The graph is the same for any given {n,k} as it is for {n,n-k}. If k=1 or k=n-1 it's the complete graph on n vertices (each combinations differs from all the others by only one character). I can't see an obvious algorithm from this, though.
Edit: My next thought was to conceive the graph with a slightly different interpretation. You can think of each {n,k}-combination as a sequence of n bits where there are k 1s. The position of the 1s corresponds to which of the n characters is present in the combination. So for n=7 k=3, abc is 1110000, adg is 1001001, efg is 0000111. With this you can also imagine the points lying at the corners of an n-dimensional hypercube. So for a given subsequence, the edges match your "minimal swapping" criteria if they are co-planar: I think of them as "cutting planes" through the hypercube.
You are looking for a Hamiltonian path through this graph of combinations, one that meets your special criteria.
Another way to think of your problem is to minimize the number of times in the sequence that you do change which character in the combination is being altered.
For a good answer, would computing the list of combinations all at once be acceptable, or do you need to compute them one at a time? In other words, do you need a function:
Combination nextCombo();
or would
vector<Combination> allCombinations();
be acceptable?
If computing the combinations in batch is permissible, it is possible that an iterative-deepening A* search (or just an A* search but I suspect it'd run out of memory) would work. With an admissible heuristic, A* is guaranteed to give the optimum. I'm short of time, so I decided to post this partial answer and edit the post if I get time to write code.
A* is a graph search algorithm. In this case, the nodes are lists of combinations used so far (no duplicates allowed in the list). My plan was to use a bit-string representation for the nodes. n=30 would fit into a 32 bit integer. We can arbitrarily permute any solution so that the first combination begins with 0's and ends in 1's, i.e. 000...1111. A node with a shorter list is connected to a longer one if the two lists are the same up until the last element and the last element differs only in having one 0'bit flipped to a 1 and one 1 bit flipped to a 0. The distance between the two is 0 if there was a "swap" and 1 if there was a swap.
A second representation for each combination is a sorted list of the bits that are turned on. One possible admissible heuristic for this graph is to use this index list. Each time (in the list of combinations) an index is used at a particular position in the index list, mark it off. The number of positions with un-used indices - the current last changed index is (I believe) the minimal number of swaps that need to happen.
To illustrate this heuristic, consider the sequence abc abd abe* abf* abg afg from above. (the letters would be numbers in my treatment, but that is a minor difference). This sequence (which would be one node in the search-graph) would have the following places marked:
1 2 3
*a
b *b
c c *c
d d *d
e e *e
*f *f
*g
Thus the heuristic would predict that there is at least one swap required (since there are no unmarked elements in position 3 and the current position is 2).
If I get the time, I'll edit this to post code and performance of the algorithm.
Re: the NP completeness result (in a comment to Zac Thompson's answer). The graph on which we are searching for a minimal cost Hamiltonian path has a very special structure. For example, the normally NP-complete Hamiltonian Path problem can be solved in O(n) time with the "enumerate all combinations" algorithm - with n being the number of nodes in the graph. This structure makes it possible that, though on a general graph, vertex cover is hard, on your graph it may be polynomial (even linear or quadratic). Of course, since the graph has a lot of nodes for e.g. n=30, k=8 you may still have a lot of computation ahead of you.
I worked on this problem in 2010 but failed to find a solution then. A few days ago I had another look at some of my notes from that time and suspected I had been very close to a solution. A few minutes later I had the key.
To recapitalute: the requirement is a strict minimal change ordering of the k-subsets of a string s such that LIFO (last in first out) is maximised. I refer to this as maximised ‘swapping’ in earlier posts.
I call the algorithm maxlifo (maximised LIFO) after the key requirement. It takes two parameters, a string s, which must not contain duplicated characters, and a positive integer k not greater than the size of s. The algorithm is recursive, i.e. maxlifo(s, k) uses the output of maxlifo(s, k-1) down to k=1. Output is returned as a list.
Below I give an informal explanation, with examples, using the string "abcdefg" and various values of k. This is followed by an example of implementation as a Unicon procedure. (I’m not fluent in any of the more commonly used languages.)
The case k=1 is trivial — it returns the elements of s in order from first to last.
For k>1, apply the following rules to the output of maxlifo(s, k-1):
(1) For each element of the output of maxlifo(s, k-1), list in a row the k-subsets built from that element with each missing character of s in turn. The order of characters in the subsets is as in s.
(2) Working from the second row down, substitute blank ‘placeholders’ for all occurrences of subsets that appear in an earlier row. Each k-subset of s now appears just once.
(3) In each non-blank row, mark with an initial ! each subset such that there is a placeholder at the same position in the next row. This marking means ‘first’. Exactly one subset will be so marked in each non-blank row.
(4) Delete all rows that are completely blank (contain only placeholders).
(5) In each remaining row except the last, mark with a final ! the subset in the position corresponding to the subset marked ‘first’ in the next lower row. This marking means ‘last’.
Now we can list the final maxlifo ordering of the subsets. Each row from top to bottom is ordered and its elements added in that order to the output list.
(6) In each row from the top down:
(6.1) Remove all blank placeholders.
(6.2) Add to the output list the subset marked ‘first’ (initial !) and remove it from the row.
(6.3) If there are still subsets remaining in the row, either the leftmost or the rightmost subset will be marked ‘last’ (final !). If the rightmost subset is marked ‘last’, add the subsets to the output list in order from left to right, otherwise in order from right to left.
(7) After processing all rows, return the output list.
Example using maxlifo("abcdefg", 2):
Col1 contains the output of maxlifo("abcdefg", 1). The rows of Col2 contain the cliques formed with the remaining characters of s:
Col1 Col2
---- ----------------------------
a ab ac ad ae af ag
b ab bc bd be bf bg
c ac bc cd ce cf cg
d ad bd cd de df dg
e ae be ce de ef eg
f af bf cf df ef fg
g ag bg cg dg eg fg
Blank out subsets that appear in an earlier row:
a ab ac ad ae af ag
b bc bd be bf bg
c cd ce cf cg
d de df dg
e ef eg
f fg
g
Mark the ‘first’ subset in each row (the one with a blank below it):
a !ab ac ad ae af ag
b !bc bd be bf bg
c !cd ce cf cg
d !de df dg
e !ef eg
f !fg
g
Delete all completely blank rows (only one in this case):
a !ab ac ad ae af ag
b !bc bd be bf bg
c !cd ce cf cg
d !de df dg
e !ef eg
f !fg
Mark the ’last’ subset in each row (the one with a ‘first’ subset below it).
a !ab ac! ad ae af ag
b !bc bd! be bf bg
c !cd ce! cf cg
d !de df! dg
e !ef eg!
f !fg
Output each row in the order described above: ‘first’, unmarked, ’last’:
Ordered rows:
a !ab ac! ad ae af ag ab ag af ae ad ac
b !bc bd! be bf bg bc bg bf be bd
c !cd ce! cf cg cd cg cf ce
d !de df! dg de dg df
e !ef eg! ef eg
f !fg fg
Output: [ab ag af ae ad ac bc bg bf be bd cd cg cf ce df dg de ef eg fg]
Examples for 3 <= k <= 6 are given in less detail. The blank rows deleted in step 4 are left in place.
maxlifo("abcdefg", 3):
Ordered rows:
ab !abc abd abe abf abg! abc abd abe abf abg
ag acg adg aeg! !afg afg acg adg aeg
af acf adf! !aef aef acf adf
ae ace! !ade ade ace
ad !acd! acd
ac
bc !bcd bce bcf bcg! bcd bce bcf bcg
bg bdg beg! !bfg bfg bdg beg
bf bdf! !bef bef bdf
be !bde! bde
bd
cd !cde cdf cdg! cde cdf cdg
cg ceg! !cfg cfg ceg
cf !cef! cef
ce
de !def deg! def deg
dg !dfg! dfg
df
ef !efg efg
eg
fg
Output: [abc abd abe abf abg afg acg adg aeg aef acf adf ade ace acd
bcd bce bcf bcg bfg bdg beg bef bdf bde
cde cdf cdg cfg ceg cef
def deg dfg
efg]
maxlifo("abcdefg", 4):
Ordered rows:
abc !abcd abce! abcf abcg abcd abcg abcf abce
abd !abde abdf! abdg abde abdg abdf
abe !abef abeg! abef abeg
abf !abfg! abfg
abg
afg acfg! adfg !aefg aefg adfg acfg
acg !acdg aceg! acdg aceg
adg !adeg! adeg
aeg
aef acef! !adef adef acef
acf !acdf! acdf
adf
ade !acde! acde
ace
acd
bcd !bcde bcdf! bcdg bcde bcdg bcdf
bce !bcef bceg! bcef bceg
bcf !bcfg! bcfg
bcg
bfg bdfg! !befg befg bdfg
bdg !bdeg! bdeg
beg
bef !bdef! bdef
bdf
bde
cde !cdef cdeg! cdef cdeg
cdf !cdfg! cdfg
cdg
cfg !cefg! cefg
ceg
cef
def !defg defg
deg
dfg
efg
Output: [abcd abcg abcf abce abde abdg abdf abef abeg abfg aefg adfg acfg acdg aceg adeg adef acef acdf acde
bcde bcdg bcdf bcef bceg bcfg befg bdfg bdeg bdef
cdef cdeg cdfg cefg
defg]
maxlifo("abcdefg", 5):
Ordered rows:
abcd !abcde abcdf abcdg! abcde abcdf abcdg
abcg abceg! !abcfg abcfg abceg
abcf !abcef! abcef
abce
abde !abdef abdeg! abdef abdeg
abdg !abdfg! abdfg
abdf
abef !abefg! abefg
abeg
abfg
aefg acefg! !adefg adefg acefg
adfg !acdfg! acdfg
acfg
acdg !acdeg! acdeg
aceg
adeg
adef !acdef! acdef
acef
acdf
acde
bcde !bcdef bcdeg! bcdef bcdeg
bcdg !bcdfg! bcdfg
bcdf
bcef !bcefg! bcefg
bceg
bcfg
befg !bdefg! bdefg
bdfg
bdeg
bdef
cdef !cdefg cdefg
cdeg
cdfg
cefg
defg
Output: [abcde abcdf abcdg abcfg abceg abcef abdef abdeg abdfg abefg adefg acefg acdfg acdeg acdef
bcdef bcdeg bcdfg bcefg bdefg
cdefg]
maxlifo("abcdefg", 6):
Ordered rows:
abcde !abcdef abcdeg! abcdef abcdeg
abcdf !abcdfg! abcdfg
abcdg
abcfg !abcefg! abcefg
abceg
abcef
abdef !abdefg! abdefg
abdeg
abdfg
abefg
adefg
acefg !acdefg! acdefg
acdfg
acdeg
acdef
bcdef !bcdefg bcdefg
bcdeg
bcdfg
bcefg
bdefg
cdefg
Output: [abcdef abcdeg abcdfg abcefg abdefg acdefg bcdefg]
Unicon implementation:
procedure maxlifo(s:string, k:integer)
# A solution to my combinatorics problem from 2010.
# Return a list of the k subsets of the characters of a string s
# in a minimal change order such that last-in first-out is maximised.
# String s must not contain duplicate characters and in the present
# implementation must not contain "!", which is used as a marker.
local ch, cand, Hit, inps, i, j, K, L, Outp, R, S
# Errors
if *cset(s) ~= *s then
stop("Duplicate characters in set in maxlifo(", s, ", ", k, ")")
if find("!", s) then
stop("Illegal character in set in maxlifo(", s, ", ", k, ")")
if k > *s then
stop("Subset size larger than set size in maxlifo(", s, ", ", k, ")")
# Special cases
if k = 0 then return []
if k = *s then return [s]
Outp := []
if k = 1 then {
every put(Outp, !s)
return Outp
}
# Default case
S := set()
K := []
# Build cliques from output of maxlifo(s, k-1) with the remaining
# characters in s, substituting empty strings as placeholders for
# subsets already listed.
every inps := !maxlifo(s, k-1) do {
R := []
every ch := !s do
if not find(ch, inps) then {
cand := reorder(inps ++ ch, s)
if member(S, cand) then cand := "" else insert(S, cand)
put(R, cand)
}
put(K, R)
}
# Mark ‘first’ subset in each row with initial "!"
every i := 1 to *K - 1 do {
every j := 1 to *K[i] do
if K[i, j] ~== "" & K[i+1, j] == "" then {
K[i, j] := "!" || K[i, j]
break
}
}
# Remove rows containing only placeholders
every i := *K to 1 by -1 do {
every if !K[i] ~== "" then break next
delete(K, i)
}
# Mark ‘last’ subset in each row with final "!"
every i := 1 to *K - 1 do
every j := 1 to *K[i] do
if K[i+1, j][1] == "!" then {
K[i, j] ||:= "!"
break
}
# Build output list
every R := !K do {
# Delete placeholders from row (no longer needed and in the way)
every j := *R to 1 by -1 do if R[j] == "" then delete(R, j)
# Handle ‘first’ subset and remove from row
# N.B. ‘First’ subset will be leftmost or rightmost in row
if R[1][1] == "!" then
put(Outp, trim(get(R), '!', 0))
else put(Outp, trim(pull(R), '!', 0))
# Handle any remaining subsets, ‘last’ subset last, stripping '!' markers
# N.B. ‘Last’ subset will be leftmost or rightmost in row after removal
# of ‘first’ subset.
if R[-1][-1] == "!" then while put(Outp, trim(get(R), '!', 0)) else
while put(Outp, trim(pull(R), '!', 0))
}
return Outp
end
procedure reorder(cs:cset, s:string)
# Reorder cset cs according to string s
local r
# If no s, return set in alphabetical order
if /s then return string(cs)
r := ""
s ? while tab(upto(cs)) do r ||:= move(1)
return r
end