Find linear trend up to the maximum value using awk - shell

I have a datafile as below:
ifile.txt
-10 /
-9 /
-8 /
-7 3
-6 4
-5 13
-4 16
-3 17
-2 23
-1 26
0 29
1 32
2 35
3 38
4 41
5 40
6 35
7 30
8 25
9 /
10 /
Here "/" are the missing values. I would like to compute the linear trend up to the maximum value in the y-axis (i.e. up to the value "41" in 2nd column). So it should calculate the trend from the following data:
-7 3
-6 4
-5 13
-4 16
-3 17
-2 23
-1 26
0 29
1 32
2 35
3 38
4 41
Other (x, y) won't be consider because the y values are less than 41 after (4, 41)
The following script is working fine for all values:
awk '!/\//{sx+=$1; sy+=$2; c++;
sxx+=$1*$1; sxy+=$1*$2}
END {det=c*sxx-sx*sx;
print (det?(c*sxy-sx*sy)/det:"DIV0")}' ifile.txt
But I can't able to do it for maximum value
For the given example the result will be 3.486

Updated based on your comments. I assumed your trend calculations were good and used them:
$ awk '
$2!="/" {
b1[++j]=$1 # buffer them up until or if used
b2[j]=$2
if(max=="" || $2>max) { # once a bigger than current max found
max=$2 # new champion
for(i=1;i<=j;i++) { # use all so far buffered values
# print b1[i], b2[i] # debug to see values used
sx+=b1[i] # Your code from here on
sy+=b2[i]
c++
sxx+=b1[i]*b1[i]
sxy+=b1[i]*b2[i]
}
j=0 # buffer reset
delete b1
delete b2
}
}
END {
det=c*sxx-sx*sx
print (det?(c*sxy-sx*sy)/det:"DIV0")
}' file
For data:
0 /
1 1
2 2
3 4
4 3
5 5
6 10
7 7
8 8
with debug print uncommented program would output:
1 1
2 2
3 4
4 3
5 5
6 10
1.51429

You can do the update of the concerned rows only when $2 > max and save the intermediate rows into variables. for example using associate arrays:
awk '
$2 == "/" {next}
$2 > max {
# update max if $2 > max
max = $2;
# add all elemenet of a1 to a and b1 to b
for (k in a1) {
a[k] = a1[k]; b[k] = b1[k]
}
# add the current row to a, b
a[NR] = $1; b[NR] = $2;
# reset a1, b1
delete a1; delete b1;
next;
}
# if $2 <= max, then set a1, b1
{ a1[NR] = $1; b1[NR] = $2 }
END{
for (k in a) {
#print k, a[k], b[k]
sx += a[k]; sy += b[k]; sxx += a[k]*a[k]; sxy += a[k]*b[k]; c++
}
det=c*sxx-sx*sx;
print (det?(c*sxy-sx*sy)/det:"DIV0")
}
' ifile.txt
#3.48601
Or calculate sx, sy etc directly instead of using arrays:
awk '
$2 == "/" {next}
$2 > max {
# update max if $2 > max
max = $2;
# add the current Row plus the cached values
sx += $1+sx1; sy += $2+sy1; sxx += $1*$1+sxx1; sxy += $1*$2+sxy1; c += 1+c1
# reset the cached variables
sx1 = 0; sy1 = 0; sxx1 = 0; sxy1 = 0; c1 = 0;
next;
}
# if $2 <= max, then calculate and cache the values
{ sx1 += $1; sy1 += $2; sxx1 += $1*$1; sxy1 += $1*$2; c1++ }
END{
det=c*sxx-sx*sx;
print (det?(c*sxy-sx*sy)/det:"DIV0")
}
' ifile.txt

Related

Count the occurences of a number in all the columns in bash

I have a data set like this:
1 3 3 4 5 2 3 3
2 2 2 1 2 2 2 2
1 3 3 3 3 3 3 3
1 4 4 4 4 4 4 3
I would like to count the number of times that the number "one" appears per column, so I would like the output like:
3 0 0 1 0 0 0 0
Does anyone know how to do it in bash?
Thank you very much!
Ana
Do it in awk. Iterate over number of fields and if the field is equal to 1 increment the array. Then on the end print the array.
awk '{ for (i = 1; i <= NF; ++i) { if($i == 1) { ++c[i]; } }
END{ for (i = 1; i <= NF; ++i) { printf "%d%s", c[i], i!=NF ? OFS : ORS; } }

awk skipping records. getline command

this is a task related to data compression using fibonacci binary representation.
what i have is this text file:
result.txt
a 20
b 18
c 18
d 15
e 7
this file is a result of scanning a text file and counting the appearances of each char on the file using awk.
now i need to give each char its fibonacci-binary representation length.
since i'm new to ubuntu and teminal, i've done a program in java that receives a number and prints all the fibonacci codewords length up to the number and it's working.
this is exactly what i'm trying to do here. the problem is that it doesn't work...
the length of fibonacci codewords is also work as fibonnaci.
these are the rules:
f(1)=1 - there is 1 codeword of length 1.
f(2)=1 - there is 1 codeword of length 2.
f(3)=2 - there is 2 codeword of length 3.
f(4)=3 - there is 3 codeword of length 4.
and so on...
(i'm adding on more bit to each codeword so the first two lengths will be 2 and 3)
this is the code i've made: its name is scr5
{
a=1;
b=1;
len=2
print $1 , $2, len;
getline;
print $1 ,$2, len+1;
getline;
len=4;
for(i=1; i< num; i++){
c= a+b;
g=c;
while (c >= 1){
print $1 ,$2, len ;
if (getline<=0){
print "EOF"
exit;
}
c--;
i++;
}
a=b;
b=c;
len++;
}}
now i write on terminal:
n=5
awk -v num=$n -f scr5 a
and there are two problems:
1. it skips the third letter c.
2. on the forth letter d, it prints the length of the first letter, 2, instead of length 3.
i guess that there is a problem in the getline command.
thank u very much!
Search Google for getline and awk and you'll mostly find reasons to avoid getline completely! Often it's a sign you're not really doing things the "awk" way. Find an awk tutorial and work through the basics and I'm sure you'll see quickly why your attempt using getlines is not getting you off in the right direction.
In the script below, the BEGIN block is run once at the beginning before any input is read, and then the next block is automatically run once for each line of input --- without any need for getline.
Good luck!
$ cat fib.awk
BEGIN { prior_count = 0; count = 1; len = 1; remaining = count; }
{
if (remaining == 0) {
temp = count;
count += prior_count;
prior_count = temp;
remaining = count;
++len;
}
print $1, $2, len;
--remaining;
}
$ cat fib.txt
a 20
b 18
c 18
d 15
e 7
f 0
g 0
h 0
i 0
j 0
k 0
l 0
m 0
$ awk -f fib.awk fib.txt
a 20 1
b 18 2
c 18 3
d 15 3
e 7 4
f 0 4
g 0 4
h 0 5
i 0 5
j 0 5
k 0 5
l 0 5
m 0 6
The above solution, compressed form :
mawk 'BEGIN{ ___= __= _^=____=+_ } !_ { __+=(\
____=___+_*(_=___+=____))^!_ } $++NF = (_--<_)+__' fib.txt
a 20 1
b 18 2
c 18 3
d 15 3
e 7 4
f 0 4
g 0 4
h 0 5
i 0 5
j 0 5
k 0 5
l 0 5
m 0 6

Finding a range of numbers of a file in another file using awk

I have lots of files like this:
3
10
23
.
.
.
720
810
980
And a much bigger file like this:
2 0.004
4 0.003
6 0.034
.
.
.
996 0.01
998 0.02
1000 0.23
What I want to do is find in which range of the second file my first file falls and then estimate the mean of the values in the 2nd column of that range.
Thanks in advance.
NOTE
The numbers in the files do not necessarily follow an easy pattern like 2,4,6...
Since your smaller files are sorted you can pull out the first row and the last row to get the min and max. Then you just need go through the bigfile with an awk script to compute the mean.
So for each smallfile small you would run the script
awk -v start=$(head -n 1 small) -v end=$(tail -n 1 small) -f script bigfile
Where script can be something simple like
BEGIN {
sum = 0;
count = 0;
range_start = -1;
range_end = -1;
}
{
irow = int($1)
ival = $2 + 0.0
if (irow >= start && end >= irow) {
if (range_start == -1) {
range_start = NR;
}
sum = sum + ival;
count++;
}
else if (irow > end) {
if (range_end == -1) {
range_end = NR - 1;
}
}
}
END {
print "start =", range_start, "end =", range_end, "mean =", sum / count
}
You can try below:
for r in *; do
awk -v r=$r -F' ' \
'NR==1{b=$2;v=$4;next}{if(r >= b && r <= $2){m=(v+$4)/2; print m; exit}; b=$2;v=$4}' bigfile.txt
done
Explanation:
First pass it saves column 2 & 4 into temp variables. For all other passes it checks if filename r is between the begin range (previous coluimn 2) and end range (current column 2).
It then works out the mean and prints the result.

how to write bash script in ubuntu to normalize the index of text comparison

I had a input which is a result from text comparison. It is in a very simple format. It has 3 columns, position, original texts and new texts.
But some of the records looks like this
4 ATCG ATCGC
10 1234 123
How to write the short script to normalize it to
7 G GC
12 34 3
probably, the whole original texts and the whole new text is like below respectively
ACCATCGGA1234
ACCATCGCGA123
"Normalize" means "trying to move the position in the first column to the position that changes gonna occur", or "we would remove the common prefix ATG, add its length 3 to the first field; similarly on line 2 the prefix we remove is length 2"
This script
awk '
BEGIN {OFS = "\t"}
function common_prefix_length(str1, str2, max_len, idx) {
idx = 1
if (length(str1) < length(str2))
max_len = length(str1)
else
max_len = length(str2)
while (substr(str1, idx, 1) == substr(str2, idx, 1) && idx < max_len)
idx++
return idx - 1
}
{
len = common_prefix_length($2, $3)
print $1 + len, substr($2, len + 1), substr($3, len + 1)
}
' << END
4 ATCG ATCGC
10 1234 123
END
outputs
7 G GC
12 34 3

Adjacent number algorithm grouper

By which I mean this:
Given the input set of numbers:
1,2,3,4,5 becomes "1-5".
1,2,3,5,7,9,10,11,12,14 becomes "1-3, 5, 7, 9-12, 14"
This is the best I managed to come up with: [C#]
Which feels a little sloppy to me, so the question is, is there somehow more readable and/or elegant solution to this?
public static string[] FormatInts(int[] ints)
{
if (ints == null)
throw new ArgumentNullException("ints"); // hey what are you doing?
if (ints.Length == 0)
return new string[] { "" }; // nothing to process
if (ints.Length == 1)
return new string[] { ints[0].ToString() }; // nothing to process
Array.Sort<int>(ints); // need to sort these lil' babies
List<string> values = new List<string>();
int lastNumber = ints[0]; // start with the first number
int firstNumber = ints[0]; // same as above
for (int i = 1; i < ints.Length; i++)
{
int current = ints[i];
int difference = (lastNumber - current ); // compute difference between last number and current number
if (difference == -1) // the numbers are adjacent
{
if (firstNumber == 0) // this is the first of the adjacent numbers
{
firstNumber = lastNumber;
}
else // we're somehow in the middle or at the end of the adjacent number set
{
lastNumber = current;
continue;
}
}
else
{
if (firstNumber > 0 && firstNumber != lastNumber) // get ready to print a set of numbers
{
values.Add(string.Format("{0}-{1}", firstNumber, lastNumber));
firstNumber = 0; // reset
}
else // print a single value
{
values.Add(string.Format("{0}", lastNumber));
}
}
lastNumber = current;
}
if (firstNumber > 0) // if theres anything left, print it out
{
values.Add(string.Format("{0}-{1}", firstNumber, lastNumber));
}
return values.ToArray();
}
I've rewritten your code like this:
public static string[] FormatInts(int[] ints)
{
Array.Sort<int>(ints);
List<string> values = new List<string>();
for (int i = 0; i < ints.Length; i++)
{
int groupStart = ints[i];
int groupEnd = groupStart;
while (i < ints.Length - 1 && ints[i] - ints[i + 1] == -1)
{
groupEnd = ints[i + 1];
i++;
}
values.Add(string.Format(groupEnd == groupStart ? "{0}":"{0} - {1}", groupStart, groupEnd));
}
return values.ToArray();
}
And then:
/////////////////
int[] myInts = { 1,2,3,5,7,9,10,11,12,14 };
string[] result = FormatInts(myInts); // now result haves "1-3", "5", "7", "9-12", "14"
See How would you display an array of integers as a set of ranges? (algorithm)
My answer to the above question:
void ranges(int n; int a[n], int n)
{
qsort(a, n, sizeof(*a), intcmp);
for (int i = 0; i < n; ++i) {
const int start = i;
while(i < n-1 and a[i] >= a[i+1]-1)
++i;
printf("%d", a[start]);
if (a[start] != a[i])
printf("-%d", a[i]);
if (i < n-1)
printf(",");
}
printf("\n");
}
Pure functional Python:
#!/bin/env python
def group(nums):
def collect((acc, i_s, i_e), n):
if n == i_e + 1: return acc, i_s, n
return acc + ["%d"%i_s + ("-%d"%i_e)*(i_s!=i_e)], n, n
s = sorted(nums)+[None]
acc, _, __ = reduce(collect, s[1:], ([], s[0], s[0]))
return ", ".join(acc)
assert group([1,2,3,5,7,9,10,11,12,14]) == "1-3, 5, 7, 9-12, 14"
I'm a bit late to the party, but anyway, here is my version using Linq:
public static string[] FormatInts(IEnumerable<int> ints)
{
var intGroups = ints
.OrderBy(i => i)
.Aggregate(new List<List<int>>(), (acc, i) =>
{
if (acc.Count > 0 && acc.Last().Last() == i - 1) acc.Last().Add(i);
else acc.Add(new List<int> { i });
return acc;
});
return intGroups
.Select(g => g.First().ToString() + (g.Count == 1 ? "" : "-" + g.Last().ToString()))
.ToArray();
}
Looks clear and straightforward to me. You can simplify a bit if you either assume the input array is sorted, or sort it yourself before further processing.
The only tweak I'd suggest would be to reverse the subtraction:
int difference = (current - lastNumber);
... simply because I find it easier to work with positive differences. But your code is a pleasure to read!
As I wrote in comment, I am not fan of the use of value 0 as flag, making firstNumber both a value and a flag.
I did a quick implementation of the algorithm in Java, boldly skipping the validity tests you already correctly covered...
public class IntListToRanges
{
// Assumes all numbers are above 0
public static String[] MakeRanges(int[] numbers)
{
ArrayList<String> ranges = new ArrayList<String>();
Arrays.sort(numbers);
int rangeStart = 0;
boolean bInRange = false;
for (int i = 1; i <= numbers.length; i++)
{
if (i < numbers.length && numbers[i] - numbers[i - 1] == 1)
{
if (!bInRange)
{
rangeStart = numbers[i - 1];
bInRange = true;
}
}
else
{
if (bInRange)
{
ranges.add(rangeStart + "-" + numbers[i - 1]);
bInRange = false;
}
else
{
ranges.add(String.valueOf(numbers[i - 1]));
}
}
}
return ranges.toArray(new String[ranges.size()]);
}
public static void ShowRanges(String[] ranges)
{
for (String range : ranges)
{
System.out.print(range + ","); // Inelegant but quickly coded...
}
System.out.println();
}
/**
* #param args
*/
public static void main(String[] args)
{
int[] an1 = { 1,2,3,5,7,9,10,11,12,14,15,16,22,23,27 };
int[] an2 = { 1,2 };
int[] an3 = { 1,3,5,7,8,9,11,12,13,14,15 };
ShowRanges(MakeRanges(an1));
ShowRanges(MakeRanges(an2));
ShowRanges(MakeRanges(an3));
int L = 100;
int[] anr = new int[L];
for (int i = 0, c = 1; i < L; i++)
{
int incr = Math.random() > 0.2 ? 1 : (int) Math.random() * 3 + 2;
c += incr;
anr[i] = c;
}
ShowRanges(MakeRanges(anr));
}
}
I won't say it is more elegant/efficient than your algorithm, of course... Just something different.
Note that 1,5,6,9 can be written either 1,5-6,9 or 1,5,6,9, not sure what is better (if any).
I remember having done something similar (in C) to group message numbers to Imap ranges, as it is more efficient. A useful algorithm.
Perl
With input validation/pre-sorting
You can easily get the result as a LoL if you need to do something more fancy than
just return a string.
#!/usr/bin/perl -w
use strict;
use warnings;
use Scalar::Util qw/looks_like_number/;
sub adjacenify {
my #input = #_;
# Validate and sort
looks_like_number $_ or
die "Saw '$_' which doesn't look like a number" for #input;
#input = sort { $a <=> $b } #input;
my (#output, #range);
#range = (shift #input);
for (#input) {
if ($_ - $range[-1] <= 1) {
push #range, $_ unless $range[-1] == $_; # Prevent repetition
}
else {
push #output, [ #range ];
#range = ($_);
}
}
push #output, [ #range ] if #range;
# Return the result as a string. If a sequence is size 1, then it's just that number.
# Otherwise, it's the first and last number joined by '-'
return join ', ', map { 1 == #$_ ? #$_ : join ' - ', $_->[0], $_->[-1] } #output;
}
print adjacenify( qw/1 2 3 5 7 9 10 11 12 14/ ), "\n";
print adjacenify( 1 .. 5 ), "\n";
print adjacenify( qw/-10 -9 -8 -1 0 1 2 3 5 7 9 10 11 12 14/ ), "\n";
print adjacenify( qw/1 2 4 5 6 7 100 101/), "\n";
print adjacenify( qw/1 62/), "\n";
print adjacenify( qw/1/), "\n";
print adjacenify( qw/1 2/), "\n";
print adjacenify( qw/1 62 63/), "\n";
print adjacenify( qw/-2 0 0 2/), "\n";
print adjacenify( qw/-2 0 0 1/), "\n";
print adjacenify( qw/-2 0 0 1 2/), "\n";
Output:
1 - 3, 5, 7, 9 - 12, 14
1 - 5
-10 - -8, -1 - 3, 5, 7, 9 - 12, 14
1 - 2, 4 - 7, 100 - 101
1, 62
1
1 - 2
1, 62 - 63
-2, 0, 2
-2, 0 - 1
-2, 0 - 2
-2, 0 - 2
And a nice recursive solution:
sub _recursive_adjacenify($$);
sub _recursive_adjacenify($$) {
my ($input, $range) = #_;
return $range if ! #$input;
my $number = shift #$input;
if ($number - $range->[-1] <= 1) {
return _recursive_adjacenify $input, [ #$range, $number ];
}
else {
return $range, _recursive_adjacenify $input, [ $number ];
}
}
sub recursive_adjacenify {
my #input = #_;
# Validate and sort
looks_like_number $_ or
die "Saw '$_' which doesn't look like a number" for #input;
#input = sort { $a <=> $b } #input;
my #output = _recursive_adjacenify \#input, [ shift #input ];
# Return the result as a string. If a sequence is size 1,
# then it's just that number.
# Otherwise, it's the first and last number joined by '-'
return join ', ', map { 2 == #$_ && $_->[0] == $_->[1] ? $_->[0] :
1 == #$_ ? #$_ :
join ' - ', $_->[0], $_->[-1] } #output;
}
Short and sweet Ruby
def range_to_s(range)
return range.first.to_s if range.size == 1
return range.first.to_s + "-" + range.last.to_s
end
def format_ints(ints)
range = []
0.upto(ints.size-1) do |i|
range << ints[i]
unless (range.first..range.last).to_a == range
return range_to_s(range[0,range.length-1]) + "," + format_ints(ints[i,ints.length-1])
end
end
range_to_s(range)
end
My first thought, in Python:
def seq_to_ranges(seq):
first, last = None, None
for x in sorted(seq):
if last != None and last + 1 != x:
yield (first, last)
first = x
if first == None: first = x
last = x
if last != None: yield (first, last)
def seq_to_ranges_str(seq):
return ", ".join("%d-%d" % (first, last) if first != last else str(first) for (first, last) in seq_to_ranges(seq))
Possibly could be cleaner, but it's still waaay easy.
Plain translation to Haskell:
import Data.List
seq_to_ranges :: (Enum a, Ord a) => [a] -> [(a, a)]
seq_to_ranges = merge . foldl accum (id, Nothing) . sort where
accum (k, Nothing) x = (k, Just (x, x))
accum (k, Just (a, b)) x | succ b == x = (k, Just (a, x))
| otherwise = (k . ((a, b):), Just (x, x))
merge (k, m) = k $ maybe [] (:[]) m
seq_to_ranges_str :: (Enum a, Ord a, Show a) => [a] -> String
seq_to_ranges_str = drop 2 . concatMap r2s . seq_to_ranges where
r2s (a, b) | a /= b = ", " ++ show a ++ "-" ++ show b
| otherwise = ", " ++ show a
About the same.
Transcript of an interactive J session (user input is indented 3 spaces, text in ASCII boxes is J output):
g =: 3 : '<#~."1((y~:1+({.,}:)y)#y),.(y~:(}.y,{:y)-1)#y'#/:~"1
g 1 2 3 4 5
+---+
|1 5|
+---+
g 1 2 3 5 7 9 10 11 12 14
+---+-+-+----+--+
|1 3|5|7|9 12|14|
+---+-+-+----+--+
g 12 2 14 9 1 3 10 5 11 7
+---+-+-+----+--+
|1 3|5|7|9 12|14|
+---+-+-+----+--+
g2 =: 4 : '<(>x),'' '',>y'/#:>#:(4 :'<(>x),''-'',>y'/&.>)#((<#":)"0&.>#g)
g2 12 2 14 9 1 3 10 5 11 7
+---------------+
|1-3 5 7 9-12 14|
+---------------+
(;g2) 5 1 20 $ (i.100) /: ? 100 $ 100
+-----------------------------------------------------------+
|20 39 82 33 72 93 15 30 85 24 97 60 87 44 77 29 58 69 78 43|
| |
|67 89 17 63 34 41 53 37 61 18 88 70 91 13 19 65 99 81 3 62|
| |
|31 32 6 11 23 94 16 73 76 7 0 75 98 27 66 28 50 9 22 38|
| |
|25 42 86 5 55 64 79 35 36 14 52 2 57 12 46 80 83 84 90 56|
| |
| 8 96 4 10 49 71 21 54 48 51 26 40 95 1 68 47 59 74 92 45|
+-----------------------------------------------------------+
|15 20 24 29-30 33 39 43-44 58 60 69 72 77-78 82 85 87 93 97|
+-----------------------------------------------------------+
|3 13 17-19 34 37 41 53 61-63 65 67 70 81 88-89 91 99 |
+-----------------------------------------------------------+
|0 6-7 9 11 16 22-23 27-28 31-32 38 50 66 73 75-76 94 98 |
+-----------------------------------------------------------+
|2 5 12 14 25 35-36 42 46 52 55-57 64 79-80 83-84 86 90 |
+-----------------------------------------------------------+
|1 4 8 10 21 26 40 45 47-49 51 54 59 68 71 74 92 95-96 |
+-----------------------------------------------------------+
Readable and elegant are in the eye of the beholder :D
That was a good exercise! It suggests the following segment of Perl:
sub g {
my ($i, #r, #s) = 0, local #_ = sort {$a<=>$b} #_;
$_ && $_[$_-1]+1 == $_[$_] || push(#r, $_[$_]),
$_<$#_ && $_[$_+1]-1 == $_[$_] || push(#s, $_[$_]) for 0..$#_;
join ' ', map {$_ == $s[$i++] ? $_ : "$_-$s[$i-1]"} #r;
}
Addendum
In plain English, this algorithm finds all items where the previous item is not one less, uses them for the lower bounds; finds all items where the next item is not one greater, uses them for the upper bounds; and combines the two lists together item-by-item.
Since J is pretty obscure, here's a short explanation of how that code works:
x /: y sorts the array x on y. ~ can make a dyadic verb into a reflexive monad, so /:~ means "sort an array on itself".
3 : '...' declares a monadic verb (J's way of saying "function taking one argument"). # means function composition, so g =: 3 : '...' # /:~ means "g is set to the function we're defining, but with its argument sorted first". "1 says that we operate on arrays, not tables or anything of higher dimensionality.
Note: y is always the name of the only argument to a monadic verb.
{. takes the first element of an array (head) and }: takes all but the last (curtail). ({.,}:)y effectively duplicates the first element of y and lops off the last element. 1+({.,}:)y adds 1 to it all, and ~: compares two arrays, true wherever they are different and false wherever they are the same, so y~:1+({.,}:)y is an array that is true in all the indices of y where an element is not equal to one more than the element that preceded it. (y~:1+({.,}:)y)#y selects all elements of y where the property stated in the previous sentence is true.
Similarly, }. takes all but the first element of an array (behead) and {: takes the last (tail), so }.y,{:y is all but the first element of y, with the last element duplicated. (}.y,{:y)-1 subtracts 1 to it all, and again ~: compares two arrays item-wise for non-equality while # picks.
,. zips the two arrays together, into an array of two-element arrays. ~. nubs a list (eliminates duplicates), and is given the "1 rank, so it operates on the inner two-element arrays rather than the top-level array. This is # composed with <, which puts each subarray into a box (otherwise J will extend each subarray again to form a 2D table).
g2 is a mess of boxing and unboxing (otherwise J will pad strings to equal length), and is pretty uninteresting.
Here's my Haskell entry:
runs lst = map showRun $ runs' lst
runs' l = reverse $ map reverse $ foldl newOrGlue [[]] l
showRun [s] = show s
showRun lst = show (head lst) ++ "-" ++ (show $ last lst)
newOrGlue [[]] e = [[e]]
newOrGlue (curr:other) e | e == (1 + (head curr)) = ((e:curr):other)
newOrGlue (curr:other) e | otherwise = [e]:(curr:other)
and a sample run:
T> runs [1,2,3,5,7,9,10,11,12,14]
["1-3","5","7","9-12","14"]
Erlang , perform also sort and unique on input and can generate programmatically reusable pair and also a string representation.
group(List) ->
[First|_] = USList = lists:usort(List),
getnext(USList, First, 0).
getnext([Head|Tail] = List, First, N) when First+N == Head ->
getnext(Tail, First, N+1);
getnext([Head|Tail] = List, First, N) ->
[ {First, First+N-1} | getnext(List, Head, 0) ];
getnext([], First, N) -> [{First, First+N-1}].
%%%%%% pretty printer
group_to_string({X,X}) -> integer_to_list(X);
group_to_string({X,Y}) -> integer_to_list(X) ++ "-" ++ integer_to_list(Y);
group_to_string(List) -> [group_to_string(X) || X <- group(List)].
Test getting programmatically reusable pairs:
shell> testing:group([34,3415,56,58,57,11,12,13,1,2,3,3,4,5]).
result> [{1,5},{11,13},{34,34},{56,58},{3415,3415}]
Test getting "pretty" string:
shell> testing:group_to_string([34,3415,56,58,57,11,12,13,1,2,3,3,4,5]).
result> ["1-5","11-13","34","56-58","3415"]
hope it helps
bye
VBA
Public Function convertListToRange(lst As String) As String
Dim splLst() As String
splLst = Split(lst, ",")
Dim x As Long
For x = 0 To UBound(splLst)
Dim groupStart As Integer
groupStart = splLst(x)
Dim groupEnd As Integer
groupEnd = groupStart
Do While (x <= UBound(splLst) - 1)
If splLst(x) - splLst(x + 1) <> -1 Then Exit Do
groupEnd = splLst(x + 1)
x = x + 1
Loop
convertListToRange = convertListToRange & IIf(groupStart = groupEnd, groupStart & ",", groupStart & "-" & groupEnd & ",")
Next x
convertListToRange = Left(convertListToRange, Len(convertListToRange) - 1)
End Function
convertListToRange("1,2,3,7,8,9,11,12,99,100,101")
Return: "1-3,7-9,11-12,99-101"

Resources