Count consecutive repetitive element in a sequence using XQuery - xpath

If the Sequence = [a a b c c c a d d e e e f g h h]
then the Output = [1 2 1 1 2 3 1 1 2 1 2 3 1 1 1 2]
Have tried to use recursion but no luck...Please Help..Thanks in Anticipation
Note: Using XQuery implementation 1.0
One of my failed implementation looks like:
declare function local:test($sequence,$count){
for $counter in (1 to count($sequence))
let $maxIndex := count($sequence)
return
if (matches(subsequence($sequence,1,$maxIndex)[$counter],subsequence($sequence,1,$maxIndex)[$counter + +1])) then let $count := $count + 1 return $count[last()]
else let $count := 1 return $count[last()]
};

You are correct, recursion is a very viable way to go here. What the following function does it traverses the sequence from end to start. For each element it then counts in local:count() whether the element before is the same as the current element. If so, it will call the function recursively, otherwise the repetitive sequence ended and 1 is returned.
In the end, this resulting sequence is reversed once again as to match the order of the incoming sequence.
declare function local:count($sequence, $pos) {
if ($sequence[$pos - 1] = $sequence[$pos])
then 1 + local:count($sequence, $pos - 1)
else 1
};
declare function local:test($sequence){
reverse(
for $pos in reverse(1 to count($sequence))
return local:count($sequence, $pos)
)
};
let $test := ("a","a", "b", "c", "c", "c", "a", "d", "d", "e", "e", "e", "f", "g", "h", "h")
return local:test($test)

I got a working solution for my Question. Credits: odie_63 # http://odieweblog.wordpress.com/
declare namespace xf = "http://tempuri.org/OSBTestProject/Resources/XQuery/test/";
declare function local:sequence-group($seq as item()*) as item()*
{
let $start-of-group :=
fn:index-of(
for $i in 1 to count($seq)
let $prev := $seq[$i - 1]
return if ($prev != $seq[$i] or not($prev)) then 1 else 0
, 1
)
return
for $i in 1 to count($seq)
return $i - $start-of-group[. le $i][last()] + 1
};
declare function xf:test($test as xs:string) as xs:integer*
{
let $test1 := tokenize($test, ',')
return local:sequence-group($test1)
};
declare variable $test as xs:string external;
xf:test($test)
Input: a,a,b,c,c,c,a,d,d,e,e,e,f,g,h,h
Output: 1 2 1 1 2 3 1 1 2 1 2 3 1 1 1 2

Not tested, but this should work, and is fairly simple.
declare function local:test($sequence)
{
for $item at $current-pos in $sequence
let $different-pos :=
last((0, $sequence[position() < $current-pos][. != $item]))
return $current-pos - $different-pos
}

Related

pseudocode for this program (Matlab)

I have three sets, say:
a=[1 1 1 1];
b=[2 2 2];
c=[3 3];
Now, I have to find out all unique combinations by taking 3 elements from all sets..
So in matlab, I can do it:
>> a=[1 1 1 1];
>> b=[2 2 2];
>> c=[3 3];
>> all=[a b c];
>> nchoosek(all,3)
>> unique(nchoosek(all,3),'rows')
The o/p is:
1 1 1
1 1 2
1 1 3
1 2 2
1 2 3
1 3 3
2 2 2
2 2 3
2 3 3
How to write the logic behind the program in pseudocode?
Here's how I would do it:
Create a dictionary of item counts.
Recurse on this dictionary k times, taking care not to pick items that are not or no longer in the pool.
When recursing, skip items that are smaller (by some criterion) than the current item in order to get a unique list.
In pseudocode:
function ucombok_rec(count, k, lowest)
{
if (k == 0) return [[]];
var res = [];
for (item in count):
if (item >= lowest && count[item] > 0) {
count[item]--;
var combo = ucombok_rec(count, k - 1, item);
for (c in combo) res ~= [[item] ~ c];
count[item]++;
}
return res;
}
function ucombok(s, k)
{
if (!s) return []; // nothing to do
var count = {};
var lowest = min(s); // min. value in set
for (item in s) count[item]++; // create item counts
return ucombok_rec(count, k, lowest); // recurse
}
In this code, [] denotes a list or vector, {} a dictionary or map and the tilde ~ means list concatenation. The count decrements and increments around the recursion remove an item temporarily from the item pool.
In your example, where the pool is made up of three lists, you' d call the function like this:
c = ucombok(a ~ b ~ c, 3)

Sorting a table by nested value in Lua [duplicate]

This question already has answers here:
Associatively sorting a table by value in Lua
(7 answers)
Closed 8 years ago.
I have a program which aggregates for every user the total number of downloads performed with an aggregate of the total downloaded data in kb.
local table = {}
table[userID] = {5, 23498502}
My aim is that the output of the printTable function will produce the entire list of users ordered in descending order by the amount of kb downloaded v[2]
local aUsers = {}
...
function topUsers(key, nDownloads, totalSize)
if aUsers[key] then
aUsers[key][1] = aUsers[key][1] + nDownloads
aUsers[key][2] = aUsers[key][2] + totalSize
else
aUsers[key] = {nDownloads, totalSize}
end
end
function printTable(t)
local str = ""
-- How to sort 't' so that it prints in v[2] descending order?
for k,v in pairs(t) do
str = str .. k .. ", " .. v[1] .. ", " .. v[2] .. "\n"
end
return str
end
...
Any ideas how could I do that?
You can get the keys into a separate table and then sort that table using the criteria you need:
local t = {
a = {1,2},
b = {2,3},
c = {4,1},
d = {9,9},
}
local keys = {}
for k in pairs(t) do table.insert(keys, k) end
table.sort(keys, function(a, b) return t[a][2] > t[b][2] end)
for _, k in ipairs(keys) do print(k, t[k][1], t[k][2]) end
will print:
d 9 9
b 2 3
a 1 2
c 4 1

convert 1mio Perl bitvector to .5 mio integer array

I have bitvector coming from the database of length 1 Mio with two bits
each representing one integer number for compressed storage:
the bit string : 10110001 from the database
array 2 3 0 1 needed for further processing
The current solution is:
my $bitstring =
$sth->fetchrow_array(); # has 2 bits / snp, need 2 convert to I
my $snp_no = 1000000;
for ( my $i = 0; $i <= $snp_no - 1; $i++ ) {
my $A2 = substr ($bitstring ,$j,2);
$j = $j + 2;
my $vec = Bit::Vector->new_Bin(32, $A2);
#bitArray->[$i] = $vec->to_Dec();
}
This does work but is is waaay too slow: to process one such vector take a second
and with thousands of them the processing will take hours.
does someone have an idea how this can be made faster?
If you start with the data "packed", use the following:
my #decode =
map [
($_ >> 6) & 3,
($_ >> 4) & 3,
($_ >> 2) & 3,
($_ >> 0) & 3,
],
0x00..0xFF;
my #nums = map #{ $decode[$_] }, unpack 'C*', $bytes;
For me, this takes about roughly 1.1s for 1,000,000 bytes, which is to say 1.1 microseconds per byte.
A specialized pure C solution takes about half the time.
use Inline C => <<'__EOI__';
void decode(AV* av, SV* sv) {
STRLEN len;
U8* p = (U8*)SvPVbyte(sv, len);
av_fill(av, len*4);
av_clear(av);
while (len--) {
av_push(av, newSViv(*p >> 6 ));
av_push(av, newSViv(*p >> 4 & 3));
av_push(av, newSViv(*p >> 2 & 3));
av_push(av, newSViv(*(p++) & 3));
}
}
__EOI__
decode(\my #nums, $bytes);
If you start with the binary representation of the bits, use the following first:
my $bytes = packed('B*', $bits);
(This assumes the number of bits is divisible by 8. Left-pad with zeroes if it isn't, and don't forget to remove the extra entries this creates in #decode.)
Is this any faster?
#!/usr/bin/env perl
use warnings;
use strict;
my %bin2dec = (
'0' => 0,
'1' => 1,
'00' => 0,
'01' => 1,
'10' => 2,
'11' => 3
);
#warn "$_ => $bin2dec{$_}\n" for sort keys %bin2dec;
my #results;
while (<>)
{
foreach my $bitstring (/([01]+)/g)
{
my #result;
#warn "bitstring is $bitstring\n";
for ( my $i = 0 ; $i < length($bitstring) ; $i += 2 )
{
#warn "value is ", substr( $bitstring, $i, 2 ), "\n";
push( #result, $bin2dec{ substr( $bitstring, $i, 2 ) } );
}
push( #results, \#result );
}
}
foreach my $result (#results)
{
print join( ' ', #$result ), "\n";
}
saved to file b2dec. Example output:
$ echo 10010101010010010101001111001011010101w00101010 | b2dec
2 1 1 1 1 0 2 1 1 1 0 3 3 0 2 3 1 1 1
0 2 2 2
$ b2dec b2dec
0
0
1
1
0
0
1
1
2
3
1
0

How can I generate this pattern of numbers?

Given inputs 1-32 how can I generate the below output?
in. out
1
1
1
1
2
2
2
2
1
1
1
1
2
2
2
2
...
Edit Not Homework.. just lack of sleep.
I am working in C#, but I was looking for a language agnostic algorithm.
Edit 2 To provide a bit more background... I have an array of 32 items that represents a two dimensional checkerboard. I needed the last part of this algorithm to convert between the vector and the graph, where the index aligns on the black squares on the checkerboard.
Final Code:
--Index;
int row = Index >> 2;
int col = 2 * Index - (((Index & 0x04) >> 2 == 1) ? 2 : 1);
Assuming that you can use bitwise operators you can check what the numbers with same output have in common, in this case I preferred using input 0-31 because it's simpler (you can just subtract 1 to actual values)
What you have?
0x0000 -> 1
0x0001 -> 1
0x0010 -> 1
0x0011 -> 1
0x0100 -> 2
0x0101 -> 2
0x0110 -> 2
0x0111 -> 2
0x1000 -> 1
0x1001 -> 1
0x1010 -> 1
0x1011 -> 1
0x1100 -> 2
...
It's quite easy if you notice that third bit is always 0 when output should be 1 and viceversa it's always 1 when output should be 2
so:
char codify(char input)
{
return ((((input-1)&0x04)>>2 == 1)?(2):(1));
}
EDIT
As suggested by comment it should work also with
char codify(char input)
{
return ((input-1 & 0x04)?(2):(1));
}
because in some languages (like C) 0 will evaluate to false and any other value to true. I'm not sure if it works in C# too because I've never programmed in that language. Of course this is not a language-agnostic answer but it's more C-elegant!
in C:
char output = "11112222"[input-1 & 7];
or
char output = (input-1 >> 2 & 1) + '1';
or after an idea of FogleBird:
char output = input - 1 & 4 ? '2' : '1';
or after an idea of Steve Jessop:
char output = '2' - (0x1e1e1e1e >> input & 1);
or
char output = "12"[input-1>>2&1];
C operator precedence is evil. Do use my code as bad examples :-)
You could use a combination of integer division and modulo 2 (even-odd): There are blocks of four, and the 1st, 3rd, 5th block and so on should result in 1, the 2nd, 4th, 6th and so on in 2.
s := ((n-1) div 4) mod 2;
return s + 1;
div is supposed to be integer division.
EDIT: Turned first mod into a div, of course
Just for laughs, here's a technique that maps inputs 1..32 to two possible outputs, in any arbitrary way known at compile time:
// binary 1111 0000 1111 0000 1111 0000 1111 0000
const uint32_t lu_table = 0xF0F0F0F0;
// select 1 bit out of the table
if (((1 << (input-1)) & lu_table) == 0) {
return 1;
} else {
return 2;
}
By changing the constant, you can handle whatever pattern of outputs you want. Obviously in your case there's a pattern which means it can probably be done faster (since no shift is needed), but everyone else already did that. Also, it's more common for a lookup table to be an array, but that's not necessary here.
The accepted answer return ((((input-1)&0x04)>>2 == 1)?(2):(1)); uses a branch while I would have just written:
return 1 + ((input-1) & 0x04 ) >> 2;
Python
def f(x):
return int((x - 1) % 8 > 3) + 1
Or:
def f(x):
return 2 if (x - 1) & 4 else 1
Or:
def f(x):
return (((x - 1) & 4) >> 2) + 1
In Perl:
#!/usr/bin/perl
use strict; use warnings;
sub it {
return sub {
my ($n) = #_;
return 1 if 4 > ($n - 1) % 8;
return 2;
}
}
my $it = it();
for my $x (1 .. 32) {
printf "%2d:%d\n", $x, $it->($x);
}
Or:
sub it {
return sub {
my ($n) = #_;
use integer;
return 1 + ( (($n - 1) / 4) % 2 );
}
}
In Haskell:
vec2graph :: Int -> Char
vec2graph n = (cycle "11112222") !! (n-1)
Thats pretty straightforward:
if (input == "1") {Console.WriteLine(1)};
if (input == "2") {Console.WriteLine(1)};
if (input == "3") {Console.WriteLine(1)};
if (input == "4") {Console.WriteLine(1)};
if (input == "5") {Console.WriteLine(2)};
if (input == "6") {Console.WriteLine(2)};
if (input == "7") {Console.WriteLine(2)};
if (input == "8") {Console.WriteLine(2)};
etc...
HTH
It depends of the language you are using.
In VB.NET, you could do something like this :
for i as integer = 1 to 32
dim intAnswer as integer = 1 + (Math.Floor((i-1) / 4) mod 2)
' Do whatever you need to do with it
next
It might sound complicated, but it's only because I put it into a sigle line.
In Groovy:
def codify = { i ->
return (((((i-1)/4).intValue()) %2 ) + 1)
}
Then:
def list = 1..16
list.each {
println "${it}: ${codify(it)}"
}
char codify(char input)
{
return (((input-1) & 0x04)>>2) + 1;
}
Using Python:
output = 1
for i in range(1, 32+1):
print "%d. %d" % (i, output)
if i % 4 == 0:
output = output == 1 and 2 or 1
JavaScript
My first thought was
output = ((input - 1 & 4) >> 2) + 1;
but drhirsch's code works fine in JavaScript:
output = input - 1 & 4 ? 2 : 1;
and the ridiculous (related to FogleBird's answer):
output = -~((input - 1) % 8 > 3);
Java, using modulo operation ('%') to give the cyclic behaviour (0,1,2...7) and then a ternary if to 'round' to 1(?) or 2(:) depending on returned value.
...
public static void main(String[] args) {
for (int i=1;i<=32;i++) {
System.out.println(i+"="+ (i%8<4?1:2) );
}
Produces:
1=1 2=1 3=1 4=2 5=2 6=2 7=2 8=1 9=1
10=1 11=1 12=2 13=2 14=2 15=2 16=1
17=1 18=1 19=1 20=2 21=2 22=2 23=2
24=1 25=1 26=1 27=1 28=2 29=2 30=2
31=2 32=1

What's a good, non-recursive algorithm to calculate a Cartesian product?

Note
This is not a REBOL-specific question. You can answer it in any language.
Background
The REBOL language supports the creation of domain-specific languages known as "dialects" in REBOL parlance. I've created such a dialect for list comprehensions, which aren't natively supported in REBOL.
A good cartesian product algorithm is needed for list comprehensions.
The Problem
I've used meta-programming to solve this, by dynamically creating and then executing a sequence of nested foreach statements. It works beautifully. However, because it's dynamic, the code is not very readable. REBOL doesn't do recursion well. It rapidly runs out of stack space and crashes. So a recursive solution is out of the question.
In sum, I want to replace my meta-programming with a readable, non-recursive, "inline" algorithm, if possible. The solution can be in any language, as long as I can reproduce it in REBOL. (I can read just about any programming language: C#, C, C++, Perl, Oz, Haskell, Erlang, whatever.)
I should stress that this algorithm needs to support an arbitrary number of sets to be "joined", since list comprehension can involve any number of sets.
3 times Faster and less memory used (less recycles).
cartesian: func [
d [block! ]
/local len set i res
][
d: copy d
len: 1
res: make block! foreach d d [len: len * length? d]
len: length? d
until [
set: clear []
loop i: len [insert set d/:i/1 i: i - 1]
res: change/only res copy set
loop i: len [
unless tail? d/:i: next d/:i [break]
if i = 1 [break]
d/:i: head d/:i
i: i - 1
]
tail? d/1
]
head res
]
How about something like this:
#!/usr/bin/perl
use strict;
use warnings;
my #list1 = qw(1 2);
my #list2 = qw(3 4);
my #list3 = qw(5 6);
# Calculate the Cartesian Product
my #cp = cart_prod(\#list1, \#list2, \#list3);
# Print the result
foreach my $elem (#cp) {
print join(' ', #$elem), "\n";
}
sub cart_prod {
my #sets = #_;
my #result;
my $result_elems = 1;
# Calculate the number of elements needed in the result
map { $result_elems *= scalar #$_ } #sets;
return undef if $result_elems == 0;
# Go through each set and add the appropriate element
# to each element of the result
my $scale_factor = $result_elems;
foreach my $set (#sets)
{
my $set_elems = scalar #$set; # Elements in this set
$scale_factor /= $set_elems;
foreach my $i (0 .. $result_elems - 1) {
# Calculate the set element to place in this position
# of the result set.
my $pos = $i / $scale_factor % $set_elems;
push #{$result[$i]}, $$set[ $pos ];
}
}
return #result;
}
Which produces the following output:
1 3 5
1 3 6
1 4 5
1 4 6
2 3 5
2 3 6
2 4 5
2 4 6
For the sake of completeness, Here's Robert Gamble's answer translated into REBOL:
REBOL []
cartesian: func [
{Given a block of sets, returns the Cartesian product of said sets.}
sets [block!] {A block containing one or more series! values}
/local
elems
result
row
][
result: copy []
elems: 1
foreach set sets [
elems: elems * (length? set)
]
for n 0 (elems - 1) 1 [
row: copy []
skip: elems
foreach set sets [
skip: skip / length? set
index: (mod to-integer (n / skip) length? set) + 1 ; REBOL is 1-based, not 0-based
append row set/(index)
]
append/only result row
]
result
]
foreach set cartesian [[1 2] [3 4] [5 6]] [
print set
]
; This returns the same thing Robert Gamble's solution did:
1 3 5
1 3 6
1 4 5
1 4 6
2 3 5
2 3 6
2 4 5
2 4 6
Here is a Java code to generate Cartesian product for arbitrary number of sets with arbitrary number of elements.
in this sample the list "ls" contains 4 sets (ls1,ls2,ls3 and ls4) as you can see "ls" can contain any number of sets with any number of elements.
import java.util.*;
public class CartesianProduct {
private List <List <String>> ls = new ArrayList <List <String>> ();
private List <String> ls1 = new ArrayList <String> ();
private List <String> ls2 = new ArrayList <String> ();
private List <String> ls3 = new ArrayList <String> ();
private List <String> ls4 = new ArrayList <String> ();
public List <String> generateCartesianProduct () {
List <String> set1 = null;
List <String> set2 = null;
ls1.add ("a");
ls1.add ("b");
ls1.add ("c");
ls2.add ("a2");
ls2.add ("b2");
ls2.add ("c2");
ls3.add ("a3");
ls3.add ("b3");
ls3.add ("c3");
ls3.add ("d3");
ls4.add ("a4");
ls4.add ("b4");
ls.add (ls1);
ls.add (ls2);
ls.add (ls3);
ls.add (ls4);
boolean subsetAvailabe = true;
int setCount = 0;
try{
set1 = augmentSet (ls.get (setCount++), ls.get (setCount));
} catch (IndexOutOfBoundsException ex) {
if (set1 == null) {
set1 = ls.get(0);
}
return set1;
}
do {
try {
setCount++;
set1 = augmentSet(set1,ls.get(setCount));
} catch (IndexOutOfBoundsException ex) {
subsetAvailabe = false;
}
} while (subsetAvailabe);
return set1;
}
public List <String> augmentSet (List <String> set1, List <String> set2) {
List <String> augmentedSet = new ArrayList <String> (set1.size () * set2.size ());
for (String elem1 : set1) {
for(String elem2 : set2) {
augmentedSet.add (elem1 + "," + elem2);
}
}
set1 = null; set2 = null;
return augmentedSet;
}
public static void main (String [] arg) {
CartesianProduct cp = new CartesianProduct ();
List<String> cartesionProduct = cp.generateCartesianProduct ();
for (String val : cartesionProduct) {
System.out.println (val);
}
}
}
use strict;
print "#$_\n" for getCartesian(
[qw(1 2)],
[qw(3 4)],
[qw(5 6)],
);
sub getCartesian {
#
my #input = #_;
my #ret = map [$_], #{ shift #input };
for my $a2 (#input) {
#ret = map {
my $v = $_;
map [#$v, $_], #$a2;
}
#ret;
}
return #ret;
}
output
1 3 5
1 3 6
1 4 5
1 4 6
2 3 5
2 3 6
2 4 5
2 4 6
EDIT: This solution doesn't work. Robert Gamble's is the correct solution.
I brainstormed a bit and came up with this solution:
(I know most of you won't know REBOL, but it's a fairly readable language.)
REBOL []
sets: [[1 2 3] [4 5] [6]] ; Here's a set of sets
elems: 1
result: copy []
foreach set sets [elems: elems * (length? set)]
for n 1 elems 1 [
row: copy []
foreach set sets [
index: 1 + (mod (n - 1) length? set)
append row set/(index)
]
append/only result row
]
foreach row result [
print result
]
This code produces:
1 4 6
2 5 6
3 4 6
1 5 6
2 4 6
3 5 6
(Upon first reading the numbers above, you may think there are duplicates. I did. But there aren't.)
Interestingly, this code uses almost the very same algorithm (1 + ((n - 1) % 9) that torpedoed my Digital Root question.

Resources