Reading a file backwards in Mathematica -- How? - wolfram-mathematica

I have a large file (written in Mathematica) that contains n "records" and each of these records is a list of fixed length m, where n > 10,000 and 500 < m < 600 (bytes). Note, my system does not have the capacity to hold all records in memory --- the reason for writing them to a file. I have an application (in Mathematica) that needs to process these records in reverse order; i.e. the last record written out is the first record to be processed. How can I read these records from the file in reverse order?
Meanwhile (after some trial and error with Mathematica I/O) I found one solution. Note, this is a stripped down example of a possible solution.
fname = "testfile";
strm = OpenWrite[fname];
n = 10; (* In general, n could be very large *)
For[k = 1, k <= n, k++,
(* Create list on each pass through this loop ... *)
POt = {{k, k + 1}, {k + 2, k + 3}};
Print[POt];
(* Save to a file *)
Write[strm, POt];
];
Close[strm];
(* 2nd pass to get byte offsets of each record written to file *)
strm = OpenRead[fname];
ByteIndx = {0};
For[i = 1, i <= n, i++,
PIn = Read[strm];
AppendTo[ByteIndx, StreamPosition[strm]];
];
Drop[ByteIndx, -1]
(* Read records in reverse order *)
For[i = n, i >= 1, i--,
SetStreamPosition[strm, ByteIndx[[i]]];
PIn = Read[strm];
Print[PIn];
(* Process PIn ... *)
];
Close[strm];
It would be nice if the 2nd pass (to get the byte offsets) could be eliminated but I have not found how to do this yet... Also, these byte offsets could be written to a file (similar to how the records are handled) and then read back in one at a time, should there still be a memory problem.

for sake of putting an answer, your second pass can be written concisely:
strm = OpenRead[fname];
ByteIndx=Reap[While[Sow[StreamPosition[strm]]; !TrueQ[Read[strm ] == EndOfFile]]][[2,1,;;-2]]
n=Length[ByteIndx]

Related

Making a matrix in a for loop

I am currently working with mathematica and I got stuck on some technicalities.
Rvec[R_] := UnitVector[Length[R], RandomInteger[{1, Length[R]}]]
Fvec[R_] := R - Rvec[R] + Rvec[R]
vec[R_] := Module[{S = Fvec[R]}, If[Count[S, -1] > 0, R, S]]
Loop[R_, n_] := For[i = 1; L = R, i < n + 1, i++, L = vec[L]; Print[L]]
The idea is that I now have a loop going that will randomly subtract one number from one entry in a set and add it to another in the next iteration, but with the catch that no entry can drop below zero. The output I then get is a set of outcomes put beneath each other.
Having done that I would like to know how I could put the entire output in the form of one matrix:
https://i.gyazo.com/a4ef70ba5670fd53003e0ac5ec1e434e.png
Instead of having the output like that, I would like to have it in matrix form, as in having this set of outputs placed in a larger set containing those sets as elements. This would greatly help me, as I would be able to manipulate and work with the entire output.
If you need to make matrix by consequently adding vector by vector, you can do like this:
vector = {1, 2, 3, 4, 5};
matrix = {}; (* Initialize matrix *)
Do[matrix = Append[matrix, vector], 5]; (* Construct matrix by adding line by line*)
MatrixForm[matrix] (* Print matrix *)
Please tell me If I didn't understanf youy problem properly.

Select one number at a time between 0 & 10 billion in random order

Problem
I have a need to pick one unique random number at a time between 0 and 10,000,000,000 and do it till all numbers are selected. Essentially the behavior I need is a pre-built stack/queue with 10 billion numbers in random order, with no ability to push new items into it.
Not so good ways to solve:
There's no shortage of inefficient ways in my brain. Such as,
persist generated numbers and check newly generated random number is already used, at some point this gets us into indefinite wait before a usable number is produced.
Persist all possible numbers in a table and pop a random row and maintain new row count for next pick etc. Not sure if this is good or bad.
Questions:
Are there other deterministic ways besides storing all possible combinations and using random?
Like maintaining windows of available numbers and randomly select a window first and randomly select a number within that window etc. eg: like this
If not, what is the best type to store numbers in reasonably small amount of space?
50+% of numbers wont fit in a 32 bit (int), 64 bit (long) is waste. Cos largest number fits in 34 bits, wasting 30 bits per number (>37GB total).
If this problem hasn't been solved already.
What is a good data structure for storing & picking a random spot and quickly adjust the structure for next pick to be fast?
***Sorry for the ambiguity. The largest selectable number is 9,999,999,999 and smallest selectable is 1.
You ask: "Are there other deterministic ways besides storing all possible combinations and using random?"
Yes there is: Encryption. Encryption with a given key guarantees a unique result for unique inputs since it is reversible. Each key defines a one-to-one permutation of the possible inputs. You need an encryption of inputs in the range [1..10e9]. To deal with something that big you need 34 bit numbers, which go up to 17,179,869,183.
There is no standard 34 bit encryption. Depending on how much security you need, and how fast you need the numbers, you can either write your own simple, fast, insecure four-round Feistel Cipher or else for something slower and more secure use Hasty Pudding cipher in 34 bit mode.
With either solution, if the first encryption gives a result outside the range, just encrypt the result again until the new result is within the range you want. The one-to-one property ensures that the final result of the chain of encryptions will be unique.
To generate a sequence of unique random-seeming numbers just encrypt 0, 1, 2, 3, 4, ... in order with the same key. Encryption guarantees that the results will be unique for that key. If you record how far you have got, then you can generate more unique numbers later, up to your 10 billion limit.
As mentioned by AChampion in the comments, you could use a Linear Congruential generator.
Your modulo (m) value will be 10 billion. In order to get a full period (all values in the range appear before the series repeats) you need to choose the a and c constants to satisfy certain criteria. m and c need to be relatively prime and a - 1 needs to be divisible by the prime factors of m (which are just 2 and 5) and also by 4 (since 10 billion is divisible by 4).
If you just come up with a single set of constants, you will only have one possible series and the numbers will always occur in the same order. However you can easily randomly generate constants that satisfy the criteria. To test for relative primality of c and m, just test if c is divisible by 2 and 5, since these are the only prime factors of m (see first condition of coprimality test here)
Simple sketch in Python:
import random
m = 10000000000
a = 0
c = 0
r = 0
def setupLCG():
global a, c, r
# choose value of c that is 0 < c < m and relatively prime to m
c = 5
while ((c % 5 == 0) or (c % 2 == 0)):
c = random.randint(1, m - 1)
# choose value of a that is 0 < a <= m and a - 1 is divisible by
# prime factors of m, and 4
a = 4
while ((((a - 1) % 4) != 0) or (((a - 1) % 5) != 0)):
a = random.randint(1, m)
r = random.randint(0, m - 1)
def rand():
global m, a, c, r
r = (a*r + c) % m
return r
random.seed()
setupLCG()
for i in range(1000):
print rand() + 1
This approach won't give the full possibility of 10000000000! possible combinations, but it will still be on the order of 1019, which is quite a lot. It does have a few other issues (e.g. alternates even and odd values). You could mix it up a bit by having a small pool of numbers, adding a number from the sequence to it each time and randomly drawing one out.
Similar to what rossum has suggested, you can use invertible integer hash function, which uniquely maps an integer in [0,2^k) to another integer in the same range. For your particular problem, you choose k=34 (2^34=16 billion) and reject any number above 10 billion. Here is a complete implementation:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
uint64_t hash_64(uint64_t key, uint64_t mask)
{
key = (~key + (key << 21)) & mask; // key = (key << 21) - key - 1;
key = key ^ key >> 24;
key = ((key + (key << 3)) + (key << 8)) & mask; // key * 265
key = key ^ key >> 14;
key = ((key + (key << 2)) + (key << 4)) & mask; // key * 21
key = key ^ key >> 28;
key = (key + (key << 31)) & mask;
return key;
}
int main(int argc, char *argv[])
{
uint64_t i, shift, mask, max = 10000ULL;
char *dummy;
if (argc > 1) max = strtol(argv[1], &dummy, 10);
for (shift = 0; 1ULL<<shift <= max; ++shift) {}
mask = (1ULL<<shift) - 1;
for (i = 0; i <= mask; ++i) {
uint64_t x = hash_64(i, mask);
x = hash_64(x, mask);
x = hash_64(x, mask); // apply multiple times to increase randomness
if (x > max || x == 0) continue;
printf("%llu\n", x);
}
return 0;
}
This should give you number [0,10000000000] in random order.
For the range 1-999,999,999,999 is equivalent 0-999,999,999,998 (just add 1). Given the definition of LCG then you can implement this:
import functools as ft
import itertools as it
import operator as op
from sympy import primefactors, nextprime
def LCG(m, seed=0):
factors = set(primefactors(m))
a = ft.reduce(op.mul, factors)+1
assert(m%4 != 0 or (m%4 == 0 and (a-1)%m == 0))
c = nextprime(max(factors)+1)
assert(c < m)
x = seed
while True:
x = (a * x + c) % m
yield x
# Check the first 10,000,000 for duplicates
>>> x = list(it.islice(LCG(999999999999), 10000000))
>>> len(x) == len(set(x))
True
# Last 10 numbers
>>> x[-10:]
[99069910838, 876847698522, 765736597318, 99069940559, 210181061577,
432403293706, 99069970280, 543514424631, 99069990094, 99070000001]
I've taken a couple of shortcuts for the context of this question as the asserts should be replaced with handling code, currently it would just fail if those asserts were False
I'm not aware of any truly random methods of picking the numbers without storing a list of the numbers already picked. You could do some sort of linear hashing algorithm, and then pass the numbers 0 to n through it (repeating when your hash returns a value above 10000000000), but this wouldn't be truly random.
If you are to store the numbers, you might consider doing it via a bitmask. To pick quickly in the bitmask, you would likely keep a tree, where each leaf would represent the number of free bits in the corresponding 32 bytes, the branches above that would list the number of free bits in the corresponding 2K entries, and so forth. You then have O(log(n)) time to find your next entry, and O(log(n)) time to claim a bit (as you have to update the tree). It would require something to the order of 2n bits to store as well.
You definitely don't need to store all the numbers.
If you want a perfect set of the numbers from 1 to 10B each exactly once, there are two options that I see: as hinted at by the others, use a 34-bit LCG or Galois LFSR or XOR-shift that generates a sequence of numbers from 1 to 17B or so, then throw out the ones over 10B. I am not aware of any specifically 34-bit functions for this, but I'm sure someone is.
Option 2, if you can spare 1.25 GB of memory, is to create a bitmap that stores only the information that a certain number has been chosen, then use Floyd's Algorithm to get the numbers, which would be fast and give you much better quality numbers (in fact, it would work just fine with hardware RNGs).
Option 3, if you can live with a rare but occasional mistake (duplicate or never-selected number), replace the bitmap with a Bloom filter and save memory.
If predictability is not a concern, you can generate quickly using XOR operations. Suppose you want to generate a random sequence of unique numbers with n bits (34 in your case):
1- take a seed number on n bits. This number, K, can be considered as a seed that you can change each time you run a new experiment.
2- Use a counter from 0 upward
3- Each time XOR the counter with K : next = counter xor K; counter++;
To limit the range to 10 Billion, which is not a power of two, you will need to do rejection.
The obvious drawback is predictability. In step 3, you can do a prior transposition on the bytes of the counter, for example inverse the order of the bytes (like when you transform from little-endian to big endian). This would yield some improvement concerning the predictability of the next number.
Finally I have to admit that this answer can be considered as a particular implementation of encryption which was mentioned in the answer of #rossum, but it's more specific and probably fastest.
Incredibly slow but it should work. Completely random
using System;
using System.Diagnostics;
using System.IO;
using System.Runtime.InteropServices;
namespace ConsoleApplication1
{
class Program
{
static Random random = new Random();
static void Main()
{
const long start = 1;
const long NumData = 10000000000;
const long RandomNess = NumData;
var sz = Marshal.SizeOf(typeof(long));
var numBytes = NumData * sz;
var filePath = Path.GetTempFileName();
using (var stream = new FileStream(filePath, FileMode.Create))
{
// create file with numbers in order
stream.Seek(0, SeekOrigin.Begin);
for (var index = start; index < NumData; index++)
{
var bytes = BitConverter.GetBytes(index);
stream.Write(bytes, 0, sz);
}
for (var iteration = 0L; iteration < RandomNess; iteration++)
{
// get 2 random longs
var item1Index = LongRandom(0, NumData - 1, random);
var item2Index = LongRandom(0, NumData - 1, random);
// allocate room for data
var data1ByteArray = new byte[sz];
var data2ByteArray = new byte[sz];
// read the first value
stream.Seek(item1Index * sz, SeekOrigin.Begin);
stream.Read(data1ByteArray, 0, sz);
// read the second value
stream.Seek(item2Index * sz, SeekOrigin.Begin);
stream.Read(data2ByteArray, 0, sz);
var item1 = BitConverter.ToInt64(data1ByteArray, 0);
var item2 = BitConverter.ToInt64(data2ByteArray, 0);
Debug.Assert(item1 < NumData);
Debug.Assert(item2 < NumData);
// swap the values
stream.Seek(item1Index * sz, SeekOrigin.Begin);
stream.Write(data2ByteArray, 0, sz);
stream.Seek(item2Index * sz, SeekOrigin.Begin);
stream.Write(data1ByteArray, 0, sz);
}
}
File.Delete(filePath);
Console.WriteLine($"{numBytes}");
}
static long LongRandom(long min, long max, Random rand)
{
long result = rand.Next((int)(min >> 32), (int)(max >> 32));
result = (result << 32);
result = result | rand.Next((int)min, (int)max);
return result;
}
}
}

smallest integer not obtainable from {2,3,4,5,6,7,8} (Mathematica)

I'm trying to solve the following problem using Mathematica:
What is the smallest positive integer not obtainable from the set {2,3,4,5,6,7,8} via arithmetic operations {+,-,*,/}, exponentiation, and parentheses. Each number in the set must be used exactly once. Unary operations are NOT allowed (1 cannot be converted to -1 with without using a 0, for example).
For example, the number 1073741824000000000000000 is obtainable via (((3+2)*(5+4))/6)^(8+7).
I am a beginner with Mathematica. I have written code that I believe solves the problems for the set {2,3,4,5,6,7} (I obtained 2249 as my answer), but my code is not efficient enough to work with the set {2,3,4,5,6,7,8}. (My code already takes 71 seconds to run on the set {2,3,4,5,6,7})
I would very much appreciate any tips or solutions to solving this harder problem with Mathematica, or general insights as to how I could speed my existing code.
My existing code uses a brute force, recursive approach:
(* this defines combinations for a set of 1 number as the set of that 1 number *)
combinations[list_ /; Length[list] == 1] := list
(* this tests whether it's ok to exponentiate two numbers including (somewhat) arbitrary restrictions to prevent overflow *)
oktoexponent[number1_, number2_] :=
If[number1 == 0, number2 >= 0,
If[number1 < 0,
(-number1)^number2 < 10000 \[And] IntegerQ[number2],
number1^number2 < 10000 \[And] IntegerQ[number2]]]
(* this takes a list and removes fractions with denominators greater than 100000 *)
cleanup[list_] := Select[list, Denominator[#] < 100000 &]
(* this defines combinations for a set of 2 numbers - and returns a set of all possible numbers obtained via applications of + - * / filtered by oktoexponent and cleanup rules *)
combinations[list_ /; Length[list] == 2 && Depth[list] == 2] :=
cleanup[DeleteCases[#, Null] &#DeleteDuplicates#
{list[[1]] + list[[2]],
list[[1]] - list[[2]],
list[[2]] - list[[1]],
list[[1]]*list[[2]],
If[oktoexponent[list[[1]], list[[2]]], list[[1]]^list[[2]],],
If[oktoexponent[list[[2]], list[[1]]], list[[2]]^list[[1]],],
If[list[[2]] != 0, list[[1]]/list[[2]],],
If[list[[1]] != 0, list[[2]]/list[[1]],]}]
(* this extends combinations to work with sets of sets *)
combinations[
list_ /; Length[list] == 2 && Depth[list] == 3] :=
Module[{m, n, list1, list2},
list1 = list[[1]];
list2 = list[[2]];
m = Length[list1]; n = Length[list2];
cleanup[
DeleteDuplicates#
Flatten#Table[
combinations[{list1[[i]], list2[[j]]}], {i, m}, {j, n}]]]
(* for a given set, partition returns the set of all partitions into two non-empty subsets *)
partition[list_] := Module[{subsets},
subsets = Select[Subsets[list], # != {} && # != list &];
DeleteDuplicates#
Table[Sort#{subsets[[i]], Complement[list, subsets[[i]]]}, {i,
Length[subsets]}]]
(* this finally extends combinations to work with sets of any size *)
combinations[list_ /; Length[list] > 2] :=
Module[{partitions, k},
partitions = partition[list];
k = Length[partitions];
cleanup[Sort#
DeleteDuplicates#
Flatten#(combinations /#
Table[{combinations[partitions[[i]][[1]]],
combinations[partitions[[i]][[2]]]}, {i, k}])]]
Timing[desiredset = combinations[{2, 3, 4, 5, 6, 7}];]
{71.5454, Null}
Complement[
Range[1, 3000], #] &#(Cases[#, x_Integer /; x > 0 && x <= 3000] &#
desiredset)
{2249, 2258, 2327, 2509, 2517, 2654, 2789, 2817, 2841, 2857, 2990, 2998}
This is unhelpful, but I'm under my quota for useless babbling today:
(* it turns out the symbolizing + * is not that useful after all *)
f[x_,y_] = x+y
fm[x_,y_] = x-y
g[x_,y_] = x*y
gd[x_,y_] = x/y
(* power properties *)
h[h[a_,b_],c_] = h[a,b*c]
h[a_/b_,n_] = h[a,n]/h[b,n]
h[1,n_] = 1
(* expand simple powers only! *)
(* does this make things worse? *)
h[a_,2] = a*a
h[a_,3] = a*a*a
(* all symbols for two numbers *)
allsyms[x_,y_] := allsyms[x,y] =
DeleteDuplicates[Flatten[{f[x,y], fm[x,y], fm[y,x],
g[x,y], gd[x,y], gd[y,x], h[x,y], h[y,x]}]]
allsymops[s_,t_] := allsymops[s,t] =
DeleteDuplicates[Flatten[Outer[allsyms[#1,#2]&,s,t]]]
Clear[reach];
reach[{}] = {}
reach[{n_}] := reach[n] = {n}
reach[s_] := reach[s] = DeleteDuplicates[Flatten[
Table[allsymops[reach[i],reach[Complement[s,i]]],
{i,Complement[Subsets[s],{ {},s}]}]]]
The general idea here is to avoid calculating powers (which are
expensive and non-commutative), while at the same time using the
commutativity/associativity of addition/multiplication to reduce the
cardinality of reach[].
Code above also available at:
https://github.com/barrycarter/bcapps/blob/master/playground.m#L20
along with literally gigabytes of other useless code, data, and humor.
I think the answer to your question lays in the command Groupings. This allows you to create a binary tree of a list. The binary tree is very usefull as each of the operations you allow Plus, Subtract, Times, Divide, Power take two arguments. Eg.
In> Groupings[3,2]
Out> {List[List[1,2],3],List[1,List[2,3]]}
Thus all we need to do is replace List with any combination of the allowed operations.
However, Groupings seems to be almighty as it has an option to do this. Imagine you have two functions foo and bar and both take 2 arguments, then you can make all combinations as :
In> Groupings[3,{foo->2,bar->2}]
Out> {foo[foo[1,2],3],foo[1,foo[2,3]],foo[bar[1,2],3],foo[1,bar[2,3]],
bar[foo[1,2],3],bar[1,foo[2,3]],bar[bar[1,2],3],bar[1,bar[2,3]]}
Now it is possible to count the amount of combinations we have :
In> Groupings[Permutations[#],
{Plus->2,Subtract->2,Times->2,Divide->2,Power->2}
] &# {a,b,c,d,e};
In> Length#%
In> DeleteDuplicates#%%
In> Length#%
Out> 1050000
Out> 219352
This means that for 5 distinct numbers, we have 219352 unique combinations.
Sadly, many of these combinations cannot be evaluated due to overflow, division by zero or underflow. However, it is not evident which ones to remove. The value a^(b^(c^(d^e))) could be humongous, or just small. Fractional powers could result in perfect roots and divisions by large numbers can become perfect.
In> Groupings[Permutations[#],
{Plus->2,Subtract->2,Times->2,Divide->2,Power->2}
] &# {2, 3, 4};
In> Union[Cases[%, _?(IntegerQ[#] && # >= 0 &)]];
In> Split[%, #2 - #1 <= 1 &][[1]]
Out> {1, 2, 3, 4, 5, 6}

Given an array that contains all elements thrice except one. Find the element which occurs once. [duplicate]

There are many numbers in an array and each number appears three times excepting for one special number appearing once. Here is the question: how can I find the special number in the array?
Now I can only put forward some methods with radix sorting and rapid sorting which cannot takes advantage the property of the question. So I need some other algorithms.
Thanks for your help.
Add the numbers bitwise mod 3, e.g.
def special(lst):
ones = 0
twos = 0
for x in lst:
twos |= ones & x
ones ^= x
not_threes = ~(ones & twos)
ones &= not_threes
twos &= not_threes
return ones
Since nobody's saying it, I will: hashtable.
You can calculate how many times each element occurs in the array in O(n) with simple hashtable (or hashmap).
If the array is sorted, the problem is trivial, you just loop through the list, three items at a time, and check if the third item is the same as the current.
If the array is not sorted, you can use a Hash Table to count the number of occurences of each numbers.
A possible algorithm (very generic, not tested) :
function findMagicNumber(arr[0...n])
magic_n := NaN
if n = 1 then
magic_n := arr[0]
else if n > 1 then
quicksort(arr)
old_n := arr[0]
repeat := 0
for i := 1 to n
cur_n := arr[i]
repeat := repeat + 1
if cur_n ≠ old_n then
if repeat = 1 then
magic_n := old_n
old_n := cur_n
repeat := 0
return magic_n
Following is another O(n) time complexity and O(1) extra space method
suggested by aj. We can sum the bits in same positions for all the numbers and take modulo with 3.
The bits for which sum is not multiple of 3, are the bits of number with single occurrence.
Let us consider
the example array {5, 5, 5, 8}.
The 101, 101, 101, 1000
Sum of first bits%3 = (1 + 1 + 1 + 0)%3 = 0;
Sum of second bits%3 = (0 + 0 + 0 + 0)%0 = 0;
Sum of third bits%3 = (1 + 1 + 1 + 0)%3 = 0;
Sum of fourth bits%3 = (1)%3 = 1;
Hence number which appears once is 1000
#include <stdio.h>
#define INT_SIZE 32
int getSingle(int arr[], int n)
{
// Initialize result
int result = 0;
int x, sum;
// Iterate through every bit
for (int i = 0; i < INT_SIZE; i++)
{
// Find sum of set bits at ith position in all
// array elements
sum = 0;
x = (1 << i);
for (int j=0; j< n; j++ )
{
if (arr[j] & x)
sum++;
}
// The bits with sum not multiple of 3, are the
// bits of element with single occurrence.
if (sum % 3)
result |= x;
}
return result;
}
// Driver program to test above function
int main()
{
int arr[] = {12, 1, 12, 3, 12, 1, 1, 2, 3, 2, 2, 3, 7};
int n = sizeof(arr) / sizeof(arr[0]);
printf("The element with single occurrence is %d ",getSingle(arr, n));
return 0;
}
How about the following?
If we assume that you know the maximum and minimum values of all numbers in the array (or can at least limit them to some maximum range, say max - min + 1, then create an auxiliary array of that size, initialized to all zeros, say AuxArray[].
Now scan your original array, say MyArray[], and for each element MyArray[i], increment
AuxArray[MyArray[i]] by one. After your scan is complete, there will be exactly one element
in AuxArray[] that equals one, and the index of that element in AuxArray[] will be the value of the special number.
No complicated search here. Just a linear order of complexity.
Hope I've made sense.
John Doner
I didnt find the implementation of bitwise mod 3 very intuitive so I wrote a more intiuitive version of the code and tested it with various examples and it worked.
Here is the code inside the loop
threes=twos&x //=find all bits counting exactly thrice
x&=~threes //remove the bits countring thrice from x as well as twos
twos&=~threes
twos|=ones&x //find all bits counting exactly twice
x&=~twos //remove all bits counting twice from modified x as well as ones
ones&=~twos
ones|=x //find all the bits from previous ones and modified x
Hope you guys find it easy to understand this version of code.
I got a solution. It's O (n) time and O (1) space.
n=list(map(int,input().split()))
l=[0]*64
for x in n:
b=bin(x)[2:]
b='0'*(64-len(b))+b
i=0
while i<len(l):
l[i]+=int(b[i])
i+=1
i=0
while i<len(l):
l[i]%=3
i+=1
s=''
for x in l:
s+=str(x)
print(int(s,2))
int main()
{
int B[] = {1,1,1,3,3,3,20,4,4,4};
int ones = 0 ;
int twos = 0 ;
int not_threes;
int x ;
for( i=0; i< 10; i++ )
{
x = B[i];
twos |= ones & x ;
ones ^= x ;
not_threes = ~(ones & twos) ;
ones &= not_threes ;
twos &= not_threes ;
}
printf("\n unique element = %d \n", ones );
return 0;
}
The code works in similar line with the question of "finding the element which appears once in an array - containing other elements each appearing twice". Solution is to XOR all the elements and you get the answer.
Basically, it makes use of the fact that x^x = 0. So all paired elements get XOR'd and vanish leaving the lonely element.
Since XOR operation is associative, commutative.. it does not matter in what fashion elements appear in array, we still get the answer.
Now, in the current question - if we apply the above idea, it will not work because - we got to have every unique element appearing even number of times. So instead of getting the answer, we will end up getting XOR of all unique elements which is not what we want.
To rectify this mistake, the code makes use of 2 variables.
ones - At any point of time, this variable holds XOR of all the elements which have
appeared "only" once.
twos - At any point of time, this variable holds XOR of all the elements which have
appeared "only" twice.
So if at any point time,
1. A new number appears - It gets XOR'd to the variable "ones".
2. A number gets repeated(appears twice) - It is removed from "ones" and XOR'd to the
variable "twice".
3. A number appears for the third time - It gets removed from both "ones" and "twice".
The final answer we want is the value present in "ones" - coz, it holds the unique element.
So if we explain how steps 1 to 3 happens in the code, we are done.
Before explaining above 3 steps, lets look at last three lines of the code,
not_threes = ~(ones & twos)
ones & = not_threes
twos & = not_threes
All it does is, common 1's between "ones" and "twos" are converted to zero.
For simplicity, in all the below explanations - consider we have got only 4 elements in the array (one unique element and 3 repeated elements - in any order).
Explanation for step 1
------------------------
Lets say a new element(x) appears.
CURRENT SITUATION - Both variables - "ones" and "twos" has not recorded "x".
Observe the statement "twos| = ones & x".
Since bit representation of "x" is not present in "ones", AND condition yields nothing. So "twos" does not get bit representation of "x".
But, in next step "ones ^= x" - "ones" ends up adding bits of "x". Thus new element gets recorded in "ones" but not in "twos".
The last 3 lines of code as explained already, converts common 1's b/w "ones" and "twos" to zeros.
Since as of now, only "ones" has "x" and not "twos" - last 3 lines does nothing.
Explanation for step 2.
------------------------
Lets say an element(x) appears twice.
CURRENT SITUATION - "ones" has recorded "x" but not "twos".
Now due to the statement, "twos| = ones & x" - "twos" ends up getting bits of x.
But due to the statement, "ones ^ = x" - "ones" removes "x" from its binary representation.
Again, last 3 lines of code does nothing.
So ultimately, "twos" ends up getting bits of "x" and "ones" ends up losing bits of "x".
Explanation for step 3.
-------------------------
Lets say an element(x) appears for the third time.
CURRENT SITUATION - "ones" does not have bit representation of "x" but "twos" has.
Though "ones & x" does not yield nothing .. "twos" by itself has bit representation of "x". So after this statement, "two" has bit representation of "x".
Due to "ones^=x", after this step, "one" also ends up getting bit representation of "x".
Now last 3 lines of code removes common 1's of "ones" and "twos" - which is the bit representation of "x".
Thus both "ones" and "twos" ends up losing bit representation of "x".
1st example
------------
2, 2, 2, 4
After first iteration,
ones = 2, twos = 0
After second iteration,
ones = 0, twos = 2
After third iteration,
ones = 0, twos = 0
After fourth iteration,
ones = 4, twos = 0
2nd example
------------
4, 2, 2, 2
After first iteration,
ones = 4, twos = 0
After second iteration,
ones = 6, twos = 0
After third iteration,
ones = 4, twos = 2
After fourth iteration,
ones = 4, twos = 0
Explanation becomes much more complicated when there are more elements in the array in mixed up fashion. But again due to associativity of XOR operation - We actually end up getting answer.

Find a special number in an array

There are many numbers in an array and each number appears three times excepting for one special number appearing once. Here is the question: how can I find the special number in the array?
Now I can only put forward some methods with radix sorting and rapid sorting which cannot takes advantage the property of the question. So I need some other algorithms.
Thanks for your help.
Add the numbers bitwise mod 3, e.g.
def special(lst):
ones = 0
twos = 0
for x in lst:
twos |= ones & x
ones ^= x
not_threes = ~(ones & twos)
ones &= not_threes
twos &= not_threes
return ones
Since nobody's saying it, I will: hashtable.
You can calculate how many times each element occurs in the array in O(n) with simple hashtable (or hashmap).
If the array is sorted, the problem is trivial, you just loop through the list, three items at a time, and check if the third item is the same as the current.
If the array is not sorted, you can use a Hash Table to count the number of occurences of each numbers.
A possible algorithm (very generic, not tested) :
function findMagicNumber(arr[0...n])
magic_n := NaN
if n = 1 then
magic_n := arr[0]
else if n > 1 then
quicksort(arr)
old_n := arr[0]
repeat := 0
for i := 1 to n
cur_n := arr[i]
repeat := repeat + 1
if cur_n ≠ old_n then
if repeat = 1 then
magic_n := old_n
old_n := cur_n
repeat := 0
return magic_n
Following is another O(n) time complexity and O(1) extra space method
suggested by aj. We can sum the bits in same positions for all the numbers and take modulo with 3.
The bits for which sum is not multiple of 3, are the bits of number with single occurrence.
Let us consider
the example array {5, 5, 5, 8}.
The 101, 101, 101, 1000
Sum of first bits%3 = (1 + 1 + 1 + 0)%3 = 0;
Sum of second bits%3 = (0 + 0 + 0 + 0)%0 = 0;
Sum of third bits%3 = (1 + 1 + 1 + 0)%3 = 0;
Sum of fourth bits%3 = (1)%3 = 1;
Hence number which appears once is 1000
#include <stdio.h>
#define INT_SIZE 32
int getSingle(int arr[], int n)
{
// Initialize result
int result = 0;
int x, sum;
// Iterate through every bit
for (int i = 0; i < INT_SIZE; i++)
{
// Find sum of set bits at ith position in all
// array elements
sum = 0;
x = (1 << i);
for (int j=0; j< n; j++ )
{
if (arr[j] & x)
sum++;
}
// The bits with sum not multiple of 3, are the
// bits of element with single occurrence.
if (sum % 3)
result |= x;
}
return result;
}
// Driver program to test above function
int main()
{
int arr[] = {12, 1, 12, 3, 12, 1, 1, 2, 3, 2, 2, 3, 7};
int n = sizeof(arr) / sizeof(arr[0]);
printf("The element with single occurrence is %d ",getSingle(arr, n));
return 0;
}
How about the following?
If we assume that you know the maximum and minimum values of all numbers in the array (or can at least limit them to some maximum range, say max - min + 1, then create an auxiliary array of that size, initialized to all zeros, say AuxArray[].
Now scan your original array, say MyArray[], and for each element MyArray[i], increment
AuxArray[MyArray[i]] by one. After your scan is complete, there will be exactly one element
in AuxArray[] that equals one, and the index of that element in AuxArray[] will be the value of the special number.
No complicated search here. Just a linear order of complexity.
Hope I've made sense.
John Doner
I didnt find the implementation of bitwise mod 3 very intuitive so I wrote a more intiuitive version of the code and tested it with various examples and it worked.
Here is the code inside the loop
threes=twos&x //=find all bits counting exactly thrice
x&=~threes //remove the bits countring thrice from x as well as twos
twos&=~threes
twos|=ones&x //find all bits counting exactly twice
x&=~twos //remove all bits counting twice from modified x as well as ones
ones&=~twos
ones|=x //find all the bits from previous ones and modified x
Hope you guys find it easy to understand this version of code.
I got a solution. It's O (n) time and O (1) space.
n=list(map(int,input().split()))
l=[0]*64
for x in n:
b=bin(x)[2:]
b='0'*(64-len(b))+b
i=0
while i<len(l):
l[i]+=int(b[i])
i+=1
i=0
while i<len(l):
l[i]%=3
i+=1
s=''
for x in l:
s+=str(x)
print(int(s,2))
int main()
{
int B[] = {1,1,1,3,3,3,20,4,4,4};
int ones = 0 ;
int twos = 0 ;
int not_threes;
int x ;
for( i=0; i< 10; i++ )
{
x = B[i];
twos |= ones & x ;
ones ^= x ;
not_threes = ~(ones & twos) ;
ones &= not_threes ;
twos &= not_threes ;
}
printf("\n unique element = %d \n", ones );
return 0;
}
The code works in similar line with the question of "finding the element which appears once in an array - containing other elements each appearing twice". Solution is to XOR all the elements and you get the answer.
Basically, it makes use of the fact that x^x = 0. So all paired elements get XOR'd and vanish leaving the lonely element.
Since XOR operation is associative, commutative.. it does not matter in what fashion elements appear in array, we still get the answer.
Now, in the current question - if we apply the above idea, it will not work because - we got to have every unique element appearing even number of times. So instead of getting the answer, we will end up getting XOR of all unique elements which is not what we want.
To rectify this mistake, the code makes use of 2 variables.
ones - At any point of time, this variable holds XOR of all the elements which have
appeared "only" once.
twos - At any point of time, this variable holds XOR of all the elements which have
appeared "only" twice.
So if at any point time,
1. A new number appears - It gets XOR'd to the variable "ones".
2. A number gets repeated(appears twice) - It is removed from "ones" and XOR'd to the
variable "twice".
3. A number appears for the third time - It gets removed from both "ones" and "twice".
The final answer we want is the value present in "ones" - coz, it holds the unique element.
So if we explain how steps 1 to 3 happens in the code, we are done.
Before explaining above 3 steps, lets look at last three lines of the code,
not_threes = ~(ones & twos)
ones & = not_threes
twos & = not_threes
All it does is, common 1's between "ones" and "twos" are converted to zero.
For simplicity, in all the below explanations - consider we have got only 4 elements in the array (one unique element and 3 repeated elements - in any order).
Explanation for step 1
------------------------
Lets say a new element(x) appears.
CURRENT SITUATION - Both variables - "ones" and "twos" has not recorded "x".
Observe the statement "twos| = ones & x".
Since bit representation of "x" is not present in "ones", AND condition yields nothing. So "twos" does not get bit representation of "x".
But, in next step "ones ^= x" - "ones" ends up adding bits of "x". Thus new element gets recorded in "ones" but not in "twos".
The last 3 lines of code as explained already, converts common 1's b/w "ones" and "twos" to zeros.
Since as of now, only "ones" has "x" and not "twos" - last 3 lines does nothing.
Explanation for step 2.
------------------------
Lets say an element(x) appears twice.
CURRENT SITUATION - "ones" has recorded "x" but not "twos".
Now due to the statement, "twos| = ones & x" - "twos" ends up getting bits of x.
But due to the statement, "ones ^ = x" - "ones" removes "x" from its binary representation.
Again, last 3 lines of code does nothing.
So ultimately, "twos" ends up getting bits of "x" and "ones" ends up losing bits of "x".
Explanation for step 3.
-------------------------
Lets say an element(x) appears for the third time.
CURRENT SITUATION - "ones" does not have bit representation of "x" but "twos" has.
Though "ones & x" does not yield nothing .. "twos" by itself has bit representation of "x". So after this statement, "two" has bit representation of "x".
Due to "ones^=x", after this step, "one" also ends up getting bit representation of "x".
Now last 3 lines of code removes common 1's of "ones" and "twos" - which is the bit representation of "x".
Thus both "ones" and "twos" ends up losing bit representation of "x".
1st example
------------
2, 2, 2, 4
After first iteration,
ones = 2, twos = 0
After second iteration,
ones = 0, twos = 2
After third iteration,
ones = 0, twos = 0
After fourth iteration,
ones = 4, twos = 0
2nd example
------------
4, 2, 2, 2
After first iteration,
ones = 4, twos = 0
After second iteration,
ones = 6, twos = 0
After third iteration,
ones = 4, twos = 2
After fourth iteration,
ones = 4, twos = 0
Explanation becomes much more complicated when there are more elements in the array in mixed up fashion. But again due to associativity of XOR operation - We actually end up getting answer.

Resources