Mathematica Generate Binary Numbers with Locked Bits - algorithm

I have a very specific Mathematica question. I am trying to generate all the binary numbers around certain 'locked' bits. I am using a list of string values to denote which bits are locked e.g. {"U","U,"L","U"}, where U is an "unlocked" mutable bit and L is a "locked" immutable bit. I start with a temporary list of random binary numbers that have been formatted to the previous list e.g. {0, 1, 1, 0}, where the 1 is the locked bit. I need to find all the remaining binary numbers where the 1 bit is constant. I've approached this problem recursively, iteratively, and with a combination of both with no results. This is for research I am doing at my university.
I am building a list of base 10 forms of the binary numbers. I realize that this code is completely wrong. This is just one attempt.
Do[
If[bits[[pos]] == "U",
AppendTo[returnList, myFunction[bits, temp, pos, returnList]]; ],
{pos, 8, 1}]
myFunction[bits_, bin_, pos_, rList_] :=
Module[{binary = bin, current = Length[bin], returnList = rList},
If[pos == current,
Return[returnList],
If[bits[[current]] == "U",
(*If true*)
If[! MemberQ[returnList, FromDigits[binary, 2]],
(*If true*)
AppendTo[returnList, FromDigits[binary, 2]];
binary[[current]] = Abs[binary[[current]] - 1],
(*If false*)
binary[[current]] = 0;
current = current - 1]; ,
(*If false*)
current = current - 1];
returnList = myFunction[bits, binary, pos, returnList];
Return[returnList]]]

You can use Tuples and Fold to generate only bit sets that you are interested in.
bits = {"U", "U", "L", "U"};
Fold[
Function[{running, next},
Insert[running, 1, next]], #, Position[bits, "L"]] & /# Tuples[{0, 1}, Count["U"]#bits]
(*
{{0, 0, 1, 0}, {0, 0, 1, 1}, {0, 1, 1, 0}, {0, 1, 1, 1},
{1, 0, 1, 0}, {1, 0, 1, 1}, {1, 1, 1, 0}, {1, 1, 1, 1}}
*)
Hope this helps.

in = IntegerDigits[Round[ Pi 10^9 ], 2];
mask = RandomSample[ConstantArray["L", 28]~Join~ConstantArray["U", 4],32];
subs[in_, mask_] := Module[ {p = Position[mask, "U"]} ,
ReplacePart[in, Rule ### Transpose[{p, #}]] & /#
Tuples[{0, 1}, Length#p]]
subs[in, mask]
{{1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1,
0, 0, 1, 0, 0, 1, 0, 1, 0}, {1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0}, {1, 0, 1,
1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
1, 0, 0, 1, 0, 1, 0}, {1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0}, ...
FromDigits[#, 2] & /# %
{3108030026, 3108030030, 3108038218, 3108038222, 3108095562,
3108095566, 3108103754, 3108103758, 3141584458, 3141584462,
3141592650, 3141592654, 3141649994, 3141649998, 3141658186,
3141658190}

myFunction[bits_] := Module[{length, num, range, all, pattern},
length = Length[bits];
num = 2^length;
range = Range[0, num - 1];
all = PadLeft[IntegerDigits[#, 2], length] & /# range;
pattern = bits /. {"U" -> _, "L" -> 1};
Cases[all, pattern]]
bits = {"U", "U", "L", "U"};
myFunction[bits]
{{0, 0, 1, 0}, {0, 0, 1, 1}, {0, 1, 1, 0}, {0, 1, 1, 1},
{1, 0, 1, 0}, {1, 0, 1, 1}, {1, 1, 1, 0}, {1, 1, 1, 1}}

Related

Fast approximation of simple cases of relaxed bipartite dimension of graph problem

Given boolean matrix M, I need to find a set of submatrices A = {A1, ..., An} such that matrices in A contain all True values in matrix M and only them. Submatrices don't have to be continuous, i.e. each submatrix is defined by the two sets of indices {i1, ..., ik}, {j1, ..., jt} of M. (For example submatrix could be something like [{1, 2, 5}, {4, 7, 9, 13}] and it is all cells in intersection of these rows and columns.) Optionally submatrices can intersect if this results in better solution. The total number of submatrices n should be minimal.
Size of the matrix M can be up to 10^4 x 10^4, so I need an effective algorithm. I suppose that this problem may not have an effective exact algorithm, because it reminds me some NP-hard problems. If this is true, then any good and fast approximation is OK. We can also suggest that the amount of true values is not very big, i.e. < 1/10 of all values, but to not have accidental DOS in prod, the solution not using this fact is better.
I don't need any code, just a general idea of the algorithm and justification of its properties, if it's not obvious.
Background
We are calculating some expensive distance matrices for logistic applications. Points in these requests are often intersecting, so we are trying do develop some caching algorithm to not calculate parts of some requests. And to split big requests into smaller ones with only unknown submatrices. Additionally some distances in the matrix may be not needed for the algorithm. On the one hand the small amount of big groups calculates faster, on the other hand if we include a lot of "False" values, and our submatrices are unreasonably big, this can slow down the calculation. The exact criterion is intricate and the time complexity of "expensive" matrix requests is hard to estimate. As far as I know for square matrices it is something like C*n^2.5 with quite big C. So it's hard to formulate a good optimization criterion, but any ideas are welcome.
About data
True value in matrix means that the distance between these two points have never been calculated before. Most of the requests (but not all) are square matrices with the same points on both axes. So most of the M is expected to be almost symmetric. And also there is a simple case of several completely new points and the other distances are cached. I deal with this cases on preprocessing stage. All the other values can be quite random. If they are too random we can give up cache and calculate the full matrix M. But sometimes there are useful patterns. I think that because of the nature of the data it is expected to contain more big sumbatrices then random data. Mostly True values are occasional, but form submatrix patterns, that we need to find. But we cannot rely on this completely, because if algorithm gets too random matrix it should be able to at least detect it to not have too long and complex calculations.
Update
As stated in wikipedia this problem is called Bipartite Dimension of a graph and is known to be NP-hard. So we can reformulate it info finding fast relaxed approximations for the simple cases of the problem. We can allow some percentage of false values and we can adapt some simple, but mostly effective greedy heuristic.
I started working on the algorithm below before you provided the update.
Also, in doing so I realised that while one is looking for blocks of true values, the problem is not one of a block transformation, as you have also now updated.
The algorithm is as as follows:
count the trues in each row
for any row with the maximum count of trues, sort the columns in the
matrix so that the row's trues all move to the left
sort the matrix rows in descending order of congruent trues on the
left (there will now be an upper left rough triangle of congruent trues)
get the biggest rectangle of trues cornered at the upper left
store the row ids and column ids for that rectangle (this is a sub-matrix definition)
change the the sub-matrix's trues to falses
repeat from the top until the upper left triangle has no trues
This algorithm will produce a complete cover of the boolean matrix consisting of row-column intersection sub-matrices containing only true values.
I am not sure if allowing some falses in a sub-matrix will help. While it will allow bigger sub-matrices to be found and hence reduce the number of passes of the boolean matrix to find a cover, it will presumably take longer to find the biggest such sub-matrices because there will be more combinations to check. Also, I am not sure how one might stop falsey sub-matrices from overlapping. It might need the maintenance of a separate mask matrix rather than using the boolean matrix as its own mask, in order to ensure disjoint sub-matrices.
Below is a first cut implementation of the above algorithm in python.
I ran it on Windows 10 on a Intel Pentium N3700 # 1.60Ghz with 4GB RAM
As is, it will do, with randomly generated ~10% trues:
100 rows x 1000 columns < 7 secs
1000 rows x 100 columns < 6 secs
300 rows x 300 columns < 14 secs
3000 rows x 300 columns < 3 mins
300 rows x 3000 columns < 15 mins
1000 rows x 1000 columns < 8 mins
I have not tested it on approximately symmetric matrices, nor have I tested it on matrices with relatively large sub-matrices. It might perform well with relatively large sub-martrices, eg, in the extreme case, ie, the entire boolean matrix is true, only two passes of the algorithm loop are required.
One area I think there can be considerable optimisation is in the row sorting. The implementation below uses the in-built phython sort with a comparator function. A custom crafted sort function will probably do much better, and possibly especially so if it is a virtual sort similar to the column sorting.
If you can try it on some real data, ie, square, approximately symmetric matrix, with relatively large sub-matrices, it would be good to know how it goes.
Please advise if you would like to me to try some optimisation of the python. I presume to handle 10^4 x 10^4 boolean matrices it will need to be a lot faster.
from functools import cmp_to_key
booleanMatrix0 = [
( 0, 0, 0, 0, 1, 1 ),
( 0, 1, 1, 0, 1, 1 ),
( 0, 1, 0, 1, 0, 1 ),
( 1, 1, 1, 0, 0, 0 ),
( 0, 1, 1, 1, 0, 0 ),
( 1, 1, 0, 1, 0, 0 ),
( 0, 0, 0, 0, 0, 0 )
]
booleanMatrix1 = [
( 0, )
]
booleanMatrix2 = [
( 1, )
]
booleanMatrix3 = [
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0 )
]
booleanMatrix4 = [
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1 )
]
booleanMatrix14 = [
( 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 ),
( 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1 ),
( 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0 ),
( 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0 ),
( 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1 ),
( 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1 ),
( 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1 ),
( 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1 ),
( 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1 ),
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0 ),
( 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 )
]
booleanMatrix15 = [
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 ),
( 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 ),
]
booleanMatrix16 = [
( 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1 ),
( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1 ),
]
import random
booleanMatrix17 = [
]
for r in range(11):
row = []
for c in range(21):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix17.append(tuple(row))
booleanMatrix18 = [
]
for r in range(21):
row = []
for c in range(11):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix18.append(tuple(row))
booleanMatrix5 = [
]
for r in range(50):
row = []
for c in range(200):
row.append(random.randrange(2))
booleanMatrix5.append(tuple(row))
booleanMatrix6 = [
]
for r in range(200):
row = []
for c in range(50):
row.append(random.randrange(2))
booleanMatrix6.append(tuple(row))
booleanMatrix7 = [
]
for r in range(100):
row = []
for c in range(100):
row.append(random.randrange(2))
booleanMatrix7.append(tuple(row))
booleanMatrix8 = [
]
for r in range(100):
row = []
for c in range(1000):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix8.append(tuple(row))
booleanMatrix9 = [
]
for r in range(1000):
row = []
for c in range(100):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix9.append(tuple(row))
booleanMatrix10 = [
]
for r in range(317):
row = []
for c in range(316):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix10.append(tuple(row))
booleanMatrix11 = [
]
for r in range(3162):
row = []
for c in range(316):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix11.append(tuple(row))
booleanMatrix12 = [
]
for r in range(316):
row = []
for c in range(3162):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix12.append(tuple(row))
booleanMatrix13 = [
]
for r in range(1000):
row = []
for c in range(1000):
if random.randrange(5) == 1:
row.append(random.randrange(2))
else:
row.append(0)
booleanMatrix13.append(tuple(row))
booleanMatrices = [ booleanMatrix0, booleanMatrix1, booleanMatrix2, booleanMatrix3, booleanMatrix4, booleanMatrix14, booleanMatrix15, booleanMatrix16, booleanMatrix17, booleanMatrix18, booleanMatrix6, booleanMatrix5, booleanMatrix7, booleanMatrix8, booleanMatrix9, booleanMatrix10, booleanMatrix11, booleanMatrix12, booleanMatrix13 ]
def printMatrix(matrix, colOrder):
for r in range(rows):
row = ""
for c in range(cols):
row += str(matrix[r][0][colOrder[c]])
print(row)
print()
def rowUp(matrix):
rowCount = []
maxRow = [ 0, 0 ]
for r in range(rows):
rowCount.append([ r, sum(matrix[r][0]) ])
if rowCount[-1][1] > maxRow[1]:
maxRow = rowCount[-1]
return rowCount, maxRow
def colSort(matrix):
# For a row with the highest number of trues, sort the true columns to the left
newColOrder = []
otherCols = []
for c in range(cols):
if matrix[maxRow[0]][0][colOrder[c]]:
newColOrder.append(colOrder[c])
else:
otherCols.append(colOrder[c])
newColOrder += otherCols
return newColOrder
def sorter(a, b):
# Sort rows according to leading trues
length = len(a)
c = 0
while c < length:
if a[0][colOrder[c]] == 1 and b[0][colOrder[c]] == 0:
return -1
if b[0][colOrder[c]] == 1 and a[0][colOrder[c]] == 0:
return 1
c += 1
return 0
def allTrues(rdx, cdx, matrix):
count = 0
for r in range(rdx+1):
for c in range(cdx+1):
if matrix[r][0][colOrder[c]]:
count += 1
else:
return
return rdx, cdx, count
def getBiggestField(matrix):
# Starting at (0, 0) find biggest rectangular field of 1s
biggestField = (None, None, 0)
cStop = cols
for r in range(rows):
for c in range(cStop):
rtn = allTrues(r, c, matrix)
if rtn:
if rtn[2] > biggestField[2]:
biggestField = rtn
else:
cStop = c
break;
if cStop == 0:
break
return biggestField
def mask(matrix):
maskMatrix = []
for r in range(rows):
row = []
for c in range(cols):
row.append(matrix[r][0][c])
maskMatrix.append([ row, matrix[r][1] ])
maskRows = []
for r in range(biggestField[0]+1):
maskRows.append(maskMatrix[r][1])
for c in range(biggestField[1]+1):
maskMatrix[r][0][colOrder[c]] = 0
maskCols= []
for c in range(biggestField[1]+1):
maskCols.append(colOrder[c])
return maskMatrix, maskRows, maskCols
# Add a row id to each row to keep track of rearranged rows
rowIdedMatrices = []
for matrix in booleanMatrices:
rowIdedMatrix = []
for r in range(len(matrix)):
rowIdedMatrix.append((matrix[r], r))
rowIdedMatrices.append(rowIdedMatrix)
import time
for matrix in rowIdedMatrices:
rows = len(matrix)
cols = len(matrix[0][0])
colOrder = []
for c in range(cols):
colOrder.append(c)
subMatrices = []
startTime = time.thread_time()
loopStart = time.thread_time()
loop = 1
rowCount, maxRow = rowUp(matrix)
ones = 0
for row in rowCount:
ones += row[1]
print( "_________________________\n", "Rows", rows, "Columns", cols, "Ones", str(int(ones * 10000 / rows / cols) / 100) +"%")
colOrder = colSort(matrix)
matrix.sort(key=cmp_to_key(sorter))
biggestField = getBiggestField(matrix)
if biggestField[2] > 0:
maskMatrix, maskRows, maskCols = mask(matrix)
subMatrices.append(( maskRows, maskCols ))
while biggestField[2] > 0:
loop += 1
rowCount, maxRow = rowUp(maskMatrix)
colOrder = colSort(maskMatrix)
maskMatrix.sort(key=cmp_to_key(sorter))
biggestField = getBiggestField(maskMatrix)
if biggestField[2] > 0:
maskMatrix, maskRows, maskCols = mask(maskMatrix)
subMatrices.append(( maskRows, maskCols) )
if loop % 100 == 0:
print(loop, time.thread_time() - loopStart)
loopStart = time.thread_time()
endTime = time.thread_time()
print("Sub-matrices:", len(subMatrices), endTime - startTime)
for sm in subMatrices:
print(sm)
print()
input("Next matrix")
LOOP over true values
Can you grow the submatrix containing the true value in any direction
( i.e can you go from
t
to
tt
tt
)
Keep growing for as long as possible
Set all cells in M that are in the new submatrix to false
Repeat until every cell in M is false.
Here is a simple example of how it works
The top picture shows the large Matrix M containing a few true values
The bottom rows show the first few iteration, with the blus submatric growing as it finds more adjacent cells with true values. In this case I have stopped because it cannot grow any durther without including false cells. If a few cells in a submatrix can be false, then you could continue a bit further.
Let's say M is an s by t matrix. The trivial (but possibly useful) solution is just to take all the non-empty columns (or rows) as your submatrices. This will result in at most min(s,t) submatrices.

Pandas Series correlation against a single vector

I have a DataFrame with a list of arrays as one column.
import pandas as pd
v = [1, 2, 3, 4, 5, 6, 7]
v1 = [1, 0, 0, 0, 0, 0, 0]
v2 = [0, 1, 0, 0, 1, 0, 0]
v3 = [1, 1, 0, 0, 0, 0, 1]
df = pd.DataFrame({'A': [v1, v2, v3]})
print df
Output:
A
0 [1, 0, 0, 0, 0, 0, 0]
1 [0, 1, 0, 0, 1, 0, 0]
2 [1, 1, 0, 0, 0, 0, 1]
I want to do a pd.Series.corr for each row of df.A against the single vector v.
I'm currently doing a loop on df.A and achieving it. It is very slow.
Expected Output:
A B
0 [1, 0, 0, 0, 0, 0, 0] -0.612372
1 [0, 1, 0, 0, 1, 0, 0] -0.158114
2 [1, 1, 0, 0, 0, 0, 1] -0.288675
Here's one using the correlation defintion with NumPy tools meant for performance with corr2_coeff_rowwise -
a = np.array(df.A.tolist()) # or np.vstack(df.A.values)
df['B'] = corr2_coeff_rowwise(a, np.asarray(v)[None])
Runtime test -
Case #1 : 1000 rows
In [59]: df = pd.DataFrame({'A': [np.random.randint(0,9,(7)) for i in range(1000)]})
In [60]: v = np.random.randint(0,9,(7)).tolist()
# #jezrael's soln
In [61]: %timeit df['new'] = pd.DataFrame(df['A'].values.tolist()).corrwith(pd.Series(v), axis=1)
10 loops, best of 3: 142 ms per loop
In [62]: %timeit df['B'] = corr2_coeff_rowwise(np.array(df.A.tolist()), np.asarray(v)[None])
1000 loops, best of 3: 461 µs per loop
Case #2 : 10000 rows
In [63]: df = pd.DataFrame({'A': [np.random.randint(0,9,(7)) for i in range(10000)]})
In [64]: v = np.random.randint(0,9,(7)).tolist()
# #jezrael's soln
In [65]: %timeit df['new'] = pd.DataFrame(df['A'].values.tolist()).corrwith(pd.Series(v), axis=1)
1 loop, best of 3: 1.38 s per loop
In [66]: %timeit df['B'] = corr2_coeff_rowwise(np.array(df.A.tolist()), np.asarray(v)[None])
100 loops, best of 3: 3.05 ms per loop
Use corrwith, but if performance is important, Divakar's anwer should be faster:
df['new'] = pd.DataFrame(df['A'].values.tolist()).corrwith(pd.Series(v), axis=1)
print (df)
A new
0 [1, 0, 0, 0, 0, 0, 0] -0.612372
1 [0, 1, 0, 0, 1, 0, 0] -0.158114
2 [1, 1, 0, 0, 0, 0, 1] -0.288675

Use Ruby to Truncate duplicate patterns in an Array

SITE ADMIN: WOULD YOU PLEASE REMOVE THIS POST?
For example, I have
tt = [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0]
and I would like to slim it down to
tt_out = [0, 1, 1, 2, 2, 1, 1, 0, 0]
also I'd like to know when does the repetition begins and ends, hence I'd like to have the following tip
tip = '0','1.','.5','6.','.11','12.','.15','16.','.20'
tt = [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0]
tip = []
tt_out = tt.map.with_index{|t, i|
start_range = (i==0 || tt[i-1] != tt[i])
end_range = (tt[i+1] != tt[i])
if start_range && end_range
tip << "#{i}"
elsif start_range
tip << "#{i}."
elsif end_range
tip << ".#{i}"
end
t if start_range || end_range
}.compact
tip
=> ["0", "1.", ".5", "6.", ".11", "12.", ".15", "16.", ".20"]
tt_out
=> [0, 1, 1, 2, 2, 1, 1, 0, 0]
P.S: You've got an error in your example, the last element of tip should be '.20'

Select Range of SQLDateTimes

table = {{ID1, SQLDateTime[{1978, 1, 10, 0, 0, 0.`}]},
{ID2, SQLDateTime[{1999, 1, 10, 0, 0, 0.`}]},
{ID3, SQLDateTime[{2010, 9, 10, 0, 0, 0.`}]},
{ID4, SQLDateTime[{2011, 1, 10, 0, 0, 0.`}]}}
I'd like to return all cases in table in which the SQLDateTime is within the last year (DatePlus[{-1, "Year"}]). How do I specify a search for those cases?
You could also use DateDifference:
Cases[table, {a_, SQLDateTime[b_]} /;
DateDifference[b, DateList[], "Year"][[1]] <= 1]
Select[table, (AbsoluteTime[ DatePlus[{-1, "Year"}]] <=
AbsoluteTime[ #[[2, 1]]] <= AbsoluteTime[ ] &)]
(* ==> {{ID3, SQLDateTime[{2010, 9, 10, 0, 0, 0.}]},
{ID4,SQLDateTime[{2011, 1, 10, 0, 0, 0.}]}
}
*)
Small update (pre-caching of Date[], based on Leonid's comments):
With[
{date = Date[]},
Select[table,
(AbsoluteTime[ DatePlus[date, {-1, "Year"}]] <=
AbsoluteTime[ #[[2, 1]]] <= AbsoluteTime[date ] &)]
]
This also removes a problem with the original DatePlus[{-1, "Year"}] which only takes today's date into account and not the current time.

how many ways are there to see if a number is even, and which one is the fastest and clearest?

given any number, what's the best way to determine it is even? how many methods can you think of, and what is the fastest way and clearest way?
bool isEven = ((number & 0x01) == 0)
The question said "any number", so one could either discard floats or handle them in another manner, perhaps by first scaling them up to an integral value first - watching out for overflow - i.e. change 2.1 to 21 (multiply by 10 and convert to int) and then test. It may be reasonable to assume, however, that by mentioning "any number" the person who posed the question is actually referring to integral values.
bool isEven = number % 2 == 0;
isEven(n) = ((-1) ^ n) == 1
where ^ is the exponentiation/pow function of your language.
I didn't say it was fast or clear, but it has novelty value.
The answer depends on the position being applied for. If you're applying for an Enterprise Architect position, then the following may be suitable:
First, you should create a proper Service-Oriented Architecture, as certainly the even-odd service won't be the only reusable component in your enterprise. An SOA consists of a service, interface, and service consumers. The service is function which can be invoked over the network. It exposes an interface contract and is typically registered with a Directory Service.
You can then create a Simple Object Access Protocol (SOAP) HTTP Web Service to expose your service.
Next, you should prevent clients from directly calling your Web Service. If you allow this, then you will end up with a mess of point-to-point communication, which is very hard to maintain. Clients should access the Web Service through an Enterprise Service Bus (ESB).
In addition to providing a standard plug-able architecture, additional components like service orchestration can occur on the bus.
Generally, writing a bespoke even/odd service should be avoided. You should write a Request for proposal (RFP), and get several vendors to show you their even/odd service. The vendor's product should be able to plug into your ESB, and also provide you with an Service level agreement (SLA).
This is even easier in ruby:
isEven = number.even?
Yes.. The fastest way is to check the 1 bit, because it is set for all odd numbers and unset for all even numbers..
Bitwise ANDs are pretty fast.
If your type 'a' is an integral type, then we can define,
even :: Integral a => a -> Bool
even n = n `rem` 2 == 0
according to the Haskell Prelude.
For floating points, of course within a reasonable bound.
modf(n/2.0, &intpart, &fracpart)
return fracpart == 0.0
With some other random math functions:
return gcd(n,2) == 2
return lcm(n,2) == n
return cos(n*pi) == 1.0
If int is 32 bits then you could do this:
bool is_even = ((number << 31) >> 31) == 0;
With using bit shifts you'll shift the right-most bit to the left-most position and then back again, thus making all other bits 0's. Then the number you're left with is either 0 or 1. This method is somewhat similar to "number & 1" method where you again make all bits 0's except the first one.
Another approach, similar to this one is this:
bool is_even = (number << 31) == 0;
or
bool is_odd = (number << 31) < 0;
If the number is even (the right-most bit is 0), then shifting it 31 positions will make the whole number 0. If the bit is 1, i.e. the number is odd, then the resulting number would be negative (every integer with left-most bit 1 is negative except if the number is of type unsigned, where it won't work). To fix signed/unsigned bug, you can just test:
bool is_odd = (number << 31) != 0;
Actually I think (n % 2 == 0) is enough, which is easy to understand and most compilers will convert it to bit operations as well.
I compiled this program with gcc -O2 flag:
#include <stdio.h>
int main()
{
volatile int x = 310;
printf("%d\n", x % 2);
return 0;
}
and the generated assembly code is
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
movl $310, 28(%esp)
movl 28(%esp), %eax
movl $.LC0, (%esp)
movl %eax, %edx
shrl $31, %edx
addl %edx, %eax
andl $1, %eax
subl %edx, %eax
movl %eax, 4(%esp)
call printf
xorl %eax, %eax
leave
ret
which we can see that % 2 operation is already converted to the andl instruction.
Similar to DeadHead's comment, but more efficient:
#include <limits.h>
bool isEven(int num)
{
bool arr[UINT_MAX] = { 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
// ...and so on
};
return arr[num];
}
As fast as an array index, which may or may not be faster than bitwise computations (it's difficult to test because I don't want to write the full version of this function). For what it's worth, that function above only has enough filled in to find even numbers up to 442, but would have to go to 4294967295 to work on my system.
With reservations for limited stack space. ;) (Is this perhaps a candidate for tail calls?)
public static bool IsEven(int num) {
if (num < 0)
return !IsEven(-num - 1);
if (num == 0)
return true;
return IsEven(-num);
}
a % 2.
It's clear
It's fast on every decent compiler.
Everyone who cries "But! But! What if compiler doesn't optimize it" should find normal compiler, shut up and read about premature optimization, read again, read again.
If it's low level check if the last (LSB) bit is 0 or 1 :)
0 = Even
1 = Odd
Otherwise, +1 #sipwiz: "bool isEven = number % 2 == 0;"
Assumming that you are dealing with an integer, the following will work:
if ((testnumber & -2)==testnumber) then testnumber is even.
basically, -2 in hex will be FFFE (for 16 bits) if the number is even, then anding with with -2 will leave it unchanged.
** Tom **
You can either using integer division and divide it by two and inspect the remainder or use a modulus operator and mod it by two and inspect the remainder. The "fastest" way depends on the language, compiler, and other factors but I doubt there are many platforms for which there is a significant difference.
Recursion!
function is_even (n number) returns boolean is
if n = 0 then
return true
elsif n = 1 then
return false
elsif n < 0 then
return is_even(n * -1)
else
return is_even(n - 2)
end if
end
Continuing the spirit of "how many ways are there...":
function is_even (n positive_integer) returns boolean is
i := 0
j := 0
loop
if n = i then
return (j = 0)
end if;
i := i + 1
j := 1 - j
end loop
end
In response to Chris Lutz, an array lookup is significantly slower than a BITWISE_AND operation. In an array lookup you're doing a memory lookup which will always be slower than a bitwise operation because of memory latency. This of course doesn't even factor in the problem of putting all possible int values into your array which has a memory complexity of O(2^n) where n is your bus size (8,16,32,64).
The odd/even property is only defined in integers. So any answer dealing with floating point is invalid. The abstract representation of this problem is Int -> bool (to use Haskell notation).
Another useless novelty solution:
if (2 * (n/2) == n)
return true;
else
return false;
Only with integers, and it depends on how the langugage handles integer division.
n/2 == n/2 if it's even or n/2-.5 if it's odd.
So 2*(n/2) == n if it's even or n - 1 if it's odd.
Here's a recursive way to do it in python:
def is_even(n: int) -> bool:
if n == 0:
return True
else:
return is_odd(n-1)
def is_odd(n: int) -> bool:
if n == 0:
return False
else:
return is_even(n-1)
Of course, you can add in logic to check if n is negative as well.

Resources