How do I use groupby and mean in a sprase matrix? - matrix

everyone, I have a question that puzzled me. Hope you can help me. Thank you in advanced.
I have a sparse matrix ,it looks like this .
(2, 0) 1.3862943611198906
(7, 0) 1.0986122886681096
(27, 0) 1.0986122886681096
(29, 0) 1.0986122886681096
(31, 0) 0.6931471805599453
(37, 0) 0.6931471805599453
(41, 0) 3.1780538303479458
(42, 0) 1.0986122886681096
(45, 0) 1.3862943611198906
(47, 0) 0.6931471805599453
(51, 0) 0.6931471805599453
(52, 0) 0.6931471805599453
(55, 0) 1.0986122886681096
(56, 0) 0.6931471805599453
(60, 0) 2.0794415416798357
(62, 0) 2.639057329615259
(64, 0) 2.0794415416798357
(65, 0) 2.3978952727983707
(66, 0) 1.3862943611198906
(69, 0) 1.0986122886681096
(70, 0) 0.6931471805599453
(72, 0) 0.6931471805599453
(73, 0) 1.6094379124341003
(77, 0) 0.6931471805599453
(78, 0) 1.3862943611198906
: :
(19669, 65535) 0.6931471805599453
(19670, 65535) 1.0986122886681096
(19671, 65535) 0.6931471805599453
(19675, 65535) 0.6931471805599453
(19677, 65535) 0.6931471805599453
(19678, 65535) 1.0986122886681096
(19686, 65535) 1.3862943611198906
(19687, 65535) 0.6931471805599453
(19688, 65535) 2.0794415416798357
(19689, 65535) 1.6094379124341003
(19690, 65535) 2.1972245773362196
(19691, 65535) 0.6931471805599453
(19692, 65535) 0.6931471805599453
(19693, 65535) 0.6931471805599453
(19694, 65535) 0.6931471805599453
(19695, 65535) 2.5649493574615367
(19696, 65535) 2.639057329615259
(19697, 65535) 0.6931471805599453
(19698, 65535) 2.3978952727983707
(19699, 65535) 1.6094379124341003
(19700, 65535) 1.791759469228055
(19701, 65535) 1.791759469228055
(19702, 65535) 1.0986122886681096
(19703, 65535) 1.6094379124341003
(19704, 65535) 3.5263605246161616
(19706, 65536)
Now, I have another columns and I called it as "A" it looks like this.
gene_id
0 ENSMUSG00000031144
1 ENSMUSG00000031144
2 ENSMUSG00000031144
3 ENSMUSG00000031144
4 ENSMUSG00000031144
5 ENSMUSG00000031144
6 ENSMUSG00000031155
7 ENSMUSG00000031155
8 ENSMUSG00000031155
9 ENSMUSG00000031155
10 ENSMUSG00000031155
11 ENSMUSG00000031161
12 ENSMUSG00000031161
13 ENSMUSG00000031161
14 ENSMUSG00000031161
15 ENSMUSG00000031161
16 ENSMUSG00000031161
17 ENSMUSG00000031161
18 ENSMUSG00000031161
19 ENSMUSG00000031161
20 ENSMUSG00000031161
21 ENSMUSG00000031161
22 ENSMUSG00000031161
23 ENSMUSG00000031161
24 ENSMUSG00000031161
25 ENSMUSG00000031161
26 ENSMUSG00000031161
27 ENSMUSG00000031161
28 ENSMUSG00000031161
29 ENSMUSG00000031161
... ...
19676 ENSMUSG00000042532
19677 ENSMUSG00000042532
19678 ENSMUSG00000042532
19679 ENSMUSG00000042532
19680 ENSMUSG00000042532
19681 ENSMUSG00000025196
19682 ENSMUSG00000025196
19683 ENSMUSG00000025196
19684 ENSMUSG00000025196
19685 ENSMUSG00000025196
19686 ENSMUSG00000025196
19687 ENSMUSG00000025036
19688 ENSMUSG00000025025
19689 ENSMUSG00000025025
19690 ENSMUSG00000025025
19691 ENSMUSG00000025025
19692 ENSMUSG00000025025
19693 ENSMUSG00000025025
19694 ENSMUSG00000025025
19695 ENSMUSG00000024985
19696 ENSMUSG00000024985
19697 ENSMUSG00000024985
19698 ENSMUSG00000024985
19699 ENSMUSG00000024985
19700 ENSMUSG00000024985
19701 ENSMUSG00000024985
19702 ENSMUSG00000025075
19703 ENSMUSG00000025089
19704 ENSMUSG00000025089
19705 ENSMUSG00000025089
[19706 rows x 1 columns]
They have the same rows. I want to groupby the "A" and then get the mean values of this sparse matrix every columns. How can I do that? Thank you in advance.

Related

Algorithm to generate incremental numbers up to a fixed number using a set of given numbers

We have a requirement where we want to generate incremental numbers until a target number using a given set of numbers.
(The given set of numbers may vary, and the algorithm needs to work for any set of numbers. Though, practically we are expecting the set to contain up to 15 numbers.)
Assumptions: All the numbers in the given set are assumed to be integers.
By incremental numbers I mean all the numbers possible by adding multiples of a given set of numbers.
E.g. Let's say the we have a set {12, 13, 17} and the target number is 50. Then the incremental numbers are {0, 12, 13, 17, 24, 25, 26, 29, 30, 34, 36, 37, 38, 39, 41, 42, 43, 46, 47, 48, 49, 50}
(Explanation for the above incremental numbers:
0 = (12 * 0) + (13 * 0) + (17 * 0)
12 = (12 * 1) + (13 * 0) + (17 * 0)
13 = (12 * 0) + (13 * 1) + (17 * 0)
17 = (12 * 0) + (13 * 0) + (17 * 1)
24 = (12 * 2) + (13 * 0) + (17 * 0)
25 = (12 * 1) + (13 * 1) + (17 * 0)
26 = (12 * 0) + (13 * 2) + (17 * 0)
29 = (12 * 1) + (13 * 0) + (17 * 1)
30 = (12 * 0) + (13 * 1) + (17 * 1)
34 = (12 * 0) + (13 * 0) + (17 * 2)
36 = (12 * 3) + (13 * 0) + (17 * 0)
37 = (12 * 2) + (13 * 1) + (17 * 0)
38 = (12 * 1) + (13 * 2) + (17 * 0)
39 = (12 * 0) + (13 * 3) + (17 * 0)
41 = (12 * 2) + (13 * 0) + (17 * 1)
42 = (12 * 1) + (13 * 1) + (17 * 1)
43 = (12 * 0) + (13 * 2) + (17 * 1)
46 = (12 * 1) + (13 * 0) + (17 * 2)
47 = (12 * 0) + (13 * 1) + (17 * 2)
48 = (12 * 4) + (13 * 0) + (17 * 0)
49 = (12 * 3) + (13 * 1) + (17 * 0)
50 = (12 * 2) + (13 * 2) + (17 * 0)
)
Could anybody help me with an optimized solution for this ?
It's hard to do much better than a brute-force approach here. You can discard any number that is divisible by another number in the list. Then, you can use the following algorithm:
lst = [12,13,17]
target = 50
sums = {0}
for i in lst:
new_sums = set()
for j in sums:
x = j + i
while x <= target and x not in new_sums:
new_sums.add(x)
x += i
sums = sums.union(new_sums)
print(sums)
# {0, 12, 13, 17, 24, 25, 26, 29, 30, 34, 36, 37, 38, 39, 41, 42, 43, 46, 47, 48, 49, 50}

Does Spark support subqqueries? [duplicate]

This question already has answers here:
Does SparkSQL support subquery?
(2 answers)
Closed 6 years ago.
When I am running this query i got this type of error
select * from raw_2 where ip NOT IN (select * from raw_1);
org.apache.spark.sql.AnalysisException:
Unsupported language features in query:
select * from raw_2 where ip NOT IN (select * from raw_1)
TOK_QUERY 1, 0,24, 14
TOK_FROM 1, 4,6, 14
TOK_TABREF 1, 6,6, 14
TOK_TABNAME 1, 6,6, 14
raw_2 1, 6,6, 14
TOK_INSERT 0, -1,24, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 0, 0,2, 0
TOK_SELEXPR 0, 2,2, 0
TOK_ALLCOLREF 0, 2,2, 0
TOK_WHERE 1, 8,24, 29
NOT 1, 10,24, 29
TOK_SUBQUERY_EXPR 1, 14,10, 33
TOK_SUBQUERY_OP 1, 14,14, 33
IN 1, 14,14, 33
TOK_QUERY 1, 16,24, 51
TOK_FROM 1, 21,23, 51
TOK_TABREF 1, 23,23, 51
TOK_TABNAME 1, 23,23, 51
raw_1 1, 23,23, 51
TOK_INSERT 0, -1,19, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 0, 17,19, 0
TOK_SELEXPR 0, 19,19, 0
TOK_ALLCOLREF 0, 19,19, 0
TOK_TABLE_OR_COL 1, 10,10, 26
ip 1, 10,10, 26
scala.NotImplementedError: No parse rules for ASTNode type: 817, text:
TOK_SUBQUERY_EXPR :
TOK_SUBQUERY_EXPR 1, 14,10, 33
TOK_SUBQUERY_OP 1, 14,14, 33
IN 1, 14,14, 33
TOK_QUERY 1, 16,24, 51
TOK_FROM 1, 21,23, 51
TOK_
Spark 2.0.0+:
since 2.0.0 Spark supports a full range of subqueries. See Does SparkSQL support subquery? for details.
Spark < 2.0.0
Does Spark support subqqueries?
Generally speaking it does. Constructs like SELECT * FROM (SELECT * FROM foo WHERE bar = 1) as tmp perfectly valid queries in the Spark SQL.
As far as I can tell from the Catalyst parser source it doesn't support inner queries in a NOT IN clause:
| termExpression ~ (NOT ~ IN ~ "(" ~> rep1sep(termExpression, ",")) <~ ")" ^^ {
case e1 ~ e2 => Not(In(e1, e2))
}
It is still possible to use outer join followed by filter to obtain the same effect.

Wumpus game's make-city-edges function causes heap overflow

Going through the Land of Lisp book, I managed to get to the Grand Theft Wumpus game, that has me define a make-city-edges function. When I try to run it however, SBCL hangs for a while before giving me a very nasty error saying
Heap exhausted during garbage collection: 0 bytes available, 16 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 0 0 0 0 0 0 0 0 0 0 0 10737418 0 0 0.0000
1: 0 0 0 0 0 0 0 0 0 0 0 10737418 0 0 0.0000
2: 27757 0 0 0 19204 70 0 10 54 631392704 505408 2000000 0 0 0.9800
3: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
4: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
6: 0 0 0 0 1638 251 0 0 0 61898752 0 2000000 1523 0 0.0000
Total bytes allocated = 1073069936
Dynamic-space-size bytes = 1073741824
GC control variables:
*GC-INHIBIT* = true
*GC-PENDING* = true
*STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 85448(tid 140735276667664):
Heap exhausted, game over.
Error opening /dev/tty: Device not configured
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>
I've triple-checked to see if I made any mistake, but I couldn't find any.
Here's the function causing the problem:
(defun make-city-edges ()
(let* ((nodes (loop for i from 1 to *node-num*
collect i))
(edge-list (connect-all-islands nodes (make-edge-list)))
(cops (remove-if-not (lambda (x)
(zerop (random *cop-odds*)))
edge-list)))
(add-cops (edges-to-alist edge-list) cops)))
[here] is the rest of the code if you want to have a look at the other functions, I added it to a GitHub Gist page since it would take up too much space in the question.
What can I do to resolve this? I'm using Emacs 24.4 (9.0) on OSX 10.9 with SLIME and SBCL 1.2.10 for the project.
In the linked code,
(defun find-islands (nodes edge-list)
"returns a list of nodes that aren't interconnected"
(let ((islands nil))
(labels ((find-island (nodes)
(let* ((connected (get-connected (car nodes) edge-list))
(unconnected (set-difference nodes connected)))
(push connected islands)
(when connected
(find-island unconnected)))))
(find-island nodes))
islands))
(when connected should be (when unconnected.
A few tips for debugging heap exhaustion:
Check that your loops and recursions actually terminate. (That's what led us to this solution -- get-connected never returns nil, so find-island would recurse forever.)
CL's trace can be useful, as well as the traditional adding of print statements.
C-c C-c in SLIME after the program has run for a bit but before heap exhaustion might provide a useful backtrace.
E.g. of the backtrace:
0: ((:INTERNAL TRAVERSE GET-CONNECTED) NIL)
Locals:
NODE = NIL
#:G11908 = ((2 . 21) (20 . 22) (22 . 20) (9 . 28) (28 . 9) (2 . 7) ...)
EDGE-LIST = ((8 . 3) (3 . 8) (18 . 7) (7 . 18) (26 . 23) (23 . 26) ...)
VISITED = (NIL)
1: (GET-CONNECTED NIL ((8 . 3) (3 . 8) (18 . 7) (7 . 18) (26 . 23) (23 . 26) ...))
Locals:
NODE = NIL
EDGE-LIST = ((8 . 3) (3 . 8) (18 . 7) (7 . 18) (26 . 23) (23 . 26) ...)
VISITED = (NIL)
2: ((:INTERNAL FIND-ISLAND FIND-ISLANDS) NIL)
Locals:
NODES = NIL
ISLANDS = ((NIL) (NIL) (NIL) (NIL) (NIL) (NIL) ...)
EDGE-LIST = ((8 . 3) (3 . 8) (18 . 7) (7 . 18) (26 . 23) (23 . 26) ...)
3: (FIND-ISLANDS (1 2 3 4 5 6 ...) ((8 . 3) (3 . 8) (18 . 7) (7 . 18) (26 . 23) (23 . 26) ...))
Locals:
NODES = (1 2 3 4 5 6 ...)
EDGE-LIST = ((8 . 3) (3 . 8) (18 . 7) (7 . 18) (26 . 23) (23 . 26) ...)
ISLANDS = ((NIL) (NIL) (NIL) (NIL) (NIL) (NIL) ...)
That might lead us to say "I didn't think a node would ever be nil, and islands being ((nil) (nil) (nil) ...) seems broken."

Search first occurance of "OK" in a text file and extract first 2 characters in the same line to save it as a variable in UFT/VB scripting

I've been trying to find a proper solution to my problem for several days now looking everywhere. Hopefully some of you guys can direct me to the right direction.
I need to find the string "OK" in a text file and Extract first 2 characters in the same line if I find "OK" to save it as a variable.
I give you an example of the lines you can find in this text file:
Debugger
--------------
>h state 2
Health thread state is: POLLING
Health Devices:
Sensor Name State Eval RED Value ( D , M ) Link Active Grp Description
11 ( 2) TEMP ( 1) OK 1 1 21 ( 1, 0) 0xff 0000 0 01-Inlet Ambient (X:1 y:1)
12 ( 2) TEMP ( 1) OK 0 1 40 ( 1, 0) 0xff 0000 0 02-CPU 1 (X:11 y:5)
13 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 1) 0xff 0000 0 04-P1 DIMM 1-6 (X:14 y:5)
14 ( 2) TEMP ( 1) OK 0 1 24 ( 1, 0) 0xff r0000 0 05-P1 DIMM 7-12 (X:9 y:5)
15 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 1) 0xff 0000 0 06-P2 DIMM 1-6 (X:6 y:5)
16 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 0) 0xff 0000 0 07-P2 DIMM 7-12 (X:1 y:5)
17 ( 2) TEMP ( 1) OK 0 1 35 ( 1, 0) 0xff 0000 0 08-HD Max (X:2 y:3)
18 ( 2) TEMP ( 1) OK 0 1 38 ( 1, 0) 0xff 0000 0 10-Chipset (X:13 y:10)
19 ( 2) TEMP ( 1) OK 0 1 24 ( 1, 0) 0xff 0000 0 11-PS 1 Inlet (X:1 y:14)
20 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 0) 0xff 0000 0 12-PS 2 Inlet (X:4 y:14)
21 ( 2) TEMP ( 1) OK 0 1 32 ( 1, 0) 0xff 0000 0 13-VR P1 (X:10 y:1)
22 ( 2) TEMP ( 1) OK 0 1 28 ( 1, 0) 0xff 0000 0 15-VR P1 Mem (X:13 y:1)
23 ( 2) TEMP ( 1) OK 0 1 27 ( 1, 0) 0xff 0000 0 16-VR P1 Mem (X:9 y:1)
24 ( 2) TEMP ( 1) OK 0 1 40 ( 1, 0) 0xff 0000 0 19-PS 1 Internal (X:8 y:1)
25 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 0) 0xff 0000 0 20-PS 2 Internal (X:1 y:8)
26 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 0) 0xff 0000 0 21-PCI 1 (X:5 y:12)
27 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 0) 0xff 0000 0 22-PCI 2 (X:11 y:12)
28 ( 2) TEMP ( 2) MISSING 0 1 0 ( 0, 0) 0xff 0000 0
Using the following Code I can extract the first 2 characters in a line but I have to extract 2 characters from the line where I find the first occurrence of "OK"
strLine = objTextFile.ReadLine
objTextFile.Close
'Gets first 2 chars
SerNum = Left(strLine, 2)
Looking for help in this... Thanks in advance...
My unfinished vbscript:
Const ForReading = 1
Dim strSearchFor
strSearchFor = "OK"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile("C:\myFile.txt", ForReading)
For i = 0 to 20
strLine = objTextFile.ReadLine()
If InStr(strLine, strSearchFor) > 0 Then
SensorNumb = Left(strLine, 2)
Exit For
End If
Next
Final Code :
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile("C:\myFile.txt", ForReading)
For i = 0 to 20
strLine = objTextFile.ReadLine()
If InStr(strLine, "OK") > 0 Then
SensorNumb = Left(strLine, 2)
objTextFile.Close
Exit For
End If
Next
Basically what you want to do is:
read the file line by line
Find a substring using InStr
Print the first two chars Mid(str, 1, 2)
You should be able to chain these together yourself.
This python script should work for you:
lines = [line[:-1] for line in open("MY_FILENAME")]
for i in range(len(lines)):
if lines[i].contains("OK"):
print lines[i][:2]
Save this as a *.py file and execute it with python path/to/python/file

matrix enlargement in vhdl

I have a matrix (size n x m) in matlab. And I want to enlarge the matrix size to n+2x, m+2y) by adding zeros around the original n x m matrix.
Example:
original 2x2 matrix
1 2
3 4
New 4x4 matrix [0 0 0 0; 0 1 2 0; 0 3 4 0; 0 0 0 0]
0 0 0 0
0 1 2 0
0 3 4 0
0 0 0 0
How can I do it in vhdl?
type matrix is array(natural range <>, natural range <>) of natural;
signal matrix2x2 : matrix(0 to 1, 0 to 1);
signal matrix6x4 : matrix(0 to 5, 0 to 3);
function expand_matrix(m : matrix; x : natural; y : natural) return matrix is
constant ROWS : natural := m'length(1) + 2 * y;
constant COLS : natural := m'length(2) + 2 * x;
variable mx : matrix(0 to ROWS-1, 0 to COLS-1) := (others => (others => 0));
begin
for r in y to y + m'length(1) - 1 loop
for c in x to x + m'length(2) - 1 loop
mx(r, c) := m(r-y, c-x);
end loop;
end loop;
return mx;
end function;
begin
matrix2x2 <= ((1, 2), (3, 4));
matrix6x4 <= expand_matrix(matrix2x2, 1, 2);
For simplicity I made the matrix type a 2D array of natural. If you intend to eventually synthesize this you will be better off changing to a more constrained integer type or use the signed/unsigned vector types with only the number of bits you need.

Resources