Access Values of Cell Array deeply nested within Structure Array - data-structures

I have a nested structure_array/cell_array/structure_array of character values which is the result of a web query which returns a converted JSON object, the needed numeric value(s) of which I can access in loops thus:
for ix = 1 : size( S.orderBook.buckets , 2 )
if ( str2double( S.orderBook.buckets{ ix }.price ) >= str2double( S.orderBook.price ) )
mid_ix = ix ;
break ;
endif
endfor
The above loop gets the index, mid_ix, of the cell in the middle of the region of interest, and
orderbook_begin_ix = mid_ix - 20 ; orderbook_end_ix = mid_ix + 20 ;
jj = 0 ;
for ix = orderbook_begin_ix : orderbook_end_ix
jj = jj + 1 ;
new_orderbook_data( 1 , jj ) = str2double( S.orderBook.buckets{ ix }.longCountPercent ) ;
endfor
this second loop fills the pre-initialised matrix, new_orderbook_data, with the values of interest.
However, I was wondering whether there is a quicker/more elegant way of getting these values? At the moment, as can be seen above, I am having to run a "look up" for loop that encloses an "if statement" to get in the ballpark of the required numeric values, and then run a second for loop in the region of the ballpark to extract these required values.
Note: cross posted at Octave forum

I think I have solved this by using the syntax below:
prices = cellfun( #str2double , { [ S.orderBook.buckets{:} ].price } ) ;
which gives me a matrix "prices" to which I can further apply vectorised code.
Explanation:-
the { : } extracts the prices from the cell array into a comma
separated list,
the enclosing [ ] puts this list into a structure array,
the [ ].price extracts just the prices which are then put back into a
cell array with the outermost enclosing { }
and then the string values are converted to numeric by applying the
cellfun to this prices cell array and
are finally assigned to the "prices" matrix.

Related

Attempt to index nested table and insert numeric character number

local file = assert(io.open("E:\\text.txt","r"))
local Table = {}
local function Sort()
for c in file:lines() do
Table[#Table + 1] = {}
print(c)
for i = 1,#c do
Table[#Table][i] = string.byte(c,i,i)
Table[#Table] = table.concat(Table[#Table])
end
print("hi")
print(table.concat(table))
end
end
Sort()
-- error:8: attempt to index a string value(field '?')
This Lua code is supposed to traverse through the lines of the file and create a table with the numeric representation of all its characters.
In your outer loop the first time through you set Table[1] = {}. The first time through your inner loop you are setting Table[1] to the result of table.concat which is a string. The next time through the inner loop when i=2 you are attempting Table[1][2], but Table[1] is now a string, hence the error.

Passing & returning a list/array as a parameter/ return type to a UDF in Redshift

I have a bunch of metrics that consume the entire list of float values of a column(think a series of order value on which I a doing some outlier analysis, hence needing the entire array of values) .
Can I pass the entire list as a parameter ? It would be too much data munging, if I were to do this in python entirely. Thoughts ?
# Redshift UDF - the red part is invalid signature & needs a fill
create function Median_absolute_deviation(y <Pass a list, but how? >,threshold float)
--INPUTS:
--a list of order values, -- a threshold
RETURNS <return a list, but how? >
STABLE
AS $
import numpy as np
m = np.median(y)
abs_dev = np.abs(y - m)
left_mad = np.median(abs_dev[y<=m])
right_mad = np.median(abs_dev[y>=m])
y_mad = np.zeros(len(y))
y_mad[y < m] = left_mad
y_mad[y > m] = right_mad
modified_z_score = 0.6745 * abs_dev / y_mad
modified_z_score[y == m] = 0
return modified_z_score > threshold
$LANGUAGE plpythonu
I can pass the m = np.median(y) from another function (using select statement on the DB) - but again calculating abs_dev & left_mad & right_mad needs the entire series.
Can I use anyelement data type here ? AWS Reference : http://docs.aws.amazon.com/redshift/latest/dg/udf-data-types.html
This is what I tried . Also, I would like to return the value of that column if flag was "0" - but I guess I can do it on 2nd pass ?
create or replace function Median_absolute_deviation(y anyelement ,thresh int)
--INPUTS:
--a list of order values, -- a threshold
-- I tried both float & anyelement return type, but same error
RETURNS float
--OUTPUT:
-- returns the value of order amount if not outlier, else returns 0
STABLE
AS $$
import numpy as np
m = np.median(y)
abs_dev = np.abs(y - m)
left_mad = np.median(abs_dev[y<=m])
right_mad = np.median(abs_dev[y>=m])
y_mad = np.zeros(len(y))
y_mad[y < m] = left_mad
y_mad[y > m] = right_mad
modified_z_score = 0.6745 * abs_dev / y_mad
modified_z_score[y == m] = 0
flag= 1 if (modified_z_score > thresh ) else 0
return flag
$$LANGUAGE plpythonu
select Median_absolute_deviation(price,3) from my_table where price >0 limit 5;
An error occurred when executing the SQL command:
select Median_absolute_deviation(price,3) from my_table where price >0 limit 5
ERROR: IndexError: invalid index to scalar variable.. Please look at svl_udf_log for more information
Detail:
-----------------------------------------------
error: IndexError: invalid index to scalar variable.. Please look at svl_udf_log for more information
code: 10000
context: UDF
query: 47544645
location: udf_client.cpp:298
process: query6_41 [pid=24744]
-----------------------------------------------
Execution time: 0.73s
1 statement failed.
My end goal is populating tableau views using these computations made via UDF's(the end goal) - so I need something that can interact with tableau and do computations on the fly using a function. Suggestions ?
Redshift only supports scalar UDFs for the time being, which means that you basically CANNOT pass a list as a parameter.
That being said, you can be creative and pass it as a string of numbers separated with a special character and then reconvert it to a list in your udf eg.:
list = [1, 2, 3.5] can be passed as
string_list = "1|2|3.5"
For this to work you need to pre-decide the precision of your numbers and the maximum size of your list, so as to define a varchar of the appropriate length.
It is not the best practice, but it will work.

Creating matrix of "concord" results

I have matrix with 400 rows and 40 columns.
I would like to create a new matrix from this data where I calculate the concordance between 2 variables, i.e., concord [A1,B1]=number1; concord [A1,B2]=number2; [A1,B39]=number39. So, number1 should now be the first number of the first row of a new matrix; number 2 is the second number in the first row....
The end result is a new matrix that shows the rho_c for each pair of numbers in the original data matrix.
The original matrix has a lot of empty cells. I can also create multiple matrix of subsections of concordance calculations, it doesn't matter much. However, I don't quite understand how to write this command in mata.
I've searched here: http://jasoneichorst.com/wp-content/uploads/2012/01/BeginMatrix.pdf
EDIT: The data looks like this (variable "Score1" is a rater). Not all raters rate the same item.
enter image description here
Assuming I fully understand the question, there are methods to do this. One which comes to mind involves the use of concord available from SSC (ssc install concord) along with some local macros and loops.
/* Clear and set up sample data */
clear *
set obs 60
forvalues i = 1/6 {
gen A`i' = runiform()
}
replace A2 = . in 10/L
replace A3 = . in 1/5
replace A3 = . in 20/L
replace A4 = . in 1/20
replace A4 = . in 30/L
replace A5 = . in 1/15
replace A5 = . in 40/L
replace A6 = . in 1/40
/* End data set-up */
* describe, varlist will allow you to store your variables in a local macro
qui describe, varlist
local vars `r(varlist)'
* get number of variables in local macro vars
local varcount : word count `vars'
* Create a matrix to hold rho_c
mat rho = J(6,6,.)
mat rownames rho = `vars'
mat colnames rho = `vars'
* Loop through vars to run concord on all unique combinations of A1-A6
* using the position of each variable in local vars to assign the var name
* to local x and local y
* concord is executed only for j >= i so that you don't end up with two sets
* of the same variables being ran (eg., A1,A2 and A2,A1)
forvalues i = 1/`varcount' {
local y `: word `i' of `vars''
forvalues j = 1/`varcount' {
local x `: word `j' of `vars''
if `j' >= `i' {
capture noisily concord `y' `x'
mat rho[`i',`j'] = r(rho_c)
}
}
}
* Display the results stored in the matrix, rho.
mat list rho
The above code should get you started, but there may need to be changes made depending on exactly what you want to do.
You will notice that inside of the loop, I have included capture noisily before concord. The reason for this is because in the image you linked to, your variables were missing values across entire sections of observations. This will likely result in an error message being thrown (specifically, r(2000): no observations). The capture piece forces Stata to continue to execute the loop if an error occurs there. The noisily piece tells Stata to display the output from concord even though capture was specified.
Also, if you search help concord in Stata, you will be directed to the help page which indicates that the concordance correlation coefficient is stored in r(rho_c). You can store these as individual scalars inside the loop or do as in the example and create a kxk matrix of values.

How to create the equivalent of a HashMap<Int, Int[]> in Lua

I would like to have a simple data structure in lua resembling a Java HashMap equivalent.
The purpose of this is that I wish to maintain a unique key 'userID' mapped against a set of two values which get constantly updated, for example;
'77777', {254, 24992}
Any suggestions as to how can I achieve this?
-- Individual Aggregations
local dictionary = ?
-- Other Vars
local sumCount = 0
local sumSize = 0
local matches = redis.call(KEYS, query)
for _,key in ipairs(matches) do
local val = redis.call(GET, key)
local count, size = val:match(([^:]+):([^:]+))
topUsers(string.sub(key, 11, 15), sumCount, sumSize)
-- Global Count and Size for the Query
sumCount = sumCount + tonumber(count)
sumSize = sumSize + tonumber(size)
end
local result = string.format(%s:%s, sumCount, sumSize)
return result;
-- Users Total Data Aggregations
function topUsers()
-- Do sums for each user
end
Assuming that dictionary is what you are asking about:
local dictionary = {
['77777'] = {254, 24992},
['88888'] = {253, 24991},
['99999'] = {252, 24990},
}
The tricky part is that the key is a string that can't be converted to a Lua variable name so you must surround each key with []. I can't find a clear description of rule for this in Lua 5.1 reference manual, but the Lua wiki says that if a key "consists of underscores, letters, and numbers, but doesn't start with a number" only then does it not require the [] when defined in the above manner, otherwise the square brackets are required.
Just use a Lua table indexed by userID and with values another Lua table with two entries:
T['77777']={254, 24992}
This is possible implementation of the solution.
local usersTable = {}
function topUsers(key, count, size)
if usersTable[key] then
usersTable[key][1] = usersTable[key][1] + count
usersTable[key][2] = usersTable[key][2] + size
else
usersTable[key] = {count, size}
end
end
function printTable(t)
for key,value in pairs(t) do
print(key, value[1], value[2])
end
end

Lotus Notes - #Subset function - get the last element

I'm trying to find out how to get the last position from a list obtained from
#Unique ( #DbLookup( "" : "NoCache" ; #DbName ; _view ; field1+field2 ; 2 ));
This gives me a list containing , let say , 5 elements. ( I don't know always how much elements there are in it ).
I just to get the last element ( from last position ) ! Thanks in advance.
Actually the answer is in your question's title itself. You can use the #Subset function to do that. So your code would be:
list := #Unique ( #DbLookup( "" : "NoCache" ; #DbName ; _view ; field1+field2 ; 2 ));
lastElement := #Subset(list; -1);
The help documentation says that: If you specify a negative number, #Subset searches the list from right to left, but the result is ordered as from the beginning of the list.

Resources