I'm trying to create an efficient table() calculation (get the frequency of each value in a vector). The difference from the ordinary table() function is that it needs to support adding and removing values without recalculating the whole table.
I thought of using a hash table. for add: look for the key, add 1 to the value. for remove: look for the key. if found: subtract 1 from value, if not found: add new key with value=1.
I was wondering if any of you have other ideas.
Example:
X
key freq
1 3
2 5
3 2
8 1
remove(8)
key freq
1 3
2 5
3 2
add(2)
key freq
1 3
2 6
3 2
Any ideas for an efficient implementation?
Thanks in advance!
--EDIT--
My current code, if anyone is interested (also involves the calculation of shannon entropy)
create.freq.hash<-function(x)
{
t<-table(x)
h<-hash(names(t),as.numeric(t));
return(h);
}
freq.hash.add<-function(hash,key)
{
if(is.null(hash[[key]]))
{
.set(hash,key,+1)
}
else
{
.set(hash,key,hash[[key]]+1)
}
}
freq.hash.remove<-function(hash,key)
{
if(!is.null(hash[[key]]))
{
if(hash[[key]]==1)
del(key,hash)
else
.set(hash,key,hash[[key]]-1)
}
}
hash.entropy<-function(hash)
{
if(is.empty(hash))
return;
v<-values(hash);
v.prob<-v/sum(v);
entropy = (-1)*(v.prob%*%log2(v.prob))
return(entropy)
}
Related
I have 3 tables in my database.
The first two tables are just normal tables with an ID and some other columns like:
Table 1
ID
col01
1
...
2
...
Table 2
ID
col01
1
...
2
...
The third table is some kind of a relation/assignment table:
Table 3
ID
table1_id
table2_id
text
1
1
1
..
2
1
2
..
3
1
3
..
4
2
1
..
5
3
3
..
Now I do have a SQL statement which does exactly what I want:
SELECT * FROM table_3 where (table1_id, table2_id) in ( (1, 1), (2, 1), (3, 3));
So Im sending following Request Body to the API:
{
"assignments": [
{
"table1_id": 1,
"table2_id": 1
},
{
"table1_id": 2,
"table2_id": 1
},
{
"table1_id": 3,
"table2_id": 3
}
]
}
I do validate my the request with
->validate($request,
[
'assignments' => 'required|array',
'assignments.*.table1_id' => 'required|integer|min:1|max:20',
'assignments.*.table2_id' => 'required|integer|min:1|max:20'
]
Now Im kinda stuck how to use the eloquent commands (e.g. whereIn) to get my desired output.
Thanks in advance!
EDIT
So I took the workaround of arcanedev-maroc mentioned here: https://github.com/laravel/ideas/issues/1021
and edited it to fit my Request.
Works like a charm.
Laravel does not provide any functions by default. The core team said that they would not maintain this feature. You can read the post here.
But you can create your own query to accomplish this. I am providing a function that you can use as per your specification:
public function test(Request $request)
{
$body=$request->input('data');
$data=json_decode($body)->assignments;
$query='(table1_id, table2_id) in (';
$param=array();
foreach($data as $datum)
{
$query=$query."(".$datum->table1_id.",".$datum->table2_id."), ";
}
$query = rtrim($query, ", ");
$query = $query.")";
$result=DB::table('table3')->whereRaw($query)->get();
return $result;
}
So I took the workaround of arcanedev-maroc mentioned here: https://github.com/laravel/ideas/issues/1021
and edited it to fit my Request.
Works like a charm.
I have a method called getAllEmployees() which returns a pageable.
Lets say the page size is 1 and started with page 0
Pageable pageable = PageRequest.of(0, 1);
Page<String> allEmployees = service.getAllEmployeeNames(pageable)
while(true) {
for(String name: allEmployees.getContent()) {
// check whether the employee needs to be deleted or not based on certain conditions
boolean isDelete = service.checkEmployeeToBeDeleted()
if (isDelete)
EmployeeEntity entity = service.findByName(name);
service.delete(entity);
}
if (!page.hasNext()) {
break;
}
pageable = page.nextPageable();
}
In this scenario, all employees are not deleted only those matching the condition will be
deleted
as page size is 1
Let's say, total 6 employees
emp pageable(pageNumber, pageSize)
1 (0 ,1)
2 (1 ,1)
3 (2 ,1)
4 (3 ,1)
5 (4 ,1)
6 (5 ,1)
when 1 gets deleted the pageable will be like
2 (0 ,1)
3 (1 ,1)
4 (2 ,1)
5 (3 ,1)
6 (4 ,1)
but as we go with page.nextPageable() the next one will be like (1,1)
So on next fetch, it will pick emp 3, and emp 2 will be missed
I had a similiar problem. The problem is that you are doing the 'search' and the 'deletion' in one step while you are on an iteration. So the deleting will modify the search, but you are one step further with you initial search. (I think this is the main problem).
My solutin was, first search ALL objects that need to be deleted and put them into a list. And after the searching, delete those objects. So your code could be something like:
Page<String> allEmployees = service.getAllEmployeeNames(pageable)
List<EmployeeEntity> deleteList = new ArrayList<>();
while(true) {
for(String name: allEmployees.getContent()) {
// check whether the employee needs to be deleted or not based on certain conditions
boolean isDelete = service.checkEmployeeToBeDeleted()
if (isDelete)
EmployeeEntity entity = service.findByName(name);
deleteList.add(entity);
}
if (!page.hasNext()) {
// here iterate over delete list and service.delete(entity);
break;
}
pageable = page.nextPageable();
}
You need only use nextPageable() when you not delete employee. Just add conditions like this:
if (!isDelete) {
pageable = page.nextPageable();
}
I have the following problem, I want to make a boxplot (with dc.js) per service (A, B, C, D) to represent (q1, q2, q3, q4 and outliers) the time each is delayed.
My data contains an id, category, the time it takes and other data, the problem is that I have is that I have repeated rows due to the other additional data that are important to have for other graphics.
For example,
Id / category / time / other data
1 / B / 2 / ...
155 / A / 51 / ..
155 / A / 51 / ..
156 / A / "NaN" / ..
157 / C / 10 / ..
etc
Before adding the additional data, I had no problem with the repeated data and used the following code.
var categorydim=ndx.dimension(function(d){return d["category"]});
var categorydim.group().reduce(
function(p,v){
if (v["time"]>0.){
p.push(v["time"])};
return p;
},
function(p,v){
if (v["time"]>0.){
p.splice(p.indexOf(v["time"]),1)};
return p;
},
function(){
return[];
}
)
But now I must for example stay with a single value of id 155. Do you have any idea to do it in crossfilter? Or with reductio.js?
How to exclude repeated data?
Assuming I've understood the problem, you need to track the unique IDs you've already seen. Reductio does this for exception aggregation with sum and count, but not for your scenario, I believe. This or something like it should work. If you can put together a working example, I'll be happy to verify this code:
var categorydim=ndx.dimension(function(d){return d["category"]});
var categorydim.group().reduce(
function(p,v){
// Ignore record if time is invalid or key has already been added.
if (v["time"]>0. && !p.keys[v['Id']]){
p.values.push(v["time"])
p.keys[v['Id']] = 1
} else if(v["time"]>0.) {
// Time is valid and key has shown up 1 or more times already
p.keys[v['Id']]++
}
return p;
},
function(p,v){
// Ignore record if time is invalid or key is the "last" of this key.
if (v["time"]>0. && p.keys[v['Id']] === 1){
p.values.splice(p.values.indexOf(v["time"]), 1)
p.keys[v['Id']] = 0
} else if(v["time"]>0.) {
// Key is greater than 1, so decrement
p.keys[v['Id']]--
}
return p;
},
function(){
return {
keys: {},
values: []
};
}
)
I have an index with the following data:
{
"_index":"businesses",
"_type":"business",
"_id":"1",
"_version":1,
"found":true,
"_source":{
"business":{
"account_level_id":"2",
"business_city":"Abington",
"business_country":"United States of America",
}
}
}
When I query the index, I want to sort by account_level_id (which is a digit between 1-5). The problem is, I don't want to sort in ASC or DESC order, but by the following: 4..3..5..2..1. This was caused by bad practice a couple years ago, where the account level maxed out at level 4, but then a lower level account was added with the value of 5. Is there a way to tell ES that I want the results returned in that specific order?
You could write a sort based script something like (not tested):
doc['account_level_id'].value == "5" ? 3 : doc['account_level_id'].value == "4" ? 5 : doc['account_level_id'].value == "3" ? 4 : doc['account_level_id'].value == "2" ? 2 : 1;
Or if possible you could create another field sort_level that maps account_level_id to sensible values that you can sort on.
{
"_index":"businesses",
"_type":"business",
"_id":"1",
"_version":1,
"found":true,
"_source":{
"business":{
"account_level_id":"4",
"business_city":"Abington",
"business_country":"United States of America",
"sort_level": 5
}
}
}
If you can sort in DESC you can create function that maps integers and sort using it.
DESC should sort them like (5 4 3 2 1), 5 replaced by 4, 4 replaced by 3, 3 replaced by 5.
int map_to(int x){
switch(x){
case 1: case 2: return x;
case 3: return 4;
case 4: return 5;
case 5: return 3;
}
}
and use it for your sorting algorithm (so when sorting algorithm has to compare x vs y it should compare map_to(x) vs map_to(y) , and this will make 4 comes before 3 and 5 as you want.
Well, I know how to create a table/metatable with their initial values, but I don't know how to insert or remove an element after its creation. How can I do this using the best practice in Lua Script? Is there any kind of standart function to do this?
Here's just about every way of inserting and removing from Lua tables; firstly, for array-style tables:
local t = { 1, 2, 3 }
-- add an item at the end of the table
table.insert(t, "four")
t[#t+1] = 5 -- this is faster
-- insert an item at position two, moving subsequent entries up
table.insert(t, 2, "one and a half")
-- replace the item at position two
t[2] = "two"
-- remove the item at position two, moving subsequent entries down
table.remove(t, 2)
And for hash-style tables:
local t = { a = 1, b = 2, c = 3 }
-- add an item to the table
t["d"] = 4
t.e = 5
-- remove an item from the table
t.e = nil