Sort by analysis fields in Elasticsearch

Sort by analysis fields in Elasticsearch - sorting

Consider indexes
Foo {id, bazs_count}
Bar {id, foo_id}
Baz {bar_id}
Foo (1) - (0..*) Bar (1) - (0..*) Baz
I need a way to sort foos by their bazs count (count of bazs linked to bars that linked to foo). Baz index grows with time so bazs count dynamically changes. The way I found is to make some analytics app that continiously updates bazs_count in Foo items. Is there a better way?
Clarification about how data is formed: I index bars with their bazs. After that I process all bazs and link each to some bar. So I don't know the foo_id for bar and baz when I index them.

Update the field bazs_count of the index Foo with a batch process OR consolidate your index Foo with all the data Foo+Bar+Baz with the nested document feature of Elasticsearch and use the agregate API to sort your Foos items by a simple count.

Related

lua nested for loop and table.insert creating multiple entries

Still learning here, so most likely there's a less complicated way to achieve my goals. This is a snippet I pulled from my code that 90% works as intended, but I'm failing on my last step.
inv = {}
for i = 23,mq.TLO.Me.NumBagSlots()+22 do
inv[i] = {}
for j = 1,mq.TLO.Me.Inventory(i).Container() do
inv[i][j]={Item=mq.TLO.Me.Inventory(i).Item(j),bag=i,slot=j}
end
end
local sortthatsucker={}
for _,bags in pairs(inv) do
for _,invcontainer in pairs(bags) do
for _,_ in pairs(invcontainer) do
local compactstring = tostring(invcontainer.Item).."-"..tostring(invcontainer.bag).."_"..tostring(invcontainer.slot)
if tostring(invcontainer.Item) ~= "NULL" then
table.insert(sortthatsucker,compactstring)
end
end
end
end
table.sort(sortthatsucker)
for k,v in pairs(sortthatsucker) do
print(k,"-",v)
end
Will currently extract, filter, and sort all items as I want, creating a compound string showing the item name and current "coords" if you will for simplified k,v table that's sortable by name, but due to I believe the 3 nested for loops to pull the data out, the table.insert will create three identical entries in the final table, albeit with unique key values for each entry.
ie:
163-shovel-23_10
164-shovel-23_10
165-shovel-23_10
166-tonsils-18_5
167-tonsils-18_5
168-tonsils-18_5
169-withers-6_12
170-withers-6_12
171-withers-6_12
etc...
My goal is to have this table with a single entry for each item found, and do a set of new nested for loops to pull the x and y values out of that string to interact with that item at that give coordinate and move it to a new x and y. If there was a way to create the initial table and interact with that, rather make it as a nested table to achieve the same result, that would also be beneficial, as I have not been able to figure that out after two weeks of trying and searching.

What's better for performance, cell arrays of objects or heterogeneous arrays?

Suppose I have some classes foo < handle, and bar < foo, baz < foo, and maybe qux < foo. There are a couple ways I can store an array of these objects:
As a cell array: A = {foo bar baz qux} % A(1) would be a cell, A{1} gives me a foo object
Starting with R2011a, I can make foo <matlab.mixin.Heterogeneous, and then build an array directy: A = [foo bar baz qux] % A(1) directly gives me a foo object
The way I see it, from a maintenance perspective it would be better to use the second method rather than the first, this way it removes ambiguity about how to access A. Namely, when we need to dereference elements of the cell array (cell A(1) vs foo object A{1}, which lives inside A(1)).
But is there any kind of memory or performance penalty (or benefit) to using one syntax vs the other?

I did a small experiment (source) on the memory and running time of the cell array, containers.Map and a Heterogeneous array.
In my method I preallocated each array with N=65535 elements (the max array size for Map and Heterogeneous array), then began assigning each element a uint32, and measured the time and memory.
My Heterogeneous Class was a simple class with a single public property, and a constructor which assigned that property.
The containers.Map had uint32 key/value pairs.
Maps took 9.17917e-01 seconds.
Cells took 5.81220e-02 seconds.
Heterogeneous array took 4.95336e+00 seconds.
**Name** **Size** **Bytes** **Class**
map 65535x1 112 containers.Map
cellArr 65535x1 7602060 cell
hArr 1x65535 262244 SomeHeterogeneousClass
Immediately note that the size of the mapArray is not accurate. It is hidden behind the containers.Map class implementation, most likley the 112 bytes reported is the memory assigned to the map itself, excluding the data. I approximate the true size to be at minimum (112+65535*(sizeof(uint32)*2)) = 524392 bytes. This value is almost exactly double the hArr size, which makes me think it is quite accurate, since the map must store twice as much data (for key AND value) as the hArr.
The results are straightforward:
Time: cell Array < Map < Heterogeneous Array
Memory: Heterogeneous Array < Map < cell Array
I repeated the experiment with N=30 to test for small arrays, the results were similar.
God only knows why cells take up so much memory and Heterogeneous arrays are so slow.

Redis Hash: How to Query on both Key and Value

I want to store key-value pairs(T1,T2) in Redis. Both key and value are unique.
I want to be able to query on both key and value, i.e. HGET(Key) should return corresponding Value and HGET(Value) should return corresponding Key.
A trivial approach would be to create 2 Hashes in Redis (T1,T2) and (T2,T1) and then query on appropriate Hash. Problem with this approach is that insertion, update or deletion of pairs would need updates in both Hashes.
Is there a better way to serve my requirement...

If one of T1, T2 has an integer type you could use a combo like:
1->foo
2->bar
ZADD myset 1 foo
ZADD myset 2 bar
ZSCORE myset foo //returns 1.0 in O(n)
ZSCORE myset bar //return 2.0 in O(n)
ZRANGEBYSCORE myset 1 1 //returns "foo" in O(log(N)+M)
source
If this is not the case then it makes sense to maintain 2 separate hashes, preferably within a Lua script

What is a HASH TABLE when doing HASH JOIN?

In HASH JOIN method of oracle, HASH TABLE will be built on one of the tables and other will be joined depending on the values in the hash table.
Could you please let me know what is Hash table? What is the structure of hash table? how will it be created?

A hash table is a table where you can store stuff by the use of a key. It is like an array but stores things differently
a('CanBeVarchar') := 1; -- A hash table
In oracle, they are called associative arrays or index by tables. and you make one like this:
TYPE aHashTable IS TABLE OF [number|varchar2|user-defined-types] INDEX BY VARCHAR2(30);
myTable aHashTable;
So, what is it? it's just a bunch of key-value pairs. The data is stored as a linked list with head nodes that group the data by the use of something called HashCode to find things faster. Something like this:
a -> b -> c
Any Bitter Class
Array Bold Count
Say you are storing random words and it's meaning (a dictionary); when you store a word that begins with a, it is stored in the 'a' group. So, say you want this myTable('Albatroz') := 'It's a bird', the hash code will be calculated and put in the A head node, where it belongs: just above the 'Any'. a, has a link to Any, which has a link to Array and so on.
Now, the cool thing about it is that you get fast data retreival, say you want the meaning of Count, you do this definition := myTable('Count'); It will ignore searching for Any, Array, Bitter, Bold. Will search directly in the C head node, going trhough Class and finally Count; that is fast!
Here a wikipedia Link: http://en.wikipedia.org/wiki/Hash_table
Note that my example is oversimplified read with a little bit of more detail in the link.
Read more details like the load factor: What happens if i get a LOT of elements in the a group and few in the b and c; now searching for a word that begins with a is not very optinmal, is it? the hash table uses the load factor to reorganize and distribute the load of each node, for example, the table can be converted to subgroups:
From this
a b -> c
Any Bitter Class
Anode Bold Count
Anti
Array
Arrays
Arrow
To this
an -> ar b -> c
Any Array Bitter Class
Anode Arrays Bold Count
Anti Arrow
Now looking for words like Arrow will be faster.

matching array items in rails

I have two arrays and I want to see the total number of matches, between the arrays individual items that their are.
For example arrays with:
1 -- House, Dog, Cat, Car
2 -- Cat, Book, Box, Car
Would return 2.
Any ideas? Thanks!
EDIT/
Basically I have two forms (for two different types of users) that uses nested attributes to store the number of skills they have. I can print out the skills via
current_user.skills.each do |skill| skill.name
other_user.skills.each do |skill| skill.name
When I print out the array, I get: #<Skill:0x1037e4948>#<Skill:0x1037e2800>#<Skill:0x1037e21e8>#<Skill:0x1037e1090>#<Skill:0x1037e0848>
So, yes, I want to compare the two users skills and return the number that match. Thanks for your help.

This works:
a = %w{house dog cat car}
b = %w{cat book box car}
(a & b).size
Documentation: http://www.ruby-doc.org/core/classes/Array.html#M000274
To convert classes to an array using the name, try something like:
class X
def name
"name"
end
end
a = [X.new]
b = [X.new]
(a.map{|x| x.name} & b.map{|x| x.name}).size
In your example, a is current_user.skills and b is other_users.skills. x is simply a reference to the current index of the array as the map action loops through the array. The action is documented in the link I provided.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio