How to reconcile two collections - linq

I have 2 collections of the same type elements. Let's call those elements lettersOfAlphabet. Each letter has ID as int and Name as string.
So
ID Letter
0 A
1 B
2 C
3 D
and so on.
First collection contains the alphabet, second contains selected letters.
In first step I create new selectedLettersCollection and this is easy. I simply add element from alphabet collection to selectedLettersColletion and immediately renove it from source collection (alphabet). So let's say I created collection of first 5 letters of the alphabet and saved it to SQL table. Now the alphabet collection starts at F and contains letters through Z, selectedlettersCollection contains letters A,B,C,D and E.
Now let's say I want to remove letter C form selectedLettersCollection and move it back to alphabet and get letter G from alphabet and move it to selectedLettersColletion.
What is the most efficient way to perform this opetation in LINQ, generic collections and/or T-SQL?
I would certainly create a new temporary collection of selected items and load selected elements into it. I would then perform my add remove operations. But so far the only thing that comes to my mind to reconcile those collections would be to iterate through selectedLettersColletion and new temporary collection and move elements accordingly but I was wondering if there is a method that would not require iteration akin TSQL's joins looking for NULLs.

I'll use A and B for the table names.
To move the row with Id = 1 from table A to table B use the following:
DELETE A
OUTPUT deleted.Id, deleted.Letter INTO B
WHERE Id = 1;

Related

I'm trying to add a value to individual characters

I'm working on a Scrabble assignment and I'm trying to assign values to letters. Like in Scrabble, A, E, I, O, U, L, N, S, T, R are all equal to 1. I had some help in figuring out how to add the score up once I assign values, but now I'm trying to figure out how to assign values. Is there a way to create one variable for all the values? That doesn't really make sense to me.
I was also thinking I could do an if-else statement. Like if the letter equals any of those letters, value = 1, else if the letter equals D or G, value = 2 and so on. There are 7 different scores so it's kind of annoying and not efficient, but I'm not really sure what a better way might be. I'm new to programming, a novice, so I'm looking for advice that takes my level into account.
I have started my program by reading words from a text file into an arraylist. I successfully printed the arraylist, so I know that part worked. Next I'm working on how to read each character of each word and assign a value. Last, I will figure out how to sort it.
it's me from the other question again. You can definitely do an if-statement, but if I'm not wrong Scrabble has 8 different values for letters, so you would need 8 “if”s and also since there are around 25 letters (depending on language) you would have to handle all 25 some way in the if-statements which would be quite clunky in my opinion.
I think the best option is to use a Hash-table. A hash-table is basically like a dictionary where you look up a key and get a value. So I would add each letter as a key and keep the corresponding value as the value. It would look like this:
//initialize empty hash map
Hashtable<String, Integer> letterScores = new Hashtable<>();
//now we can add values with "put"
letterScores.put("A",1)
letterScores.put("B",3)
letterScores.put("X",8)
//etc
To access an element from the hash table we can use the "get"-method.
//returns 1
letterScores.get("A")
So when looping through our word we would essentially get something like this to calculate the value of the word:
int sumValue = 0;
for(int i =0; i < word.length(); i++)}
sumValue += letterScores.get(word.charAt(i))
}
For each character we grab the value entry from the letterScores hash table where we have saved all our letter's corresponding values.

Most common "denominators" in a two column list in Google Sheets

How can I find the most commonly found 'Code' (Col B) associated with each unique 'Name' in (Col A) and find the closest value if the 'Code' in Col B is unique?
The image below shows the shared google sheet with Starting data in Columns A & B and the desired output columns in columns C and D. Each Unique Name has associated codes. Column D displays the most commonly occuring Code for each unique name. For example, Buick La Sabre 1 has 3 associated codes in B3,B4,B5 but in D3 only 98761 because it appears more frequently than the other 2 codes do in B2:B. I will explain what I mean by the closest value below.
The Codes that have a count = 1 are unique so the output in column D tries to find the closest match.
However, when the count of the code in B2:B > 1, then the output in column D = to the most frequent code associated with the Name.
Approach when there is 2 or more of the same values in column B
Query
I thought I might use a QUERY with a ORDER BY count(B) DESC LIMIT 2 in a fashion similar to this working equation:
QUERY($A$1:$D$25,"SELECT A, B ORDER BY B DESC Limit 2",1)
but I could not get it to work when I substituted in the Count function.
SORT & INDEX OR VLOOKUP
If the query function can't be fixed to work, then I thought another approach might be to combine a Vlookup/Index after sorting column B in a descending order.
UNIQUE(sort($B$3:$B,if(len($B$3:$B),countif($B$3:$B,$B$3:$B),),0,1,1))
Since a Vlookup or Index using multiple criteria would just pull the first value it finds, you would just end up with the first matching value, we would then get the most frequent value.
Approach when there is < 2 of the same values in column B
This is a little more complicated since the values can be numbers and letters.
A solution like that seen in the image below could be used if everything were a number. In our case there will usually be between 3 - 5 character alphanumeric code starting with 0 - 1 letters numbers and followed by numbers. I'm not sure what the best way to match a code like A1234 would be. I imagine a solution might be to SPLIT off letters and trying to match those first. For example A1234 would be split into A | 1234, then matching the closest letter and then the closest number. But I really am not sure what the best solution to this might be that works within the constraints of Google Sheets.
In the event that a number is equidistant between two numbers, the lower number should be chosen. For example, if 8 is the number and the closest match would be 6 or 10, then 6 should be selected.
In the event that a letter is being used it should work in a similar fashion. For example, thinking of {A, B, C} as {1, 2, 3}, B should preferrentially match to A since it comes before C.
In summary, looking for a way to find the most frequently associated code in col B that is associated with unique names in col A in this sheet and; In the event where there are none of the same codes in B2:B, a formula that will find the closest match for a number or alphanumeric code.
You can use this formula:
=QUERY({range of numerators & denominators}, "select Col2, count(Col2) group by Col2 label Col2 'Denominator', count(Col2) 'Count'")
That outputs something like this:
Denominator
Count
Den 1
Count 1
Den 2
Count 2
use:
=ARRAY_CONSTRAIN(SORTN(QUERY({A3:B},
"select Col1,Col2,count(Col2)
where Col1 is not null
group by Col1,Col2
order by count(Col2) desc,Col2 asc
label count(Col2)''"), 9^9, 2, 1, 1), 9^9, 2)

What is a HASH TABLE when doing HASH JOIN?

In HASH JOIN method of oracle, HASH TABLE will be built on one of the tables and other will be joined depending on the values in the hash table.
Could you please let me know what is Hash table? What is the structure of hash table? how will it be created?
A hash table is a table where you can store stuff by the use of a key. It is like an array but stores things differently
a('CanBeVarchar') := 1; -- A hash table
In oracle, they are called associative arrays or index by tables. and you make one like this:
TYPE aHashTable IS TABLE OF [number|varchar2|user-defined-types] INDEX BY VARCHAR2(30);
myTable aHashTable;
So, what is it? it's just a bunch of key-value pairs. The data is stored as a linked list with head nodes that group the data by the use of something called HashCode to find things faster. Something like this:
a -> b -> c
Any Bitter Class
Array Bold Count
Say you are storing random words and it's meaning (a dictionary); when you store a word that begins with a, it is stored in the 'a' group. So, say you want this myTable('Albatroz') := 'It's a bird', the hash code will be calculated and put in the A head node, where it belongs: just above the 'Any'. a, has a link to Any, which has a link to Array and so on.
Now, the cool thing about it is that you get fast data retreival, say you want the meaning of Count, you do this definition := myTable('Count'); It will ignore searching for Any, Array, Bitter, Bold. Will search directly in the C head node, going trhough Class and finally Count; that is fast!
Here a wikipedia Link: http://en.wikipedia.org/wiki/Hash_table
Note that my example is oversimplified read with a little bit of more detail in the link.
Read more details like the load factor: What happens if i get a LOT of elements in the a group and few in the b and c; now searching for a word that begins with a is not very optinmal, is it? the hash table uses the load factor to reorganize and distribute the load of each node, for example, the table can be converted to subgroups:
From this
a b -> c
Any Bitter Class
Anode Bold Count
Anti
Array
Arrays
Arrow
To this
an -> ar b -> c
Any Array Bitter Class
Anode Arrays Bold Count
Anti Arrow
Now looking for words like Arrow will be faster.

Join of two datasets in Mapreduce/Hadoop

Does anyone know how to implement the Natural-Join operation between two datasets in Hadoop?
More specifically, here's what I exactly need to do:
I am having two sets of data:
point information which is stored as (tile_number, point_id:point_info) , this is a 1:n key-value pairs. This means for every tile_number, there might be several point_id:point_info
Line information which is stored as (tile_number, line_id:line_info) , this is again a 1:m key-value pairs and for every tile_number, there might be more than one line_id:line_info
As you can see the tile_numbers are the same between the two datasets. now what I really need is to join these two datasets based on each tile_number. In other words for every tile_number, we have n point_id:point_info and m line_id:line_info. What I want to do is to join all pairs of point_id:point_info with all pairs of line_id:line_info for every tile_number
In order to clarify, here's an example:
For point pairs:
(tile0, point0)
(tile0, point1)
(tile1, point1)
(tile1, point2)
for line pairs:
(tile0, line0)
(tile0, line1)
(tile1, line2)
(tile1, line3)
what I want is as following:
for tile 0:
(tile0, point0:line0)
(tile0, point0:line1)
(tile0, point1:line0)
(tile0, point1:line1)
for tile 1:
(tile1, point1:line2)
(tile1, point1:line3)
(tile1, point2:line2)
(tile1, point2:line3)
Use a mapper that outputs titles as keys and points/lines as values. You have to differentiate between the point output values and line output values. For instance you can use a special character (even though a binary approach would be much better).
So the map output will be something like:
tile0, _point0
tile1, _point0
tile2, _point1
...
tileX, *lineL
tileY, *lineK
...
Then, at the reducer, your input will have this structure:
tileX, [*lineK, ... , _pointP, ...., *lineM, ..., _pointR]
and you will have to take the values separate the points and the lines, do a cross product and output each pair of the cross-product , like this:
tileX (lineK, pointP)
tileX (lineK, pointR)
...
If you can already easily differentiate between the point values and the line values (depending on your application specifications) you don't need the special characters (*,_)
Regarding the cross-product which you have to do in the reducer:
You first iterate through the entire values List, separate them into 2 list:
List<String> points;
List<String> lines;
Then do the cross-product using 2 nested for loops.
Then iterate through the resulting list and for each element output:
tile(current key), element_of_the_resulting_cross_product_list
So basically you have two options here.Reduce side join or Map Side Join .
Here your group key is "tile". In a single reducer you are going to get all the output from point pair and line pair. But you you will have to either cache point pair or line pair in the array. If either of the pairs(point or line) are very large that neither can fit in your temporary array memory for single group key(each unique tile) then this method will not work for you. Remember you don't have to hold both of key pairs for single group key("tile") in memory, one will be sufficient.
If both key pairs for single group key are large , then you will have to try map-side join.But it has some peculiar requirements. However you can fulfill those requirement by doing some pre-processing your data through some map/reduce jobs running equal number of reducers for both data.

Can't sort table with associative indexes

Why I can't use table.sort to sort tables with associative indexes?
In general, Lua tables are pure associative arrays. There is no "natural" order other than the as a side effect of the particular hash table implementation used in the Lua core. This makes sense because values of any Lua data type (other than nil) can be used as both keys and values; but only strings and numbers have any kind of sensible ordering, and then only between values of like type.
For example, what should the sorted order of this table be:
unsortable = {
answer=42,
true="Beauty",
[function() return 17 end] = function() return 42 end,
[math.pi] = "pi",
[ {} ] = {},
12, 11, 10, 9, 8
}
It has one string key, one boolean key, one function key, one non-integral key, one table key, and five integer keys. Should the function sort ahead of the string? How do you compare the string to a number? Where should the table sort? And what about userdata and thread values which don't happen to appear in this table?
By convention, values indexed by sequential integers beginning with 1 are commonly used as lists. Several functions and common idioms follow this convention, and table.sort is one example. Functions that operate over lists usually ignore any values stored at keys that are not part of the list. Again, table.sort is an example: it sorts only those elements that are stored at keys that are part of the list.
Another example is the # operator. For the above table, #unsortable is 5 because unsortable[5] ~= nil and unsortable[6] == nil. Notice that the value stored at the numeric index math.pi is not counted even though pi is between 3 and 4 because it is not an integer. Furthermore, none of the other non-integer keys are counted either. This means that a simple for loop can iterate over the entire list:
for i in 1,#unsortable do
print(i,unsortable[i])
end
Although that is often written as
for i,v in ipairs(unsortable) do
print(i,v)
end
In short, Lua tables are unordered collections of values, each indexed by a key; but there is a special convention for sequential integer keys beginning at 1.
Edit: For the special case of non-integral keys with a suitable partial ordering, there is a work-around involving a separate index table. The described content of tables keyed by string values is a suitable example for this trick.
First, collect the keys in a new table, in the form of a list. That is, make a table indexed by consecutive integers beginning at 1 with keys as values and sort that. Then, use that index to iterate over the original table in the desired order.
For example, here is foreachinorder(), which uses this technique to iterate over all values of a table, calling a function for each key/value pair, in an order determined by a comparison function.
function foreachinorder(t, f, cmp)
-- first extract a list of the keys from t
local keys = {}
for k,_ in pairs(t) do
keys[#keys+1] = k
end
-- sort the keys according to the function cmp. If cmp
-- is omitted, table.sort() defaults to the < operator
table.sort(keys,cmp)
-- finally, loop over the keys in sorted order, and operate
-- on elements of t
for _,k in ipairs(keys) do
f(k,t[k])
end
end
It constructs an index, sorts it with table.sort(), then loops over each element in the sorted index and calls the function f for each one. The function f is passed the key and value. The sort order is determined by an optional comparison function which is passed to table.sort. It is called with two elements to compare (the keys to the table t in this case) and must return true if the first is less than the second. If omitted, table.sort uses the built-in < operator.
For example, given the following table:
t1 = {
a = 1,
b = 2,
c = 3,
}
then foreachinorder(t1,print) prints:
a 1
b 2
c 3
and foreachinorder(t1,print,function(a,b) return a>b end) prints:
c 3
b 2
a 1
You can only sort tables with consecutive integer keys starting at 1, i.e., lists. If you have another table of key-value pairs, you can make a list of pairs and sort that:
function sortpairs(t, lt)
local u = { }
for k, v in pairs(t) do table.insert(u, { key = k, value = v }) end
table.sort(u, lt)
return u
end
Of course this is useful only if you provide a custom ordering (lt) which expects as arguments key/value pairs.
This issue is discussed at greater length in a related question about sorting Lua tables.
Because they don't have any order in the first place. It's like trying to sort a garbage bag full of bananas.

Resources