Why is my nested Lua table printing out of order? - for-loop

Beginner Lua quesiton - I'm just learning lua, and I wrote some code, a nested table to create something like a table with rows and columns.
However, when I iterate through the table using pairs(), it doesn't output in the same order I put it in. I put it in a Serial, Service Days, Connected, and it's coming out as Service Days, Serial, Connected. I am at a loss to figuring out why. I intentionally created the three rows different ways, since I'm just learning and trying to get comfortable with the different ways of dealing with Lua tables...
The code:
myTable = {}
myTable["headerRow"] = {
Serial = "Serial",
ServDays = "Service Days",
Connected = "Connected" }
myTable[1] = {
Serial = "B9FX",
ServDays = 7,
Connected = true }
myTable[2] = {}
myTable[2]["Serial"] = "2SHA"
myTable[2]["ServDays"] = 3
myTable[2]["Connected"] = true
for k, v in pairs(myTable) do
for k2, v2 in pairs(v) do
io.write(tostring(v2),",")
end
io.write("\n") --End the row
end
The result:
c:\lua>lua53 primer.lua
7,B9FX,true,
3,2SHA,true,
Service Days,Serial,Connected,

pairs uses the next function. Hence the order of traversal in a generic for loop using the pairs iterator is unspecified.
From the Lua reference manual:
https://www.lua.org/manual/5.3/manual.html#pdf-next
The order in which the indices are enumerated is not specified, even
for numeric indices. (To traverse a table in numerical order, use a
numerical for.)
The behavior of next is undefined if, during the traversal, you assign
any value to a non-existent field in the table. You may however modify
existing fields. In particular, you may clear existing fields.
If you do something like this:
myTable[2] = {}
myTable[2]["Serial"] = "2SHA"
myTable[2]["ServDays"] = 3
myTable[2]["Connected"] = true
Lua will not remember in which order you asigned values to table keys. It will only map keys to values.

Related

Errors when grouping by list in Power Query

I have a set of unique items (Index) to each of which are associated various elements of another set of items (in this case, dates).
In real life, if a date is associated with an index, an item associated with that index appeared in a file generated on that date. For combination of dates that actually occurs, I want to know which accounts were present.
let
Source = Table.FromRecords({
[Idx = 0, Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}],
[Idx = 1, Dates = {#date(2016,2,1), #date(2016,2,2), #date(2016,2,3)}],
[Idx = 2, Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}]},
type table [Idx = number, Dates = {date}]),
// Group by
Grouped = Table.Group(Source, {"Dates"}, {{"Idx", each List.Combine({[Idx]}), type {number}}}),
// Clicking on the item in the top left corner generates this code:
Navigation = Grouped{[Dates={...}]}[Dates],
// Which returns this error: "Expression.Error: Value was not specified"
// My own code to reference the same value returns {0,2} as expected.
CorrectValue = Grouped{0}[Idx],
// If I re-make the table as below the above error does not occur.
ReMakeTable = Table.FromColumns(Table.ToColumns(Grouped), Table.ColumnNames(Grouped))
in ReMakeTable
It seems that I can use the results of this in my later work even without the Re-make (I just can't preview cells correctly), but I'd like to know if what's going on that causes the error and the odd code at the Navigation step, and why it disappears after the ReMakeTable step.
This happens because when you double click an item, the auto-generated code uses value filter instead of row index that you are using to get the single row from the table. And since you have a list as a value, it should be used instead of {...}. Probably UI isn't capable to work with lists in such a situation, and it inserts {...}, and this is indeed an incorrect value.
Thus, this line of code should look like:
Navigate = Grouped{[Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}]}[Idx],
Then it will use value filter.
This is a bug in the UI. The index the UI calculates is incorrect: it should be 0 instead of [Dates={...}]. ... is a placeholder value, and it generates the "Value was not specified" exception if it is not replaced.

Order by random in RethinkDB

I want to order documents randomly in RethinkDB. The reason for this is that I return n groups of documents and each group must appear in order in the results (so all documents belonging to a group should be placed together); and I need to randomly pick a document, belonging to the first group in the results (you don't know which is the first group in the results - the first ones could be empty, so no documents are retrieved for them).
The solution I found to this is to randomly order each of the groups before concat-ing to the result, then always pick the first document from the results (as it will be random). But I'm having a hard time ordering these groups randomly. Would appreciate any hint or even a better solution if there is one.
If you want to order a selection of documents randomly you can just use .orderBy and return a random number using r.random.
r.db('test').table('table')
.orderBy(function (row) { return r.random(); })
If these document are in a group and you want to randomize them inside the group, you can just call orderBy after the group statement.
r.db('test').table('table')
.groupBy('property')
.orderBy(function (row) { return r.random(); })
If you want to randomize the order of the groups, you can just call orderBy after calling .ungroup
r.db('test').table('table')
.groupBy('property')
.ungroup()
.orderBy(function (row) { return r.random(); })
The accepted answer here should not be possible, as John mentioned the sorting function must be deterministic, which r.random() is not.
The r.sample() function could be used to return a random order of the elements:
If the sequence has less than the requested number of elements (i.e., calling sample(10) on a sequence with only five elements), sample will return the entire sequence in a random order.
So, count the number of elements you have, and set that number as the sample number, and you'll get a randomized response.
Example:
var res = r.db("population").table("europeans")
.filter(function(row) {
return row('age').gt(18)
});
var num = res.count();
res.sample(num)
I'm not getting this to work. I tried to sort an table randomly and I'm getting the following error:
e: Sorting by a non-deterministic function is not supported in:
r.db("db").table("table").orderBy(function(var_33) { return r.random(); })
Also I have read in the rethink documentation that this is not supported. This is from the rethinkdb orderBy documentation:
Sorting functions passed to orderBy must be deterministic. You cannot, for instance, order rows using the random command. Using a non-deterministic function with orderBy will raise a ReqlQueryLogicError.
Any suggestions on how to get this to work?
One simple solution would be to give each document a random number:
r.db('db').table('table')
.merge(doc => ({
random: r.random(1, 10)
})
.orderBy('random')

Pig - how to select only some values from the list (not just simple distinct)?

Let's say I have intput_file.txt (user_id, event_code, event_date):
1,a,1
1,b,2
2,a,3
2,b,4
2,b,5
2,b,6
2,c,7
2,b,8
as you can see, user_id = 2, has events like this: abbbcb
I'd like to have a result like this:
1,{(a,1),(b,2)}
2,{(a,2),(b,6),(c,7),(b,8)}
So when we have few events, with the same code, I'd like to take only the last one.
Can you please share any hints?
Regards
Pawel
The main thing you are describing is what GROUP BY does.
In this case:
B = GROUP A BY user_id;
Gets your records together by user_id. Your data will now look like this:
1,{(a,1),(b,2)}
2,{(a,2),(b,6),(c,7),(b,8)}
You say you only want the last one (I assume you mean the one with the greatest event_date). To do this, you can do a nested FOREACH with an ORDER BY to sort by date, and then take the first one with LIMIT. Note that this has arbitrary behavior when there are ties.
C = FOREACH B {
DA = ORDER A BY event_date DESC;
DB = LIMIT DA 1;
GENERATE FLATTEN(group), FLATTEN(DB.event_code), FLATTEN(DB.event_date);
}
Your data should now look like this:
1,b,2
2,b,8
Another option would be to use a UDF to write some custom behavior on the groups given by GROUP BY:
B = GROUP A BY user_id;
C = FOREACH B GENERATE YourUDFThatYouBuilt(group, A);
In that UDF you'd write whatever custom behavior you want (in this case return the tuple with the greatest date)
It seems like you could use the DistinctBy UDF from Apache DataFu to achieve this. This UDF, given a bag, returns the first instance found for a given field. In your case the field you care about is event_code. But we have to reverse the order, as you actually want the last instance.
One clarification though. Correct me if I'm wrong, but I think the intended output is:
1,{(a,1),(b,2)}
2,{(a,3),(b,6),(c,7),(b,8)}
That is, the (a,3) event occurs for member 2. The (a,2) event occurs for member 1.
Here's how you can do it:
-- pass in 1 because we want distinct by event code (position 1)
define DistinctBy datafu.pig.bags.DistinctBy('1');
FOREACH (GROUP A BY user_id) {
-- reverse so we can take the last event code occurrence
A_reversed = ORDER A BY event_date DESC;
-- use DistinctBy to get the first tuple having an occurrence of a field value
A_distinct_by_code = DistinctBy(A_reversed);
-- put back in order again
A_ordered = ORDER A_distinct_by_code BY event_date ASC;
GENERATE group as user_id, A_ordered.(event_code,event_date);
}

how to create set of values, after group function in Pig (Hadoop)

Lets say I have set of values in file.txt
a,b,c
a,b,d
k,l,m
k,l,n
k,l,o
And my code is:
file = LOAD 'file.txt' using PigStorage(',');
events = foreach file generate session_id, user_id, code, type;
gr = group events by (session_id, user_id);
and I have set of value:
((a,b),{(a,b,c),(a,b,d)})
((k,l),{(k,l,m),(k,l,n),(k,l,o)})
And I'd like to have:
(a,b,(c,d))
(k,l,(m,n,o))
Have you got any idea how to do it?
Regards
Pawel
Note: you are inconsistent in your question. You say session_id, user_id, code, type in the FOREACH line, but your have a PigStorage not providing values. Also, that FOREACH has 4 values, while your sample data only has 3. I'll assume that type doesn't exist in order to answer your question.
After your gr relation, you are left with the group by key (in this case (session_id, user_id)) in a automatically generated tuple called group.
So, first step: gr2 = FOREACH gr GENERATE FLATTEN(group);
This will give you the tuples (a,b) and (k,l). You need to use FLATTEN because group is a tuple and you are asking for session_id and user_id to be individual columns. FLATTEN does that for you.
Ok, so now modify the gr2 line to also use a projection to tease out the third value:
gr2 = FOREACH gr GENERATE FLATTEN(group), events.code;
events.code creates a bag out of all the code values. events is the name of the bag of grouped tuples (it's named after the original relation).
This should give you:
(a, b, {c, d})
(k, l, {m, n, o})
It's very important to note that the values in the list are in a bag not a tuple, like you asked for. Keeping it in a bag is the right idea because the bag is a variable list, while a tuple is not.
Additional advice: Understanding how GROUP BY outputs data is something I see a lot of people struggle with when first using Pig. If you think my answer doesn't make much sense, I'd recommend spending some time to really get to understand GROUP BY. Understanding versus thinking it is magic will pay off in the long run.

PL/SQL: UPDATE inside CURSOR, but some data is NULL

I'm still learning some of the PL/SQL differences, so this may be an easy question, but... here goes.
I have a cursor which grabs a bunch of records with multiple fields. I then run two separate SELECT statements in a LOOP from the cursor results to grab some distances and calculate those distances. These work perfectly.
When I go to update the table with the new values, my problem is that there are four pieces of specific criteria.
update work
set kilometers = calc_kilo,
kilo_test = test_kilo
where lc = rm.lc
AND ld = rm.ld
AND le = rm.le
AND lf = rm.lf
AND code = rm.code
AND lcode = rm.lcode
and user_id = username;
My problem is that this rarely updating because rm.lf and rm.le have NULL values in the database. How can I combat this, and create the correct update.
If I'm understanding you correctly, you want to match lf with rm.lf, including when they're both null? If that's what you want, then this will do it:
...
AND (lf = rm.lf
OR (lf IS NULL AND rm.lf IS NULL)
)
...
It's comparing the values of lf and rm.lf, which will return false if either is null, so the OR condition returns true if they're both null.
I have a cursor which grabs a bunch of records with multiple fields. I then run two separate SELECT statements in a LOOP from the cursor results to grab some distances and calculate those distances. These work perfectly.
When I go to update the table with the new values, my problem is that there are four pieces of specific criteria.
The first thing I'd look at is not using a cursor to read data, then make calculations, then perform updates. In 99% of cases it's faster and easier to just run updates that do all of this in a single step
update work
set kilometers = calc_kilo,
kilo_test = test_kilo
where lc = rm.lc
AND ld = rm.ld
AND NVL(le,'x') = NVL(rm.le,'x')
AND NVL(lf,'x') = NVL(rm.lf,'x')
AND code = rm.code
AND lcode = rm.lcode
and user_id = username;

Resources