how to use List.Generate as a loop - powerquery

This might be simple but would appreciate any help to push me in the right direction. I am trying to use the list.generate in power query to count number of tix based on the difference from 1-5. It is a must that a loop is used such as the list.generate.
current tix-1
current tix-2
current tix-3
current tix-4
current tix-5
= Table.AddColumn(#"Added Custom1", "Count", each List.Count(Table.SelectRows(#"Added Custom1", (C) =>
(
[Tix]=C[Tix]-(List.Generate(()=>1,each _ 5, each _ - 1))
)
)[Column1]))
Here is the sample data. The idea is for me to able to put in the generated series of number as a loop. this is the simplest representation because for other formula, I need the generated number as the x eg. (-1/2 x*x + 41/2 x).
+-------------+------------+
TIX |TIX count |
5,000,243 | 0 |
6,991,904 | 0 |
6,991,905 | 1 |
6,991,906 | 2 |
6,991,907 | 3 |
6,991,908 | 4 |
7,000,234 | 0 |
+-------------+------------+
To simply put my targeted code should be something like this which i believe could be simplify by list.generate.
= Table.AddColumn(#"Added Custom1", "Count", each List.Count(Table.SelectRows(#"Added Custom1", (C) =>
(
[Tix]=C[Tix]-1
+[Tix]=C[Tix]-2
+[Tix]=C[Tix]-3
+[Tix]=C[Tix]-4
+[Tix]=C[Tix]-5 )
)
)[Column1]))
Tried another code based on almost similar post : Power Query M loop table / lookup via a self-join.
This also returns an error. Please advise what I'm doing wrong.
= Table.AddColumn(
#"Renamed Columns",
"Count",
List.Sum(
List.Generate(
() => [Continue = 1],
each [Continue]<6,
each [Count =
List.Count(
Table.SelectRows(
#"Renamed Columns",
(x) => x[Tix]-[Continue]= [Tix]))[Column1]],
each [Count])))

thru research, i found the solution:
= Table.AddColumn(#"Added Custom2", 
"Count", each List.Sum(
List.Generate(
() => [Count=List.Count(Table.SelectRows(#"Added Custom2",
(C) => C[Tix] = [Tix]-Continue)
[Column1]),
Continue =1],
each [Continue]<=10,
each [Final_Item = [Final_Item], 
Continue =[Continue]+1],
each [Count])))

Related

Reshape data in pig - change row values to column names

Is there a way to reshape the data in pig?
The data looks like this -
id | p1 | count
1 | "Accessory" | 3
1 | "clothing" | 2
2 | "Books" | 1
I want to reshape the data so that the output would look like this--
id | Accessory | clothing | Books
1 | 3 | 2 | 0
2 | 0 | 0 | 1
Can anyone please suggest some way around?
If its a fixed set of product line the below code might help, otherwise you can go for a custom UDF which helps in achieving the objective.
Input : a.csv
1|Accessory|3
1|Clothing|2
2|Books|1
Pig Snippet :
test = LOAD 'a.csv' USING PigStorage('|') AS (product_id:long,product_name:chararray,rec_cnt:long);
req_stats = FOREACH (GROUP test BY product_id) {
accessory = FILTER test BY product_name=='Accessory';
clothing = FILTER test BY product_name=='Clothing';
books = FILTER test BY product_name=='Books';
GENERATE group AS product_id, (IsEmpty(accessory) ? '0' : BagToString(accessory.rec_cnt)) AS a_cnt, (IsEmpty(clothing) ? '0' : BagToString(clothing.rec_cnt)) AS c_cnt, (IsEmpty(books) ? '0' : BagToString(books.rec_cnt)) AS b_cnt;
};
DUMP req_stats;
Output :DUMP req_stats;
(1,3,2,0)
(2,0,0,1)

Number of string value occurrences for distinct another column value

I have a model Counter which returns the following records:
name.....flowers.....counter
vino.....rose.........1
vino.....lily.........1
gaya.....rose.........1
rosi.....lily.........1
vino.....lily.........1
rosi.....rose.........1
rosi.....rose.........1
I want to display in the table like:
name | Rose | Lily |
---------------------
Vino | 1 | 2 |
---------------------
Gaya | 1 | 0 |
---------------------
Rosi | 2 | 1 |
I want to display the count of flowers for each distinct name. I have tried the following and wondering how can I do it elegantly?
def counter_results
#counter_results= {}
Counter.each do |name|
rose = Counter.where(flower: 'rose').count
lily= Counter.where(flower: 'lily').count
#counter_results['name'] = name
#counter_results['rose_count'] = rose
#counter_results['lily_count'] = lily
end
return #counter_results
end
which I don't get the hash values.
This will give you slightly different output, but I think it is probably closer to what you want than what you showed.
You can use the query:
Counter.group([:name, :flowers]).sum(:counter)
To get a result set that looks like:
{ ["vino", "rose"] => 1, ["vino", "lily"] => 2, ["gaya", "rose"] => 1, ["gaya", "lily"] => 0, ... }
And you can do something like this to generate your hash:
def counter_results
#counter_results = {}
Counter.group([:name, :flowers]).sum(:counter).each do |k, v|
#counter_results[k.join("_")] = v
end
#counter_results
end
The resulting hash would look like this:
{
"vino_rose" => 1,
"vino_lily" => 2,
"gaya_rose" => 1,
"gaya_lily" => 0,
...
}
Somebody else may have a better way to do it, but seems like that should get you pretty close.

Ruby CSV re-arranging Array

I'm not sure what the appropriate title for this question so if someone could help me with that also, it would be nice.
-
I have a CSV file that looks something like
ID | Num
a | 1
a | 2
a | 3
b | 4
b | 5
c | 6
c | 7
I need the result to be:
ID | Num
a | 1,2,3,4
b | 4,5
c | 6,7
Currently, my solution is:
ary = CSV.open('some_file')
final = Array.new
id = ary[1][0] # ary[0] is "id"
numJoin = ary[1][1]
(1..ary.length).each do |i|
if id == ary[i+1][0]
numJoin = numJoin + "," + ary[i+1][1]
else
final << [id,numJoin]
id = ary[i+1][0]
numJoin = ary[i+1]]1]
end
end
It works, but I would like to have the opportunity to learn other ways to solve this, as I think there should be simpler ways to do this..
Thanks in advance.
You can use group_by, which groups by the return value of the block passed to it, in this case, it's the ID.
ary = ary.group_by { |v| v[0] }
P.S That file ain't looking like a CSV.

How to duplicate Sum(Sum(Fields!VarName.Value)) using Lookupsets and Custom Code in SSRS

I am fairly new to SSRS but am having a problem double summing when using Lookupsets as output. I have the following table and query which does work
Query for Hours_DataSet
SELECT CallbackDate, SUM(TelemarketingHours) AS DailyHours,
(SELECT SUM(TelemarketingHours) AS Expr1
FROM CallbackTbl) AS HoursPTD
FROM CallbackTbl AS CallbackTbl_1
GROUP BY CallbackDate
Definition of Matrix
| [CallbackDate] | Weekly totals
________________________________________________________________
Hours | [Sum(DailyHours]) | Sum(Sum(DailyHours))
The output is this:
12/01/2014 | 12/02/2014 | 12/03/2014 | 12/04/2014 | 12/05/2014| Weekly totals|
28.75 | 42 | 42.25 | 40.25 | 37.50 | 190.75
In another table I need to calculate the appointments per hour and total appointments per hour for the week. So I set the main data-set to be the number of appointments and use lookupset and custom code to do the summing.
Everything works well for one level of sum. I need to recreate the 190.75 number and use it in the as the denominator in the calculation for number of appointments per hour for the week.
Query for Positive_DataSet:
SELECT MainHistory_1.REALDATE, StatusTbl.Status, COUNT (MainHistory_1.DBRECID) AS Positives, StatusTbl.Code,
(SELECT COUNT(DBRECID) AS Expr1
FROM MainHistory
WHERE (REALDATE > CONVERT(DATETIME, #StartDate, 102)) AND (REALDATE < CONVERT(DATETIME, #EndDate, 102))) AS TotalCalls
FROM MainHistory AS MainHistory_1 INNER JOIN
StatusTbl ON MainHistory_1.STATUS = StatusTbl.Status
GROUP BY MainHistory_1.REALDATE, StatusTbl.Status, StatusTbl.Code
HAVING (StatusTbl.Code = 'P') AND (MainHistory_1.REALDATE > CONVERT(DATETIME, #StartDate, 102)) AND (MainHistory_1.REALDATE < CONVERT(DATETIME, #EndDate, 102))
My Matrix looks like this:
[REALDATE]| Weekly Totals
EXPR | EXPR
where the expressions are
FORMAT(Code.CalcPerHour(Lookupset(FORMAT(Fields!REALDATE.Value,"Long Date"),FORMAT(Fields!CallbackDate.Value,"Long Date"),Fields!DailyHours.Value,"Hours_DataSet"),SUM(Fields!Positives.Value)),"Fixed")
Sum(Sum(Fields!Positives.Value))/SUM(code.CalcPTD(Lookupset(FORMAT(Fields!REALDATE.Value,"Long Date"),FORMAT(Fields!CallbackDate.Value,"Long Date"),Fields!DailyHours.Value,"Hours_DataSet")))
My custom code is this:
PUBLIC SHARED FUNCTION CalcPerHour(Hours AS OBJECT, Totals AS OBJECT) AS DECIMAL
DIM i AS INTEGER
DIM PerHour AS DECIMAL
FOR i=0 TO UBOUND(Hours)
IF CINT(Hours(i)) < > 0 THEN
PerHour = PerHour + (CDEC(Totals)/CDEC(Hours(i)))
END IF
Next i
RETURN PerHour
END FUNCTION
PUBLIC SHARED FUNCTION CalcPTD(LookupArray AS Array) AS DECIMAL
DIM I AS INTEGER
DIM Total AS DECIMAL
Total = 0
FOR i = 0 to UBOUND(LookupArray)
Total = Total + CDEC(LookupArray(i))
NEXT i
RETURN Total
END FUNCTION
My Output is this:
12/01/2014 | 12/02/2014 | 12/03/2014 | 12/04/2014 | 12/05/2014 | Weekly totals|
1.63 | 1.79 | 1.75 | 1.59 | 1.41 | .87
The numbers corresponding to the days of the week are correct.
The number I should be getting for a total is
313/190.75 = 1.64
If I break it down and just look at the sum like this:
sum(Code.CalcPTD(Lookupset(FORMAT(Fields!REALDATE.Value,"Long Date"),FORMAT(Fields!CallbackDate.Value,"Long Date"),Fields!DailyHours.Value,"Hours_DataSet")))
I get the result of 352.50
If I count the number of items like this:
Count(Code.CalcPTD(Lookupset(FORMAT(Fields!REALDATE.Value,"Long Date"),FORMA(Fields!CallbackDate.Value,"Long Date"),Fields!DailyHours.Value,"Hours_DataSet")))
I get the result of 9
If I count distinct the number of items like this:
CountDistinct(Code.CalcPTD(Lookupset(FORMAT(Fields!REALDATE.Value,"Long Date"),FORMAT(Fields!CallbackDate.Value,"Long Date"),Fields!DailyHours.Value,"Hours_DataSet")))
I get the expected 5
I tried to write code for a distinct sum but it wouldn't return a single result but a series of 5 corresponding to the days of the week and I have to display in a single cell.
Any help would be appreciated. I know its kinda complicated. If you have questions or need further clarification please let me know.
So I figured out the answer on my own. To get a grand total of a variable within a dataset you can use SUM(fields!VarName.Value,"DataSet") and it does it for you.

How did the sphinx calculate the weight?

Note:
This is a cross-post, it is firstly posted at the sphinx forum,however I got no answer, so I post it here.
First take a look at a example:
The following is my table(just for test used):
+----+--------------------------+----------------------+
| Id | title | body |
+----+--------------------------+----------------------+
| 1 | National first hospital | NASA |
| 2 | National second hospital | Space Administration |
| 3 | National govenment | Support the hospital |
+----+--------------------------+----------------------+
I want to search the contents from the title and body field, so I config the sphinx.conf
as shown followed:
--------The sphinx config file----------
source mysql
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =0000
sql_db = testfull
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT * FROM test
}
index mysql
{
source = mysql
path = var/data/mysql_old_test
docinfo = extern
mlock = 0
morphology = stem_en, stem_ru, soundex
min_stemming_len = 1
min_word_len = 1
charset_type = utf-8
html_strip = 0
}
indexer
{
mem_limit = 128M
}
searchd
{
listen = 9312
read_timeout = 5
max_children = 30
max_matches = 1000
seamless_rotate = 0
preopen_indexes = 0
unlink_old = 1
pid_file = var/log/searchd_mysql.pid
log = var/log/searchd_mysql.log
query_log = var/log/query_mysql.log
}
------------------
Then I reindex the db and start the searchd daemon.
In my client side I set the attribute as:
----------Client side config-------------------
sc = new SphinxClient();
///other thing
HashMap<String, Integer> weiMap=new HashMap<String, Integer>();
weiMap.put("title", 100);
weiMap.put("body", 0);
sc.SetFieldWeights(weiMap);
sc.SetMatchMode(SphinxClient.SPH_MATCH_ALL);
sc.SetSortMode(SphinxClient.SPH_SORT_EXTENDED,"#weight DESC");
When I try to search "National hospital", I got the following output:
Query 'National hospital' retrieved 3 of 3 matches in 0.0 sec.
Query stats:
'nation' found 3 times in 3 documents
'hospit' found 3 times in 3 documents
Matches:
1. id=3, weight=101
2. id=1, weight=100
3. id=2, weight=100
The match number (three matched) is right,however the order of the result is not what I
wanted.
Obviously the document of id 1 and 2 should be the most closed items to the required
string( "National hospital" ), so in my opinion they should be given the largest
weights,but they are orderd at the last position.
I wonder if there is anyway to meet my requirement?
PS:
1)please do not suggestion me set the sortModel to :
sc.SetSortMode(SphinxClient.SPH_SORT_EXTENDED,"#weight ASC");
This may work for just this example, it will caused some other potinal problems.
2)Actuall the contents in my table is Chinese, I just use the "National Hosp..l" to make
a example.
1° You ask "National hospital" but sphinx search "nation" and "hospit" because
morphology = stem_en, stem_ru, soundex
2° You give weight
weiMap.put("title", 100);
weiMap.put("body", 0);
to unexisting text fields
sql_query = SELECT * FROM test
3° finaly my simple answer to main question
You sort by weight,
the third row has more weight because no words between nation and hospit

Resources