How to filter all columns by one condition - sorting

I want to filter all columns in my dataset.
Dataset has 750 columns 610.000 rows.
How to I filter all columns by one condition.
I filter like that but it not efficientdf5 = df4[(df4['KST PLEVNE CD'] >= 0) & (df4['KST PLEVNE CD'] <= 100)
Thank u for help
df5 = df4[(df4['KST PLEVNE CD'] >= 0) & (df4['KST PLEVNE CD'] <= 100)
I try one by one but I need all columns filtered version.

Related

Powerquery: passing column value to custom function

I'm struggling on passing the column value to a formula. I tried many different combinations but I only have it working when I hard code the column,
(tbl as table, col as list) =>
let
avg = List.Average(col),
sdev = List.StandardDeviation(col)
in
Table.AddColumn(tbl, "newcolname" , each ([column] - avg)/sdev)
I'd like to replace [column] by a variable. In fact, it's the column I use for the average and the standard deviation.
Please any help.
Thank you
This probably does what you want, called as x= fctn(Source,"ColumnA")
Does the calculations using and upon ColumnA from Source table
(tbl as table, col as text) =>
let
avg = List.Average(Table.Column(tbl,col)),
sdev = List.StandardDeviation(Table.Column(tbl,col))
in Table.AddColumn(tbl, "newcolname" , each (Record.Field(_, col) - avg)/sdev)
Potentially you want this. Does the average and std on the list provided (which can come from any table) and does the subsequent calculations on the named column in the table passed over
called as x = fctn(Source,"ColumnNameInSource",SomeSource[SomeColumn])
(tbl as table, cname as text, col as list) =>
let
avg = List.Average(col),
sdev = List.StandardDeviation(col)
in Table.AddColumn(tbl, "newcolname" , each (Record.Field(_, cname) - avg)/sdev)

Delete rows based on certain logic in power query

I need to delete rows based on the below logic:
Sum of column B for the same product, to compare with one of the values in column D for this product.
If the sum value < the value in column D, then delete the rows with extra ReceiptQty. In this case, for product AAA, receiptQty =12000, which is >10000, then delete the row 7.
Is there any way to achieve this in power query? Thanks~
This code should work:
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
group = Table.Group(Source, {"ProductID"}, {"temp", each _}),
list = Table.AddColumn(group, "list", each List.Skip(List.Accumulate([temp][ReceiptQty], {0}, (a, b) => a & {List.Last(a) + b}))),
table = Table.AddColumn(list, "table", each Table.FromColumns(Table.ToColumns([temp])&{[list]}, Table.ColumnNames(Source)&{"RunningQty"})),
final = Table.SelectRows(Table.Combine(table[table]), each [OnhandQty] >= [RunningQty])
in
final

Oracle Apex get Multiple results from checkboxes

DB = Oracle 11g
Apex = 4.2.6
In the form I have various items which all work great. However I now have a set of check boxes(:P14_DAYS) one for each day of the week.
What I need to do is get all records between :P14_START_DATE :P14_END_DATE, but only within the days select that's checked.
Below is also a sample of the DATE_SETS table
http://i.stack.imgur.com/YAckN.png
so for example
dates 01-AUG-14 - 5-AUG-14 But only require Sundays AND Mondays date would bring back 2 refs.
BEGIN
UPDATE MD_TS_DETAIL
SET job_for = :P14_JOBFORTEM,
job_type_id = :P14_JOB_TYPE_VALUE,
account_id = :P14_ACC_VAL,
qty = :P14_HRS,
rio = :P14_RIO,
post_code = :P14_POSTCODE
WHERE id IN (SELECT D.id
FROM MD_TS_MAST M
LEFT JOIN MD_TS_DETAIL D
ON M.mast_id = D.md_id
LEFT JOIN DATE_SETS
ON ms_date = dt
WHERE eng_id = :P14_ENG_VAL
AND ms_date BETWEEN :P14_START_DATE AND :P14_END_DATE
AND DATE_SETS.col_day = ANY instr(':'||:P14_DAYS||':',Return)
END;
Any help would be much appreciated .
I found this example: http://docs.oracle.com/cd/B31036_01/doc/appdev.22/b28839/check_box.htm#CHDBGDJH
As I can understand, when you choose some values in your checkbox list, item :P14_DAYS receives value, that contains return values of chosen elements of the LOV with delimiters. Then you need to replace this string in your query
AND DATE_SETS.col_day = ANY instr(':'||:P14_DAYS||':',Return)
with
AND instr(':'||:P14_DAYS||':', ':'||DATE_SETS.col_day||':') > 0
Here function instr searches substring DATE_SETS.col_day in string :P14_DAYS. It returns position of substring if substring was found or 0, if not found. Then you compare result of function with 0, and if result > 0, it means that DATE_SETS.col_day is among selected values.

PIG- Aggregations based on multiple columns

My Input data set has 3 columns and schema looks like below:
ActivityDate, EventId, EventDate
Now, using pig i need to derive multiple variables like below in one output file:
1) All Event Ids after ActivityDate >= EventDate -30 days
2) All Event Ids after ActivityDate >= EventDate -60 days
3) All Event Ids after ActivityDate >= EventDate -90 days
I have more than 30 variables like this. If it is one variable, we can use simple FILTER to filter the data.
I am thinking about any UDF implementation which takes bag as input and returns count of Event IDs based on above criteria for each parameter.
What is the best way to aggregate the data on multiple columns in pig ?
I would suggest creating another file with all of your thresholds and cross joining with the file.
so you would have a file containing:
30
60
90
etc
read it like this:
grouping = load 'grouping.txt' using PigStorage(',') as (groups:double);
Then do:
data_with_grouping = cross data, grouping;
Then have this binary condition:
data_with_binary_condition = foreach data_with_grouping generate ActivityDate, EventId, EventDate, groups, (ActivityDate >= EventDate - groups ? 1 : 0) as binary_condition;
Now you will have one column with the threshold and one column with a binary variable that tells you whether the ID follows the condition or not.
you can do a filter out all of the zeros from the binary_condition and then group on the groups column:
data_with_binary_condition_filtered = filter data_with_binary_condition by (binary_condition != 0);
grouped_by_threshold = group data_with_binary_condition_filtered by groups;
count_of_IDS = foreach grouped_by_threshold generate group, COUNT(data_with_binary_condition.EventId);
I hope this works. Obviously, I didn't debug it for you since I don't have your files.
This code will take a tad more time to run, but it will produce the output you need without a UDF.
If I understand your question correctly, you want to divide the difference between EventDate and ActivityDate in 30 days blocks (e.g. 1 to 30, 31 to 60, 61 to 90 and so on) and then count the frequency of each block.
In this case, I would just rearrange the above equation to create the variable 'range' as below:
// assuming input contains 3 columns ActivityDate, EventId, EventDate
// dividing the difference between ED and AD by 30 and casting it to int, so that 1 block is represented by 1 integer.
input1 = FOREACH input GENERATE (int)((EventDate - ActivityDate) / 30) as range;
output1 = GROUP input1 BY range;
output2 = FOREACH output1 GENERATE group AS range, COUNT(range) as count;
Hope this helps.

How to use LINQ To Entities for filtering when many methods are not supported?

I have a table in SQL database:
ID Data Value
1 1 0.1
1 2 0.4
2 10 0.3
2 11 0.2
3 10 0.5
3 11 0.6
For each unique value in Data, I want to filter out the row with the largest ID. For example: In the table above, I want to filter out the third and fourth row because the fifth and sixth rows have the same Data values but their IDs (3) are larger (2 in the third and fourth row).
I tried this in Linq to Entities:
IQueryable<DerivedRate> test = ObjectContext.DerivedRates.OrderBy(d => d.Data).ThenBy(d => d.ID).SkipWhile((d, index) => (index == size - 1) || (d.ID != ObjectContext.DerivedRates.ElementAt(index + 1).ID));
Basically, I am sorting the list and removing the duplicates by checking if the next element has an identical ID.
However, this doesn't work because SkipWhile(index) and ElementAt(index) aren't supported in Linq to Entities. I don't want to pull the entire gigantic table into an array before sorting it. Is there a way?
You can use the GroupBy and Max function for that.
IQueryable<DerivedRate> test = (from d in ObjectContext.DerivedRates
let grouped = ObjectContext.DerivedRates.GroupBy(dr => dr.Data).First()
where d.Data == grouped.Key && d.ID == grouped.Max(dg => dg.ID)
orderby d.Data
select d);
Femaref's solution is interesting, unfortunately, it doesn't work because an exception is thrown whenever "ObjectContext.DerivedRates.GroupBy(dr => dr.Data).First()" is executed.
His idea has inspired me for another solution, something like this:
var query = from d in ObjectContext.ProviderRates
where d.ValueDate == valueDate && d.RevisionID <= valueDateRevision.RevisionID
group d by d.RateDefID into g
select g.OrderByDescending(dd => dd.RevisionID).FirstOrDefault();
Now this works.

Resources