How to filter samples in a DGEList in edgeR - rna-seq

I am trying to filter samples in a DGEList object created in edgeR by an attribute I have called "architecture".
$samples looks like:
$samples
group lib.size norm.factors architecture
15-AM_p_ap 1 36252192 1 p
15-LM-11_p_mi 1 34394164 1 p
15-LM-14_p_mi 1 37147178 1 p
15-LM-19_p_up 1 39236017 1 p
15-LM-2_p_lo 1 36543297 1 p
68 more rows ...
I want to subset the list to exclude the samples with the architecture designation of "w". I have tried more things than I can remember, the latest being:
y.subset <- y[which(!y$samples$architecture == "w"),]
How can I accomplish this?
Thanks!

Change the position of your comma so that the condition is applied to the columns of the data frame like so:
y.subset <- y[,which(!y$samples$architecture == "w")]
As the goal here is to filter on the columns of the y$counts object, which are reflected as rows of y$samples.

Related

List all highest datarow or spezific datarow

I have a table in which I use version numbers in each row. Now I would like to have only those records that have the highest version number of a category. But there are also some records that have a negative number, -1 to be precise. These should be selected instead.
An example
Version number|Categegory|Name
1 |SCHU |Shoes
2 |SCHU |new shoes
1 |HAND |Gloves
2 |HAND |New gloves
-1 |HAND |New gloves V2
I'd like to have a list that prints the following.
2 |SCHU|new shoes (Secundary, Selected because max VersionNo)
-1|HAND|New Gloves V2 (Primary, Selected because Special Version)
Translated with www.DeepL.com/Translator
You can use an expression to modify the sort value when it is -1 and then pick the highest value:
var ans = dt.AsEnumerable()
.GroupBy(r => r.Field<string>("Category"))
.Select(rg => rg.OrderByDescending(r => r.Field<int>("VersionNumber") == -1 ? Int32.MaxValue : r.Field<int>("VersionNumber")).First());
Note that the result is IEnumerable<DataRow> and not a DataTable. You can use the extension CopyToDataTable if you want to create a new DataTable containing the result rows.

Loop within a loop, Creating a top X list

I have the function below to output top 5 restaurants results based on rating. How do I add a for loop in this loop so that the output includes the position in the top 5?
def print_top_5(restaurant_list):
sorted_list = sorted(restaurant_list, key = itemgetter("rating"),reverse = True)
for restaurant in sorted_list[:5]:
print restaurant["name"]
print restaurant["rating"]
Thanks!
Try Python's built-in enumerate function:
def print_top_5(restaurant_list):
sorted_list = sorted(restaurant_list, key = itemgetter("rating"),reverse = True)
for idx, restaurant in enumerate(sorted_list[:5]):
print(idx, restaurant["name"], restaurant["rating"])
Note this counts up starting from 0, so your output will looking something like:
0 Restaurant 1 10
1 Restaurant 2 9
2 Restaurant 3 8.5
If you'd like the count to start at 1, just add 1 to idx:
print(idx+1, restaurant["name"], restaurant["rating"])

Create columns based on count of each unique value of one column in Pig

I have a dataset such as:
UserID Item EventType
001 A Buy
001 B Sell
031 A Sell
008 C Buy
001 C Buy
001 A Buy
008 C Sell
How can I split the EventType column into a different column for each event. That is, I want two new columns EventType_Buy and EventType_Sell containing the counts of the occurrences of those events for each UserID and Item pair.
So the output should be something like this:
UserID Item EventType_Buy EventType_Sell
001 A 2 0
001 B 0 1
001 C 1 0
008 C 1 1
031 A 0 1
I'm not so much interested in the sorting, but I plan to use this data in R later, so I would like some help trying to perform this split into column counts.
I've tried creating separate objects for each event type and grouping by UserID, and Item, and generating the counts and trying to join these objects, but I'm not having much success.
Ref : https://pig.apache.org/docs/r0.14.0/basic.html#foreach
Pig Script :
input_data = LOAD 'input.csv' USING PigStorage(',') AS (user_id:chararray,item:chararray,event_type:chararray);
req_stats = FOREACH(GROUP input_data BY (user_id,item)) {
buy_bag = FILTER input_data BY event_type == 'Buy';
sell_bag = FILTER input_data BY event_type == 'Sell';
GENERATE FLATTEN(group) AS (user_id,item), COUNT(buy_bag) AS event_type_buy, COUNT(sell_bag) AS event_type_sell;
};
DUMP req_stats;
Input :
001,A,Buy
001,B,Sell
031,A,Sell
008,C,Buy
001,C,Buy
001,A,Buy
008,C,Sell
Output : DUMP req_stats
(001,A,2,0)
(001,B,0,1)
(001,C,1,0)
(008,C,1,1)
(031,A,0,1)

Adding Column for duplicates in PIG

I have some values like this,
tEn 1
teN 8
Ten 1
thrEE 2
tHRee 1
How do I add column 2 and generate this for all case-insensitive duplicates in column 1?
ten 10
three 3
I have tried using GROUP,
tmp = GROUP data BY (column1);
result = FOREACH tmp GENERATE
group,
SUM(data.column2) as count
But somehow it doesn't seem to give the right results. What do I do?
Strings are case sensitive. You need to make them all lower case first so that they match up
lowerdata = FOREACH data GENERATE LOWER(column1), column2;
and then do what you were doing before.
tmp = GROUP lowerdata BY (column1);
result = FOREACH tmp GENERATE
group,
SUM(data.column2) as count

How to use LINQ To Entities for filtering when many methods are not supported?

I have a table in SQL database:
ID Data Value
1 1 0.1
1 2 0.4
2 10 0.3
2 11 0.2
3 10 0.5
3 11 0.6
For each unique value in Data, I want to filter out the row with the largest ID. For example: In the table above, I want to filter out the third and fourth row because the fifth and sixth rows have the same Data values but their IDs (3) are larger (2 in the third and fourth row).
I tried this in Linq to Entities:
IQueryable<DerivedRate> test = ObjectContext.DerivedRates.OrderBy(d => d.Data).ThenBy(d => d.ID).SkipWhile((d, index) => (index == size - 1) || (d.ID != ObjectContext.DerivedRates.ElementAt(index + 1).ID));
Basically, I am sorting the list and removing the duplicates by checking if the next element has an identical ID.
However, this doesn't work because SkipWhile(index) and ElementAt(index) aren't supported in Linq to Entities. I don't want to pull the entire gigantic table into an array before sorting it. Is there a way?
You can use the GroupBy and Max function for that.
IQueryable<DerivedRate> test = (from d in ObjectContext.DerivedRates
let grouped = ObjectContext.DerivedRates.GroupBy(dr => dr.Data).First()
where d.Data == grouped.Key && d.ID == grouped.Max(dg => dg.ID)
orderby d.Data
select d);
Femaref's solution is interesting, unfortunately, it doesn't work because an exception is thrown whenever "ObjectContext.DerivedRates.GroupBy(dr => dr.Data).First()" is executed.
His idea has inspired me for another solution, something like this:
var query = from d in ObjectContext.ProviderRates
where d.ValueDate == valueDate && d.RevisionID <= valueDateRevision.RevisionID
group d by d.RateDefID into g
select g.OrderByDescending(dd => dd.RevisionID).FirstOrDefault();
Now this works.

Resources