Label data in columns with NAs - label

I would like to label categorical data in a column. Data is dichotomized, so I want 0 = "Label_A" and 1 = "Label_B".
I use this code:
data[data$Variable_A==0,]$Variable_A<-"Label_A"
data[data$Variable_B==1,]$Variable_B<-"Label_B"
Unfortunately, I got NAs in this column, so I get the warning "missing values are not allowed in subscripted assignments of data frames". Is there a possibility to let the NAs be NAs and to label the remaining data points?
Thank you!

Related

Writing a formula in a cell in Google Sheets that averages the results from a column derived from expected values in multiple columns

I'm an average user of Google sheets and I've tried writing/looking up the formula I'm going for, but I haven't had any luck yet.
I have a spreadsheet that details multiple values that I need to display in a single cell the average of a certain set of values derived from a specific set of those values from multiple columns.
The flow of information would look something along the lines of:
if value in Column D=L
then
if value in Column J<$1.20
then
Find Avg of all Values in Column N
I'd need the formula to narrow it's field of data each time so the final result was the average of all the values in Column N that had a value in column J<$1.20 with a value in Column D=L.
I feel like a dummy over here because I just can't narrow down how I should write this flow and get it to work right without adding multiple extra hidden columns. Can anyone help on this one?
I've tried writing the formula multiple different ways but haven't kept it written down to pass on.

Merge two bag and get all the field from first bag in pig

I am new to PIG scripting. need some help on this issue.
I got two set of bag in pig and from there I want to get all the field from first bag and overwrite data of first bag if second bag has the data of same field
Column list are dynamic (columns may get added or deleted any time).
in set b we may get data in another field also which are currently blank, if so, then we need to overwrite set a with data available in set b
columns - uniqueid,catagory,b,c,d,e,f,region,g,h,date,direction,indicator
EG:
all_data= COGROUP a by (uniqueid), b by (uniqueid);
Output:
(1,{(1,test,,,,,,,,city,,,,,2020-06-08T18:31:09.000Z,west,,,,,,,,,,,,,A)},{(1,,,,,,,,,,,,,,2020-09-08T19:31:09.000Z,,,,,,,,,,,,,,N)})
(2,{(2,test2,,,,,,,,dist,,,,,2020-08-02T13:06:16.000Z,east,,,,,,,,,,,,A)},{(2,,,,,,,,,,,,,,2020-09-08T18:31:09.000Z,,,,,,,,,,,,,,N)})
Expected Result:
(1,test,,,,,,,,city,,,,,2020-09-08T19:31:09.000Z,west,,,,,,,,,,,,,N)
(2,test2,,,,,,,,dist,,,,,2020-09-08T18:31:09.000Z,east,,,,,,,,,,,,N)
I was able to achieve expected output with below
final = FOREACH all_data GENERATE flatten($1),flatten($2.(region)) as region ,flatten($2.(indicator)) as indicator;

EXCEL Files with IBM info datastage

I have some issue about "Configure the Unstructured Data stage" function in IBM info datastage.
This excel file often add columns but system need me specific data range ex.(Sheet1!A1:Z1)
So I want to know that can I specific data range like first column to last column of Sheet1 ex.(Sheet1!A1:MAX(COLUMN)1)
Thank you and apologize about my English skill
To set the range to the last column, you can use the 'Range Option' and 'Specify the start row'. This will automatically find the end row of the data range. If the start row has columns to column K, it will read through column K of last row. Whereas 'Specify the entire data range' you must specify both the start row and the end row of the range. You can see the description in yellow box if you put the mouse over the 'Range Option' drop down box.
If needed, there's an example you can exercise with, refer to
https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.ds.unstructureddatastage.usage.doc/topics/uds_examples.html
with range option configuration in Example 1.

don't show zeros in dc.js data table

I have a simple data set something like this one.
data = [ {column:'a',value:10},
{column:'a',value:2},
{column:'a',value:5},
{column:'b',value:12},
{column:'b',value:1},
{column:'b',value:8},
{column:'c',value:6}]
I have created a group on top of this data and used in data table which shows something like this
(* considering all the dimension and groups are created at this stage)
Column Value
A 17
B 21
C 6
The real problem comes when I try to filter the data, I have attached a text filter to this. Whenever I try to filter it the records which have value == 0 doesn't not disappear rather it stays showing the value as 0 something like this
Case 1 : The textfilter is filtered with column 'a' the table is showing like this.
Column Value
A 17
B 0
C 0
How do I make the ones with zero value disappear from the table on filter while using groups in the data table ?
I am assuming you are looking for something like the chart build in following link:
dc charts with filtering removing
You can see the source code, they have written a method called "remove_empty_bins". You can also implement something like that.
I hope this answer your question. If you need more help, please create a demo for your problem.

Weka NumericToNominal attributeIndices

I am using the Weka GUI and imported a csv file.
I want to transform a numerical attribute to nominal with the "NumericToNominal"-filter.
There are values between "-1" and "770".
If I set the attributeIndices value to "first-30,31-100,101-150,151-last", I get the error message: "Problem filtering instances: Invalid range list at first-30".
Do you have any idea, what is wrong?
Thanks in advance
I have just used the same NumericToNominal filter because I read in a csv file from the UI and it claimed everything was numeric.
You are using the -R switch and so it is looking for the range of column numbers. The values in whatever columns should not matter. Columns begin at 1 or first as you have above. The error message you get "Invalid range list" is when you reference a column number that does not exist. Therefore, it seems to indicate that either you have less than 30 columns or one of the columns between 1 and 30 has somehow been removed.. Did you mix up column numbers with the values contained within said columns because I believe having a negative value would not be a problem for this process?

Resources