I am new to PIG scripting. need some help on this issue.
I got two set of bag in pig and from there I want to get all the field from first bag and overwrite data of first bag if second bag has the data of same field
Column list are dynamic (columns may get added or deleted any time).
in set b we may get data in another field also which are currently blank, if so, then we need to overwrite set a with data available in set b
columns - uniqueid,catagory,b,c,d,e,f,region,g,h,date,direction,indicator
EG:
all_data= COGROUP a by (uniqueid), b by (uniqueid);
Output:
(1,{(1,test,,,,,,,,city,,,,,2020-06-08T18:31:09.000Z,west,,,,,,,,,,,,,A)},{(1,,,,,,,,,,,,,,2020-09-08T19:31:09.000Z,,,,,,,,,,,,,,N)})
(2,{(2,test2,,,,,,,,dist,,,,,2020-08-02T13:06:16.000Z,east,,,,,,,,,,,,A)},{(2,,,,,,,,,,,,,,2020-09-08T18:31:09.000Z,,,,,,,,,,,,,,N)})
Expected Result:
(1,test,,,,,,,,city,,,,,2020-09-08T19:31:09.000Z,west,,,,,,,,,,,,,N)
(2,test2,,,,,,,,dist,,,,,2020-09-08T18:31:09.000Z,east,,,,,,,,,,,,N)
I was able to achieve expected output with below
final = FOREACH all_data GENERATE flatten($1),flatten($2.(region)) as region ,flatten($2.(indicator)) as indicator;
I have a simple data set something like this one.
data = [ {column:'a',value:10},
{column:'a',value:2},
{column:'a',value:5},
{column:'b',value:12},
{column:'b',value:1},
{column:'b',value:8},
{column:'c',value:6}]
I have created a group on top of this data and used in data table which shows something like this
(* considering all the dimension and groups are created at this stage)
Column Value
A 17
B 21
C 6
The real problem comes when I try to filter the data, I have attached a text filter to this. Whenever I try to filter it the records which have value == 0 doesn't not disappear rather it stays showing the value as 0 something like this
Case 1 : The textfilter is filtered with column 'a' the table is showing like this.
Column Value
A 17
B 0
C 0
How do I make the ones with zero value disappear from the table on filter while using groups in the data table ?
I am assuming you are looking for something like the chart build in following link:
dc charts with filtering removing
You can see the source code, they have written a method called "remove_empty_bins". You can also implement something like that.
I hope this answer your question. If you need more help, please create a demo for your problem.
I am using the Weka GUI and imported a csv file.
I want to transform a numerical attribute to nominal with the "NumericToNominal"-filter.
There are values between "-1" and "770".
If I set the attributeIndices value to "first-30,31-100,101-150,151-last", I get the error message: "Problem filtering instances: Invalid range list at first-30".
Do you have any idea, what is wrong?
Thanks in advance
I have just used the same NumericToNominal filter because I read in a csv file from the UI and it claimed everything was numeric.
You are using the -R switch and so it is looking for the range of column numbers. The values in whatever columns should not matter. Columns begin at 1 or first as you have above. The error message you get "Invalid range list" is when you reference a column number that does not exist. Therefore, it seems to indicate that either you have less than 30 columns or one of the columns between 1 and 30 has somehow been removed.. Did you mix up column numbers with the values contained within said columns because I believe having a negative value would not be a problem for this process?