Pig: Invalid field Projection; Projected Field does not exist - hadoop

describe filter_records;
This gives me the below format:
filter_records: {details1: (firstname: chararray,lastname: chararray,age: int,gender: chararray),details2: (firstname: chararray,lastname: chararray,age: int,gender: chararray)}
I want to display the firstname from both details1 and details2. I tried this:
display_records = FOREACH filter_records GENERATE display1.firstname;
But I am getting the error:
Invalid field projection. Projected field [display1] does not exist in schema: details1:tuple(firstname:chararray,lastname:chararray,age:int,gender:chararray),details2:tuple(firstname:chararray,lastname:chararray,age:int,gender:chararray).
Please suggest why this error and how to resolve this.

I didn't see any relation name display1 in the filter_records. I guess instead of details1.firstname you used display1.firstname. Can you change your script like this?
display_records = FOREACH filter_records GENERATE details1.firstname;
It seems you used same variable names(firstname, lastname,age,gender) in both details1 and details2. It will give duplicate error when you print like this
display_records = FOREACH filter_records GENERATE details1.firstname,details2.firstname;
To solve this issue you need to provide a unique names in the details1 and details2 relation, Can you change your load schema like this? or you can give any unique name in the details1 and details2.
details1:tuple(firstname1:chararray,lastname1:chararray,age1:int,sex1:chararray),details2:tuple(firstname2:chararray,lastname2:chararray,age2:int,sex2:chararray)
Now when you try like this, you will get the firstname from details1 and details2
display_records = FOREACH filter_records GENERATE details1.firstname1,details2.firstname2;

Related

How to get table name for a simple Sequel Dataset object?

Ie, given a dataset object ds = DB[:transactions].where{updated_at > 1.day.ago} - no funny joins and stuff going on - how could I fetch the table name (:transactions) ?
If you want the first table in the dataset, you can use ds.first_source.
If you want it as a string you can do:
ds.first_source_table.to_s
If you want a symbol, just omit .to_s
Based on the example provided, I would do something like this.
ds.klass.name
That will return a string with the name of your table.

Having troble to read a var using FOREACH in Pig Latin

I am having trouble with the following pig code.
The previus var I need to read via FOREACH has the following DESCRIBE:
UnionD1D2_Distinct: {UnionD1D2_Foreach1::null::display_site:
chararray,UnionD1D2_Foreach1::efectivos_click:
long,UnionD1D2_Foreach2::null::display_site:
chararray,UnionD1D2_Foreach2::total_click: long}
And here, example data:
(linuxlife.example.com,113,linuxlife.example.com,5343)
(mobilesource.example.com,211,mobilesource.example.com,8120)
(siliconshore.example.com,170,siliconshore.example.com,7764)
(printoperator.example.com,62,printoperator.example.com,2724)
So, the FOREACH reads the data is:
UnionD1D2_Calc = FOREACH UnionD1D2_Distinct
GENERATE
(UnionD1D2_Distinct.UnionD1D2_Foreach1::efectivos_click1/UnionD1D2_Distinct.UnionD1D2_Foreach2::total_click2)*100 AS ctr;
But, I'm always getting the following:
ERROR 1066: Unable to open iterator for alias UnionD1D2_Calc. Backend
error : Scalar has more than one row in the output. 1st :
(filmport.example.com,121,filmport.example.com,5395), 2nd
:(firesale.example.com,129,firesale.example.com,5452)
What am I doing wrong?
When you're using FOREACH on an alias, you don't need to use the alias name again to refer to a variable. For example, instead of UnionD1D2_Distinct.UnionD1D2_Foreach1::efectivos_click1 you can just use UnionD1D2_Foreach1::efectivos_click1.
Please try:
UnionD1D2_Calc = FOREACH UnionD1D2_Distinct GENERATE
(UnionD1D2_Foreach1::efectivos_click1/UnionD1D2_Foreach2::total_click2)*100 AS ctr;
And let us know if you get the same error.

get data from other filed in odoo

I need to get value field2 from list values in field1. Field1 is relation many2many with field in another model.
I tried to use domain for it but everytime I received error.
class filial_page_products(models.Model):
gallery_rstamp_products_ids = fields.Many2many('product.template',
'gallery_rstamp_products_rel',
'gallery_rstamp_products_ids', 'filial_page_new_rstamp_products_ids',
'Gallery products')
default_gallery_product_id = fields.Many2one('product.template','Default maket', domain="[(default_gallery_product_id, 'in', 'filial_page_gallery_rstamp_products_ids')]")
class product(models.Model):
_inherit = 'product.template'
filial_page_gallery_rstamp_products_ids = fields.Many2many('product.template',
'gallery_rstamp_products_rel',
'filial_page_recovery_rstamp_products_ids', 'gallery_rstamp_products_ids',
'Gallery list')
filial_page_default_maket_product_ids = fields.One2many('pr_filials.filial_page_products',
'default_gallery_product_id',
'Linked page products')
How can I use domain to select only those values that are specified in the gallery_rstamp_products_ids field?
of course, I can set default_gallery_product_id from all products but I don't like it.
Your domain doesn't look quite right. The left operand should be quoted and the right side should not be quoted (unless it's actually supposed to be evaluated as a string).
domain="[('default_gallery_product_id', 'in', filial_page_gallery_rstamp_products_ids)]"
Note, there's a special format required for filtering against x2many fields (one2many or many2many). You may need to use this (below), however, there have been reports of issues using this in newer versions.
domain="[('default_gallery_product_id', 'in', filial_page_gallery_rstamp_products_ids[0][2])]"
Here's some documentation on domains.

How to retrieve the field name of a ShapeFile feature field?

I am using gdal-ruby to parse ESRI ShapeFiles like in this demo. I want to iterate through all features in order to push the field values into a database. However, I cannot find out how to retrieve the name of each field which I need to match the database column. By now I can only work with the field index of the field such as:
dataset = Gdal::Ogr.open(filename)
number_of_layers = dataset.get_layer_count
number_of_layers.times do |layer_index|
layer = dataset.get_layer(layer_index)
layer.get_feature_count.times do |feature_index|
feature = layer.get_feature(feature_index)
feature.get_field_count.times do |field_index|
field_value = feature.get_field(field_index)
# How can I find out the name of the field?
puts "Value = #{field_value} for unknown field name"
end
end
end
I checked the available methods with irb and looked into the API documentation. It seems as if I am searching for the wrong terms.
Looking at the OGR API itself, I think you need to go via feature.GetDefnRef, to get the feature definition, then .GetFieldDefn for the relevant field, and finally .GetNameRef...?
...
feature.get_field_count.times do |field_index|
defn_ref = feature.get_defn_ref
field_defn = defn_ref.get_field_defn(field_index)
field_name = field_defn.get_name
field_value = feature.get_field(field_index)
puts "Value = #{field_value} for field named #{field_name}"
end
...
ds = ogr.Open(filename, 1)
layer = ds.GetLayer()
for i in range(len(layer.schema)):
print(layer.schema[i].name)

How to generate a custom schema from a relation in Pig?

I have a schema describing tf-idf values for words in various articles.
Its description looks like:
tfidf_relation: {word: chararray,id: bytearray,tfidf: double}
Here is an example of such data:
(cat,article_one,0.13515503603605478)
(cat,article_two,0.4054651081081644)
(dog,article_one,0.3662040962227032)
(apple,article_three,0.3662040962227032)
(orange,article_three,0.3662040962227032)
(parrot,article_one,0.13515503603605478)
(parrot,article_three,0.13515503603605478)
I want to get output in a form:
cat article_one 0.13515503603605478, article_two 0.4054651081081644
and so on.
The question is, how do I make a relation from this which contains the word field and a tuple of id and tfidf fields?
Someting like this:
X = FOREACH tfidf_relation GENERATE word, (id, tfidf);
doesn't work. What is the correct syntax for this?
Try this:
t = LOAD 'input/file' USING PigStorage(',') as (word: chararray,id: bytearray,tfidf: double);
u = group t by word;
dump u;
The output will be
(cat,{(cat,article_two,0.4054651081081644),(cat,article_one,0.13515503603605478)})
(dog,{(dog,article_one,0.3662040962227032)})
(apple,{(apple,article_three,0.3662040962227032)})
(orange,{(orange,article_three,0.366204096222703)})
(parrot,{(parrot,article_three,0.13515503603605478),
(parrot,article_one,0.13515503603605478)})
I hope this is what you are looking for.
X = FOREACH tfidf_relation GENERATE word, {(id, tfidf)};
This is probably what you need.

Resources