I have a query, I am using a data table and I want to see each variable how many rows have values without null and value(-1) in that particular column. If anyone can help me to use the LINQ query and check the numbers.
Example: I have a table (City, Age, Gender) and i want to show the user that which column has blanks.... I mean in my current dataset city & Age has 200 rows but gender has 170 due to punching error so I through the loop to check each column I want to show the counts.
Here is what I have tried but that is not working. And its Vb.net code.
Dim dtRaw As New DataTable
daGlobal = New OleDb.OleDbDataAdapter(CMD, MyConnection)
daGlobal.Fill(dtRaw)
daGlobal.Fill(dtGlobal)
For Each column As DataColumn In dtRaw.Columns
Dim xss As String = column.ColumnName
Dim MyCounts = (From xss In dtRaw.AsEnumerable()
Where (Not String.IsNullOrEmpty(xss)).Count())
Next
I have 3 tables
table 1
country
countryid countryname
this table has a one to many join to table 2 that is state
table 2
state
stateid statename countryid
table 2 has a one to many join to city table
table 3
city
cityid cityname stateid
i tried to read country using query
session.createQuery("from Country c where c.countryName=:countryname order by c.countryName");
it gives me country object but the list of states are empty??
what am i doing wrong???
ok i fixed it.
by using cascade.all in #oneToMany annotation
and changed the hql to criteria query.
Criteria criteria = session.createCriteria(Country.class, "country");
criteria.add(Restrictions.eq("country.countryName", countryname));
I am looking to achieve the below functionality in Pig. I have a set of sample records like this.
Note that the EffectiveDate column is sometimes blank and also different for the same CustomerID.
Now, as output, I want one record per CustomerID where the EffectiveDate is the MAX. So, for the above example, i want the records highlighted as shown below.
The way I am doing it currently using PIG is this:
customerdata = LOAD 'customerdata' AS (CustomerID:chararray, CustomerName:chararray, Age:int, Gender:chararray, EffectiveDate:chararray);
--Group customer data by CustomerID
customerdata_grpd = GROUP customerdata BY CustomerID;
--From the grouped data, generate one record per CustomerID that has the maximum EffectiveDate.
customerdata_maxdate = FOREACH customerdata_grpd GENERATE group as CustID, MAX(customerdata.EffectiveDate) as MaxDate;
--Join the above with the original data so that we get the other details like CustomerName, Age etc.
joinwithoriginal = JOIN customerdata by (CustomerID, EffectiveDate), customerdata_maxdate by (CustID, MaxDate);
finaloutput = FOREACH joinwithoriginal GENERATE customerdata::CustomerID as CustomerID, CustomerName as CustomerName, Age as Age, Gender as gender, EffectiveDate as EffectiveDate;
I am basically grouping the original data to find the record with the maximum EffectiveDate. Then I join these 'grouped' records with the Original dataset again to get that same record with Max Effective date, but this time I will also get additional data like CustomerName, Age and Gender. This dataset is huge, so this approach is taking a lot of time. Is there a better approach?
Input :
1,John,28,M,1-Jan-15
1,John,28,M,1-Feb-15
1,John,28,M,
1,John,28,M,1-Mar-14
2,Jane,25,F,5-Mar-14
2,Jane,25,F,5-Jun-15
2,Jane,25,F,3-Feb-14
Pig Script :
customer_data = LOAD 'customer_data.csv' USING PigStorage(',') AS (id:int,name:chararray,age:int,gender:chararray,effective_date:chararray);
customer_data_fmt = FOREACH customer_data GENERATE id..gender,ToDate(effective_date,'dd-MMM-yy') AS date, effective_date;
customer_data_grp_id = GROUP customer_data_fmt BY id;
req_data = FOREACH customer_data_grp_id {
customer_data_ordered = ORDER customer_data_fmt BY date DESC;
req_customer_data = LIMIT customer_data_ordered 1;
GENERATE FLATTEN(req_customer_data.id) AS id,
FLATTEN(req_customer_data.name) AS name,
FLATTEN(req_customer_data.gender) AS gender,
FLATTEN(req_customer_data.effective_date) AS effective_date;
};
Output :
(1,John,M,1-Feb-15)
(2,Jane,F,5-Jun-15)
I have created a table in hive using query
CREATE TABLE u_data (
userid INT,
movieid INT,
rating INT,
unixtime STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
then loaded some data into that, now I want to retrieve average rating of movies having more than 30 ratings.
I tried creating a view using query:
create view ratingcount as select movieid, count(rating) as num_of_ratings from u_data group by movieid;
and then used join query:
Select movieid, avg(rating) from u_data join ratingcount on u_data.movieid = ratingcount .movieid where num_of_ratings >30;
which is giving exception. please let me know how to retrieve required data?
try this,
Select movieid, avg(rating) from u_data group by movieid having count(rating) > 30;
I am facing issue in writing LINQ query to perform join on three tables and then performing aggregate functions on the rows. Kindly do provide some help.
I have three tables
Table 1: Students (Id, Name)
Table 2: Subject (SubID, Title, Id)
Table 3: Grade (Id, SubID, marks)
I have to write LINQ query to get the results as following
Count of Students table rows
Count of Grade table rows
Sum of
marks of all rows in Grade table
I am writing query as following but it is not up to the mark as i feel it is not correct.
var _Count = from student in _context.Students
join subject in _context.Subject on student.Id equals subject.Id
join grade in _context.Grade on subject.SubID equals grade.SubID
// How to group them?
select new { //How to take and return the counts?};