Spark : Java : Why Dataset.show() is not showing all fields when using custom encoder? - java-8

While using map() on Dataset, it is not returning all fields.
code snippet:
RecordParser parser = new RecordParser();
Dataset<CensusData> censusData =
records.map(parser,Encoders.bean(CensusData.class));
censusData.show(40);
The above code is returning only 5 fields, whereas it has 13 fields.
output:
+----------+---------+-------+------------+--------+-------------------+
|activityId|contentId|daypart|deviceTypeId|errorMsg| genreId|
+----------+---------+-------+------------+--------+-------------------+
| null| null| null| null| null| null|
| 4| 0002| 1| 1| null| DR1|
| 4| 0004| 1| 2| null|Children (0-12 yrs)|
| | 0018| 1| 3| null| Entertainment|
How can i solve this problem?
Edit:
Details:
Java version : 8
Spark version : 2.1

I found out the issue, the Bean class must have getters and setters for all fields you want.

Related

How to update multiple values in oracle?

I have two tables:
table 1:
|Project type|Quarter1|Quarter2|Quarter3|Quarter4|
|------------|--------|--------|--------|--------|
|type1 |1 |3 |5 |7 |
|type2 |2 |4 |6 |8 |
table 2:
|Project|Value|Quarter|
|-------|-----|-------|
|type1 | |1 |
|type2 | |1 |
|type1 | |2 |
|type2 | |2 |
|type1 | |3 |
|type2 | |3 |
|type1 | |4 |
|type2 | |4 |
I want to update table 2 value section with data from table 1 and the expected outcome is:
|Project|Value|Quarter|
|-------|-----|-------|
|type1 |1 |1 |
|type2 |2 |1 |
|type1 |3 |2 |
|type2 |4 |2 |
|type1 |5 |3 |
|type2 |6 |3 |
|type1 |7 |4 |
|type2 |8 |4 |
I know updating single one value can be written as:
update table2 a
set a.value = (select Quarter1
from table1
where projecttype = 'type1')
where a.project = 'type1'
and a.quarter = '1';
Please tell me how can I update all value in one time?
Thank you!
One way is with a merge statement:
merge into table_2 t
using table_1 s
on (t.project = s.project_type)
when matched then update
set t.value = case t.quarter when 1 then s.quarter1
when 2 then s.quarter2
when 3 then s.quarter3
when 4 then s.quarter4 end
;
This is my primary thought about using loop to repeat the updating process. The main body refer to mathguy's answer (Thanks again). It may complicate the code in this scenario, but would be useful when there are numerous columns in table1, such as years instead quarters.
declare
quart_num number;
code varchar2(2000);
begin
for quart_num in 1..4
loop
code := 'merge into table2 a
using table1 b
on (a.project = b.projecttype)
when matched then
update set a.value = quarter'||
quart_num || 'where a.quarter =' ||quart_num;
execute immediate(code);
end loop;
end;

Laravel relationship method to get some total from array

I have some question about my code, i had table like these :
tb teacher
|---------------------------------|
|id | name | id_prodi|
-----------------------------------
|1 | Teacer A | 1 |
|2 | Teacher B | 2 |
|3 | Teacher C | 1 |
tb prodi
|------------------------
|id | prodi |
-------------------------
|1 | Prodi 1 |
|2 | Prodi 2 |
|3 | Prodi 3 |
tb year
|------------------------
|id | year |
-------------------------
|1 | 2018 |
|2 | 2019 |
|3 | 2020 |
tb attendence
|---------------------------------------------------|
|id | teacher_id | year_id | date_in |
-----------------------------------------------------
| 1 | 1 | 3 | 2020-03-20|
| 2 | 1 | 3 | 2020-03-21|
| 3 | 3 | 3 | 2020-03-21|
| 4 | 1 | 3 | 2020-03-22|
thats my tables. what I need to do here, get result like this on my datatable
i have some condition with combobox to select by a year
|----------------------------------------------------------|
|# | name | year | total_attend |
------------------------------------------------------------
| 1 | Teacher A | 2020 | 3 |
| 2 | Teacher C | 2020 | 1 |
some like thats for final result
and this my controller
public function showData(Request $request)
{
$year = $request->year_id;
$prodi_id = $request->prodi_id;
$datas = Teacher::whereHas('attendance', function($query) use ($year) {
$query->where('year_id', $year);
})
->where('prodi_id',$prodi_id)
->get();
return response()->json([
'success' => 'Success',
'data' => $datas
]);
}
my problem is, I cant get total column by count the attendance year. Anyone has clue??
--help
You can achieve this using selectRaw() with a count() function for teacher_name:
DB::table('attendence')->selectRaw('t.name as name,y.year as year,count(t.name) as total_attend')
->join('teacher as t','t.id','=','attendence.teacher_id')
->join('year as y','attendence.year_id','=','y.id')
->where([['attendence.year_id',$year],['t.prodi_id',$prodi_id]])
->groupBy('name','year')->get();

D register is not updated, why is that?

To replicate the cpu.out file shown below (though without my comments),
use this cpu.hdl, which passes all the tests.
Now, my question is, at clock cycle 3+, 4, and 4+. Notice that DRegise (D register) is not updated, despite the command was "D=A-D", why is that?
|time| inM | instruction |reset| outM |writeM |addre| pc |DRegiste|
a-instruc | store the number "12345"
|0+ | 0|0011000000111001| 0 | 0| 0 | 0| 0| 0 |
|1 | 0|0011000000111001| 0 | 0| 0 |12345| 1| 0 |
c-instru | comp: "A" | dest: "D" | jump: "no jump" | "D=A"
|1+ | 0|1110110000010000| 0 | 12345| 0 |12345| 1| 12345 |
|2 | 0|1110110000010000| 0 | 12345| 0 |12345| 2| |
a-instruc | "23456"
|2+ | 0|0101101110100000| 0 | -1| 0 |12345| 2| 12345 |
|3 | 0|0101101110100000| 0 | -1| 0 |23456| 3| 12345 |
c-instruc | comp: "A-D" | dest: "D" | jump: "no jump" | "D=A-D"
|3+ | 0|1110000111010000| 0 | 11111| 0 |23456| 3| 11111 |
|4 | 0|1110000111010000| 0 | 12345| 0 |23456| 4| 11111 |
a-instruc | "1000" WHY DREGISTE NOT CHANGE? v^v^
|4+ | 0|0000001111101000| 0 | -11111| 0 |23456| 4| 11111 |
|5 | 0|0000001111101000| 0 | -11111| 0 | 1000| 5| 11111 |
If your cpu.hdl is passing all the tests, it is probably operating correctly.
As far as I can tell (it's been several years since I built my CPU), the Dreg is being updated correctly; it gets updated in the + cycles. Note that in cycle 3, its value is 12345, and in 3+ (after the processing of the D=A-D) it is 11111 (which is 23456-12345, as you would expect).
My best guess is that what is happening is that the simulator doesn't update the values of the outputs of the cpu in the + phases, but does show the internal state. So you see the Dreg change in the + phases, but "addre" (which isn't an internal register, it's the external address lines) only changes in the non-+ phases.

Sorting the dataset on the basis of more than one columns

I have a sample dataset as below.
+---------+--------+---------+---------+---------+
| Col1 | Col2 | NumCol1 | NumCol2 | NumCol3 |
+---------+--------+---------+---------+---------+
| Value 1 | Value2 | 6 | 2 | 9 |
| Value 3 | Value4 | 8 | 3 | 12 |
| Value 5 | Value6 | 1 | 11 | 8 |
| Value 7 | Value8 | 4 | 10 | 5 |
+---------+--------+---------+---------+---------+
I need to Sort this dataset based on the values of Column(NumCol1,NumCol2,NumCol3) i.e If I have to sort this dataset as ascending order I need to get below result.
+---------+--------+---------+---------+---------+
| Col1 | Col2 | NumCol1 | NumCol2 | NumCol3 |
+---------+--------+---------+---------+---------+
| Value 5 | Value6 | 1 | 11 | 8 |
| Value 1 | Value2 | 6 | 2 | 9 |
| Value 3 | Value4 | 8 | 3 | 12 |
| Value 7 | Value8 | 4 | 10 | 5 |
+---------+--------+---------+---------+---------+
row with Value 5 Value6 1 11 8 came first as it has lowest 1 similarly it folows.
If in descending order, result would be:
+---------+--------+---------+---------+---------+
| Col1 | Col2 | NumCol1 | NumCol2 | NumCol3 |
+---------+--------+---------+---------+---------+
| Value 3 | Value4 | 8 | 3 | 12 |
| Value 5 | Value6 | 1 | 11 | 8 |
| Value 7 | Value8 | 4 | 10 | 5 |
| Value 1 | Value2 | 6 | 2 | 9 |
+---------+--------+---------+---------+---------+
Is it possible to do this spark? How will be able to achieve the same?
Use least and greatest to calculate the minimum and maximum among the three columns and then order by it. In pyspark:
Ascending by the least value:
import pyspark.sql.functions as f
df.orderBy(f.least(f.col('NumCol1'), f.col('NumCol2'), f.col('NumCol3'))).show()
+-------+------+-------+-------+-------+
| Col1| Col2|NumCol1|NumCol2|NumCol3|
+-------+------+-------+-------+-------+
|Value 5|Value6| 1| 11| 8|
|Value 1|Value2| 6| 2| 9|
|Value 3|Value4| 8| 3| 12|
|Value 7|Value8| 4| 10| 5|
+-------+------+-------+-------+-------+
Descending by the greatest value:
df.orderBy(f.greatest(f.col('NumCol1'), f.col('NumCol2'), f.col('NumCol3')).desc()).show()
+-------+------+-------+-------+-------+
| Col1| Col2|NumCol1|NumCol2|NumCol3|
+-------+------+-------+-------+-------+
|Value 3|Value4| 8| 3| 12|
|Value 5|Value6| 1| 11| 8|
|Value 7|Value8| 4| 10| 5|
|Value 1|Value2| 6| 2| 9|
+-------+------+-------+-------+-------+

How to get distinct rows in dataframe using pyspark?

I understand this is just a very simple question and most likely have been answered somewhere, but as a beginner I still don't get it and am looking for your enlightenment, thank you in advance:
I have a interim dataframe:
+----------------------------+---+
|host |day|
+----------------------------+---+
|in24.inetnebr.com |1 |
|uplherc.upl.com |1 |
|uplherc.upl.com |1 |
|uplherc.upl.com |1 |
|uplherc.upl.com |1 |
|ix-esc-ca2-07.ix.netcom.com |1 |
|uplherc.upl.com |1 |
What I need is to remove all the redundant items in host column, in another word, I need to get the final distinct result like:
+----------------------------+---+
|host |day|
+----------------------------+---+
|in24.inetnebr.com |1 |
|uplherc.upl.com |1 |
|ix-esc-ca2-07.ix.netcom.com |1 |
|uplherc.upl.com |1 |
If df is the name of your DataFrame, there are two ways to get unique rows:
df2 = df.distinct()
or
df2 = df.drop_duplicates()
The normal distinct not so user friendly, because you cant set the column.
In this case enough for you:
df = df.distinct()
but if you have other value in date column, you wont get back the distinct elements from host:
+--------------------+---+
| host|day|
+--------------------+---+
| in24.inetnebr.com| 1|
| uplherc.upl.com| 1|
| uplherc.upl.com| 2|
| uplherc.upl.com| 1|
| uplherc.upl.com| 1|
|ix-esc-ca2-07.ix....| 1|
| uplherc.upl.com| 1|
+--------------------+---+
after distinct you will get back as follows:
df.distinct().show()
+--------------------+---+
| host|day|
+--------------------+---+
| in24.inetnebr.com| 1|
| uplherc.upl.com| 2|
| uplherc.upl.com| 1|
|ix-esc-ca2-07.ix....| 1|
+--------------------+---+
thus you should use this:
df = df.dropDuplicates(['host'])
it will keep the first value of day
If you familiar with SQL language it will also works for you:
df.createOrReplaceTempView("temp_table")
new_df = spark.sql("select first(host), first(day) from temp_table GROUP BY host")
+--------------------+-----------------+
| first(host, false)|first(day, false)|
+--------------------+-----------------+
| in24.inetnebr.com| 1|
|ix-esc-ca2-07.ix....| 1|
| uplherc.upl.com| 1|
+--------------------+-----------------+

Resources