Filtering with comparing each element of a Flux to a single Mono - spring

I am trying to use a Mono of a username to filter out every element of a Flux (the flux having multiple courses) and I am using Cassandra as backend , here is the schema:
CREATE TABLE main.courses_by_user (
course_creator text PRIMARY KEY,
courseid timeuuid,
description text,
enrollmentkey text
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
course_creator | courseid | description | enrollmentkey
----------------+--------------------------------------+-------------+---------------
rascall | b7757e80-0c24-11ed-aec5-23fe9d87e512 | Cosmology | hubble
dibiasky | b7757e81-0c24-11ed-aec5-23fe9d87e512 | astronomy | thebigbang
michaeljburry | b7753060-0c24-11ed-aec5-23fe9d87e512 | Lol | enter
Noam Chomsky | 6c1a4800-09ac-11ed-ada9-83d934863d60 | Hi | Bye
I am using zipWith to pair the Flux of courses with the Mono of user, here is the code
public Flux<CourseByCreator> getMyCourses(Principal placeholder){
Mono<User> principal = Mono.just(new User("dibiasky", "whatifitsnotequal","Kate" ,"Dibiasky"));
return allCreatedCourses = this.courseByCreatorRepository.findAll()
.zipWith(principal)
.flatMap(tuple -> {
if(tuple.getT1().getCourseCreator().equals(tuple.getT2().getUsername())){
System.out.println(tuple.getT2().getUsername());
return Flux.just(tuple.getT1());
}else{
return Flux.empty();
}
}).log();
}
For some reason I am not getting an empty return result despite the user have one matching username and the courses have one matching course with the same creator
What am I doing wrong here?

Flux.zipWith:
Zip this Flux with another Publisher source, that is to say wait for both to emit one element and combine these elements once into a Tuple2. The operator will continue doing so until any of the sources completes.
The mono will emit one then complete.
I'd rather first resolve the principal, then filter the courses:
return Mono
.just(new User("dibiasky", "whatifitsnotequal","Kate" ,"Dibiasky"))
.flatMapMany(principal -> {
return courseByCreatorRepository
.findAll()
.filter(course -> couse.getCourseCreator().equals(principal.getUsername()));
})
.log();

public Flux<CourseByCreator> getMyCourses(Principal placeholder){
Mono<User> principal = Mono.just(new User("dibiasky", "whatifitsnotequal","Kate" ,"Dibiasky"));
return this.courseByCreatorRepository.findAll()
.filterWhen(course -> principal
.filter(user -> user.getUsername().equals(course. getCourseCreator()))
.hasElement()
)
.log();
}

Related

Laravel: Merge two query builders

I have a table of courses which will be free to access or an admin will need to click something to let users see the course.
The course table looks like this:
| id | title | invite_only |
|----|----------------|-------------|
| 1 | free course | 0 |
| 2 | private course | 1 |
Separate from this I have a course_user table, where initially users request access, then admins can approve or deny access:
| id | user_id | course_id | approved | declined |
|----|---------|-----------|----------|----------|
| 1 | 3 | 2 | 1 | 0 |
| 2 | 4 | 1 | 0 | 1 |
| 3 | 4 | 2 | 0 | 0 |
I'd like to index all the courses a user has access to:
class User extends model{
public function myCourses(){
$public = $this->publicCourses;
$invited = $this->invitedCourses;
return $public->merge($invited);
}
public function publicCourses(){
return $this
->hasMany('App\Course')
->where('invite_only', false);
}
public function invitedCourses(){
return $this
->belongsToMany("\App\Course")
->using('App\CourseUser')
->wherePivot('approved', 1);
}
}
How can I make the myCourses function return the results of both publicCourses and invitedCourses by doing only one database query? I'd like to merge the two query builder instances.
According to the doc, you can use union to merge query builders. But as far as I know, it does not work with relations. So maybe you should do it from within controller instead of model. This is an example based on what I understand from your example:
$q1 = App\Course::join('course_user', 'course_user.course_id', 'courses.id')
->join('users', 'users.id', 'course_user.user_id')
->where('courses.invite_only', 0)
->select('courses.*');
$q2 = App\Course::join('course_user', 'course_user.course_id', 'courses.id')
->join('users', 'users.id', 'course_user.user_id')
->where('courses.invite_only', 1)
->where('course_user.approvoed', 1)
->select('courses.*');
$myCourses = $q1->unionAll($q2)->get();
You can also refactor the code further by creating a join scope in App\Course.
I was able to make a much simpler query, and use Laravel's orWherePivot to extract the correct courses:
public function enrolledCourses()
{
return $this
->courses()
->where('invitation_only', false)
->orWherePivot('approved', true);
}

ManyToMany relation - how update attribute in pivot table

I am now learning to work with pivot tables: https://laravel.com/docs/4.2/eloquent#working-with-pivot-tables
I have WeeklyRoutine model. Each routine has several Activities. The assigned activities are attached in a pivot table activity_routine.
Relation defined in the WeeklyRoutine model:
return $this->belongsToMany('App\Models\Activity', 'activity_routine', 'routine_id', 'activity_id')->withPivot('done_at')->withTimestamps();
}
it looks like this:
// activity_routine pivot table (relevant columns only)
| id | activity_id | routine_id | done_at |
| 34 | 1 | 4 | 2016-04-23 09:27:27 | // *1
| 35 | 2 | 4 | null | // *2
*1 this activity is marked as done with the code below
*2 this activity is not yet done
what I have:
I can update the done_at field in the pivot table, thus making it marked as DONE for the given week (a weeklyroutine_id = 4 in the above code
public function make_an_activity_complete($routineid, $activityid) {
$date = new \DateTime;
$object = Routine::find($routineid)->activities()->updateExistingPivot($activityid, array('done_at' => $date));
return 'done!';
}
what I need
I want to UN-DO an activity. When it is already done, that is when the done_at is not null buc contains a date, make it null.
In other words I need to do the below switch of value, but the proper way:
$pivot = DB::table('activity_routine')->where('routine_id, $routineid)->where('activity_id, $activityid)->first();
if($pivot->done_at != null) {
$new_val = new \DateTime;
} else {
$new_val = null;
}
$object = Routine::find($routineid)->activities()->updateExistingPivot($activityid, array('done_at' => $new_val));
How to do it? I have no clue!
Thx.
Your approach seems fine to me. I would probably do it like this.
$routine = Routine::find($routineid);
$activity = $routine->activities()->find($activityid);
$done_at = is_null($activity->pivot->done_at) ? new \DateTime : null;
$routine->activities()->updateExistingPivot($activityid, compact('done_at'));

Calculating the sum of diffrent columns in TableView

I have a Table Witch look like the below table
TableVeiw<Transaction>
---------------------------------------------------------------------
| id | Transaction date | Name | type | Debit Amount | Credit Amount|
|---------------------------------------------------------------------|
| 1 | 21/02/2016 |Invoice|Credit | | 12000 |
|---------------------------------------------------------------------|
| 2 | 21/02/2016 |Payment|Debit | 20000 | |
|---------------------------------------------------------------------|
| Total Debit | Total Credit |
-----------------------------
The data in Debit amount and Credit amount come from one property of Transaction Object the code snnipet of how to populate those columns is below:
tcCreditAmmout.setCellValueFactory(cellData -> {
Transaction transaction = cellData.getValue() ;
BigDecimal value = null;
if(transaction.getKindOfTransaction() == KindOfTransaction.CREDIT){
value = transaction.getAmountOfTransaction();
}
return new ReadOnlyObjectWrapper<BigDecimal>(value);
});
tcDebitAmmout.setCellValueFactory(cellData -> {
Transaction transaction = cellData.getValue() ;
BigDecimal value = null;
if(transaction.getKindOfTransaction() == KindOfTransaction.DEBIT){
value = transaction.getAmountOfTransaction();
}
return new ReadOnlyObjectWrapper<BigDecimal>(value);
});
I need to calculate the total of :Total Debit(See the above table) and Total Credit (See the above table) every time the TableView item changes via Javafx Bindings, but i have no idea how to acheive this.
Note: Total Debit and Total Credit are Labels ,
Assuming you have
TableView<Transaction> table = ... ;
Label totalDebit = ... ;
Label totalCredit = ... ;
then you just need:
totalDebit.textProperty().bind(Bindings.createObjectBinding(() ->
table.getItems().stream()
.filter(transaction -> transaction.getKindOfTransaction() == KindOfTransaction.DEBIT)
.map(Transaction::getAmountOfTransaction)
.reduce(BigDecimal.ZERO, BigDecimal::add),
table.getItems())
.asString("%.3f"));
and of course
totalCredit.textProperty().bind(Bindings.createObjectBinding(() ->
table.getItems().stream()
.filter(transaction -> transaction.getKindOfTransaction() == KindOfTransaction.CREDIT)
.map(Transaction::getAmountOfTransaction)
.reduce(BigDecimal.ZERO, BigDecimal::add),
table.getItems())
.asString("%.3f"));
If getAmountOfTransaction might change while the transaction is part of the table, then your table's items list must be constructed with an extractor.

How to write output of a map reduce job directly to distributed cache so that it is passed to another job

I am currently practising Map-reduce (Hadoop 2.2) and need your help on one of the concepts.
I have a use case which I want to complete using two jobs. I want output of job1 to be written to a Distributed cache and this is passed as input to second job.
Basically I want to avoid writing the output of first job to a file, causing overhead.
Use case Input :
Songs file -
| Id | Song | Type |
|s1 | song1 | Classical |
|s2 | song2 | Jazz |
|s2 | song3 | Classical |
.
User rating file -
| User_Id | Song_Id | Rating |
| u1 | s1 | 7 |
| u2 | s2 | 5 |
| u3 | s2 | 9 |
| u4 | s1 | 7 |
| u5 | s5 | 5 |
| u6 | s1 | 9 |
Note: Both these files contains very large data.
Usecase description:
Find the average rating of each song of type classical.
The actual/intended solution I have come up is that I will use two chained jobs.
1.Job1 : It will get all the ids of classical songs and add to distribute cache
2.Job2 : Mapper in the second job filters the rating of classical songs based on the values in the cache.
Reducer will calculate the average rating of each song.
I searched on web to see if we can write the output of job directly to distributed cache, but was unable to find useful information.
I have found similar question on stackoverflow:
"How to directly send the output of a mapper-reducer to a another mapper-reducer without
saving the output into the hdfs"
The solution for this is to use 'SequenceFileOutputFormat'.
However in my case I want all the song ids to be available to each mapper in second job. So I think the above solution will not work in my case.
The alternate approach I want to go with is to run the first job, which finds the ids of Classical songs and write the output (song ids) to a file and create a new job and add the song ids output file to second job's cache. Please advise.
Your help is much appreciated.
you can update intermediate results to MemCached if eah record has less size <1mb
Follow second approach.
First Job will write output to file system.
Second Job will pass the required file to all nodes by using Job API instead of DistributedCache API, which has been deprecated.
Have a look at new Job API for methods like
addCacheFile(URI uri)
getCacheFiles()
etc.
An approach can be load the output(s) of first job in distributed cache, then launch the second job.
//CONFIGURATION
Job job = Job.getInstance(getConf(), "Reading from distributed cache and etc.");
job.setJarByClass(this.getClass());
////////////
FileSystem fs = FileSystem.get(getConf());
/*
* if you have, for example, a map only job,
* that "something" could be "part-"
*/
FileStatus[] fileList = fs.listStatus(PATH OF FIRST JOB OUTPUT,
new PathFilter(){
#Override public boolean accept(Path path){
return path.getName().contains("SOMETHING");
}
} );
for(int i=0; i < fileList.length; i++){
DistributedCache.addCacheFile(fileList[i].getPath().toUri(), job.getConfiguration());
}
//other parameters
Mapper:
//in mapper
#Override
public void setup(Context context) throws IOException, InterruptedException {
//SOME STRUCT TO STORE VALUES READ (arrayList, HashMap..... whatever)
Object store = null;
try{
Path[] fileCached = DistributedCache.getLocalCacheFiles(context.getConfiguration());
if(fileCached != null && fileCached.length > 0) {
for(Path file : fileCached) {
readFile(file);
}
}
} catch(IOException ex) {
System.err.println("Exception in mapper setup: " + ex.getMessage());
}
}
private void readFile(Path filePath) {
try{
BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath.toString()));
String line = null;
while((line = bufferedReader.readLine()) != null) {
//reading line by line that file and updating our struct store
//....
} //end while (cycling over lines in file)
bufferedReader.close();
} catch(IOException ex) {
System.err.println("Exception while reading file: " + ex.getMessage());
}
} //end readFile method
Now in the map phase, you have the files passed as input to the job AND the values you needed stored in the struct store.
My answer comes from How to use a MapReduce output in Distributed Cache.

Grouping data in mapReduce

I have a csv file which I have loaded into hadoop. Data sample is below.
name | shop | balance
tom | shop a | -500
john | shop b | 200
jane | shop c | 5000
Results:
bad 1
normal 1
wealthy 1
I have to get the balance for each person and then put them into groups(bad(<0), normal(1 to 500), good(>500)
I'm not 100% sure how to put the groups into mapReduce. Do I put it in the reducer? or mapper?
Splitting the csv file(mapper):
String[] tokens = value.toString().split(",");
Sting balance = tokens[3];
Creating groups:
String[] category = new String[3];
category[0] = "Bad"
category[1] = "Normal"
category[2] = "Good"
I also have this if/else statement:
if (bal =< 500){
//put into cat 0
} else if ( bal >= 501 && bal <=1500){
// put into cat 1
} else {
//put into cat 2
}
Thanks in advance.
A simple way to implement this would be:
Map:
map() {
if (bal <= 0) { //or 500, or whatever
emit (bad, 1);
} else if (bal <= 500) { // or 1500, or whatever
emit (normal, 1);
} else {
emit (good, 1);
}
}
Reduce (and combiner, as well):
reduce(key, values) {
int count = 0;
while (values.hasNext()) {
count += values.next();
}
emit (key, count);
}
It's exactly the same as the word count example, where, in your case, you have three words (categories): bad, normal, good.

Resources