Grouping data in mapReduce - hadoop

I have a csv file which I have loaded into hadoop. Data sample is below.
name | shop | balance
tom | shop a | -500
john | shop b | 200
jane | shop c | 5000
Results:
bad 1
normal 1
wealthy 1
I have to get the balance for each person and then put them into groups(bad(<0), normal(1 to 500), good(>500)
I'm not 100% sure how to put the groups into mapReduce. Do I put it in the reducer? or mapper?
Splitting the csv file(mapper):
String[] tokens = value.toString().split(",");
Sting balance = tokens[3];
Creating groups:
String[] category = new String[3];
category[0] = "Bad"
category[1] = "Normal"
category[2] = "Good"
I also have this if/else statement:
if (bal =< 500){
//put into cat 0
} else if ( bal >= 501 && bal <=1500){
// put into cat 1
} else {
//put into cat 2
}
Thanks in advance.

A simple way to implement this would be:
Map:
map() {
if (bal <= 0) { //or 500, or whatever
emit (bad, 1);
} else if (bal <= 500) { // or 1500, or whatever
emit (normal, 1);
} else {
emit (good, 1);
}
}
Reduce (and combiner, as well):
reduce(key, values) {
int count = 0;
while (values.hasNext()) {
count += values.next();
}
emit (key, count);
}
It's exactly the same as the word count example, where, in your case, you have three words (categories): bad, normal, good.

Related

Getting non overlapping between two dates with Carbon

UseCase: Admin assigns tasks to People. Before we assign them we can see their tasks in a gantt chart. According to the task assign date and deadline, conflict days (overlap days) are generated between tasks.
I wrote this function to get overlapping dates between two dates. But now I need to get non overlapping days between two dates, below is the function I wrote.
$tasks = Assign_review_tasks::where('assigned_to', $employee)
->where('is_active', \Constants::$REVIEW_ACTIVE)
->whereNotNull('permit_id')->get();
$obj['task'] = count($tasks);
// count($tasks));
if (count($tasks) > 0) {
if (count($tasks) > 1) {
$start_one = $tasks[count($tasks) - 1]->start_date;
$end_one = $tasks[count($tasks) - 1]->end_date;
$end_two = $tasks[count($tasks) - 2]->end_date;
$start_two = $tasks[count($tasks) - 2]->start_date;
if ($start_one <= $end_two && $end_one >= $start_two) { //If the dates overlap
$obj['day'] = Carbon::parse(min($end_one, $end_two))->diff(Carbon::parse(max($start_two, $start_one)))->days + 1; //return how many days overlap
} else {
$obj['day'] = 0;
}
// $arr[] = $obj;
} else {
$obj['day'] = 0;
}
} else {
$obj['day'] = 0;
}
$arr[] = $obj;
start_date and end_date are taken from database,
I tried modifying it to,
(Carbon::parse((min($end_one, $end_two))->add(Carbon::parse(max($start_two, $start_one))))->days)->diff(Carbon::parse(min($end_one, $end_two))->diff(Carbon::parse(max($start_two, $start_one)))->days + 1);
But it didn't work, in simple terms this is what I want,
Non conflicting days = (end1-start1 + end2-start2)- Current overlapping days
I'm having trouble translate this expression . Could you help me? Thanks in advance
before trying to reimplement complex stuff I recommend you take a look at enhanced-period for Carbon
composer require cmixin/enhanced-period
CarbonPeriod::diff macro method is what I think you're looking for:
use Carbon\CarbonPeriod;
use Cmixin\EnhancedPeriod;
CarbonPeriod::mixin(EnhancedPeriod::class);
$a = CarbonPeriod::create('2018-01-01', '2018-01-31');
$b = CarbonPeriod::create('2018-02-10', '2018-02-20');
$c = CarbonPeriod::create('2018-02-11', '2018-03-31');
$current = CarbonPeriod::create('2018-01-20', '2018-03-15');
foreach ($current->diff($a, $b, $c) as $period) {
foreach ($period as $day) {
echo $day . "\n";
}
}
This will output all the days that are in $current but not in any of the other periods. (E.g. non-conflicting days)

Is there better way to improve my algorithm of finding drawdown in stock market?

I am trying to calculate drawdowns of every stock.
Definition of drawdown is
A drawdown is a peak-to-trough decline during a specific period for an investment, trading account, or fund.
To put it simple, drawdown is how much does stock crash from peak to trough.
In addition to that, drawdown is recorded when peak's price has recovered later at some point.
To calculate drawdown, I break up into 2 points
find peak(which price is greater than 2 adjacent days' prices)
and trough (which price is lower than 2 adjacent days' prices)
When the peak's price has recovered, that peak, trough becomes a drawdown
Here is an example of stock quotation:
data class Quote(val price: Int, val date: String)
...
//example of quote
Quote(price:1, date:"20080102"),
Quote(price:2, date:"20080103"),
Quote(price:3, date:"20080104"),
Quote(price:1, date:"20080107"),
Quote(price:2, date:"20080108"),
Quote(price:3, date:"20080109"),
Quote(price:2, date:"20080110"),
Quote(price:4, date:"20080111"),
Quote(price:5, date:"20080114"),
Quote(price:6, date:"20080115"),
Quote(price:7, date:"20080116"),
Quote(price:8, date:"20080117"),
Quote(price:9, date:"20080118"),
Quote(price:7, date:"20080122"),
Quote(price:6, date:"20080123"),
Quote(price:8, date:"20080124"),
Quote(price:11,date:"20080125"),
list of drawdowns by date:
(peak: "20080104", trough:"20080107", daysTakenToRecover: 3),
(peak: "20080109", trough:"20080110", daysTakenToRecover: 2),
(peak: "20080118", trough:"20080123", daysTakenToRecover: 4),
Here is what is wrote for a test case:
class Drawdown {
var peak: Quote? = null
var trough: Quote? = null
var recovered: Quote? = null
var percentage: Double? = null
var daysToRecover: String? = null
}
data class Quote(
val price: Double,
val date: String
)
class Test {
private fun findDrawdowns(): List<Drawdown> {
val list = mutableListOf<Drawdown>()
var peak: Quote? = null
var trough: Quote? = null
var recovered: Quote? = null
for (quotation in quotations) {
val currentIdx = quotations.indexOf(quotation)
if (currentIdx in 1 until quotations.size - 1) {
val prevClosing = quotations[currentIdx - 1].price
val nextClosing = quotations[currentIdx + 1].price
val closing = quotation.price
recovered = when {
peak == null -> null
closing >= peak.price -> {
if (peak.date != quotation.date) {
//can possibly be new peak
Quote(closing, quotation.date)
} else null
}
else -> null
}
peak = if (closing > prevClosing && closing > nextClosing) {
if ((peak == null || peak.price < closing) && recovered == null) {
Quote(closing, quotation.date)
} else peak
} else peak
trough = if (closing < prevClosing && closing < nextClosing) {
if (trough == null || trough.price > closing) {
Quote(closing, quotation.date)
} else trough
} else trough
if (recovered != null) {
val drawdown = Drawdown()
val percentage = (peak!!.price - trough!!.price) / peak.price
drawdown.peak = peak
drawdown.trough = trough
drawdown.recovered = recovered
drawdown.percentage = percentage
drawdown.daysToRecover =
ChronoUnit.DAYS.between(
LocalDate.of(
peak.date.substring(0, 4).toInt(),
peak.date.substring(4, 6).toInt(),
peak.date.substring(6, 8).toInt()
),
LocalDate.of(
recovered.date.substring(0, 4).toInt(),
recovered.date.substring(4, 6).toInt(),
recovered.date.substring(6, 8).toInt()
).plusDays(1)
).toString()
list += drawdown
peak = if (closing > prevClosing && closing > nextClosing) {
Quote(recovered.price, recovered.date)
} else {
null
}
trough = null
recovered = null
}
}
}
val drawdown = Drawdown()
val percentage = (peak!!.price - trough!!.price) / peak.price
drawdown.peak = peak
drawdown.trough = trough
drawdown.recovered = recovered
drawdown.percentage = percentage
list += drawdown
return list
}
For those who want to read my code in github, here is a gist:
Find Drawdown in Kotlin, Click Me!!!
I ran some test cases and it shows no error.
So far, I believe this takes an O(n), but I want to make it more efficient.
How can I improve it? Any comments, thoughts are all welcomed!
Thank you and happy early new year.
There are two points
unfortunately the current complexity is the O(N^2)
for (quotation in quotations) {
val currentIdx = quotations.indexOf(quotation)
....
You have a loop through all the quotations, in which for each quotation you find its index. Finding the index is O(N) - look at indexOf docs. So total complexity will be O(N^2)
But you can easy fix it to O(N). Just replace foreach loop + indexOf with forEachIndexed, for example:
quotations.forEachIndexed { index, quote ->
// TODO
}
I think it's not possible to make it faster than O(N), because you need to check each quotation.

Sort table in lua as multible groups

I need to sort a list _rolls to have both the users rolls and ranks taken into considerations.
_rolls = {
{Username="User1", Roll=50, RankPrio=1},
{Username="User2", Roll=2, RankPrio=3},
{Username="User4", Roll=10, RankPrio=2},
{Username="User5", Roll=9, RankPrio=2},
{Username="User3", Roll=32, RankPrio=2}
}
I want the list to be sorted like
_rolls = {
{Username="User2", Roll=2, RankPrio=3},
{Username="User3", Roll=32, RankPrio=2},
{Username="User4", Roll=10, RankPrio=2},
{Username="User5", Roll=9, RankPrio=2},
{Username="User1", Roll=50, RankPrio=1}
}
i know i can use this to sort by Rolls but i cant see a way to do both.
table.sort(_rolls, function(a,b) return a.Roll < b.Roll end)
You just need to write the comparison function so that it compares the Roll fields when the RankPrio fields compare equal:
_rolls = {
{Username="User1", Roll=50, RankPrio=1},
{Username="User2", Roll=2, RankPrio=3},
{Username="User4", Roll=10, RankPrio=2},
{Username="User5", Roll=9, RankPrio=2},
{Username="User3", Roll=32, RankPrio=2}
}
table.sort(_rolls,
function (a, b)
if a.RankPrio == b.RankPrio then
return b.Roll < a.Roll
else return b.RankPrio < a.RankPrio
end
end)
> table.inspect(_rolls)
1 =
RankPrio = 3
Username = User2
Roll = 2
2 =
RankPrio = 2
Username = User3
Roll = 32
3 =
RankPrio = 2
Username = User4
Roll = 10
4 =
RankPrio = 2
Username = User5
Roll = 9
5 =
RankPrio = 1
Username = User1
Roll = 50

AS3: Sort Two Fields

PLEASE NEED HELP
This is what I'm doing:
var my_array_W:Array = new Array();
my_array_W.push({cor:Acorrect, tem:AnewTime, tab: "TB_A", nom:Aoseasnnombre});
my_array_W.push({cor:Bcorrect, tem:BnewTime, tab: "TB_B", nom:Boseasnnombre});
my_array_W.push({cor:Ccorrect, tem:CnewTime, tab: "TB_C", nom:Coseasnnombre});
my_array_W.push({cor:Dcorrect, tem:DnewTime, tab: "TB_D", nom:Doseasnnombre});
my_array_W.push({cor:Ecorrect, tem:EnewTime, tab: "TB_E", nom:Eoseasnnombre});
my_array_W.push({cor:Fcorrect, tem:FnewTime, tab: "TB_F", nom:Foseasnnombre});
This Output:
[tab] | [cor] | [tem]
TB_A 3 8.6877651541
TB_B 4 12.9287651344
TB_C 1 6199.334999999999923
TB_D 4 33.6526718521
TB_E 4 31.90468496844
TB_F 1 6.334999999923
So then I sort:
my_array_W.sortOn("tem", Array.NUMERIC);
my_array_W.sortOn("cor", Array.NUMERIC | Array.DESCENDING);
And Geting this T_T :
[tab] | [cor] | [tem]
TB_E 4 31.90468496844
TB_D 4 33.6526718521
TB_B 4 12.9287651344
TB_A 3 8.6877651541
TB_F 1 31.90468496844
TB_C 1 6199.334999999999923
I just wanna sort a Winner Table by Time(the less) and Correct(the high)
So the Winner is the One who make more correct answers in less time.
I really try so hard to get a sort like this:
[tab] | [cor] | [tem]
TB_B 4 12.9287651344
TB_E 4 31.90468496844
TB_D 4 33.6526718521
TB_A 3 8.6877651541
TB_F 1 6.334999999923
TB_C 1 6199.334999999999923
But couldn't achieve it
Your mistake is that you sort it 2 times. The second time does not additionally sort the sorted, it just sorts the whole Array anew. What you need is to use the Array.sort(...) method with a compareFunction argument:
my_array_W.sort(sortItems);
// Should return -1 if A < B, 0 if A == B, or 1 if A > B.
function sortItems(A:Object, B:Object):Number
{
// First, the main criteria.
if (A.cor > B.cor) return -1;
if (A.cor < B.cor) return 1;
// If A.cor == B.cor, then secondary criteria.
if (A.tem < B.tem) return -1;
if (A.tem > B.tem) return 1;
// Items seem to be equal.
return 0;
}
#Organis was so close.
But Finally I do the trick :D
With this line
my_array_W.sortOn(['cor', 'tem'],[ Array.NUMERIC | Array.DESCENDING, Array.NUMERIC ]);
I get the result I was looking for
Thanks
In your case you have to write a costume sorter function. to do that check my example:
Your first data:
var arr:Array = [];
arr.push({cor:4,tem:13});
arr.push({cor:3,tem:12});
arr.push({cor:2,tem:1});
arr.push({cor:3,tem:16});
arr.push({cor:1,tem:11});
The sorting function and sort result for sample one based on tem:
arr.sort(scrollSorter);
function temSorter(a,b):int
{
if(a.tem<b.tem)
return 1 ;//To pass a forward
else(a.tem>b.tem)
return -1;//To pass a backward
return 0;//a and b are same.
}
And the result is:
The result is this:
[
{
"cor": 3,
"tem": 16
},
{
"cor": 4,
"tem": 13
},
{
"cor": 3,
"tem": 12
},
{
"cor": 1,
"tem": 11
},
{
"cor": 2,
"tem": 1
}
]
Now the sample based on something close you need:
arr.sort(scrollSorter);
function userScoreCalculator(a):Number
{
return a.cor/a.tem;
}
function winnerSorter(a,b):int
{
var aScore:Number = userScoreCalculator(a);
var bScore:Number = userScoreCalculator(b);
if(aScore<bScore)
return 1 ;
else(aScore>bScore)
return -1
return 0
}
And the result is:
[
{
"cor": 2,
"tem": 1
},
{
"cor": 4,
"tem": 13
},
{
"cor": 3,
"tem": 12
},
{
"cor": 3,
"tem": 16
},
{
"cor": 1,
"tem": 11
}
]
Than means the person with score of 2 is winner because he made it in only 1 second. but other players are close to gather in tem parameter, so the next winner is the person with highest score. it comes from the userScoreCalculator() output. the higher output of that function is the winner.
Now take your time and change the userScoreCalculator() function to show the winner.
https://help.adobe.com/en_US/ActionScript/3.0_ProgrammingAS3/WS5b3ccc516d4fbf351e63e3d118a9b90204-7fa4.html

Perl to Ruby conversion (multidimensional arrays)

I'm just trying to get my head around a multidimensional array creation from a perl script i'm currently converting to Ruby, I have 0 experience in Perl, as in i opened my first Perl script this morning.
Here is the original loop:
my $tl = {};
for my $zoom ($zoommin..$zoommax) {
my $txmin = lon2tilex($lonmin, $zoom);
my $txmax = lon2tilex($lonmax, $zoom);
# Note that y=0 is near lat=+85.0511 and y=max is near
# lat=-85.0511, so lat2tiley is monotonically decreasing.
my $tymin = lat2tiley($latmax, $zoom);
my $tymax = lat2tiley($latmin, $zoom);
my $ntx = $txmax - $txmin + 1;
my $nty = $tymax - $tymin + 1;
printf "Schedule %d (%d x %d) tiles for zoom level %d for download ...\n",
$ntx*$nty, $ntx, $nty, $zoom
unless $opt{quiet};
$tl->{$zoom} = [];
for my $tx ($txmin..$txmax) {
for my $ty ($tymin..$tymax) {
push #{$tl->{$zoom}},
{ xyz => [ $tx, $ty, $zoom ] };
}
}
}
and what i have so far in Ruby:
tl = []
for zoom in zoommin..zoommax
txmin = cm.tiles.xtile(lonmin,zoom)
txmax = cm.tiles.xtile(lonmax,zoom)
tymin = cm.tiles.ytile(latmax,zoom)
tymax = cm.tiles.ytile(latmin,zoom)
ntx = txmax - txmin + 1
nty = tymax - tymin + 1
tl[zoom] = []
for tx in txmin..txmax
for ty in tymin..tymax
tl[zoom] << xyz = [tx,ty,zoom]
puts tl
end
end
end
The part i'm unsure of is nested right at the root of the loops, push #{$tl->{$zoom}},{ xyz => [ $tx, $ty, $zoom ] };
I'm sure this will be very simple for a seasoned Perl programmer, thanks! `
The Perl code is building up a complex data structure in $tl -- hash, array, hash, array:
$tl{$zoom}[i]{xyz}[j] = $tx # j = 0
$tl{$zoom}[i]{xyz}[j] = $ty # j = 1
$tl{$zoom}[i]{xyz}[j] = $zoom # j = 2
So I think the key line in your Ruby code should be like this:
tl[zoom] << { 'xzy' => [tx,ty,zoom] }
Note also that the root item ($tl) refers to a hash in the Perl code, while your Ruby code initializes it to be an array. That difference might cause problems for you, depending on the values that $zoom takes.

Resources