Is it possible to find maximum value of 2 or more column in a table? - jdbc

for example : I have a table as follows
id math science english history
1 80 90 90 90
2 70 60 81 78
3 69 50 45 80
4 30 40 10 80
i only want to find the maximum value in column math and science.
Is it possible?

Simply use this :
select max(science),max(math) from your_table

Related

How to subset rows from one dataframe based on matching values from a second smaller data frame in R

I want to select a control group from one data frame based of matching the age from a second data frame. As an example I have subject.df
subject.df
id age
1 1 55
2 2 62
3 3 73
4 4 54
5 5 66
I'd like to subset control.df based off of matching the age directly on a 1 to 1 matching from the subject.df dataframe.
control.df
id age
6 6 66
7 7 71
8 8 80
9 9 51
10 10 55
11 11 56
12 12 77
13 13 62
14 14 64
15 15 73
16 16 67
17 17 54
18 18 75
19 19 77
20 20 78
21 21 53
22 22 64
23 23 83
24 24 61
25 25 77
I'm fairly new to R. In the past I've used Matlab and in this instance would use a for loop to iterate over the control.df dataframe, but I've been told that R doesn't always like for loops and that it can be computationally difficult in R.
In the end I'll be doing this on a much larger data set where the subject group is around 250 and the control group is more than 40K so I know that 1:1 matching is possible.

Query table and group results into ranges in Laravel

I have a database table that stores multiple records of survey scores, the scores are between 1-100. I'm trying to present a frequency distribution on the apps front end, by grouping the scores into the following range;
Less than 20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100
So if the table had the data 87, 92, 95, 98, the user would see
80 - 90 (1)
90 - 100 (3)
etc. I think collections are the way to go about it, but I don't know where to start to get this sort of output, or whether it's even possible in Laravel?
Yes, it's possible. I believe this is the SQL query that you need (assume your table name is "scores", and "score" is the appropriate field):
select (case when score between 0 and 20 then 'Less than 20'
when score between 21 and 30 then 'Between 21 and 30'
when score between 31 and 40 then 'Between 31 and 40'
when score between 41 and 50 then 'Between 41 and 50'
when score between 51 and 60 then 'Between 51 and 60'
when score between 61 and 70 then 'Between 61 and 70'
when score between 71 and 80 then 'Between 71 and 80'
when score between 81 and 90 then 'Between 81 and 90'
when score between 91 and 100 then 'Between 91 and 100'
end) as score_range, count(*) as count
from scores
group by score_range
order by min(score);
So for Laravel it could work like this:
$frequency = DB::select("SELECT (CASE
WHEN score BETWEEN 0 AND 20 THEN 'Less than 20'
WHEN score BETWEEN 21 AND 30 THEN '20-30'
WHEN score BETWEEN 31 AND 40 THEN '30-40'
WHEN score BETWEEN 41 AND 50 THEN '40-50'
WHEN score BETWEEN 51 AND 60 THEN '50-60'
WHEN score BETWEEN 61 AND 70 THEN '60-70'
WHEN score BETWEEN 71 AND 80 THEN '70-80'
WHEN score BETWEEN 81 AND 90 THEN '80-90'
WHEN score BETWEEN 91 AND 100 THEN '90-100'
END) AS score_range, COUNT(*) as count
FROM scores
GROUP BY score_range
ORDER BY MIN(score);");
You can just edit the text titles.
In this query "40-50" (for example) it means, that the score is between 41 and 50. Also you can replace "ORDER BY MIN(score)" to "ORDER BY count" if you want.

Looking for a clever way to sort a set of data

I have a set of 80 students and I need to sort them into 20 groups of 4.
I have their previous exam scores from a prerequisite module and I want to ensure that the average of the sorted group members scores is as close as possible to the overall average of the previous exam scores.
Sorry, if that isn't particularly clear.
Here's a snapshot of the problem:
Student Score
AA 50
AB 45
AC 80
AD 70
AE 45
AF 55
AG 65
AH 90
So the average of the scores here is 62.5. How would I best go about sorting these eight students into two groups of four such that, for both groups, the average of their combined exam scores is as close as possible to 62.5.
My problem is exactly this but with 80 data points (20 groups) rather than 8 (2 groups).
The more I think about this problem the harder it seems.
Does anyone have any ideas?
Thanks
One Possible Solution:
I would try going with a greedy algorithm that starts by pairing each student with another student that gets you closest to your target average. After the initial pairing you should then be able to make subsequent pairs out of the first pairs using the same approach.
After the first round of pairing, this approach leverages taking the average of two averages and comparing that to the target mean to create subsequent groups. You can read more about why that will work for this problem here.
However,
This will not necessarily give you the optimal solution, but is rather a heuristic technique to solve the problem. One noted example below is when one low value must be offset by three high values to reach the targeted mean. These types of groupings will not be accounted for by this technique. However, if you know you have a relatively normal distribution centered around your targeted mean then I think this approach should give a decent approximation.
First sort the goup by score. So it becomes:
AH 90
AC 80
.....
AB 45
AE 45
Then start combinning the first with the last:
(AE, AH, 67.5)
(AB, AC, 62.5)
(AD, AA, 60)
(AG, AF, 60)
And so on in the other case you will combine the two by two. First two with the last two.
Another way:
1. Find all the possible groups by 4 students.
2. Then for every combination of groups find the abs deviation from the average score and SUM it up for the combination of groups.
3. Choose the combination of groups with the lowest sum.
Initially, I did think about the top-bottom match option.
However, as John has highlighted, the results certainly aren't optimal:
Scores Students Avg.
40 94 40 94 'AE' 'DA' 'AI' 'AR' 67
40 90 40 88 'AK' 'CI' 'AM' 'BP' 64.5
40 85 40 80 'AQ' 'AW' 'AT' 'BD' 61.25
40 79 40 77 'AU' 'BC' 'AV' 'AB' 59
40 76 40 75 'AX' 'CG' 'AZ' 'CQ' 57.75
40 75 40 75 'BF' 'CB' 'BN' 'BQ' 57.5
40 75 40 74 'BR' 'BI' 'CF' 'CZ' 57.25
40 74 40 74 'CK' 'CO' 'CP' 'AL' 57
40 72 41 71 'DB' 'CN' 'AG' 'BO' 56
41 71 42 70 'CD' 'BM' 'AH' 'BS' 56
42 70 42 69 'BG' 'BL' 'CU' 'CX' 55.75
43 68 44 67 'BK' 'CY' 'AD' 'CE' 55.5
44 64 44 64 'BJ' 'CR' 'BZ' 'BY' 54
45 64 45 63 'BW' 'BV' 'CS' 'BE' 54.25
45 62 47 60 'CV' 'CH' 'AC' 'CM' 53.5
47 59 47 58 'BT' 'AY' 'CL' 'AP' 52.75
47 57 48 57 'CT' 'BA' 'BX' 'AS' 52.25
48 56 49 56 'CA' 'AJ' 'AN' 'AA' 52.25
50 55 50 54 'BB' 'AF' 'CJ' 'AO' 52.25
51 52 51 52 'CC' 'BU' 'CW' 'BH' 51.5

Computing lag in Hive by a variable

My input table looks like:
guest_id days
101 79
101 70
101 68
101 61
102 101
102 90
102 55
103 99
103 90
Note that, days are in descending order,by guest_id
Desired output table:
guest_id days days_diff
101 79 0
101 70 9
101 68 2
101 61 7
102 101 0
102 90 11
102 55 35
103 99 0
103 90 9
days_diff is the first order difference by guest_id (not throughout days column)
You need to have a unique id column as well (otherwise Hive doesn't know about the order of your rows).
Then you can just self join on id=id+1 to get your differences:
select a.guest_id,
a.days,
case when a.guest_id = b.guest_id then b.days-a.days else 0 end days_diff
from
input a
join input b on a.id=b.id-1
Edit: As pointed out by Kunal in the comments, Hive does have a Lag window function which requires a PARTITION BY ... ORDER BY clause; you still need something to order your table by, for example if you have a date column you would used this like the following:
SELECT guest_id,
days,
LAG(days, 1, 0) OVER (PARTITION BY guest_id ORDER BY date)
FROM input;

vectorized indexing of matrices with other matrices (in octave)

Suppose we have a 2D (5x5) matrix:
test =
39 13 90 5 71
60 78 38 4 11
87 92 46 45 35
40 96 61 17 1
90 50 46 89 63
And a second 2D (5x2) matrix:
tidx =
1 3
2 4
2 3
2 4
4 5
And now we want to use tidx as an idex into test, so that we get the following output:
out =
39 90
78 4
92 46
96 17
89 63
One way to do this is with a for loop...
for i=1:size(test,1)
out(i,:) = test(i,tidx(i,:));
end
Question:
Is there a way to vectorize this so the same output is generated without a for loop?
Here is one way:
test(repmat([1:rows(test)]',1,columns(tidx)) + (tidx-1)*rows(test))
What you describe is an index problem. When you place a matrix all in one dimension, you get
test(:) =
39
60
87
40
90
13
78
92
96
50
90
38
46
61
46
5
4
45
17
89
71
11
35
1
63
This can be indexed using a single number. Here is how you figure out how to transform tidx into the correct format.
First, I use the above reference to figure out the index numbers which are:
outinx =
1 11
7 17
8 13
9 19
20 25
Then I start trying to figure out the pattern. This calculation gives a clue:
(tidx-1)*rows(test) =
0 10
5 15
5 10
5 15
15 20
This will move the index count to the correct column of test. Now I just need the correct row.
outinx-(tidx-1)*rows(test) =
1 1
2 2
3 3
4 4
5 5
This pattern is created by the for loop. I created that matrix with:
[1:rows(test)]' * ones(1,columns(tidx))
*EDIT: This does the same thing with a built in function.
repmat([1:rows(test)]',1,columns(tidx))
I then add the 2 together and use them as the index for test.

Resources