Querying a specific cell with known column and row - ruby

I am working with Ruby (not Rails) and PostgeSQL and have been banging my head for hours trying to figure out how to assign the value of a field when you know the column and the row you are trying to cross reference.
I have tried the following:
A database containing cities and linked distances similar to:
Cities city1 city2 city3 city4
city1 0 17 13 6
city2 17 0 7 15
city3 . . .
city4 . . .
and I have tried playing around with the following code:
array.each { |city| #array being the array containing the sorted cities
from_city = city
query { |row|
#row is a hash containing each city as key and distance as val
#first row: {"Cities"=>"city1", "city1"=>"0", "city2"=>"17",...
#I have tried doing a row.each and grabbing the value of the specified city key,
but that doesn't work..
}
}
Is there a better way to go about doing this? Basically all I need to know is how I can pull the distance value when I know what two cities I want to use and will assign it to the variable distance.

I'd change your database schema (SQL databases aren't really designed to work by column access, and adding a new column everytime you add a city is really painful to do):
origin | destination | distance
city1 | city2 | 17
... More rows ...
That way looking up the distance between 2 cities is just:
conn.exec_params('SELECT distance from cities where origin = $1 and destination = $2', [city1, city2])
which will return 1 row with 1 column which is the distance between the two.
Alternatively if your data set is small and doesn't change much there's nothing wrong with storing the data as a file and loading it into memory once at startup. Depends on your use case.
Also, if you're going to do lots of geometry computations, PostGIS might be what you want.

Related

How to restrict query result from multiple instances of overlapping date ranges in Django ORM

First off, I admit that I am not sure whether what I am trying to achieve is possible (or even logical). Still I am putting forth this query (and if nothing else, at least be told that I need to redesign my table structure / business logic).
In a table (myValueTable) I have the following records:
Item
article
from_date
to_date
myStock
1
Paper
01/04/2021
31/12/9999
100
2
Tray
12/04/2021
31/12/9999
12
3
Paper
28/04/2021
31/12/9999
150
4
Paper
06/05/2021
31/12/9999
130
As part of the underlying process, I am to find out the value (of field myStock) as on a particular date, say 30/04/2021 (assuming no inward / outward stock movement in the interim).
To that end, I have the following values:
varRefDate = 30/04/2021
varArticle = "Paper"
And my query goes something like this:
get_value = myValueTable.objects.filter(from_date__lte=varRefDate, to_date__gte=varRefDate).get(article=varArticle).myStock
which should translate to:
get_value = SELECT myStock FROM myValueTable WHERE varRefDate BETWEEN from_date AND to_date
But with this I am coming up with more than one result (actually THREE!).
How do I restrict the query result to get ONLY the 3rd instance i.e. the one with value "150" (for article = "paper")?
NOTE: The upper limit of date range (to_date) is being kept constant at 31/12/9999.
Edit
Solved it. In a round about manner. Instead of .get, resorted to generating values_list with fields from_date and myStock. Using the count of objects returned; appended a list with date difference between from_date and the ref date (which is 30/04/2021) and the value of field myStock, sorted (ascending) the generated list. The first tuple in the sorted list will have the least date difference and the corresponding myStock value and that will be the value I am searching for. Tested and works.

How to calculate the average of each key's values from 1 milion struct datas?

I have 1 milion of the following struct:
type person struct {
age int
. . .
//some more attributes like name, surname, etc
}
My goal at the end is to know for each person's name the average of its ages. A person may occur multiple times or just 1 time. I read the struct datas one by one, they are random given and I can't sort them.
Example with only age and name attributes written like a key-value data:
Josh: 34
Abigail: 6
Aaron: 43
Josh: 4
Frederich: 22
...
Aaron: 3
...
So when I access a data eg Aaron, I don't know how many times he occurs in the given stock of data, maybe that's the only time I see him. At the end I need to know the average of each person's age not necessary in an order.
My idea was the following:
I used a key-values data like this: map[name]=average, howMany. When i accessed a data, I calculated the new average with the new data's age and incremented howMany. Quite straightforward I'd say.
I can't keep in my RAM 1 milion structs like this.
I'd appreciate any suggestion and any grammar correction.

Referencing from table with mixed cells of different categories

I'm trying to program a Google Sheets for comparing and analyzing logistic costs.
I have the following:
A sheet with a database of numbers, organized like this:
A second sheet with a table in which, using the MIN function, I get the price of the cheapest provider for each model, depending on quantity and destination.
And last, into another sheet, I have what I call "The interface". Using an INDEX MATCH MATCH formula, I let the user choose destination and quantity for each one of the models avalable, and it returns the cheapest price. (I can't post more images, so basically it has this structure):
MODEL A
DESTINATION: DESTINATION 2
NUM. OBJ: 2
PRICE: 59
PROVIDER:
My problem is that I can't figure how to make it return the name of the provider with the cheapest price, as I'm referencing from the second table, in which in a same row or column there are cells with prices that belong to different providers.
Using min is undesirable in this context, because it doesn't tell you where the minimal value was found, and you need this information.
Here is a formula that returns the minimal cost together with the provider. In my example, the data is in the range A1:E7, as below; destination is in G1 and model is in G2.
=iferror(array_constrain(sort({filter(A1:A7, B1:B7=G2), filter(filter(C1:E7, B1:B7=G2), C1:E1=G1)}, 2, True), 1, 2), "Not found")
The same with linebreaks for readability:
=iferror(
array_constrain(
sort(
{
filter(A1:A7, B1:B7 = G2),
filter(filter(C1:E7, B1:B7 = G2), C1:E1 = G1)
},
2, True),
1, 2),
"Not found")
Explanation:
filtering by B1:B7 = G2 means keeping only the rows with the desired model
filtering by C1:E1 = G1 means keeping only the column with desired destination
{ , } means putting two parts of a filtered table together: column A, and column with destination
sort by 2nd column (price), in ascending order (true)
array_constrain keeps only the first row in this sort; that is, one with lowest price.
iferror is in case there is no such destination or model in the table. Then the function returns "not found".
Example: with G1 = Destination 1 and G2 = A, the formula returns
Provider 2 2

Rdlc - calculate sum of columns and display total in a single cell conditionally

Here is the scenario, I have a dataset with fields Category, Country and NUM_SCHOOLS.
i created a column to populate the country names as columns. I created a row group to calculate the row column. In my current report Column headers(Country) Country1, Country2....so on are displayed and row headers(Category) A, B, C and D are displayed. Values is [Sum(Fields!NUM_SCHOOLS.Value)]. Everything is correctly displayed
I used pipe(|) symbol as the separator between the cells, i am not allowed to post images, i tried my best to explain. Please let me know if you need any information to help me.
Current Report:
Country1 Country2
A 10 | 12
B 5 | 6
C 5 | 7
D 11 | 15
Required report:
Country1 Country2
A 10 | 12
B 5 | 6
C 5 | 7
D 26
Only for D column, i want to add the numbers and display the value as a single value. (11+15=26), for other categories it should display in different country buckets.
Please help me out. Thanks in Advance!
Sum of more than One Columns Quantity of More than one dataset in rdlc.=Sum(Fields!QUANTITY.Value, "Tone")
+Sum(Fields!QUANTITY.Value, "Buffalo")
+Sum(Fields!QUANTITY.Value, "Cow")
Sorry to be the bearer of bad news, but I don't think that you can merge columns across column groups.
I think that the best option is to remove your column grouping and manually add in 7 columns for your receipt frequencies. You'd have to use a Sum with an Iif to get your values correctly, for instance in the far left column, something like:
=Sum(iif(fields!RECIEPT_FREQUENCY.Value="ANNUAL" ,Fields!val.Value,0))
then you could add a merged cell underneath and add the following expression
=Sum(iif(Fields!PART_COUNT.Value="D", Fields!val.Value,0),"DataSetName")
Alternatively, you could leave it as it is and enter the following expression in a total row at the bottom of your matrix. But you would have to do something expression based for the cell borders to give the illusion of it being merged..
=Sum(iif(Fields!PART_COUNT.Value="D"
And fields!RECIEPT_FREQUENCY.Value="BI-WEEKLY"
,Fields!val.Value,0),"DataSetName")

R - Sorting and Sub-setting Maximum Values within Columns

I am trying to iteratively sort data within columns to extract N maximum values.
My data is set up with the first and second columns containing occupation titles and codes, and all of the rest of the columns containing comparative values (in this case location quotients that had to be previously calculated for each city) for those occupations for various cities:
*occ_code city1 ... city300*
occ1 5 ... 7
occ2 20 ... 22
. . . .
. . . .
occ800 20 ... 25
For each city I want to sort by the maximum values, select a subset of those maximum values matched by their respective occupations titles and titles. I thought it would be relatively trivial but...
edit for clarification: I want end to with a sorted subset of the data for analysis.
occ_code city1
occ200 10
occ90 8
occ20 2
occ95 1.5
At the same time I want to be able to repeat the sort column-wise (so I've tried lots of order commands through calling columns directly: data[,2]; just to be able to run the same analysis functions over the entire dataset.
I've been messing with plyr for the past 3 days and I feel like the setup of my dataset is just not conducive to how plyer was meant to be used.
I'm not exactly sure what your desired output is according to your example snippit. Here's how you could get a data frame like that for every city using plyr and reshape
#using the same df from nico's answer
library(reshape)
df.m <- melt(df, id = 1)
a.cities <- cast(df.m, codes ~ . | variable)
library(plyr)
a.cities.max <- aaply(a.cities, 1, function(x) arrange(x, desc(`(all)`))[1:4,])
Now, a.cities.max is an array of data frames, with the 4 largest values for each city in each data frame. To get one of these data frames, you can index it with
a.cities.max$X13
I don't know exactly what you'll be doing with this data, but you might want it back in data frame format.
df.cities.max <- adply(a.cities.max, 1)
One way would be to use order with ddply from the package plyr
> library(plyr)
> d<-data.frame(occu=rep(letters[1:5],2),city=rep(c('A','B'),each=5),val=1:10)
> ddply(d,.(city),function(x) x[order(x$val,decreasing=TRUE)[1:3],])
order can sort on multiple columns if you want that.
This will output the max for each city. Similar results can be obtained using sort or order
# Generate some fake data
codes <- paste("Code", 1:100, sep="")
values <- matrix(0, ncol=20, nrow=100)
for (i in 1:20)
values[,i] <- sample(0:100, 100, replace=T)
df <- data.frame(codes, values)
names(df) <- c("Code", paste("City", 1:20, sep=""))
# Now for each city we get the maximum
maxval <- apply(df[2:21], 2, which.max)
# Output the max for each city
print(cbind(paste("City", 1:20), codes[maxval]))

Resources