I would like to have ActiveRecord objects to run tests against that span several days.
For example, I want x number of Posts over the last x days so that I can calculate the average rating for posts posted yesterday. This way I know it didn't include tests from posts in the previous days.
Would I have to seed the database to do this or am I writing bad tests? :)
Take the following simplified example where your posts table has the rating column and a created_at column.
You can 'seed'/setup your test to mimic this, without needing to actually seed your development database for this.
let(:post_one) { Post.create(rating: 5, created_at: Time.now) }
let(:post_two) { Post.create(rating: 5, created_at: Time.yesterday) }
it "calculates the average correctly" do
expect "your expectation"
end
Ideally make use of FactoryGirl rather than simply Post.create. You can then switch that out for FactoryGirl.create(:post) { etc... }
Related
I'm writing this app to do some office pool on the NFL games, so I have a table where I store the game results, it basically has date and time of the game, visitor and local, and scores. Now I want to write a model attribute to get the team's record (W-L-D), so first I have to get how many games the team has played, I'm trying this:
$games = Game::whereNotNull('homescore')
->whereNotNull('awayscore')
->where('home','PIT')
->orWhere('away','PIT')
->get()
So this should give me all the games played by PIT as away or home, and only the games where a score is present, right?
Well... it's not working. If I take away both ->wherNotNUll() I correctly get the 17 games PIT will play in a season. But if I use them it gets me 10 games. Not at random, it alway get me the same 10 games. In all those games PIT is involved, at least.
This is one of the records I'm getting:
App\Models\Juego {#4788
id: 4010,
season: 2022,
week: 4,
valid: "2022-10-02 12:00:00",
away: "NYJ",
awayscore: null,
home: "PIT",
homescore: null,
dif: null,
created_at: "2023-02-05 22:24:09",
updated_at: "2023-02-06 13:18:26",
},
I even break up the query doing something like:
$games = Game::where('home','PIT')->orWhere('away','PIT')
(This gives 17 correct results) and then over that result applying the $games->whereNotNull('homescore') and awayscore and still gives me the 10 results (the query should be giving me 4 results, where there are final score for the game)
its become a mess because of the combination of whereNotNull, where and orWhere
the orWhere messes up your whole sql query logic and does not work the way you expect.
Just put the where/orWhere in a closure and it should be fine
Game::whereNotNull('homescore')
->whereNotNull('awayscore')
->where(function ($query) {
$query->where('home', 'PIT')
->orWhere('away', 'PIT');
})
->get();
just a bit of explanation,
this is the raw sql of your query, everything is fine until the OR is added
WHERE homescore IS not NULL
AND awayscore IS not NULL
AND home = 'PIT'
OR away = 'PIT'
with closure it becomes like this (which is probably what you are wanting)
WHERE homescore IS not NULL
AND awayscore IS not NULL
AND (home = 'PIT' or away = 'PIT')
I have several records in my database, the table has a column named "weekday" where I store a weekday like "mon" or "fri". Now from the frontend when a user does search the parameters posted to the server are startday and endDay.
Now I would like to retrieve all records between startDay and endDay. We can assume startDay is "mon" and endDay is "sun". I do not currently know how to do this.
Create another table with the names of the days and their corresponding number. Then you'd just need to join up your current table with the days table by name, and then use the numbers in that table to do your queries.
Not exactly practical, but it is possible to convert sun,mon,tue to numbers using MySQL.
Setup a static year and week number like 201610 for the 10th week of this year, then use a combination of DATE_FORMAT with STR_TO_DATE:
DATE_FORMAT(STR_TO_DATE('201610 mon', '%X%V %a'), '%w')
DATE_FORMAT(STR_TO_DATE('201610 sun', '%X%V %a'), '%w')
DATE_FORMAT(STR_TO_DATE('201610 tue', '%X%V %a'), '%w')
These 3 statements will evaluate to 0,1,2 respectively.
The main thing this is doing is converting the %a format (Sun-Sat) to the %w format (0-6)
well i don't know the architecture of your application as i think storing and querying a week day string is not appropriate, but i can tell you a work around this.
make a helper function which return you an array of weekdays in the range i-e
function getWeekDaysArray($startWeekDay, $endWeekDay) {
returns $daysArray['mon','tue','wed'];
}
$daysRangeArray = getWeekDaysArray('mon', 'wed');
now with this array you can query in table
DB::table('TableName')->whereIn('week_day', $daysRangeArray)->get();
Hope this help
I am not quite sure how to title this question and I am having a hard time getting it to work so here goes.
I have a hash of users which is various sizes. This can be anywhere from 2 to 40. I also have a hash of tickets which I want to search through and find any entries that do not contain the user_id of my users hash. I am not quite sure how to accomplish this. My last attempt I used this:
#not_found = []
users.each do |u|
#not_found += #tickets.select{|t| t["user_id"] != u.user_id}
end
I know this is not the right result as it does a comparison for only the one user_id. What I need to do is run through all of the tickets and pull out any results that contain a user_id that is not in the users hash.
I hope I am explaining this properly and appreciate any help!
Try this
known_user_ids = users.map(&:user_id)
tickets_with_unknown_users = #tickets.reject{ |t| known_user_ids.include?(t["user_id"]) }
I am new to hadoop and all its derivatives. And I am really getting intimidated by the abundance of information available.
But one thing I have realized is that to start implementing/using hadoop or distributed codes, one has to basically change the way they think about a problem.
I was wondering if someone can help me in the following.
So, basically (like anyone else) I have a raw data.. I want to parse it and extract some information and then run some algorithm and save the results.
Lets say I have a text file "foo.txt" where data is like:
id,$value,garbage_field,time_string\n
1, 200, grrrr,2012:12:2:13:00:00
2, 12.22,jlfa,2012:12:4:15:00:00
1, 2, ajf, 2012:12:22:13:56:00
As you can see that the id can be repeated.This id can be like how much money a customer has spent!!
What I want to do is save the result in a file which contains how much money each of the customer has spent in "morning","afternoon""evening""night"
(You can define your some time buckets to define what morning and all is.
For example here probably
1, 0,202,0,0
1 is the id, 0--> 0$ spent in morning, 202 in afternon, 0 in evening and night
Now I have a python code for it.. But I have to implement this in pig.. to get started.
If anyone can just write/guide me thru this.. Thats all I need to get started.
Thanks
I'd start like this:
foo = LOAD 'foo.txt' USING PigStorage(',') AS (
CUSTOMER_ID:int,
DOLLARS_SPENT:float,
GARBAGE_FIELD,
TIME_STRING:chararray
);
foo_with_timeslots = FOREACH foo {
GENERATE
CUSTOMER_ID,
DOLLARS_SPENT,
/* DO TIME SLOT CALCULATION HERE */ AS TIME_SLOT
;
}
I don't have much knowledge of date/time values in pig, so I'll leave how to do conversion from time string to timeslot, to you.
id_grouped_foo_with_timeslots = GROUP foo_with_timeslots BY (
CUSTOMER_ID,
TIME_SLOT
);
-- Calculate how much each customer spent at time slots
spent_per_customer_per_timeslot = FOREACH id_grouped_foo_with_timeslots {
GENERATE
group.CUSTOMER_ID as CUSTOMER_ID,
group.TIME_SLOT as TIME_SLOT,
SUM(foo_with_timeslots.DOLLARS_SPENT) as TOTAL_SPENT
;
}
You'll have an output like below in spent_per_customer_per_timeslot
1,Morning,200
1,Evening,100
2,Afternoon,30
At this point it should be trivial to re-group the data and put it in the shape you want.
It's said that using skip() for pagination in MongoDB collection with many records is slow and not recommended.
Ranged pagination (based on >_id comparsion) could be used
db.items.find({_id: {$gt: ObjectId('4f4a3ba2751e88780b000000')}});
It's good for displaying prev. & next buttons - but it's not very easy to implement when you want to display actual page numbers 1 ... 5 6 7 ... 124 - you need to pre-calculate from which "_id" each page starts.
So I have two questions:
1) When should I start worry about that? When there're "too many records" with noticeable slowdown for skip()? 1 000? 1 000 000?
2) What is the best approach to show links with actual page numbers when using ranged pagination?
Good question!
"How many is too many?" - that, of course, depends on your data size and performance requirements. I, personally, feel uncomfortable when I skip more than 500-1000 records.
The actual answer depends on your requirements. Here's what modern sites do (or, at least, some of them).
First, navbar looks like this:
1 2 3 ... 457
They get final page number from total record count and page size. Let's jump to page 3. That will involve some skipping from the first record. When results arrive, you know id of first record on page 3.
1 2 3 4 5 ... 457
Let's skip some more and go to page 5.
1 ... 3 4 5 6 7 ... 457
You get the idea. At each point you see first, last and current pages, and also two pages forward and backward from the current page.
Queries
var current_id; // id of first record on current page.
// go to page current+N
db.collection.find({_id: {$gte: current_id}}).
skip(N * page_size).
limit(page_size).
sort({_id: 1});
// go to page current-N
// note that due to the nature of skipping back,
// this query will get you records in reverse order
// (last records on the page being first in the resultset)
// You should reverse them in the app.
db.collection.find({_id: {$lt: current_id}}).
skip((N-1)*page_size).
limit(page_size).
sort({_id: -1});
It's hard to give a general answer because it depends a lot on what query (or queries) you are using to construct the set of results that are being displayed. If the results can be found using only the index and are presented in index order then db.dataset.find().limit().skip() can perform well even with a large number of skips. This is likely the easiest approach to code up. But even in that case, if you can cache page numbers and tie them to index values you can make it faster for the second and third person that wants to view page 71, for example.
In a very dynamic dataset where documents will be added and removed while someone else is paging through data, such caching will become out-of-date quickly and the limit and skip method may be the only one reliable enough to give good results.
I recently encounter the same problem when trying to paginate a request while using a field that wasn't unique, for example "FirstName". The idea of this query is to be able to implement pagination on a non-unique field without using skip()
The main problem here is being able to query for a field that is not unique "FirstName" because the following will happen:
$gt: {"FirstName": "Carlos"} -> this will skip all the records where first name is "Carlos"
$gte: {"FirstName": "Carlos"} -> will always return the same set of data
Therefore the solution I came up with was making the $match portion of the query unique by combining the targeted search field with a secondary field in order to make it a unique search.
Ascending order:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$gt: 'Carlos'}}]}},
{$sort: {'FirstName': 1, '_id': 1}},
{$limit: 10}
])
Descending order:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$lt: 'Carlos'}}]}},
{$sort: {'FirstName': -1, '_id': 1}},
{$limit: 10}
])
The $match part of this query is basically behaving as an if statement:
if firstName is "Carlos" then it needs to also be greater than this id
if firstName is not equal to "Carlos" then it needs to be greater than "Carlos"
Only problem is that you cannot navigate to an specific page number (it can probably be done with some code manipulation) but other than it solved my problem with pagination for non-unique fields without having to use skip which eats a lot of memory and processing power when getting to the end of whatever dataset you are querying for.