is it possible to create some unique ID for articles on rails?
For example, first article will get ID - aa-001,
second - aa-002
...
article #999 - aa-999,
article #1000 - ab-001 and so on?
Thanks in advance for your help!
The following method gives the next id in the sequence, given the one before:
def next_id(id, limit = 3, seperator = '-')
if id[/[0-9]+\z/] == ?9 * limit
"#{id[/\A[a-z]+/i].next}#{seperator}#{?0 * (limit - 1)}1"
else
id.next
end
end
> next_id("aa-009")
=> "aa-010"
> next_id("aa-999")
=> "ab-001"
The limit parameter specifies the number of digits. You can use as many prefix characters as you want.
Which means you could use it like this in your application:
> Post.last.special_id
=> "bc-999"
next_id(Post.last.special_id)
=> "bd-001"
However, I'm not sure I'd advice you to do it like this. Databases have smart methods to avoid race conditions for creating ids when entries are created concurrently. In Postgres, for example, it doesn't guarantee gapless ids.
This approach has no such mechanism, which could potentially lead to race conditions. However, if this is extremely unlikely to happen such in a case where you are the only one writing articles, you could do it anyway. I'm not exactly sure what you want to use this for, but you might want to look into to_param.
You may want to look into the FriendlyId gem. There’s also a Railscast on this topic which covers a manual approach as well as the usage of FriendlyId.
Related
I have three tables: fellows, mappingtabl, events with a many_to_many association in the model Fellow to access the events for a fellow.
Like always one fellow can attend multi events and many fellow make an event..
I found it is really hard to filter events without coding it myself, or I am not able to understand sequel docu:
I want a list of future events for a specific fellow...
#fellow=Fellow[PersID: #pers_id, JourneyID: #journey_id]
#events=#fellow.events_dataset.where(Ruby: 'calc')
#events=#events.where{StartDate > Time.now()}
The thing is, the output is: uninitialized constant App::StartDate
While 'StartDate' is the corresponding column in DB I understand the output from Ruby perspective. I never told the block what StartDate is and in addition it thinks it is a constant
How do I tell sequel to compare the Time stored in the Field events.StartDate and compare it with Time.now?
The direct selection a line above worked perfect using a hash value pair...
The docu is not very helpful as they simple do what I did:
items.where{credit > debit}.sql
# "SELECT * FROM items WHERE (credit > debit)
Nobody told the block what credit or debit are...
Any ideas how to access the Field StartDate in events for a where clause? I can iterate an my own but this is a bit strange... it feels that it makes a difference if the dataset is directly linked to a table or just a result of an association.
You can use one of the following:
#events=#events.where{StartDate() > Time.now()}
#events=#events.where{self.StartDate > Time.now()}
#events=#events.where{|o| o.StartDate > Time.now()}
#events=#events.where(Sequel[:StartDate] > Time.now)
#events=#events.where{events[:StartDate]> Time.now()}
The key was hidden in sequels virtual row docu...
I'm implementing Model.find_each as mentioned here:
http://edgeguides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects-in-batches
but I'm getting this error message:
Ruby ActiveRecord: DEPRECATION WARNING: Relation#find_in_batches with finder options is deprecated. Please build a scope and then call find_in_batches on it instead.
for this code:
Person.find_each(start: start_index, limit: limit) do |person|
I'm pretty much following the code given in the documentation so I'm a little puzzled. Is this code correct and, if not, what's the fix?
Change limit: limit for batch_size: limit
EDIT
Since you need a limit and your batch is based on id, i think you can do something like:
Person.where("id < ?", (start_index + limit)).find_each(start: start_index) do |person|
I don't get exactly the issue but I think you can just write this:
Person.offset(start_index).limit(limit).find_each(batch_size: NUMBER) do |person|
find_each is just a replacement for each, it's the last thing you should do on a query and you should use it only to choose how many results will be loaded in memory (batch_size).
For offsets/limits you should use the usual query API.
Important: Notice that find_each has nothing to do with find, find_each replaces each, which should be used to loop over records, not to find/filter/what else.
Also, is clearly written in the guide:
The find_each method accepts most of the options allowed by the
regular find method, except for :order and :limit, which are reserved
for internal use by find_each.
So you can't use :limit. In any case I think there is something wrong in the guide: you should use find_each for looping, not for searching.
Notice also that :start is used to restart at some point an interrupted job (it's not directly and id like 2000, it's the 2000 element in a batch of 5000 for example), much like the docs states:
By default, records are fetched in ascending order of the primary key,
which must be an integer. The :start option allows you to configure
the first ID of the sequence whenever the lowest ID is not the one you
need. This would be useful, for example, if you wanted to resume an
interrupted batch process, provided you saved the last processed ID as
a checkpoint.
You should not use it to replace offset and limit methods from ActiveRecord
User.find(:all, :order => "RANDOM()", :limit => 10) was the way I did it in Rails 3.
User.all(:order => "RANDOM()", :limit => 10) is how I thought Rails 4 would do it, but this is still giving me a Deprecation warning:
DEPRECATION WARNING: Relation#all is deprecated. If you want to eager-load a relation, you can call #load (e.g. `Post.where(published: true).load`). If you want to get an array of records from a relation, you can call #to_a (e.g. `Post.where(published: true).to_a`).
You'll want to use the order and limit methods instead. You can get rid of the all.
For PostgreSQL and SQLite:
User.order("RANDOM()").limit(10)
Or for MySQL:
User.order("RAND()").limit(10)
As the random function could change for different databases, I would recommend to use the following code:
User.offset(rand(User.count)).first
Of course, this is useful only if you're looking for only one record.
If you wanna get more that one, you could do something like:
User.offset(rand(User.count) - 10).limit(10)
The - 10 is to assure you get 10 records in case rand returns a number greater than count - 10.
Keep in mind you'll always get 10 consecutive records.
I think the best solution is really ordering randomly in database.
But if you need to avoid specific random function from database, you can use pluck and shuffle approach.
For one record:
User.find(User.pluck(:id).shuffle.first)
For more than one record:
User.where(id: User.pluck(:id).sample(10))
I would suggest making this a scope as you can then chain it:
class User < ActiveRecord::Base
scope :random, -> { order(Arel::Nodes::NamedFunction.new('RANDOM', [])) }
end
User.random.limit(10)
User.active.random.limit(10)
While not the fastest solution, I like the brevity of:
User.ids.sample(10)
The .ids method yields an array of User IDs and .sample(10) picks 10 random values from this array.
Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally
For MYSQL this worked for me:
User.order("RAND()").limit(10)
You could call .sample on the records, like: User.all.sample(10)
The answer of #maurimiranda User.offset(rand(User.count)).first is not good in case we need get 10 random records because User.offset(rand(User.count) - 10).limit(10) will return a sequence of 10 records from the random position, they are not "total randomly", right? So we need to call that function 10 times to get 10 "total randomly".
Beside that, offset is also not good if the random function return a high value. If your query looks like offset: 10000 and limit: 20 , it is generating 10,020 rows and throwing away the first 10,000 of them,
which is very expensive. So call 10 times offset.limit is not efficient.
So i thought that in case we just want to get one random user then User.offset(rand(User.count)).first maybe better (at least we can improve by caching User.count).
But if we want 10 random users or more then User.order("RAND()").limit(10) should be better.
Here's a quick solution.. currently using it with over 1.5 million records and getting decent performance. The best solution would be to cache one or more random record sets, and then refresh them with a background worker at a desired interval.
Created random_records_helper.rb file:
module RandomRecordsHelper
def random_user_ids(n)
user_ids = []
user_count = User.count
n.times{user_ids << rand(1..user_count)}
return user_ids
end
in the controller:
#users = User.where(id: random_user_ids(10))
This is much quicker than the .order("RANDOM()").limit(10) method - I went from a 13 sec load time down to 500ms.
We have a posting analyzing requirement, that is, for a specific post, we need to return a list of posts which are mostly related to it, the logic is comparing the count of common tags in the posts. For example:
postA = {"author":"abc",
"title":"blah blah",
"tags":["japan","japanese style","england"],
}
there are may be other posts with tags like:
postB:["japan", "england"]
postC:["japan"]
postD:["joke"]
so basically, postB gets 2 counts, postC gets 1 counts when comparing to the tags in the postA. postD gets 0 and will not be included in the result.
My understanding for now is to use map/reduce to produce the result, I understand the basic usage of map/reduce, but I can't figure out a solution for this specific purpose.
Any help? Or is there a better way like custom sorting function to work it out? I'm currently using the pymongodb as I'm python developer.
You should create an index on tags:
db.posts.ensure_index([('tags', 1)])
and search for posts that share at least one tag with postA:
posts = list(db.posts.find({_id: {$ne: postA['_id']}, 'tags': {'$in': postA['tags']}}))
and finally, sort by intersection in Python:
key = lambda post: len(tag for tag in post['tags'] if tag in postA['tags'])
posts.sort(key=key, reverse=True)
Note that if postA shares at least one tag with a large number of other posts this won't perform well, because you'll send so much data from Mongo to your application; unfortunately there's no way to sort and limit by the size of the intersection using Mongo itself.
I need a 6 character alphanumeric ID for use in my rails app, which will be presented to users of the system and must be unique among all the object instances in my system. I don't expect more than a few thousand object instances, so 6 characters is far more than I really need.
At this point I'm using the UUIDTools gem in my Rails app to generate a uuid. Which of the UUIDTools generation methods should I use, and which end of the resulting uuid should I take the 6 characters from, to guarantee uniqueness?
for example, if I generate ef1cf087-95c9-4868-bd95-cea950a52b58, would I want to use ef1cf0 from the front of it, or a52b58 from the back end?
... as a side note / question: am i going about this wrong? is there a better way?
No way. UUID is considered unique because it is very long and it is practically impossible to generate same UUIDs. If you trim it to 6 chars then you drammatically increase possiblility of duplicate. You have to use either incrementing id or full UUID.
Only deterministic generation (id(x + 1) = id(x) + 1) can guarantee uniqueness. UUID doesn't guarantee it and 6 chars guarantee it even less.
Other option is to create ID generation service, it will have single method getNewId and will keep knowledge that will be enought to provide unique ids. (Simplest case - counter)
When you say that incrementing the ID isn't an option, is that because you don't want users to see the scheme you're using, or because the generation must be stateless (i.e., you can't keep track of all IDs you've generated)?
If it's the former, then you can generate an ID, check to see if you've already used it, and if so, generate another new ID. (Seems pretty obvious so sorry if I'm on the wrong track.) You could do something like this:
while id = rand(2**256).to_s(36)[0..5]
break unless Ids.exists?(id)
end
where Ids.exists?(id) is the does-this-already-exist method.