Sorting content of arrays in controller - sorting

I've got a 'status' and 'type' within a 'subjects' table. This status can contain the strings: 'Open', 'In progress' and 'Closed. I want to sort the output with 'In progress' first, then 'Open', then 'Closed'.
Within the sortings of status, I want to sort on types too, which can contain four different strings too.
Is this possible (in a controller) and if yes; how?

I've solved this by using enum.
I removed the string columns and used integers instead
schema.db
t.integer "status", default: 0
t.integer "casetype", default: 0
Then I added this to my Subject model
subject.rb
enum status: ['In progress', 'Open', 'Closed']
enum casetype: %w(Info NFI RFC RFA)
Then I ordered with this:
#subjects = Subject.all.order('status ASC, casetype')
More info about enum: http://edgeapi.rubyonrails.org/classes/ActiveRecord/Enum.html

Related

Which Postgresql index is most efficient for text column with queries based on similarity

I would like to create an index on text column for the following use case. We have a table of Segment with a column content of type text. We perform queries based on the similarity by using pg_trgm. This is used in a translation editor for finding similar strings.
Here are the table details:
CREATE TABLE public.segments
(
id integer NOT NULL DEFAULT nextval('segments_id_seq'::regclass),
language_id integer NOT NULL,
content text NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT segments_pkey PRIMARY KEY (id),
CONSTRAINT segments_language_id_fkey FOREIGN KEY (language_id)
REFERENCES public.languages (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT segments_content_language_id_key UNIQUE (content, language_id)
)
And here is the query (Ruby + Hanami):
def find_by_segment_match(source_text_for_lookup, source_lang, sim_score)
aggregate(:translation_records)
.where(language_id: source_lang)
.where { similarity(:content, source_text_for_lookup) > sim_score/100.00 }
.select_append { float::similarity(:content, source_text_for_lookup).as(:similarity) }
.order { similarity(:content, source_text_for_lookup).desc }
end
---EDIT---
This is the query:
SELECT "id", "language_id", "content", "created_at", "updated_at", SIMILARITY("content", 'This will not work.') AS "similarity" FROM "segments" WHERE (("language_id" = 2) AND (similarity("content", 'This will not work.') > 0.45)) ORDER BY SIMILARITY("content", 'This will not work.') DESC
SELECT "translation_records"."id", "translation_records"."source_segment_id", "translation_records"."target_segment_id", "translation_records"."domain_id",
"translation_records"."style_id",
"translation_records"."created_by", "translation_records"."updated_by", "translation_records"."project_name", "translation_records"."created_at", "translation_records"."updated_at", "translation_records"."language_combination", "translation_records"."uid",
"translation_records"."import_comment" FROM "translation_records" INNER JOIN "segments" ON ("segments"."id" = "translation_records"."source_segment_id") WHERE ("translation_records"."source_segment_id" IN (27548)) ORDER BY "translation_records"."id"
---END EDIT---
---EDIT 1---
What about re-indexing? Initially we'll import about 2 million legacy records. When and how often, if at all, should we rebuild the index?
---END EDIT 1---
Would something like CREATE INDEX ON segment USING gist (content) be ok? I can't really find which of the available indices would be best suitable for our use case.
Best, seba
The 2nd query you show seems to be unrelated to this question.
Your first query can't use a trigram index, as the query would have to be written in operator form, not function form, to do that.
In operator form, it would look like this:
SELECT "id", "language_id", "content", "created_at", "updated_at", SIMILARITY("content", 'This will not work.') AS "similarity"
FROM segments
WHERE language_id = 2 AND content % 'This will not work.'
ORDER BY content <-> 'This will not work.';
In order for % to be equivalent to similarity("content", 'This will not work.') > 0.45, you would first need to do a set pg_trgm.similarity_threshold TO 0.45;.
Now how you get ruby/hanami to generate this form, I don't know.
The % operator can be supported by either the gin_trgm_ops index or the gist_index_ops index. The <-> can only be supported by gist_trgm_ops. But it is pretty hard to predict how efficient that support will be. If your "contents" column is long or your text to compare is long, it is unlikely to be very efficient, especially in the case of gist.
Ideally you would partition your table by language_id. If not, then it might be helpful to build a multicolumn index having both columns.
CREATE INDEX segment_language_id_idx ON segment USING btree (language_id);
CREATE INDEX segment_content_gin ON segment USING gin (content gin_trgm_ops);

sphinx search with specific ordre

I have list of company (what sphinx return), I want to sort them to (company with store first) without overriding the weight of fields.
I tried :
:order => 'store DESC'
it's ok but the weight order has broken
If you want both the weight and the store to impact the ordering, you'll need to specify both - :order doesn't build upon existing values, it replaces them.
:order => 'weight() DESC, store DESC'

Which is the most used name?

I am working on a ruby on rails site and I want to check its database for which is the most frequent name among the registered users.
There is a row called "First Name" for which I will go through. I don't mind about case sensitive right now.
Any convenient way to for example check what is the most popular name and then the second most popular, the third most popular and so on?
What I thought of is to get all users in an array and then do #users.each do |user|, then record the names in an array and after that to count the duplicates of each record that has more than one element recorded. I am not sure if its the proper way though.
Here is how you can do it using ActiveRecord:
User.group(:first_name).order('popularity desc').pluck(:first_name, 'count(*) as popularity')
This code translates to the SQL:
SELECT "users.first_name", count(*) as popularity FROM "users"
GROUP BY first_name
ORDER BY popularity
and you get something like:
[["John", 2345], ["James", 1986], ["Sam", 1835], ...]
If you want only the top ten names, you can limit the number of results simply by adding limit:
User.group(:first_name).order('popularity desc').limit(10).pluck(:first_name, 'count(*) as popularity')
Another option is to use the count API:
User.group(:first_name).count
=> {"Sam" => 1835, "Stefanos" => 2, ...}
# ordered
User.group(:first_name).order('count_all desc').count
=> {"John" => 2345, "James" => 1986, "Sam" => 1835, ...}
# top 3
User.group(:first_name).order('count_all desc').limit(3).count
=> {"John" => 2345, "James" => 1986, "Sam" => 1835 }
You could do the following SQL statement
select count(*) as count from users group by users.first_name order by count desc
Will return you the top most results. As Boris said, using just sql is the right way to go here.
Otherwise if you want to load all the users, you could do so by map-reduce.
#users.group_by(&:first_name).sort(&:count).reverse
Will give you an array of users sorted descending by their names.
Another way using ActiveRecord:
User.group(:first_name).count
Generated SQL is:
SELECT COUNT(*) AS count_all, name AS name FROM `users` GROUP BY name
Will output a hash of { name => number_of_occurances } e.g
{"John" => 29, "Peter" => 87, "Sarah" => 2}

symfony1 enums with column aggregation inheritence

I have a profile table that saves all profiles for all user.
I have different types of users and want each type of user to have different select options for choosing a certain field.
So both user types can choose how long they want to register for, but the have different options - one can choose 2 years and the other cant.
The schema.yml looks something like this:
UserProfile:
columns:
username:
type: string(255)
notnull: true
unique: false
WriterUserProfile:
inheritance:
type: column_aggregation
extends: UserProfile
columns:
register_time:
type: enum
values:
- 6 months
- 1 year
- 2 years
- Other
default: other
ReaderUserProfile:
inheritance:
type: column_aggregation
extends: UserProfile
columns:
register_time:
type: enum
values:
- 6 months
- 1 year
- Other
default: other
For some reason I am unable to select the '2 year' option - the form gives an 'invalid' error.
Does the '2 years' and 'Other' coincide with eachother because they are both the 3rd option?
Are there other fields which are not common? This one field only not enough cause to use column aggregation. Anyway, if the same field appears in multiple sub-classes than the field should be moved up and field names should be unique among all related classes (UserProfile, WriterUserProfile, ReaderUserProfile in your case).
You can change the options of the choice field in a form:
$choices = array('0' => 'a', '1' => 'b');
$this->getWidget('register_time')->setOption('choices', $choices);
$this->getValidator('register_time')->setOption('choices', array_keys($choices));

ActiveRecord count of distinct days from created_at?

Is it possible to retrieve a count of distinct records based on a field value if the field needs to be interrogated (ideally, using ActiveRecord alone)?
For example, the following returns a count of unique records based on the 'created_at' field:
Record.count('created_at', :distinct => true)
However, is it possible to get a count of, say, unique days based on the 'created_at' field in a similar way?
A naive ActiveRecord example to explain my intent being:
Record.count('created_at'.day, :distinct => true)
(I know the string 'created_at' isn't a 'Time', but that's the sort of query I'd like to ask ActiveRecord.)
You need to group the records. For example
Record.group('DATE(created_at)').count('created_at')
tells you the number of rows created on each particular date, or
Record.group('DAYOFWEEK(created_at)').count('created_at')
would tell you the number of rows created on individual days of the week.
Beware that this with the usual active record setup this will do date calculations in UTC, if you want to do your calculations in a specific timezone you'll have to add that conversion to the group statement.

Resources