Suitable Datastructure for a bookstore - data-structures

How can I build a datastructure that helps me query products based on their category, title, author etc.
ItemNo , Product No (unique), Title, Author, Category
1 , 00000001, Programming Interview, Olivier , Books > Business, Finance & Law > Careers > Job Hunting
2 , 000002, The Art of Captaincy, Robert James, Books > Biography > Sport > Cricket
What is the best way to do this - (this is not for homework or assignment)

If you need queries based on so many different categories, a relational database seems the right tool. You also get ACID for free.
Of course, you could also maintain a lot of maps from all columns you want to index to your products.

Related

Lookupvalue for a Search String in Tabular Model

Seeking assistance with Tabular Model DAX query.
I've a Lookup/Reference table which provides the Sport Items and Sports related to it.
Like
SPORT_ITEM SPORT
_____________________________
BaseballBAT Baseball
BASEBALL BAT Baseball
Baseball Glove Baseball
Helmet Football
Shoulderpads Football
Shoulder Pads Football
Then I have a table which has descriptive column. Like
ITEM_DESCRIPTION
__________________
Baseballbat Needed
Baseball Bat required
Helmet wanted
ShoulderPads provided
Shoulder Pads needed
What I've been asked to do is -
Lookup the value under ITEM_DESC and to the sting matching SPORT_ITEM and return SPORT name column.
So I should see
ITEM_DESCRIPTION SPORT
__________________________________
Baseballbat Needed Baseball
Baseball Bat required Baseball
Helmet wanted Football
ShoulderPads provided Football
Shoulder Pads needed Football
Note:
Unfortunately there is no relationship between the 2 tables. This
lookup is the only way to join.
I cannot do this join in the data source because I'm not allowed by the DBAs. Long story, (you didnt hear me say red tape).
Since there is no join, I cannot use Related, right?
And since this is Tabular Model, there is no CONTAINSVALUE.
Please let me know how this can be achieved.
Any help is much appreciated
For those who may stumble upon similar issue -
After a lot of reading everywhere, found the below link and especially in comments.
A combination of a various solutions in comments helped.
Info

Best practice for handling many-to-many relationships in Elasticsearch?

I'm pretty sure I know the answer to this question but am looking for confirmation from someone with more Elasticsearch experience than me.
Let's say I've got a database containing Authors and Books. An author can be associated with 0 or more books, and a book can be associated with 1 or more authors. We want users to be able to search on author name to find the author and all his/her books, and we also want them to be able to search on book title to get back its author(s). We know there will be plenty of multi-author books.
Because Elasticsearch only directly supports one level of parent-child relationships, and because children can only have one parent, it seems to me that we need to denormalize the data and use nested objects to establish this relationship. If we modify properties of an author who has published 23 books, we will need to reindex the author record and all 23 of his/her book records.
In my fantasy world, I'd love to have those 23 books each contain an array of author IDs so that I don't have to reindex books when I reindex authors. It seems like this would definitely be possible using Elasticsearch's parent-child support if a book could only have one author, but because of the many-to-many requirement, I have to use nested objects and reindex any related objects whenever anything changes.
Is this correct? It certainly seems like more work (and certainly more updates), but I want to do this the right way, not the "clever" way that introduces complexity and bugs and madness.
Any guidance would be appreciated.
From your question I can safely assume that ES will not be your primary data-store. So the main question as to how to denormalise your many-to-many relationship is to figure out "how & what" will you use ES. That is what queries are you expected to build.
Thinking of "query command" design and denormalize accordingly. Here are a few pointers:
denormalising Authors IDs into the book: would you expect a user to execute a search such as "all book for userId=XYZ". If not, you would rather need the name of the author as a multi-field in your Book document
duplicate, duplicate and duplicate. Figure out which data will be heavily updated (authors, as book general do not gain author after their publication). Denormalize author into books (names most likely). Duplicate (into another document type) something like "author_books" which will would be a child of authors and support update fairly often (again, denormalise the title and other relevant stuff to search from the author perspective).
Hope this makes some sense ;)

How to save different surveys to database

I'm assigned to make a web based survey application on ASP.NET MVC3.
And I have three different surveys. I have to the way best way to store survey answers on database. I'm came up with only one solution: To make an answer table for each survey type.
Can you suggest better solutions?
You can have single table itself for Answers with Survey type as a Column.
perhaps something like this?
table survey_answers
id (this is the pk) : qid (question id): answer (varchar MAX) : pid (person id)
what you can do with this is make a questions table, a person table. in the question table, put the qid, the question, and the group of questions it belongs to (the survey id). Persons table is obvious, you put a pid (person id) and info about the person (such as name, age, gender etc.)
the reason that a three table approach is bad is because let's say the amount of surveys you have grows. let's see it becomes 100 different types of surveys or more - which is a definite possibility depending on where you work - are you going to have 100 different tables? 1000? no way.

Struggling with a data modeling problem

I am struggling with a data model (I use MySQL for the database). I am uneasy about what I have come up with. If someone could suggest a better approach, or point me to some reference matter I would appreciate it.
The data would have organizations of many types. I am trying to do a 3 level classification (Class, Category, Type). Say if I have 'Italian Restaurant', it will have the following classification
Food Services > Restaurants > Italian
However, an organization may belong to multiple groups. A restaurant may also serve Chinese and Italian. So it will fit into 2 classifications
Food Services > Restaurants > Italian
Food Services > Restaurants > Chinese
The classification reference tables would be like the following:
ORG_CLASS (RowId, ClassCode, ClassName)
1, FOOD, Food Services
ORG_CATEGORY(RowId, ClassCode, CategoryCode, CategoryName)
1, FOOD, REST, Restaurants
ORG_TYPE (RowId, ClassCode, CategoryCode, TypeCode, TypeName)
100, FOOD, REST, ITAL, Italian
101, FOOD, REST, CHIN, Chinese
102, FOOD, REST, SPAN, Spanish
103, FOOD, REST, MEXI, Mexican
104, FOOD, REST, FREN, French
105, FOOD, REST, MIDL, Middle Eastern
The actual data tables would be like the following:
I will allow an organization a max of 3 classifications. I will have 3 GroupIds each pointing to a row in ORG_TYPE. So I have my ORGANIZATION_TABLE
ORGANIZATION_TABLE (OrgGroupId1, OrgGroupId2, OrgGroupId3, OrgName, OrgAddres)
100,103,NULL,MyRestaurant1, MyAddr1
100,102,NULL,MyRestaurant2, MyAddr2
100,104,105, MyRestaurant3, MyAddr3
During data add, a dialog could let the user choose the clssa, category, type and the corresponding GroupId could be populated with the rowid from the ORG_TYPE table.
During Search, If all three classification are chosen, It will be more specific. For example, if
Food Services > Restaurants > Italian is the criteria, the where clause would be 'where OrgGroupId1 = 100'
If only 2 levels are chosen
Food Services > Restaurants
I have to do 'where OrgGroupId1 in (100,101,102,103,104,105, .....)' - There could be a hundred in that list
I will disallow class level search. That is I will force selection of a class and category
The Ids would be integers. I am trying to see performance issues and other issues.
Overall, would this work? or I need to throw this out and start from scratch.
I don't like the having three columns for the "up to three" classifications. In my opinion it would be better to have a cross-reference table that allows your many-to-many mapping between organisation and type, i.e. table ORGANISATION_GROUPS with columns OrganisationId, OrgGroupId.
To sort out the problem of being able to query a different levels of classification specified you could setup this cross-ref table to hold the actual classifications, i.e. ORGANISATION_GROUPS instead has columnns: OrganisationId, ClassCode, CategoryCode, TypeCode.
This will make queries at different levels of classification very easy.
For referential integrity to work with this scheme I'd then suggest not using surrogate integer keys for your ORG_* tables but instead setting the primary key to be the real unique key, i.e. ClassCode, CategoryCode, TypeCode for ORG_TYPE.
The problem i see in your design is that it is a bit rigid. A more flexible approach you might want to consider is following:
First you would have a table for classes, categories, types and any other classification type. This table would be auto-referenced. All registers would have a field referring to its immediate parent, like following:
CLASSIFICATION (Id, Description, Parent_Id)
ITAL, Italian, REST
CHIN, Chinese, REST
MEXI, Mexican, REST
REST, Restaurant, FOOD
Next you would have, as #John pickup suggested, an intermediate cross-reference table between your restaurant (or whatever you need) table and the classification table which would contain only a composite primary key, being its components the primary key of both tables.
FOODSERVICE_CLASSIFICATION (Rest_Id, Class_Id)
100, ITAL
100, CHIN
101, MEXI
102, CHIN
It would be advisable to limit it so that only leaf registers of the CLASSIFICATION table can be referenced in the cross-reference table.
Your example of looking for all restaurants would be as simple as looking for all child categories of REST and search for them in the cross-reference table. This can be written in a single select in Oracle (not sure about other RDBMS).
This way you can:
have multiple categorization for your restaurants without being limited to 3 categories.
Do quick searches using the cross-reference table.
Mind you, this schema would work supposing your categorization is like a tree with a base category acting as the root. If instead you need a more loose categorization you would probably need a tags approach.
Btw, I also agree with #John Pickup that it is better to use real primary keys in this case.
HTH

I Don't Understand How to Express a Relationship Between Three Separate Tables

Given the following tables in ActiveRecord:
authors
sites
articles
I don't know how to express that an author is paid a different amount depending on the publication, but that authors working for the same publication have different rates:
John publishes an article in Foo for $300
John publishes an article in Bar for $350
John publishes an article in Baz for $400
Dick publishes an article in Foo for $250
Dick publishes an article in Bar for $400
etc.
What kind of relationship am I trying to describe?
At the moment I've got a "rates" table with author_id, site_id and amount columns. Given publication.id and author.id, I derive the cost of the article with
cost = Rate.find(:first, :conditions => ["author_id = ? and site_id = ?", author.id, site.id]).rate
That works, but I'm not sure it's the best way, and I'm not sure how to make sure I don't end up with 'John' having two rates for 'Baz.'
I don't think I want code so much as I want someone to say "Oh, that's a ... relationship" so I can get a grip on what I'm Googleing for.
Its a has and belongs to many with a rich join table.
class Author
has_many :publications, :through => :rates
end
class Publication
has_many :authors, :through => :rates
end
class Rate #rich join table
belongs_to :author
belongs_to :publication
end
And you can then simplify your finding like this:
#author.rates.find_by_site_id(123)
Plus you get direct access accross the join table
#author.publications
#publication.authors
Its straightforward, but I don't know if there's a specific name for this relationship.
It looks like you need three tables:
Author (info about authors)
Site (info about sites)
Rate Author/Site (rate info only)
In the third table you'd have at least:
Author ID (FK to Author, and Primary Key)
Site ID (FK to Site, and Primary Key)
Rate
And the rate table has two fields as primary keys with a unique constraint. And any joins involving the author and sites would involve a 3-table join.
When three entities are related to each other, it's called a ternary relationship.
Most of the relationships we deal with are binary, relating two entities to each other. For example, The "enrolled in" relationship between students and courses. Binary relationships are further categorized into many-to-many, many-to-one, and one-to-one. But you knew that.
Ternary relationships can be categorized as many-to-many-to-many, many-to-many-to-one,
many-to-one-to-one, and so on.
Binary and Ternary relationships can be further generalized to n-ary relationships.
Here's how I see the case you outlined: There are three entities: author, publication, and article. In addition to these three entities, there is a measure, namely rate. I could be wrong about this.
So I would see three entity tables:
Authors with PK AuthorID.
Publications with PK PublicationID.
Articles, with PK ArticleID.
Then there would be a relationship table with four columns:
AuthorID (FK),
PublicationID (FK),
AtricleID (FK),
Rate which is a currency amount.
The PK of this is (AuthorID, PublicationID, ArticleID)
Not that, in this design, there is no rate table. It's just a measure.
Note also that in this design, it's possible for several authors to collaborate on one article, and each be given a separate rate for his/her share of the article. That's not possible in some of the other proposed designs.
It's also possible for the same article to be sold to more than one publication. It might be desirable to impose constraints on the data, if the real world imposes the same constraints.
Anyway, if you want a search term to Google, the term is "ternary relationships".
I would use a third table:
author_site_mapper
------------------
id
author_id
site_id
rate
I've generally heard this referred to as a 'mapper' relationship. It signifies a many-to-many relationship between two tables.

Resources