GridFS - product images & thumbnails - what is the best DB sctructure? - image

I have a e-commerce website working on MongoDB + GridFS.
Each product may have up to 5 images.
Each image has 3 thumbnails of different sizes.
I need an advice on best DB structure for this.
Currently I'm thinking to store image IDs and also thumb IDs (IDs from GridFS) in each product:
{
'_id': 1,
'title': 'Some Product',
'images': [
{'id': '11', thumbs: {'small': '22', 'medium': '33'},
{'id': '44', thumbs: {'small': '55', 'medium': '66'}
]
}
Or would it be better to store path in GridFS?
{
'_id': '111',
'filename': '1.jpg',
'path': 'product/988/image/111/'
},
{
'_id': '222',
'filename': '1.jpg',
'path': 'product/988/image/111/thumbnail_small'
},
{
'_id': '333',
'filename': '1.jpg',
'path': 'product/988/image/111/thumbnail_large'
}
UPDATE: "path" field in GridFS is a "fake" path, not a real one. Just a quick way to find all related files. It is cheaper to have 1 indexed field than several fields with compound indexes.

If you will store the images with GridFS within MongoDB, I would go with the first one.
The second schema doesn't seem to be correct. I mean GridFS is supposed to store files, so with the id of the image you don't need any path within those documents. If you simply want to store the path of the file, directly embedd it into your primary collection, so you don't need this overhead of a somewhat useless collection.
In general see Storing Images in DB - Yea or Nay? if you really should store your images in dbms.
Additionally if you only save the path you may need few to no changes in case you're switching to some CDN to get your images to the customer.

Related

What is a good practice to map my Elasticsearch index?

In order to model different types of metrics and audit logs when a search results should include both of them.
What is more encouraged in terms of fast search performance and indexing efficiency from the 3 options below:
Mapping different subfields for each object
Using a generic object that has predefined fields and a dynamic extra metadata field
Using a different index for each object
Examples for each option:
Option 1 - Mapping different subfields under the same index unified_index_00001
{
'type': 'audit_log',
'audit_log_object': {
'name': 'object_name',
'action': 'edit',
'before': 'field_value',
'after': 'field_value_update'
}
},
{
'type': 'network',
'network': {
'name': 'network_name',
'event': 'access',
'ip': 'xxx.xxx.xxx.xxx'
}
},
{...} ...
Option 2 - Mapping a generic object under the same index unified_index_00001
{
'type': 'audit_log',
'name': 'object_name',
'action': 'edit'
'meta': {
'before': 'field_value',
'after': 'field_value_update'
}
},
{
'type': 'network',
'name': 'network_name',
'action': 'access',
'meta': {
'ip': 'xxx.xxx.xxx.xxx'
}
},
{...} ...
Option 3 - Using a different index for each object
audit_log_index_00001
{
'name': 'object_name',
'action': 'edit',
'before': 'field_value',
'after': 'field_value_update'
},
{...}
...
metric_index_00001
{
'name': 'network_name',
'event': 'event_type',
'ip': 'xxx.xxx.xxx.xxx'
},
{...} ...
Note: There is only need for Term indexing (no need of text searches)
In Elasticsearch, you usually want to start with the queries and then work backwards from there.
If you always query only events of one type, separate indices make sense. If there are mixed queries - you would be happier with a joint index, and usually with extracted common fields ("option 2") otherwise those queries won't really work.
Also, take into account Elasticsearch limitations:
fields per index (1000 by default)
shards and indices per cluster (thousands but still)
etc

Style layer depending on multiple feature properties

is there way to paint features depending on its properties? For example, there is a Feature with int properties "level_a" and "level_b" and it is needed to fill Feature depending which property is greater. There is no way to compare them directly since filter supports only [">", feature(key), value] And features suppose to be in the same layer. Thank you.
Needed something like:
map.addLayer({
'id': 'foo',
'type': 'fill',
'source': 'source',
'filter': ['>', 'level_a', 'level_b'], //cannot insert properties directly an value field
'paint': {
'fill-color': '#blue',
}
});
Yes, expressions support this and much more: https://docs.mapbox.com/mapbox-gl-js/style-spec/expressions/
Using the newer syntax, this would work fine:
'filter': ['>', ['get', 'level_a'], ['get', 'level_b']]

Where do additional action paramters go in the ElasticSearch bulk api?

I'm building a data backfill script for random dataloss in elasticSearch. I only want to add missing documents in an index from the backup; new versions may already exist and I don't want to lose any updates.
The elasticSearch index api allows me to specify the OpType to not update records:
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-index
e.g. [opType=create]
and I'm trying to use the bulk api to make it more efficient: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/docs-bulk.html
What I can't figure out is where to put the 'opType' in the bulk api.
is it in the metadata field so it looks like the following?
{ 'index': { '_index': indexName, '_type': 'data', '_id': <my Id>, 'opType': 'create' } }
{data for item}
or do I put it somewhere else?
As explained in the link you refer to, if you want to use the same semantics as opType: create you need to use the create command instead of the index one:
change this
|
v
{ 'create': { '_index': indexName, '_type': 'data', '_id': <my Id>} }
{ data for item }

Can someone advise on an HBase schema click stream data

I would like to create a click stream application using HBase, in sql this would be a pretty simple task but in Hbase I have not got the first clue. Can someone advise me on a schema design and keys to use in HBase.
I have provided a rough data model and several questions that I would like to interrogate the data for.
Questions I would like to ask for accessing data
What events led to a conversion?
What was the last page / How many paged viewed?
What pages a customer drops off?
What products does a male customer between 20 and 30 like to buy?
A customer has bought product x also likely to buy product y?
Conversion amount from first page ?
{
PageViews: [
{
date: "19700101 00:00",
domain: "http://foobar.com",
path: "pageOne.html",
timeOnPage: "10",
pageViewNumber: 1,
events: [
{ name: "slideClicked", value: 0, time: "00:00"},
{ name: "conversion", value: 100, time: "00:05"}
],
pageData: {
category: "home",
pageTitle: "Home Page"
}
},
{
date: "19700101 00:01",
domain: "http://foobar.com",
path: "pageTwo.html",
timeOnPage: "20",
pageViewNumber: 2,
events: [
{ name: "addToCart", value: 50.00, time: "00:02"}
],
pageData: {
category: "product",
pageTitle: "Mans Shirt",
itemValue: 50.00
}
},
{
date: "19700101 00:03",
domain: "http://foobar.com",
path: "pageThree.html",
timeOnPage: "30",
pageViewNumber: 3,
events: [],
pageData: {
category: "basket",
pageTitle: "Checkout"
}
}
],
Customer: {
IPAddress: 127.0.0.1,
Browser: "Chrome",
FirstName: "John",
LastName: "Doe",
Email: "john.doe#email.com",
isMobile: 1,
returning: 1,
age: 25,
sex: "Male"
}
}
Well, you data is mainly in one-to-many relationship. One customer and an array of page view entities. And since all your queries are customer centric, it makes sense to store each customer as a row in Hbase and have customerid(may be email in your case) as part of row key.
If you decide to store one row for one customer, each page view details would be stored as nested. The video link regarding hbase design will help you understand that. So for you above example, you get one row, and three nested entities
Another approach would be, denormalized form, for hbase to perform good lookup. Here each row would be page view, and customer data gets appended for every row.So for your above example, you end up with three rows. Data would be duplicated. Again the video gives info regarding that too(compression things).
You have more nested levels inside each page view - live events and pagedata. So it will only get worse, with respect to denormalization. As everything in Hbase is a key value pair, it is difficult to query and match these nested levels. Hope this helps you to kick off
Good video link here

fetch larger images from Facebook with Koala using the get_connection method

The documentation of the Koala gem gives an example how to fetch posts from Facebook, including a picture.
client = Koala::Facebook::API.new(oauth_token)
client.get_connection('someuser', 'posts',
{limit: #options[:max_items],
fields: ['message', 'id', 'from', 'type',
'picture', 'link', 'created_time', 'updated_time'
]})
Below this example the documentation makes a note:
You can pass a ‘type’ hash key with a value of ‘small’, ‘normal’, ‘large’, or ‘square’ to obtain different picture sizes, with the default being ‘square’. Also, you may need the user_photos permission.
Unfortunately, this doesn't seem to work:
client.get_connection("officialstackoverflow", "posts",
{limit: 5, type: "large", fields: [:picture, :message, :type]})
Unfortunately, I get the same picture as I would get when omitting the type param. How do I have to pass the type hash correctly?

Resources