Data structure for a Website - data-structures

I'm very new to web development and until now I have relied on MYSQL to store any information that needed to be saved. I was wondering though, how I would go about implementing my own data structure for a website. All of my experience with data structures come from school or personal projects in which the data structure terminates after the program has ended.
I guess I'm wondering if my data structure has to run 24/7 on the server as a background process or if there is another way? Also, are there any quick pointers you can give me to point me in the right direction?
Sorry if this is a stupid/obvious question, I'm still in college and I haven't had the opportunity to take any sort of web development class yet.
Thanks in advance for the help!

I really don't understand exactly what you say with "Data-structure". Of course if your website needs to have a database in the back for make the service of the website itself, it has to be active and reachable 24/7.
For the structure...it depends of the project itslef, of course use primary key(s) foreign key and the name of the column with a common sense for each name (doesn't call one "name" and another "nome" for example).
If you wull be more specific I'm glad to help you.

Related

How does CM & CD server communicates in sitecore?

I am new to sitecore and just trying to understand its architecture/design. Just curious to know how Intranet and Internet server communicates and how does the data flow happens between these two layers in on-prem and on AWS EC2 environment? I have surfed enough in the web and couldn't find the appropriate explanation.
Really appreciate if anyone can help me understand.
When u do a publish from CM, it puts a record in eventqueue table in Web Db.
all CD servers will hit the eventqueue table table for update and proceed.
default is 2 seconds once this hit happens.
In short, they communicate via events in the database(s). Note: This is very simplified but seeing it this way helped me understand how the events work and troubleshoot issues.
For example, when publishing an item, the publisher (running on CM or on a dedicated role) reads its data from the master database and writes it to the web database. When done, it raises an event by writing a row in the EventQueue table in web database. The CD server(s) picks up this event and clears its corresponding caches etc. causing a reload of that data from the web database.
All Sitecore databases have the EventQueue table and events goes to the table in different databases, depending on the type of event. An events is basically just a class name and a set of serialized data. Events can be raised "locally" and "globally" indicating if several instances should pick up the event. Think of a scenario where you have two CD servers sharing one web database, both CD's would have to pick up the event.
To keep track on what events has been processed, a "EQSTAMP" value is stored in the Properties table. It's named [database]_EQSTAMP_[InstanceName]. It's therefore essential that not two Sitecore instances share the same instance name. If not set, Sitecore will make an instance name by combining the hostname and IIS site name. The decimal Value of this timestamp corresponds to the hexadecimal Stamp column in the EventQueue table.
Normally, you should never have to play with these tables yourself, but I find it good to have some insights in how they work and keep an eye on them. They can grow in size and cause some issues. The CleanupEventQueue scheduled task is responsible for removing old processed events from the EventQueue tables. You may want to play with the scheduling of this agent if your EventQueue grows too large between cleanups.
Note: This is the most common way of communication between the servers. Later versions of Sitecore have other techniques as well, such as Rebus.
Event Queues. Why? How? When? article that explains it in detail, it also describes the pitfalls of using this mechanism in real life as well.
Please also be aware that Sitecore.Link project is a good place to get more knowledge regarding Sitecore functionality.
It accumulates Sitecore knowledge all around the web.
Thanks.

How to handle request traffic of a background location update application

I am working on a family networking app for Android that enables family members to share their location and track location of others simultaneously. You can suppose that this app is similar with Life360 or Sygic Family Locator. At first, I determined to use a MBaaS and then I completed its coding by using Parse. However, I realized that although a user read and write geolocation data per minute (of course, in some cases geolocation data is sent less frequently), the request traffic exceeds my forward-looking expectations. For this reason, I want to develop a well-grounded system but I have some doubts about whether Parse can still do its duty if number of users increases to 100-500k.
Considering all these, I am looking for an alternative method/service to set such a system. I think using a backend service like Parse is a moderate solution but not the best one. What are the possible ways to achieve this from bad to good? To exemplify, one of my friends say that I can use Sinch which is an instant messaging service in background between users that set the price considering number of active users. Nevertheless, it sounds weird to me, I have never seen such a usage of an instant messaging service as he said.
Your comments and suggestions will be highly appreciated. Thank you in advance.
Well sinch wouldn't handle location updates or storing of location data, that would be parse you are asking about.
And since you implied that the requests would be to much for your username maybe I wrongly assumed price was the problem with parse.
But to answer your question about sending location data I would probably throttle it if I where you to aile or so. No need for family members to know down to the feet in realtime. if there is a need for that I would probably inement a request method instead and ask the user for location when someone is interested.

how can I Uniquely identify a computer

I would like to develop an application that can connect to server and uniquely identify clients then give them permissions to run a specific query on server's database.
How can I identify clients in a unique way. Is MAC address reliable enough? or should I use something like CPU id or something else?
clarification : I do not what to create a registration code for my app. As it's suppose to be a free application. I would want to detect each client by an id and decide which one could have the permissions to run a specific method on server or not.
The usual approach is to give each client a login (name + password). That way, it's easy to replace clients when they need upgrade or when they fail.
MAC address should be unique but there is no central registry which enforces this rule. There are also tools to change it, so it's only somewhat reliable.
CPU and HD IDs are harder to change but people will come complaining when their hard disk died or when they upgrade their system.
Many PCs have TPM modules which have their own IDs but they can be disabled and the IDs can be wiped. Also, there are privacy issues (people don't like it when software automatically tracks them).
Another problem with an automated ID approach is how to identify them on the server. When several clients connect for the first time in quick succession, you will have trouble to tell them apart.
This question appears to have already been asked and answered in detail (although, you may not like the answers, since they appear to add up to: it's problematic.) I agree with Xefan's comment that more details would help define your question. Here's a link to earlier discussion on this:
What is a good unique PC identifier?

getting started with Single Sign On / Windows Authentication

First off, The Problem:
We have a Web App with a Flash front-end that talks to our ASP.NET web service via SOAP which then deals with all of our server side code (C#).
Right now, we implement a simple user sign on in our application, storing the info in our MSSQL DB.
A client has requested what I understand to be Windows authentication through our application using the currently logged in user.
So, I have been tasked with investigating this. Nobody, including myself, has any experience in this area.
I have been reading up on some basic Active Directory information, and some simple tutorials. I understand how to get access to the directory using ADSI through code. What I'm really interested in seeing is how the entire thing should be architected. I don't want to throw together a hacky solution.
Does anyone know of a good tutorial for this kind of thing or have any advice on getting started? More importantly, does this even sound viable?
I know I haven't given much information, but feel free to ask and I will provide answers.
Thanks.
Edit:
Will, to give you an idea of the scope of this, the network will include every computer in a large hospital. So yes, this is huge. Clearly I need to start small. I would like to come up with something that will work at my office first. Maybe ~10 Windows computers on a single domain. One Domain Controller.
I am also open to any good books on the subject.
If you are going to tie into Active Directory you will want to take a look at the System.DirectoryServices namespace. The implementations can vary wildly depending on your system architecture, but this should give you a good starting point.
Enjoy!

Core Data cloud sync - need help with logic

I'm in the middle of brainstorming a cloud sync solution for a Core Data app that I am currently developing. I'm planning to open source the code for this once its done, for anyone to use with their Core Data apps, so input from the community on how this system should work is much appreciated :-) Here's what I'm thinking:
Server Side
Storage Provider
As with all cloud sync systems, storage is a major piece of the puzzle. There are many ways to handle this. I could set up my own server for storage, or use a service like Amazon S3, but because I'm starting out with $0 capital, at this moment, a paid storage solution isn't a viable option. After some thought, I decided to settle with Dropbox (an already well established cloud sync application and storage provider). The pros of using Dropbox are:
It's free (for a limited amount of space)
In addition to being a storage service, it also handles cloud sync
They recently released an Objective-C SDK which makes it much easier to interface with it in Mac and iPhone apps
In case I decide to switch to a different storage provider in the future, I intend to add "services" to this cloud sync framework, basically allowing anyone to create a service class to interface with their choice of storage provider, which can then simply be plugged into the framework.
Storage Structure
This is a really difficult part to figure out, so I need as much input as I can here. I've been thinking about a structure like this:
CloudSyncFramework
======> [app name]
==========> devices
=============> (device id)
================> deviceinfo
================> changeset
==========> entities
=============> (entity name)
================> (object id)
A quick explanation of this structure:
The master "CloudSyncFramework" (name undecided) folder will contain separate folders for each app that uses the framework
Each app folder contains a devices folder and an entities folder
The devices folder will contain a folder for each device that is registered with the account. The device folder will be named according to the device ID, obtained using something like [[UIDevice currentDevice] uniqueIdentifier] (on iOS) or a serial number (on Mac OS).
Each device folder contains two files: deviceinfo and changeset. deviceinfo contains information about the device (e.g. OS version, last sync date, model, etc.) and the changeset file contains information about objects that have changed since the device last synchronized. Both files will just be simple NSDictionaries archived into files using NSKeyedArchiver.
Each Core Data entity has a subfolder under the entities folder
Under each entity folder, every object that belongs to that entity will have a separate file. This file will contain a JSON dictionary with the key-value pairs.
Simultaneous Sync
This is one of the areas where I am almost completely clueless. How would I handle 2 devices connecting and syncing with the cloud at the same time? There seems to be a high risk of things getting out of sync here, or even data corruption.
Handling migrations
Once again, another clueless area here. How would I handle migrations of the Core Data managed object model? The easiest thing to do here seems to be just to wipe the cloud data store clean and upload a new copy of the data from a device which has undergone the migration process, but this seems somewhat risky, and there may be a better way.
Client Side
Converting NSManagedObjects into JSON
Converting attributes into JSON isn't a very hard task (theres lots of code for it floating around the web). Relationships are the key problem here. In this stackoverflow post, Marcus Zarra posts code in which the relationship objects themselves are added to the JSON dictionary. However, he mentions that this can cause an infinite loop depending on the structure of the model, and I'm not sure if this would work with my method, because I store each object as an individual file.
I've been trying to find a way to get an ID as a string for an NSManagedObject. Then I could save relationships in JSON as an array of IDs. The closest thing I found was [[managedObject objectID] URIRepresentation], but this isn't really an ID for an object, its more of a location for the object in the persistent store, and I don't know if its concrete enough to use as a reference for an object.
I suppose I could generate a UUID string for each object and save it as an attribute, but I'm open for suggestions.
Syncing changes to the cloud
The first (and still best) solution that popped into my head for this was to listen for the NSManagedObjectContextObjectsDidChangeNotification to get a list of changed objects, then update/delete/insert those objects in the cloud data store. After the changes have been saved, I would need to update the changeset file for every other registered device to reflect the newly changed objects.
One problem that comes up here is, how would I handle a failed or interrupted sync?. One idea I have is to first push changes to a temporary directory on the cloud, then once that has been confirmed as successful, to merge it with the master data on the cloud so that an interruption in the middle of the sync won't corrupt data. Then I would save records of the objects that need to be updated in the cloud into a plist file or something, to be pushed during the next time the app is connected to the internet.
Retrieving changed objects
This is fairly simple, the device downloads its changeset file, figures out which objects need to be updated/inserted/deleted, then acts accordingly.
And that sums up my thoughts for the logic that this system will use :-) Any insight, suggestions, answers to problems, etc. is greatly appreciated.
UPDATE
After lots of thinking, and reading TechZens suggestions, I have come up with some modifications to my concept.
The largest change I've thought up is to make each device have a separate data store in the cloud. Basically, every time the managed object context saves (thanks TechZen), it will upload the changes to that device's data store. After those changes are updated, it will create a "changeset" file with change details, and save it into the changeset folders of the OTHER devices that are using the application. When the other devices connect to sync, they will go through the changeset folder and apply each changeset to the local data store, then update their respective data stores in the cloud as well.
Now, if a new device is registered with the account, it will find the newest copy of the data out of all the devices and download that for use as its local storage. This solves the problem of simultaneous sync and reduces the chances for data corruption because there is no "central" data store, each devices touches only its data and just updates changes rather than every device accessing and modifying the same data at the same time.
There's some obvious conflict situations to deal with, mainly in relation to deleting objects. If a changeset is downloading instructing the app to delete an object that is currently being edited, etc. there needs to be ways to deal with this.
You want to look at this pessimistic take on cloud sync: Why Cloud Sync Will Never Work.
It covers a lot of the issues that you are wrestling with. Many of them are largely intractable.
It is very, very, very difficult to synchronize information period. Adding in different devices, different operating systems, different data structures, etc snowballs the complexity often fatally. People have been working on variants of this problem since the 70s and things really haven't improve much.
The fundamental problem is that if you leave the system flexible and customizable, then the complexity of synchronizing all the variations explodes exponentially as a function of the number of customization. If you make it rigid, you can sync but you are limited in what you can sync.
How would I handle 2 devices
connecting and syncing with the cloud
at the same time?
If you figure that out, you will be rich. It's a big issue for current cloud sync providers. They real problem here is that your not "syncing" your merging. Software sucks at merging because its very hard to establish a predefined rule set to describe all the possible merges.
The simplest system is to establish either a canonical device or a device hierarchy such that the system always knows which input to choose. This however, destroys flexibility.
How would I handle migrations of the
Core Data managed object model?
The migration of the Core Data model is largely irrelevant to the server. That's something that Core Data manages internally to itself. Model migration updates the model i.e. the entity graph, not the actual data.
Converting NSManagedObjects into JSON
Modeling relationships is hard especially with tools that don't support it as easily as Core Data does. However, the URI of a permanent managed object ID is supposed to serve as a UUID that nails the object down to a specific location in a specific store on a specific device. It's not technically guaranteed to be universally unique but its close enough for all practical purposes.
Syncing changes to the cloud
I think you're confusing implementation details of Core Data with the cloud itself. If you use NSManagedObjectContextObjectsDidChangeNotification you will evoke network traffic every time the observed context changes regardless of whether those changes are persisted or not. Depending on the app, this could drive connections thousands of times in a few minutes. Instead, you only want to sync when context is saved at the most.
One problem that comes up here is, how
would I handle a failed or interrupted
sync?
You don't commit changes until the sync completes. This is a big problem and leads to corrupt data. Again, you can have flexibility, complexity and fragility or inflexibility, simplicity and robustness.
Retrieving changed objects: This is
fairly simple, the device downloads
its changeset file, figures out which
objects need to be
updated/inserted/deleted, then acts
accordingly
It's only simple if you have an inflexible data structure. Describing changes to a flexible data structure is a nightmare.
Not sure if I have helped any. None of the problems have elegant solutions. Most designer end up with rigidity and/or slow, brute force iterative merging.
Take a serious look at RestKit.
It is an open source project that aims to help with integrating iOS apps with cloud data, including but not limited to the scenario where there is a core-data model for that data on the client.
I have recently started to use it in one of my projects, and found it to be quite useful. In the core-data scenario, you implement declarative mappings between your data model and the content you GET from and POST to the server, and it takes care of things like injecting objects from the cloud into your client model, posting new objects to the server and incorporating server-generated objects IDs into your client-side model, doing all of this in a background thread and taking care of all the core-data context threading issues and so on.
RestKit by no means is a mature product, but is has a fairly good foundation and quite a few things that can use help from other contributors. Especially, if your goal is to create an open source solution, it would be great to contribute and improve something like this rather than re-invent a new solution. Unless of course, your see serious differences between what you have in mind and other existing solutions :-)
Since this post was current, there are several new options available. It is possible to develop a solution, and there are apps shipping with these solutions.
Here is a short list of the main Core Data sync options:
Apple's native Core Data/iCloud sync. (Had a rocky start. Seems better now.)
TICDS
Wasabi Sync, a paid service.
Simperium (Seems abandoned.)
ParcelKit with Dropbox Datastore API
Ensembles, the most recent. (Disclosure: I am the founder of the project)
It's like Apple answered my question for me with the announcement of the iCloud SDKs, which come complete with Core Data integration. Win!

Resources