Bootstrapping complex data sets - ruby

The application I am working on has "Default lists" one of which is already created in the app currently. The list has events and events touch 2-3 other models. Which would make seeding, etc very time consuming due to the complexity of the lists and the associated models the list has data in
Due to the complexity of the lists I would prefer to build the lists though the UI and then extracting it for later use.
Is there any worthwhile way of extracting the aforementioned list object and for lack of a better term "bootstrap it"
Thanks for your help in advance.

I think what you are trying to get at is seed data. Take a look at this railscats on just that.

Solution: https://github.com/rhalff/seed_dump
I highly enjoy the comments on the github page
It mainly exists for people who are too lazy writing create statements
in db/seeds.rb themselves and need something (seed_dump) to dump data
from the table(s) into seeds.rb
My response to that is "work smart, not hard" no need for me to spend a day or 2 writing out long seeds instead of doing actual work.
Unless i'm hungover then i'll just pretend seed_dump is on the fritz ;)

Related

Making sure my Go page view counter isn't abused

I believe I have a found a very good and fast solution for efficiently counting page views:
Working example in go playground here: https://play.golang.org/p/q_mYEYLa1h
My idea is to push this to the database every X minutes, and after pushing a key then delete it from the page map.
My question now is, what would be the optimal way to ensure that this isn't abused? Ideally, I would only want to increase page count from the same person if there was a time interval of 2 hours since last visiting the page.
As far as I know, it would be ideal to store and compare both IP and user agent (I don't want to rely on cookie/localstorage), but I'm not quite sure how to efficiently store and compare this information.
I'd likely get both the IP (req.Header.Get("x-forwarded-for")) and UserAgent (req.UserAgent()) from http.Request.
I was thinking making a visitor struct similar to my page struct that would look like this:
type visitor struct {
mutex sync.Mutex
urlIPUAAndTime map[string]time
}
This way should make it possible to do something similar to before. However, imagine if the website had so many requests that there would be hundreds of millions of unique visitor maps being stored, and each of these could only be deleted after 2 (or more) hours. I therefore think this is not a good solution.
I guess it would be ideal/necessary to write to and read from some file, but not sure how this should be done efficiently. Help would be greatly appreciated
One of optimization ways is to add a Bloom filter before this map. Bloom filter is a probabilistic structure which can say one of these:
this user is definitely new
and this user possibly was here
This is a way to cut off computation on early stage. If many of your users are new then you save requests to database to check all of them.
What if structure says "user is possibly non-unique"? Then you go the database and check it.
Here's one more optimization: if you do not need very accurate information and can agree with mistake about several percent, you may use the sole bloom filter. I guess many large sites use this technique for estimation.

Survey Monkey results directly into DB

This may be a question for Survey Monkey, but I felt that someone here may have encountered something like this in past experiences. Is there a way to work with the API of Survey Monkey (SM), to add the information from the survey straight into a database of my own? I realize that I can generate the information into output files, but I was wondering if there was a way to directly access the information from the SM database. I feel like this might cause some privacy concerns for SM. Has anyone attempted this, or would the best option of mine be to create my own surveys without a third party website?
I had a similar issue and here's my solution.
I was doing health related surveys which contain HIPPA protected Personal Health Info. Zapier is NOT HIPAA safe, so the "zap the results over to Google Drive" solution didn't work.
So I wanted a quick n dirty way to grab SM survey data and begin to design a data structure to analyse and store this data. I figured that I would start with <1000 results, sort it out, then build out a bigger/fancier structure as needed.
I just downloaded CSV's of the SM individual responses, munged the downloaded CSV files to make a Python CSV reader happy, then wrote a Python 3.5 script to grab the survey data and spit it out into a couple of output CSV files designed for different analytic purposes.
It was really quick and easy to alter the Python script to deliver different subsets of data to different output files, and really quick and easy to see if these output (CSV or XLS) files really told me what I wanted to know.
This is a really quick and easy way to start analysing right away without spending too much time on procedural overhead. You can alter CSV (or XLS ) tables really quickly and easily, so you can mix and match data / derivative data as much as you want. A wise person once told me "don't think, do." So the more you analyse on small runs of data, the better your final Big Buildout In The Sky will look.
Yah, you can spend a lot of time writing and API and setting up a dbase, but if you are not completely happy with what you want out of the SM data, start small. Hope this helps.

Organizing memcache keys

Im trying to find a good way to handle memcache keys for storing, retrieving and updating data to/from the cache layer in a more civilized way.
Found this pattern, which looks great, but how do I turn it into a functional part of a PHP application?
The Identity Map pattern: http://martinfowler.com/eaaCatalog/identityMap.html
Thanks!
Update: I have been told about the modified memcache (memcache-tag) that apparently does do a lot of this, but I can't install linux software on my windows development box...
Well, memcache use IS an identity map pattern. You check your cache, then you hit your database (or whatever else you're using). You can go about finding information about the source by storing objects instead of just values, but you'll take a performance hit for that.
You effectively cannot ask the cache what it contains as a list. To mass invalidate, you'll have to keep a list of what you put in and iterate it, or you'll have to iterate every possible key that could fit the pattern of concern. The resource you point out, memcache-tag can simplify this, but it doesn't appear to be maintained inline with the memcache project.
So your options now are iterative deletes, or totally flushing everything that is cached. Thus, I propose a design consideration is the question that you should be asking. In order to get a useful answer for you, I query thus: why do you want to do this?

What is the name of this anti-pattern?

Surely some of you have dealt with this one. It tends to happen when programmers get a bit too taken by OO and forget about performance and having a database.
For an example, lets say we have an Email table and they need to be sent by this program. At start-up, it looks for anything that needs to be sent as follows:
Emails = find_every_damn_email_in_the_database();
FOR Email in Emails
IF !Email.IsSent() THEN Email.Send()
This is a good from a do-not-repeat-yourself perspective, but sometimes it's unavoidable and it should be:
Emails = find_unsent_emails();
FOR Email in Emails
Email.Send()
Is there a name of this one?
I'll have a go at it and coin the name "the lazy filter (anti) pattern".
I saw that once. That programmer wasn't around too long.
We called that the "firehose method".
To me it's Joel Spolsky's leaky abstraction.
It's not exactly an anti-pattern, but whoever wrote this code, didn't really understand where Active Record pattern abstraction leaks.
I call that "The Shotgun Approach".
I'm not sure this is necessarily database related, since you could have a complex and expensive procedure (e.g., more than a flag) for applying a filter for a group.
I don't think there's a name to it, since the first design is simply not good, and it violates the one-responsibility-only principle. If you search, filter, and print the filtered you are doing multiple things, so you need to refactor it into "searched filtered" and print.
The only thing different than a simple refactoring here is that it also affects performance, in the same way that inner loops can be designed in ways that harm performance.
Appear to have derived from the following anti-patterns:
Standing On The Shoulders Of Midgets
If It Is Working Dont Change
The original developer would have possibly not been allowed to write the find_unsent_emails() implementation, and would therefore have reused the midget function. And then, why change it after development and testing?
This is frequently due to it being a lot easier to use an existing query and then filtering in code than getting a new SQL query added. Maybe because the DBAs control all queries and getting a new query approved takes days, or maybe because the ORM tool you're using makes it very difficult to define your own custom queries.
If I were to name it I'd call it the "Easy Way Out" (anti)pattern. Whether it's an antipattern or not really depends on the individual situation. If it will always be a fairly small number of items you need to retrieve, doing the filtering in code really isn't a big problem. But if the number of items is large and has the potential to continually grow, then obviously the filtering should be done on the server.
I've seen similar issues elsewhere, where instead of a simple array of things to do, there was a "transaction cluster" based on a "list cluster" based on a "collection cluster" based on a "memory cluster". Needless to say, the simplest thing turned into a great big freakin' deal.
I called it galloping generality.
Stoopid Amateurs.
Seriously, I've only seen this one in people with Computer Science degrees and no professional experience at all. When I was teaching at Duke, my advisor and I ran a "Large Scale Programming" class where we made people look at exactly these sorts of errors.
The performance of the first one can actually be fine, depending on the type of Emails. If it's just an iterator (think of std::vector::begin() in C++) then it's fine and better than storing all unsent e-mails in some container first.
This antipattern has several possible names.
"Don't-know-SQL" antipattern
"Fascist-DBA" antipattern
"What-does-'latency'-mean?" antipattern
There is a nice example at The Daily WTF.
Inspired partly by 1800's "the lazy filter (anti) pattern", how about "dysfunctional programming" (ie the opposite of functional programming)?

Generating UI from DB - the good, the bad and the ugly?

I've read a statement somewhere that generating UI automatically from DB layout (or business objects, or whatever other business layer) is a bad idea. I can also imagine a few good challenges that one would have to face in order to make something like this.
However I have not seen (nor could find) any examples of people attempting it. Thus I'm wondering - is it really that bad? It's definately not easy, but can it be done with any measure success? What are the major obstacles? It would be great to see some examples of successes and failures.
To clarify - with "generating UI automatically" I mean that the all forms with all their controls are generated completely automatically (at runtime or compile time), based perhaps on some hints in metadata on how the data should be represented. This is in contrast to designing forms by hand (as most people do).
Added: Found this somewhat related question
Added 2: OK, it seems that one way this can get pretty fair results is if enough presentation-related metadata is available. For this approach, how much would be "enough", and would it be any less work than designing the form manually? Does it also provide greater flexibility for future changes?
We had a project which would generate the database tables/stored proc as well as the UI from business classes. It was done in .NET and we used a lot of Custom Attributes on the classes and properties to make it behave how we wanted it to. It worked great though and if you manage to follow your design you can create customizations of your software really easily. We also did have a way of putting in "custom" user controls for some very exceptional cases.
All in all it worked out well for us. Unfortunately it is a sold banking product and there is no available source.
it's ok for something tiny where all you need is a utilitarian method to get the data in.
for anything resembling a real application though, it's a terrible idea. what makes for a good UI is the humanisation factor, the bits you tweak to ensure that this machine reacts well to a person's touch.
you just can't get that when your interface is generated mechanically.... well maybe with something approaching AI. :)
edit - to clarify: UI generated from code/db is fine as a starting point, it's just a rubbish end point.
hey this is not difficult to achieve at all and its not a bad idea at all. it all depends on your project needs. a lot of software products (mind you not projects but products) depend upon this model - so they dont have to rewrite their code / ui logic for different client needs. clients can customize their ui the way they want to using a designer form in the admin system
i have used xml for preserving meta data for this sort of stuff. some of the attributes which i saved for every field were:
friendlyname (label caption)
haspredefinedvalues (yes for drop
down list / multi check box list)
multiselect (if yes then check box
list, if no then drop down list)
datatype
maxlength
required
minvalue
maxvalue
regularexpression
enabled (to show or not to show)
sortkey (order on the web form)
regarding positioning - i did not care much and simply generate table tr td tags 1 below the other - however if you want to implement this as well, you can have 1 more attribute called CssClass where you can define ui specific properties (look and feel, positioning, etc) here
UPDATE: also note a lot of ecommerce products follow this kind of dynamic ui when you want to enter product information - as their clients can be selling everything under the sun from furniture to sex toys ;-) so instead of rewriting their code for every different industry they simply let their clients enter meta data for product attributes via an admin form :-)
i would also recommend you to look at Entity-attribute-value model - it has its own pros and cons but i feel it can be used quite well with your requirements.
In my Opinion there some things you should think about:
Does the customer need a function to customize his UI?
Are there a lot of different attributes or elements?
Is the effort of creating such an "rendering engine" worth it?
Okay, i think that its pretty obvious why you should think about these. It really depends on your project if that kind of model makes sense...
If you want to create some a lot of forms that can be customized at runtime then this model could be pretty uselful. Also, if you need to do a lot of smaller tools and you use this as some kind of "engine" then this effort could be worth it because you can save a lot of time.
With that kind of "rendering engine" you could automatically add error reportings, check the values or add other things that are always build up with the same pattern. But if you have too many of this things, elements or attributes then the performance can go down rapidly.
Another things that becomes interesting in bigger projects is, that changes that have to occur in each form just have to be made in the engine, not in each form. This could save A LOT of time if there is a bug in the finished application.
In our company we use a similar model for an interface generator between cash-software (right now i cant remember the right word for it...) and our application, just that it doesnt create an UI, but an output file for one of the applications.
We use XML to define the structure and how the values need to be converted and so on..
I would say that in most cases the data is not suitable for UI generation. That's why you almost always put a a layer of logic in between to interpret the DB information to the user. Another thing is that when you generate the UI from DB you will end up displaying the inner workings of the system, something that you normally don't want to do.
But it depends on where the DB came from. If it was created to exactly reflect what the users goals of the system is. If the users mental model of what the application should help them with is stored in the DB. Then it might just work. But then you have to start at the users end. If not I suggest you don't go that way.
Can you look on your problem from application architecture perspective? I see you as another database terrorist – trying to solve all by writing stored procedures. Why having UI at all? Try do it in DB script. In effect of such approach – on what composite system you will end up? When system serves different businesses – try modularization, selectively discovered components, restrict sharing references. UI shall be replaceable, independent from business layer. When storing so much data in DB – there is hard dependency of UI – system becomes monolith. How you implement MVVM pattern in scenario when UI is generated? Designers like Blend are containing lots of features, which cannot be replaced by most futuristic UI generator – unless – your development platform is Notepad only.
There is a hybrid approach where forms and all are described in a database to ensure consistency server side, which is then compiled to ensure efficiency client side on deploy.
A real-life example is the enterprise software MS Dynamics AX.
It has a 'Data' database and a 'Model' database.
The 'Model' stores forms, classes, jobs and every artefact the application needs to run.
Deploying the new software structure used to be to dump the model database and initiate a CIL compile (CIL for common intermediate language, something used by Microsoft in .net)
This way is suitable for enterprise-wide software and can handle large customizations. But keep in mind that this approach sets a framework that should be well understood by whoever gonna maintain and customize the application later.
I did this (in PHP / MySQL) to automatically generate sections of a CMS that I was building for a client. It worked OK my main problem was that the code that generates the forms became very opaque and difficult to understand therefore difficult to reuse and modify so I did not reuse it.
Note that the tables followed strict conventions such as naming, etc. which made it possible for the UI to expect particular columns and infer information about the naming of the columns and tables. There is a need for meta information to help the UI display the data.
Generally it can work however the thing is if your UI just mirrors the database then maybe there is lots of room to improve. A good UI should do much more than mirror a database, it should be built around human interaction patterns and preferences, not around the database structure.
So basically if you want to be cheap and do a quick-and-dirty interface which mirrors your DB then go for it. The main challenge would be to find good quality code that can do this or write it yourself.
From my perspective, it was always a problem to change edit forms when a very simple change was needed in a table structure.
I always had the feeling we have to spend too much time on rewriting the CRUD forms instead of developing the useful stuff, like processing / reporting / analyzing data, giving alerts for decisions etc...
For this reason, I made long time ago a code generator. So, it become easier to re-generate the forms with a simple restriction: to keep the CSS classes names. Simply like this!
UI was always based on a very "standard" code, controlled by a custom CSS.
Whenever I needed to change database structure, so update an edit form, I had to re-generate the code and redeploy.
One disadvantage I noticed was about the changes (customizations, improvements etc.) done on the previous generated code, which are lost when you re-generate it.
But anyway, the advantage of having a lot of work done by the code-generator was great!
I initially did it for the 2000s Microsoft ASP (Active Server Pages) & Microsoft SQL Server... so, when that technology was replaced by .NET, my code-generator become obsoleted.
I made something similar for PHP but I never finished it...
Anyway, from small experiments I found that generating code ON THE FLY can be way more helpful (and this approach does not exclude the SAVED generated code): no worries about changing database etc.
So, the next step was to create something that I am very proud to show here, and I think it is one nice resolution for the issue raised in this thread.
I would start with applicable use cases: https://data-seed.tech/usecases.php.
I worked to add details on how to use, but if something is still missing please let me know here!
You can change database structure, and with no line of code you can start edit data, and more like this, you have available an API for CRUD operations.
I am still a fan of the "code-generator" approach, and I think it is just a flavor of using XML/XSLT that I used for DATA-SEED. I plan to add code-generator functionalities.

Resources