So let's say in a Service Oriented Architecture, you have 3 layers:
The Web/External Layer - what the user sees
Application Logic - generates layer 3. handles users, sessions, forms & etc...
Internal API - your data, and how to access data
Now 1 and 2 live in the same network so latency is our least thought of problem. Essentially, layer 2 consumes data from layer 1 using REST. I was thinking of alternatives to how data can be consumed.
What are the PROS and CONS of making layer 1 and 2 communicate with Websockets instead of REST?
Assuming, you have multiple servers and layer 2 applications.
This question is purely out of curiosity.
There is an old discussion on RESTfull HTTP vs websockets. I like to think of them as being different. In general, websockets will give you finer control. With that comes perhaps more efficiency --imagine if you, say, define your own protocol. The downside is that you will have a less standard approach. REST is less flexible but more standard and more loosely coupled.
Stefan Tilkov summarized it pretty well in his blog post. There is also a related discussion here.
Related
Given following processes:
manually transforming huge .csv's files via rules (using MS excel or excel like software) & sharing them via ftp
scripts (usually written in Perl or Python) which basically transform data preparing them for other processes.
API's batch reading from files or other origin sources & updating their corresponding data model.
Springboot deployments used (or abused) to in part regularly collect & aggregate data from files or other sources.
And given these problems/ areas of improvement:
Standardization: I'd like to (as far as it makes sense), to propose a unified powerful tool that natively deals with these types of (kind of big) data transformation workflows.
Rising the abstraction level of the processes (related to the point above): Many of the "tasks/jobs" I mentioned above, are seen by the teams using them, in a very technical low level task-like way. I believe having a higher level view of these processes/flows highlighting their business meaning would help self document these processes better, and would also help to establish a ubiquitous language different stakeholders can refer to and think unambiguously about.
IO bottlenecks and resource utilization (technical): Some of those processes do fail more often that what would be desirable, (or take a very long time to finish) due to some memory or network bottleneck. Though it is clear that hardware has limits, resource utilization doesn't seem to have been a priority in many of these data transformation scripts.
Do the Dataflow model and specifically the Apache Beam implementation paired with either Flink or Google Cloud Dataflow as a backend runner, offer a proven solution to those "mundane" topics? The material on the internet mainly focuses on discussing the unified streaming/batch model and also typically cover more advanced features like streaming/event windowing/watermarks/late events/etc, which do look very elegant and promising indeed, but I have some concerns regarding tool maturity and community long term support.
It's hard to give a concrete answer to such a broad question, but I would say that, yes, Beam/Dataflow is a tool that handle this kind of thing. Even though the documentation focuses on "advanced" features like windowing and streaming, lots of people are using it for more "mundane" ETL. For questions about tool maturity and community you could consider sources like Forrester reports that often speak of Dataflow.
You may also want to consider pairing it with other technologies like Arflow/Composer.
I've read a lot on using GraphQL as API gateway for the front-end in front of the micro-services.
But I wonder if all the GraphQL advantages over Rest aren't relevant to communication between the micro-services as well.
Any inputs, pros/cons and successful usage examples will be appreciated.
Key notes to consider:
GraphQL isn't a magic bullet, nor is it "better" than REST. It is just different.
You can definitely use both at the same time, so it is not either/or.
Per specific use, GraphQL (or REST) can be anywhere on the scale of great to horrible.
GraphQL and REST aren't exact substitutes:
GraphQL is a query language, specification, and collection of tools, designed to operate over a single endpoint via HTTP, optimizing for performance and flexibility.
REST is an Architectural Style / approach for general communication utilizing the uniform interface of the protocols it exists in.
Some reasons for avoiding a common use of GraphQL between microservices:
GraphQL is mainly useful when the client need a flexible response it can control without making changes to the server's code.
When you grant the client service control over the data that comes in, it can lead to exposing too much data, hence compromising Encapsulation on the serving service. This is a long-term risk on System maintainability and ability to change.
Between microservices, latency is far less an issue than between client-server, so goes for the aggregation capabilities.
Uniform interface is really useful when you have many services - but graphQL may be counter-productive for that cause.
The flexible queries defined by QueryQL can be more challenging in terms of performance optimizing it.
Updating an hierarchy of object at once (graphQL natural structure) may add complexities in atomicity, idempotency, error reporting, etc.
To recap:
GraphQL can be really great for server to server communication, but most likely it would be a good fit in a small percentage of the use-cases.
Do you have a use-case for an API Gateway between services? Maybe this is the question you should ask yourself. GraphQL is just a (popular) tool.
Like always, it is best to match a problem to a tool.
I don't have experience with using GraphQL in a microservices environment but I'm inclined to think that its not the greatest for microservices.
To add a little more color to #Lior Bar-On's answer, GraphQL is more of a query language and is more dynamic in nature. It is often used to aggregate data sets as a result of a single request which in turn will potentially require many requests being made to many services in a microservice environment. At the same time, it also adds complexity to have to translate the gathering of information from respective sources of the information (other microservices). Of course, how complex would depend on how micro your services are and what queries you may look to support.
On the other hand, I think a monolith that uses an MVC architecture may actually have an upper hand because it owns a larger body of a data that it can query.
I have a server/client application, which uses a REQ/REP formal pattern and I know this is synchronous.
Can I completely replace zmq.REQ / zmq.REP by zmq.ROUTER and zmq.DEALER ?
Or do these have to be used only as intermediate proxies?
ZeroMQ is a box with a few smart and powerful building blocks
However, only the Architect and the Designer decide how well or how poor these get harnessed in your distributed applications' architecture.
So, a synchronicity or asynchronicity is not an inherent feature of some particular ZeroMQ Scaleable Formal Communication Pattern's access-node, but depends on real deployment, within some larger context of use.
Yes, ROUTER can talk to DEALER, but ...
as one may read in details in ZeroMQ API-specification tables, so called compatible socket-archetypes are listed for each named socket type, however anyone can grasp much stronger powers from ZeroMQ if trying to start using the ZeroMQ way of thinking by spending more time on the ZeroMQ concept and their set of Zero-maxims -- Zero-copy + (almost) Zero-latency + Zero-warranty + (almost) Zero-scaling degradation etc.
The best next step:
IMHO if you are serious about professional messaging, get the great book and source both the elementary setups knowledge, a bit more complex multi-socket messaging layer designs with soft signaling and also the further thoughts about the great powers of concurrent, heterogeneous, distributed processing to advance your learning curve.
Pieter Hintjens' book "Code Connected, Volume 1" ( available in PDF ) is more than a recommended source for your issue.
There you will get grounds for your further use of ZeroMQ.
ZeroMQ is a great tool, not just for the messaging layer itself. Worth time and efforts.
In my phone interview at one of the financial firms as an software architect, "design a cloud storage system like AWS S3".
Here is what I answered, Would you please help with your critiques & comments and on my approach. I would like to improve based on your feedback.
First
, I listed requirements
- CRUD Microservices on objects
- Caching layer to improve performance
- Deployment on PaaS
- resiliency with failover
- AAA support ( authorization, auditing, accounting/billing)
- Administration microservices (user, project, lifecycle of object, SLA dashboard)
- Metrics collection (Ops, Dev)
- Security for service endpoints for admin UI
Second,
I defined basic APIs.
https://api.service.com/services/get Arugments object id, metadata return binary object
https://api.service.com/services/upload Arguments object returns object id
https://api.service.com/services/delete Arugments object id returns success/error
http://api.service.com/service/update-meta Arugments object id, metadata return success/error
Third,
I drew the picture on board with architecture and some COTS components i can use. below is the picture.
Interviewer did not ask me much questions, and hence I am bit worried that if I am on right track with my process. Pl provide your feedback..
Thanks in advance..
There are a couple of areas of feedback that might be helpful:
1. Comparison with S3's API
The S3 API is a RESTful API these days (it used to support SOAP) and it represents each 'file' (really a blob of data indexed by a key) as an HTTP resource, where the key is the path in the resource's URI. Your API is more RPC, in that each HTTP resource represents an operation to be carried out and the key to the blob is one of the parameters.
Whether or not this is a good or bad thing depends on what you're trying to achieve and what architectural style you want to adopt (although I am a fan of REST, it doesn't mean you have to adopt it for all applications), however since you were asked to design a system like S3, your answer would have benefited from a clear argument as to why you chose NOT to use REST as S3 does.
2. Lines connecting things
Architecture diagrams tend to often be very high level - which is appropriate - but there is a tendency sometimes to just draw lines between boxes without being clear about what those lines mean. Does it mean there is a network connection between the infrastructure hosting those software components? Does it mean there is an information or data flow between those components?
When you a draw a line like in your diagram that has multiple boxes all joining together on the line, the implication is that there is some relationship between the boxes. When you add arrows, there is the further implication that the relationship follows the direction of the arrows. But there is no clarity about what that relationship is, or why the directionality is important.
One could infer from your diagram that the Memcache Cluster and the File Storage cluster are both sending data to the Metrics/SLA portal, but that they are not sending data to each other. Or that the ELB is not connected to the microservices. Clearly that is not the case.
3. Mixing Physical, Logical, Network & Software Architecture
General Type of Architecture
Logical Architecture - tends to be more focussed on information flows between areas of functional responsibility
Physical Architecture - tends to be more focussed on deployable components, such as servers, VMs, containers, but I also group installable software packages here, as a running executable process may host multiple elements from the logical architecture
Specific Types of Architecture
Network Architecture - focuses on network connectivity between machines and devices - may reference VLANs, IP ranges, switches, routers etc.
Software Architecture - focuses on the internal structures of a software program design - may talk about classes, modules, packages etc.
Your diagram includes a Load Balancer (more physical) and also a separate box per microservice (could be physical or logical or software), where each microservice is responsible for a different type of operation. It is not clear if each microservice has it's own load balancer, or if the load balancer is a layer 7 balancer that can map paths to different front ends.
4. Missing Context
While architectures often focus on the internal structure of a system, it is also important to consider the system context - i.e. what are the important elements outside the system that the system needs to interract with? e.g. what are the expected clients and their methods of connectivity?
5. Actual Architectural Design
While the above feedback focussed on the method of communicating your, this is more about the actual design.
COTS products - did you talk about alternatives and why you selected the one you chose? Or is it just the only one you know. Awareness of the options and ability to select the appropriate option for a given purpose is valuable.
Caching - you have caching in front of the file storage, but nothing in front of the microservices (edge cache, or front end reverse proxy) - assuming the microservices are adding some value to the process, caching their results might also be useful
Redundancy and durability of data - while you talk about resiliency to failover, data redundancy and durability of the data storage is a key requirement in something like this and some explicit reference to how that would be achieved would be useful. Note this is slightly different to availability of services.
Performance - you talk about introducing a caching layer to improve performance, but don't qualify the actual performance requirements - 100's of objects stored or retrieved per second, 1000's or millions? You need to know that to know what to build in
Global Access - S3 is a multi-region/multi-datacentre solution - your architecture does not reference any aspect of multi-datacentre such as replication of the stored objects and metadata
Security - you reference requirements around AAA but your proposed solution doesn't define which component is responsible for security, and at which layer or at what point in the request path a request is verified and accepted or rejected
6. The Good
Lest this critique be thought too negative, it's worth saying that there is a lot to like in your approach - your assessment of the likely requirements is thorough, and great to see inclusion of security and also operational monitoring and sla's considered up front.
However, reviewing this, I'd wonder what kind of job it actually was - it looks more like the application for a cloud architect role, rather than a software architect role, for which I'd expect to see more discussion of packages, modules, assemblies, libraries and software components.
All of the above notwithstanding, it's also worth considering - what is an interviewer looking for if they ask this in an interview? Nobody expects you to propose an architecture in 15 minutes that can do what has taken a team of Amazon engineers and architects many years to build and refine! They are looking for clarity of thought and expression, thoroughness of examination, logical conclusions from clearly stated assumptions, and knowledge and awareness of industry standards and practices.
Hope this is helpful, and best of luck on the job hunt!
I'm not entirely convinced of the benefits of a 3-tier architecture. Why, then, has LINQ emerged, which is a lighter data access approach? Any input would be appreciated.
One of the main benefits of n-tier applications (there are of course many more than what I mention here) is the separation of concerns that it brings. If you structure your application so, that the responsibility for i.e. data access is held in a data access layer (LINQ2SQL is a perfectly good example of one), validation and other business logic in one or more other layers, presentation in yet another one etc., you can change details in, or even replace, an either layer without having to re-write the rest of your applicaton.
If, on the other hand, you choose not to implement an n-tier approach, you'll quickly notice that for example changing the name of one single database table will require you to go through your entire application - every single line of code - in search for SQL statements that need to be updated. In an n-tier application (if you've done things rigth), you'll only have to change the table name once in your code.
You need to do it the naive way and fail before you realize the problems those frameworks and patterns solve.
Happened to me with many things. SVN branches looked like a disorganized way to do things, until one day I wished I had branched before my last 5 commits. C++ templates seemed useless and confusing, until I got enlightened; now I use them pretty often. And every single J2EE feature will look like useless bloat to anyone, until you actually build an app big enough and have problems; then they may be exactly what you need. (thus it's a flaw to "require" you use them)
Like in most fields of engineering there's never a perfect one-size-fits-all solution for development or architecture. So it is with n-tier architectures.
For example, quite a few applications run perfectly well as a one-tier or two-tier architecture. Microsoft Word, for example, does quite well, thank you, as a single-tier system.
Most business applications have started using layers (as distinct from tiers: layers are virtual, tiers are physical) as it makes life much easier to have presentation logic in one place, business logic in another, and persistence logic somewhere else. It can make sense too depending on the application to have lots more layers: I recently finished up a project with about sixteen layers between the UI client and the SQL database. (We had REST services, co-ordination layers, mixed databases, you name it. Made for quite a deployment challenge.)
The nice thing about all these layers are
testing becomes fairly easy, as each layer does one and only one thing
it's feasible to scale, especially if you design your layers to be stateless: then you can group them together and deploy to separate boxes quite easily
it's feasible to have lots of developers working simultaneously, so long as you keep talkin' to each other
changes are (usually) limited to one layer in the code
LINQ, the Language Integrated Query, really helps too, as can abstracts away much of the harder parts of working with persistence layers. For instance
the SQL-like syntax maps fairly directly to SQL database tables or views
working with more complex non-relational data like XML files is made straightforward
Without LINQ developing persistence layers was invariably repetitive, never a good thing.