Does it make sense to use GraphQL for microservices intercommunication? - graphql

I've read a lot on using GraphQL as API gateway for the front-end in front of the micro-services.
But I wonder if all the GraphQL advantages over Rest aren't relevant to communication between the micro-services as well.
Any inputs, pros/cons and successful usage examples will be appreciated.

Key notes to consider:
GraphQL isn't a magic bullet, nor is it "better" than REST. It is just different.
You can definitely use both at the same time, so it is not either/or.
Per specific use, GraphQL (or REST) can be anywhere on the scale of great to horrible.
GraphQL and REST aren't exact substitutes:
GraphQL is a query language, specification, and collection of tools, designed to operate over a single endpoint via HTTP, optimizing for performance and flexibility.
REST is an Architectural Style / approach for general communication utilizing the uniform interface of the protocols it exists in.
Some reasons for avoiding a common use of GraphQL between microservices:
GraphQL is mainly useful when the client need a flexible response it can control without making changes to the server's code.
When you grant the client service control over the data that comes in, it can lead to exposing too much data, hence compromising Encapsulation on the serving service. This is a long-term risk on System maintainability and ability to change.
Between microservices, latency is far less an issue than between client-server, so goes for the aggregation capabilities.
Uniform interface is really useful when you have many services - but graphQL may be counter-productive for that cause.
The flexible queries defined by QueryQL can be more challenging in terms of performance optimizing it.
Updating an hierarchy of object at once (graphQL natural structure) may add complexities in atomicity, idempotency, error reporting, etc.
To recap:
GraphQL can be really great for server to server communication, but most likely it would be a good fit in a small percentage of the use-cases.
Do you have a use-case for an API Gateway between services? Maybe this is the question you should ask yourself. GraphQL is just a (popular) tool.
Like always, it is best to match a problem to a tool.

I don't have experience with using GraphQL in a microservices environment but I'm inclined to think that its not the greatest for microservices.
To add a little more color to #Lior Bar-On's answer, GraphQL is more of a query language and is more dynamic in nature. It is often used to aggregate data sets as a result of a single request which in turn will potentially require many requests being made to many services in a microservice environment. At the same time, it also adds complexity to have to translate the gathering of information from respective sources of the information (other microservices). Of course, how complex would depend on how micro your services are and what queries you may look to support.
On the other hand, I think a monolith that uses an MVC architecture may actually have an upper hand because it owns a larger body of a data that it can query.

Related

Apache Beam and ETL processes

Given following processes:
manually transforming huge .csv's files via rules (using MS excel or excel like software) & sharing them via ftp
scripts (usually written in Perl or Python) which basically transform data preparing them for other processes.
API's batch reading from files or other origin sources & updating their corresponding data model.
Springboot deployments used (or abused) to in part regularly collect & aggregate data from files or other sources.
And given these problems/ areas of improvement:
Standardization: I'd like to (as far as it makes sense), to propose a unified powerful tool that natively deals with these types of (kind of big) data transformation workflows.
Rising the abstraction level of the processes (related to the point above): Many of the "tasks/jobs" I mentioned above, are seen by the teams using them, in a very technical low level task-like way. I believe having a higher level view of these processes/flows highlighting their business meaning would help self document these processes better, and would also help to establish a ubiquitous language different stakeholders can refer to and think unambiguously about.
IO bottlenecks and resource utilization (technical): Some of those processes do fail more often that what would be desirable, (or take a very long time to finish) due to some memory or network bottleneck. Though it is clear that hardware has limits, resource utilization doesn't seem to have been a priority in many of these data transformation scripts.
Do the Dataflow model and specifically the Apache Beam implementation paired with either Flink or Google Cloud Dataflow as a backend runner, offer a proven solution to those "mundane" topics? The material on the internet mainly focuses on discussing the unified streaming/batch model and also typically cover more advanced features like streaming/event windowing/watermarks/late events/etc, which do look very elegant and promising indeed, but I have some concerns regarding tool maturity and community long term support.
It's hard to give a concrete answer to such a broad question, but I would say that, yes, Beam/Dataflow is a tool that handle this kind of thing. Even though the documentation focuses on "advanced" features like windowing and streaming, lots of people are using it for more "mundane" ETL. For questions about tool maturity and community you could consider sources like Forrester reports that often speak of Dataflow.
You may also want to consider pairing it with other technologies like Arflow/Composer.

Does it make sense to use actor/agent oriented programming in Function as a Service environment?

I am wondering, if is it possible to apply agent/actor library (Akka, Orbit, Quasar, JADE, Reactors.io) in Function as a Service environment (OpenWhisk, AWS Lambda)?
Does it make sense?
If yes, what is minimal example hat presents added value (that is missing when we are using only FaaS or only actor/agent library)?
If no, then are we able to construct decision graph, that can help us decide, if for our problem should we use actor/agent library or FaaS (or something else)?
This is more opinion based question, but I think, that in the current shape there's no sense in putting actors into FaaS - opposite works actually quite well: OpenWhisk is implemented on top of Akka actually.
There are several reasons:
FaaS current form is inherently stateless, which greatly simplifies things like request routing. Actors are stateful by nature.
From my experience FaaS functions are usually disjointed - ofc you need some external resources, but this is the mental model: generic resources and capabilities. In actor models we tend to think in category of particular entities represented as actors i.e. user Max, rather than table of users. I'm not covering here the scope of using actors solely as unit of concurrency.
FaaS applications have very short lifespan - this is one of the founding stones behind them. Since creation, placement and state recovery for more complex actors may take a while, and you usually need a lot of them to perform a single task, you may end up at point when restoring the state of a system takes more time than actually performing the task, that state is needed for.
That being said, it's possible that in the future, those two approaches will eventually converge, but it needs to be followed with changes in both mental and infrastructural model (i.e. actors live in runtime, which FaaS must be aware of). IMO setting up existing actor frameworks on top of existing FaaS providers is not feasible at this point.

Interview assignment - design a system like S3

In my phone interview at one of the financial firms as an software architect, "design a cloud storage system like AWS S3".
Here is what I answered, Would you please help with your critiques & comments and on my approach. I would like to improve based on your feedback.
First
, I listed requirements
- CRUD Microservices on objects
- Caching layer to improve performance
- Deployment on PaaS
- resiliency with failover
- AAA support ( authorization, auditing, accounting/billing)
- Administration microservices (user, project, lifecycle of object, SLA dashboard)
- Metrics collection (Ops, Dev)
- Security for service endpoints for admin UI
Second,
I defined basic APIs.
https://api.service.com/services/get Arugments object id, metadata return binary object
https://api.service.com/services/upload Arguments object returns object id
https://api.service.com/services/delete Arugments object id returns success/error
http://api.service.com/service/update-meta Arugments object id, metadata return success/error
Third,
I drew the picture on board with architecture and some COTS components i can use. below is the picture.
Interviewer did not ask me much questions, and hence I am bit worried that if I am on right track with my process. Pl provide your feedback..
Thanks in advance..
There are a couple of areas of feedback that might be helpful:
1. Comparison with S3's API
The S3 API is a RESTful API these days (it used to support SOAP) and it represents each 'file' (really a blob of data indexed by a key) as an HTTP resource, where the key is the path in the resource's URI. Your API is more RPC, in that each HTTP resource represents an operation to be carried out and the key to the blob is one of the parameters.
Whether or not this is a good or bad thing depends on what you're trying to achieve and what architectural style you want to adopt (although I am a fan of REST, it doesn't mean you have to adopt it for all applications), however since you were asked to design a system like S3, your answer would have benefited from a clear argument as to why you chose NOT to use REST as S3 does.
2. Lines connecting things
Architecture diagrams tend to often be very high level - which is appropriate - but there is a tendency sometimes to just draw lines between boxes without being clear about what those lines mean. Does it mean there is a network connection between the infrastructure hosting those software components? Does it mean there is an information or data flow between those components?
When you a draw a line like in your diagram that has multiple boxes all joining together on the line, the implication is that there is some relationship between the boxes. When you add arrows, there is the further implication that the relationship follows the direction of the arrows. But there is no clarity about what that relationship is, or why the directionality is important.
One could infer from your diagram that the Memcache Cluster and the File Storage cluster are both sending data to the Metrics/SLA portal, but that they are not sending data to each other. Or that the ELB is not connected to the microservices. Clearly that is not the case.
3. Mixing Physical, Logical, Network & Software Architecture
General Type of Architecture
Logical Architecture - tends to be more focussed on information flows between areas of functional responsibility
Physical Architecture - tends to be more focussed on deployable components, such as servers, VMs, containers, but I also group installable software packages here, as a running executable process may host multiple elements from the logical architecture
Specific Types of Architecture
Network Architecture - focuses on network connectivity between machines and devices - may reference VLANs, IP ranges, switches, routers etc.
Software Architecture - focuses on the internal structures of a software program design - may talk about classes, modules, packages etc.
Your diagram includes a Load Balancer (more physical) and also a separate box per microservice (could be physical or logical or software), where each microservice is responsible for a different type of operation. It is not clear if each microservice has it's own load balancer, or if the load balancer is a layer 7 balancer that can map paths to different front ends.
4. Missing Context
While architectures often focus on the internal structure of a system, it is also important to consider the system context - i.e. what are the important elements outside the system that the system needs to interract with? e.g. what are the expected clients and their methods of connectivity?
5. Actual Architectural Design
While the above feedback focussed on the method of communicating your, this is more about the actual design.
COTS products - did you talk about alternatives and why you selected the one you chose? Or is it just the only one you know. Awareness of the options and ability to select the appropriate option for a given purpose is valuable.
Caching - you have caching in front of the file storage, but nothing in front of the microservices (edge cache, or front end reverse proxy) - assuming the microservices are adding some value to the process, caching their results might also be useful
Redundancy and durability of data - while you talk about resiliency to failover, data redundancy and durability of the data storage is a key requirement in something like this and some explicit reference to how that would be achieved would be useful. Note this is slightly different to availability of services.
Performance - you talk about introducing a caching layer to improve performance, but don't qualify the actual performance requirements - 100's of objects stored or retrieved per second, 1000's or millions? You need to know that to know what to build in
Global Access - S3 is a multi-region/multi-datacentre solution - your architecture does not reference any aspect of multi-datacentre such as replication of the stored objects and metadata
Security - you reference requirements around AAA but your proposed solution doesn't define which component is responsible for security, and at which layer or at what point in the request path a request is verified and accepted or rejected
6. The Good
Lest this critique be thought too negative, it's worth saying that there is a lot to like in your approach - your assessment of the likely requirements is thorough, and great to see inclusion of security and also operational monitoring and sla's considered up front.
However, reviewing this, I'd wonder what kind of job it actually was - it looks more like the application for a cloud architect role, rather than a software architect role, for which I'd expect to see more discussion of packages, modules, assemblies, libraries and software components.
All of the above notwithstanding, it's also worth considering - what is an interviewer looking for if they ask this in an interview? Nobody expects you to propose an architecture in 15 minutes that can do what has taken a team of Amazon engineers and architects many years to build and refine! They are looking for clarity of thought and expression, thoroughness of examination, logical conclusions from clearly stated assumptions, and knowledge and awareness of industry standards and practices.
Hope this is helpful, and best of luck on the job hunt!

What is Unifying Logic within the Semantic Stack Model and who is supposed to take care of it?

Basically, after hours of researching I still dont get what the Unifying Logic Layer within Semantic Web Stack Model is and whose problem it is to take care of it.
I think this depends on what your conceptualisation of the semantic web is. Suppose the ultimate expression of the semantic web is to make heterogeneous information sources available via web-like publishing mechanisms to allow programs - agents - to consume them in order to satisfy some high-level user goal in an autonomous fashion. This is close to Berners-Lee et al's original conceptualisation of the purpose of the semantic web. In this case, the agents need to know that the information they get from RDF triple stores, SPARQL end-points, rule bases, etc, is reliable, accurate and trustworthy. The semantic web stack postulates that a necessary step to getting to that end-point is to have a logic, or collection of logics, that the agent can use when reasoning about the knowledge it has acquired. It's rather a strong AI view, or well towards that end of the spectrum.
However, there's an alternative conceptualisation (and, in fact, there are probably many) in which the top layers of the semantic web stack, including unifying logic, are not needed, because that's not what we're asking agents to do. In this view, the semantic web is a way of publishing disaggregated, meaningful information for consumption by programs but not autonomously. It's the developers and/or the users who choose, for example, what information to treat as trustworthy. This is the linked data perspective, and it follows that the current stack of standards and technologies is perfectly adequate for building useful applications. Indeed, some argue that even well-established standards like OWL are not necessary for building linked-data applications, though personally I find it essential.
As to whose responsibility it is, if you take the former view it's something the software agent community is already working on, and if you take the latter view it doesn't matter whether something ever gets standardised because we can proceed to build useful functionality without it.

The benefits of a 3-tier architecture?

I'm not entirely convinced of the benefits of a 3-tier architecture. Why, then, has LINQ emerged, which is a lighter data access approach? Any input would be appreciated.
One of the main benefits of n-tier applications (there are of course many more than what I mention here) is the separation of concerns that it brings. If you structure your application so, that the responsibility for i.e. data access is held in a data access layer (LINQ2SQL is a perfectly good example of one), validation and other business logic in one or more other layers, presentation in yet another one etc., you can change details in, or even replace, an either layer without having to re-write the rest of your applicaton.
If, on the other hand, you choose not to implement an n-tier approach, you'll quickly notice that for example changing the name of one single database table will require you to go through your entire application - every single line of code - in search for SQL statements that need to be updated. In an n-tier application (if you've done things rigth), you'll only have to change the table name once in your code.
You need to do it the naive way and fail before you realize the problems those frameworks and patterns solve.
Happened to me with many things. SVN branches looked like a disorganized way to do things, until one day I wished I had branched before my last 5 commits. C++ templates seemed useless and confusing, until I got enlightened; now I use them pretty often. And every single J2EE feature will look like useless bloat to anyone, until you actually build an app big enough and have problems; then they may be exactly what you need. (thus it's a flaw to "require" you use them)
Like in most fields of engineering there's never a perfect one-size-fits-all solution for development or architecture. So it is with n-tier architectures.
For example, quite a few applications run perfectly well as a one-tier or two-tier architecture. Microsoft Word, for example, does quite well, thank you, as a single-tier system.
Most business applications have started using layers (as distinct from tiers: layers are virtual, tiers are physical) as it makes life much easier to have presentation logic in one place, business logic in another, and persistence logic somewhere else. It can make sense too depending on the application to have lots more layers: I recently finished up a project with about sixteen layers between the UI client and the SQL database. (We had REST services, co-ordination layers, mixed databases, you name it. Made for quite a deployment challenge.)
The nice thing about all these layers are
testing becomes fairly easy, as each layer does one and only one thing
it's feasible to scale, especially if you design your layers to be stateless: then you can group them together and deploy to separate boxes quite easily
it's feasible to have lots of developers working simultaneously, so long as you keep talkin' to each other
changes are (usually) limited to one layer in the code
LINQ, the Language Integrated Query, really helps too, as can abstracts away much of the harder parts of working with persistence layers. For instance
the SQL-like syntax maps fairly directly to SQL database tables or views
working with more complex non-relational data like XML files is made straightforward
Without LINQ developing persistence layers was invariably repetitive, never a good thing.

Resources