Connecting to Cassandra on startup, and monitoring session health - phantom-dsl

Two related questions
1) Currently, the session to C* is established in a lazy fashion - aka, only on the first any table is accessed.
Instead, we would like to establish a session as soon as the application is started (in case there is a connectivity problem, etc. ). What would be the best way to do that? Should I just get a session object in my startup code?
connector.provider.session
2) How would I then monitor the health of the connection? I could call
connector.provider.session.isClosed()
but I'm not sure it will do the job.

I wouldn't manually rely on that mechanism per say as you may want to get more metrics out of the cluster, for which purpose you have native JMX support, so through the JMX protocol you can look at metrics in more detail.
Now obviously you have OpsCenter which natively leverages this feature, but alternatively you can use a combination of a JMX listener with something like Graphana(just a thought) or whatever supports native compatibility.
In terms of low level methods, yes, you are on the money:
connector.provider.session.isClosed()
But you also have heartbeats that you can log and look at and so on. There's more detail here.

Related

How does JanusGraph.open() work and how to scale?

I am evaluating different Graph databases and libraries etc.. and JanusGraph seems to be providing most of what I need. I do have a couple of questions:
I would like to connect to it via Gremlin Server with Cluster option, however I don't seem to see any Java examples to handle transaction rollbacks etc at all.
And if I was to use JanusGraphFactory.open("...") option, how exactly does this work? Would it mean the entire Graph is loaded into memory in JVM?
If the entire graph is loaded into memory, how would one scale up and different JVMs are keeping up to date with each other?
Thanks & regards
Tin
I would like to connect to it via Gremlin Server with Cluster option, however I don't seem to see any Java examples to handle transaction rollbacks etc at all.
Connecting to Gremlin Server involves sessionless communication, meaning each request equals one transaction. You can connect with a session but it is not typically encouraged for most use cases.
And if I was to use JanusGraphFactory.open("...") option, how exactly does this work? Would it mean the entire Graph is loaded into memory in JVM?
It just creates a reference to the data and provides a Graph instance from which you can create a GraphTraversalSource to interact with for spawning traversals. It doesn't load any of that data into memory just by virtue of calling it.

Process Laravel/Redis job from multiple server

We are building a reporting app on Laravel that need to fetch users data from a third-party server that allow 1 request per seconds.
We need to fetch 100K to 1000K rows based on user and we can fetch max 250 rows per request.
So the restriction is:
1. We can send 1 request per seconds
2. 250 rows per request
So, it requires 400-4000 request/jobs to fetch a user data, So, loading data for multiple users is very time-consuming and the server gets slow.
So, now, we are planning to load the data using multiple servers, like 4-10 servers to fetch users data, so we can send 10 requests per second from 10 servers.
How can we design the system and process jobs from multiple servers?
Is it possible to use a dedicated server for hosting Redis and connect to that Redis server from multiple servers and execute jobs? Can any conflict/race-condition happen?
Any hint or prior experience related to this would be really helpful.
The short answer is yes, this is absolutely possible and is something I've implemented in production apps many times before.
Redis is just like any other service and can run anywhere, with clients from anywhere, connecting to it. It's all up to your configuration of the server to dictate how exactly that happens (and adding passwords, configuring spiped, limiting access via the firewall, etc.). I'd reccommend reading up on the documentation they have in the Administration section here: https://redis.io/documentation
Also, when you do make the move to a dedicated Redis host, with multiple clients accessing it, you'll likely want to look into having more than just one Redis server running for reliability, high availability, etc. Redis has efficient and easy replication available with a few simple configuration commands, which you can read more about here: https://redis.io/topics/replication
Last thing on Redis, if you do end up implementing a master-slave set up, you may want to look into high availability and auto-failover if your Master instance were to go down. Redis has a really great utility built into the application that can monitor your Master and Slaves, detect when the Master is down, and automatically re-configure your servers to promote one of the slaves to the new master. The utility is called Redis Sentinel, and you can read about that here: https://redis.io/topics/sentinel
For your question about race conditions, it depends on how exactly you write your jobs that are pushed onto the queue. For your use case though, it doesn't sound like this would be too much of an issue, but it really depends on the constraints of the third-party system. Either way, if you are subject to a race condition, you can still implement a solution for it, but would likely need to use something like a Redis Lock (https://redis.io/topics/distlock). Taylor recently added a new feature to the upcoming Laravel version 5.6 that I believe implements a version of the Redis Lock in the scheduler (https://medium.com/#taylorotwell/laravel-5-6-preview-single-server-scheduling-54df8e0e139b). You can look into how that was implemented, and adapt for your use case if you end up needing it.

How to enable monitoring in Oracle Service Bus 11g?

I have been looking to enable monitoring in OSB 11g? I am not exactly sure how to achieve this?
Thanks
It depends on what you mean by "monitoring" as there are many different kinds and a lot depends on your functional requirements around monitoring too.
Monitoring can be:
* Proactive (When you actively look for patterns - preferably automatically but also possible manually - and detect issues before they occur or get alerted to those immediately after they occur)
* Reactive (when you are trying to debug an issue after it has occurred)
Monitoring can also be:
* Technical - check for signs of timeouts, long running invocations etc. Technical monitoring can be at:
Application level (OSB specific in your case)
Platform level (Application server/JVM/operating system - after all, for OSB monitoring to work, you need to ensure/monitor that the OSB itself is running!)
*Functional (often involves explicit logging from your code but can be co-related to technical patterns - e.g. number of invocations of a particular API/service might indicate number of orders).
Functional monitoring can also include SLA monitoring
Finally, in the Oracle Service bus:
* You can enable monitoring at the individual service level (via the Operations tab under each service or via scripting in WSLT)
* The monitoring above can be combined with rules to alert on specifc scenarios (such as SLA breaches)
* You can use specific log entries within your pipelines and then monitor those at runtime
There is a lot more you can do to "monitor" services depending on what is relevant for your services. Although OSB monitoring can be performed via various consoles (/sbconsole or /em in 12c), a lot of good monitoring combines these features into well designed alerts so that you are always on top of potential problems. You can reach this stage by constantly observing your system's behaviour and then improving/tweaking your monitoring solution(s).
This is a good document to read to start:
https://docs.oracle.com/cd/E29542_01/admin.1111/e15867/monitoring_ops.htm#OSBAG472
HTH.

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.
RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

mod_jk vs mod_cluster

Can someone please tell me the pro's and con's of mod_jk vs mod_cluster.
We are looking to do very simple load balancing.. We are going to be using sticky sessions and just need something to route new requests to a new server if one server goes down. I feel that mod_jk does this and does a good job so why do I need mod_cluster?
If your JBoss version is 5.x or above, you should use mod_cluster, it will give you a better performance and reliability than mod_jk. Here you've some reasons:
better load balacing between app servers: the load balancing logic is calculated based on information and metrics provided directly by the applications servers (bear in mind they have first hand information about its load), in contrast with mod_jk with which the logic is calculated by the proxy itself. For that, mod_cluster uses an extra connection between the servers and the proxy (a part from the data one), used to send this load information.
better integration with the lifecycle of the applications deployed in the servers: the servers keep the proxy informed about the changes of the application in each respective node (for example if you undeploy the application in one of the nodes, the node will inform the proxy (mod_cluster) immediately, avoiding this way the inconvenient 404 errors.
it doesn't require ajp: you can also use it with http or https.
better management of the servers lifecycle events: when a server shutdowns or it's restarted, it informs the proxy about its state, so that the proxy can reconfigure itself automatically.
You can use sticky sessions as well with mod cluster, though of course, if one of the nodes fails, mod cluster won't help to keep the user sessions (as it would happen as well with other balancers, unless you've the JBoss nodes in cluster). But due to the reasons given above (keeping track of the server lifecycle events, and better load balancing mainly), in case one of the servers goes down, mod cluster will manage it better and more transparently to the user (the proxy will be informed immediately, and so it will never send requests to that node, until it's informed that it's restarted).
Remember that you can use mod_cluster with JBoss AS/EAP 5.x or JBoss Web 2.1.1 or above (in the case of Tomcat I think it's version 6 or above).
To sum up, though your use case of load balancing is simple, mod_cluster offers a better performance and scalability.
You can look for more information in the JBoss site for mod_cluster, and in its documentation page.

Resources