Single Node Module Mean Stack - mean-stack

Can I use only 1 node modules in my project instead having multiple node modules?
1 for mongodb, expressjs and 1 for angular?
How to link node modules in 1 folder?

Related

Infinispan clustered REPL_ASYNC cache: command indefinitely bounced between two nodes

Im running a spring boot application using infinispan 10.1.8 in a 2 node cluster. The 2 nodes are communicating via jgroups TCP. I configured several REPL_ASYNC.
The problem:
One of these caches, at some point is causing the two nodes to exchange the same message over and over, causing high CPU and memory usage. The only way to stop this is to stop one of the two nodes.
More details, here is the configuration.
org.infinispan.configuration.cache.Configuration replAsyncNoExpirationConfiguration = new ConfigurationBuilder()
.clustering()
.cacheMode(CacheMode.REPL_ASYNC)
.transaction()
.lockingMode(LockingMode.OPTIMISTIC)
.transactionMode(TransactionMode.NON_TRANSACTIONAL)
.statistics().enabled(cacheInfo.isStatsEnabled())
.locking()
.concurrencyLevel(32)
.lockAcquisitionTimeout(15, TimeUnit.SECONDS)
.isolationLevel(IsolationLevel.READ_COMMITTED)
.expiration()
.lifespan(-1) //entries do not expire
.maxIdle(-1) // even when they are idle for some time
.wakeUpInterval(-1) // disable the periodic eviction process
.build();
One of these caches (named formConfig) is causing me abnormal communication between the two nodes, this is what happens:
with jmeter I generate traffic load targeting only node 1
for some time node 2 receives cache entries from node 1 via SingleRpcCommand, no anomalies, even formConfig cache behaves properly
after some time a new cache entry is sent to the formConfig cache
At this point the same message seems to keep bouncing between the two nodes:
node 1 sends entry mn-node1.company.acme-develop sending command to all: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850]
node 2 receives the entry mn-node2.company.acme-develop received command from mn-node1.company.acme-develop: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850]
node 2 sends the entry back to node 1 mn-node2.company.acme-develop sending command to all: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850]
node 1 receives the entry mn-node1.company.acme-develop received command from mn-node2.company.acme-develop: SingleRpcCommand{cacheName='formConfig', command=PutKeyValueCommand{key=SimpleKey [form_config,MECHANICAL,DESIGN,et,7850],
node 1 sends the entry to node 2 and so on and on...
Some other things:
the system is not under load, jmeter is running only few users in parallel
Even stopping jmeter this loop doesn't stop
formConfig is the only cache that behaves this way. All the other REPL_ASYNC caches work properly. I deactivated only formConfig cache and the system is working correctly.
I cannot reproduce the problem with two nodes running on my machine
Here's a more complete log file including logs from both nodes.
Other infos:
opendjdk 11 hot spot
spring boot 2.2.7
infinispan spring boot starter 2.2.4
using JbossUserMarshaller
I'm suspecting
something related to transactional configuration
or something related to serialization/deserialization of the cached object
The only scenario where this can happen is when the SimpleKey has different hashCode().
Are there any exceptions in the log? Are you able to check if the hashCode() is the same after serialization & deserialization of the key?

How to add jar dependency to dataproc cluster in GCP?

In particular, how do I add the spark-bigquery-connector so that I can query data from within dataproc's Jupyter web interface?
Key links:
- https://github.com/GoogleCloudPlatform/spark-bigquery-connector
Goal:
To be able to run something like:
s = spark.read.bigquery("transactions")
s = (s
.where("quantity" >= 0)
.groupBy(f.col('date'))
.agg({'sales_amt':'sum'})
)
df = s.toPandas()
There are basically 2 ways to achieve what you want:
1 At Cluster creation:
You will have to creat an initialization script (param --initialization-actions) to install you dependencies.
https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/init-actions
2 At Cluster creation:
You can specify a customized image to be used when creating your cluster.
https://cloud.google.com/dataproc/docs/guides/dataproc-images
3 At job runtime:
You can pass the additional jar files when you run the job using the --jars parameter:
https://cloud.google.com/sdk/gcloud/reference/beta/dataproc/jobs/submit/pyspark#--jars
I recommend (3) if you have a simple .jar dependency to run, like scoop.jar
I recommend (1) if you have lots of packages to install before running your jobs. It gives you much more control.
Option (2) definitely gives you total control, but you will have to maintain the image yourself (apply patches, upgrade etc) so unless you really need it I don't recommend.

Invoke a command on a node in Karaf Cellar based on NodeID

At the moment, I have a cellar setup with just two nodes (meant for testing); as seen in the dump below:
| Id | Alias | Host Name | Port
--+-------------------+----------------+--------------+-----
x | 192.168.99.1:5702 | localhost:8182 | 192.168.99.1 | 5702
| 192.168.99.1:5701 | localhost:8181 | 192.168.99.1 | 5701
Edit 1 -- Additional Information about the setup (begin):
I have multiple cellar nodes. I am trying to make one node as a master, which is supposed to expose a management web panel via which I would like to fetch stats from all the other nodes. For this purpose, I have exposed my custom implementations of Mbeans involving my business logics. I understand that these mbeans can be invoked using Jolokia, and I am already doing that. So, that means, all these different nodes will have Jolokia installed, while the master node will have Hawtio installed (such that I can connect to slave nodes via Jolokia API through hawtio panel).
Right now, I am manually assigning the alias for every node (which refers to the web endpoint that it exposes via pax.web configuration). This is just a workaround to simplify my testing procedures.
Desired Process:
I have access to the ClusterManager service via service registry. Thus, I am able to invoke clusterManager.listNodes() and loop through the result in my MBean. While looping through this, all I get is the basic node info. But, if it is possible, I would like to parse the etc/org.ops4j.pax.web.cfg file from every node and get the port number (or the value of the property org.osgi.service.http.port).
While retrieving the list of nodes, I would like to get a response as:
{
"Node 1": {
"hostname": "192.168.0.100",
"port": 5701,
"webPort": "8181",
"alias": "Data-Node-A"
"id": "192.168.0.100:5701"
},
"Node 2": {
"hostname": "192.168.0.100",
"port": 5702,
"webPort": "8182",
"alias": "Data-Node-B",
"id": "192.168.0.100:5702"
}
}
Edit 1 (end):
I am trying to find a way to execute specific commands on a particular node. For example, I want to execute a command on Node *:5702 from *:5701 such that *:5702 returns the properties and values of a local configuration file.
My current method is not optimal, as I am setting the alias(the web endpoint for jolokia) of a node manually, and based on that I am retrieving my desired info via my custom mbean. I guess, this is not the best practice.
So far, I have:
Set<Node> nodes = clusterManager.listNodes();
thus, if I loop through this set of nodes, I would like to retrieve config settings from local configuration file from every node based on the node ID.
Do I need to implement something specific to dosgi here?
Or would it be something similar to the sample code of ping-pong (https://github.com/apache/karaf-cellar/tree/master/utils/src/main/java/org/apache/karaf/cellar/utils/ping) from apache-cellar project?
Any input on this would be very helpful.
P.S. I tried posting this in Karaf mailing list, but my posts are getting bounced.
Regards,
Cooshal.

How does Multiple shards per module support works in ODL nitrogen

The module-shards.conf file mentions the following :
For Helium we support only one shard per module. Beyond Helium we will support more than 1
Have ODL Nitrogen and trying to use "module-shards.conf" like the following :
module-shards = [
{
name = "default"
shards = [
{
name="default1"
replicas = [
"member-1"
]
},
{
name="default2"
replicas = [
"member-1"
]
}
]
}
]
That seems to be working as karaf shows the 2 shards successfully created
for the same module "default". But how the data is distributed among
the 2 shards is not clear, can application control/decide which data goes in which of the module shards.
I am not clear how the application config tree data and the operational tree
data for a given module be stored under different shards for the same module.
If there are multiple shards for the same module, can application decide/control
which shard to use for storing a particular type of data. For example,
can application decide to store the config tree in "default1" shard
and operational tree in "default2" shard for the same module "default" ?
Is it possible to disable operational tree component for one of the shards of
a given module, say disable operational tree component for shard "default2"
of module "default" with 2 shards namely "default1" and "default2" ?
While it says for Helium only one shard per module is supported, that is still the case - support for more than 1 was never implemented. There hasn't been a use case for that. The shards are created but transactions are only routed to one of them (ie picks the first one that is found).

how does the task-tracker gets the data for map task from another node if data is not-local?

How does the task tracker gets its data for map task from another node in case if data is not-local?
Does it talk directly to the data node of the machine containing data directly or it talks to its own data node which in-turn talks to the other one?
Thanks,
Suresh.
The task tracker itself doesn't get the data - it launches (or reuses) a JVM to run a Map task. The map task uses the DFS File System client to query the name node for the block locations of the file it is to process. The client then connects to the data node where one of the blocks is replicated to actually acquire the file contents (as a stream).
If you want to delve deeper, the source is an excellent place to get a good understanding - check out the DFSClient and inner class DFSInputStream (especially the bestNode method)
http://svn.apache.org/viewvc/hadoop/common/tags/release-0.20.2/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?view=markup
Class starts around line 1443
openInfo() method # line 1494
chooseDataNode() method # 1800

Resources