How do you provision new node on the cloud with puppet 2? - amazon-ec2

Every node should have the valid certificate to connect to the puppet-master.
In the past, the nodes are static and setup by the operator and execute the ca-sign by themselves.
Today, the nodes are created dynamically by the api of cloud providers, such as AWS, Azure, etc.
To provision a new node is a problem because we don't know the node's domain or IP, before it is running and attached to an IP.
I found some strategies on the Internet:
Make a pre-sign ca and ship it with node created
Just enable the autosign feature anyway
Leverage the external node classifier
About the pre-sign method, it will be like this:
https://gist.github.com/zipkid/3496753
The related work is
Generate the ca for your node
Put the ca files to the new node
Perhaps you want to change the hostname for the external node classifier.
It is better for the prebuilt image like the ec2 image,
we can build it with the presign ca.
About the enable autosign method, just edit the autosign rules.
I think it is good for a special case like the ec2's vpc, because every node cannot be connected from the Internet.
About the external node classifier method, there are many variations:
Validate the node, if it is a valid node to include the useful classes, or the noop class
Validate the node, if it is a valid node to add the new entry to autosign rule
How do you provision a cloud node with Puppet's ca ? Could you like to share it ?
PS. puppet 3's new feature policy-based-autosigning

Related

Dataflow PubSub to Elasticsearch Template proxy

We need to create a Dataflow job that ingests from PubSub to Elasticsearch but the job can not make outbound internet connections to reach Elastic Cloud.
Is there a way to pass proxy parameters to the Dataflow vm on creation time?
Found this article but proxy parameters are part of a maven app, I'm not sure how to use it here.
https://leifengblog.net/blog/run-dataflow-jobs-in-a-shared-vpc-on-gcp/
Thanks
To reach an external endpoint you’ll need to configure internet access and firewall settings, depending on your use case, your VMs may also need access to other resources you can check in this document which method you’ll need to configure for Dataflow. Before selecting the method that you’ll choose please check the document how to specify a network or a subnetwork.
In GCP, in subnetwork, you can enable Google Private Access, and the VMs in that subnetwork will be able to reach all the GCP endpoints (Dataflow, BigQuery, etc), even if they have private IPs only. There is no need to set up a proxy. See this document.
For instance, for Java pipelines, I normally use private IPs only for the Dataflow workers, and they are able to reach Pubsub, BigQuery, Bigtable, etc.
For Python pipelines, if you have external dependencies, the workers will need to reach the PyPi, and for that, you need Internet connectivity. If you want to use private IPs in Python pipelines, you can ship those external dependencies in a custom container, so the workers don't need to download them.
You can use a maven file right after you write your pipeline, you must create and stage your template file(mvn) you can follow this example.

best practice to setup a opnSense HA environment

if I would like to setup a opnSense HA cluster of two nodes, what is the best practice to setup such an environment.
My preferred approach would be:
setup the first node IP
setup the physical interfaces
setup the link aggregation(s)
setup the VLANs
setup the needed services
Now it is unclear for me (and also from the documentation), if I can setup a CARP (HA) with the second and if all these settings will be automatically synchronized to the second node?
Or do I need to resetup all the configurations also for the second node and afterwards to setup the CARP?
If the later case is the fact and I need to setup some things redundantly on the second node:
what are these things, which need to be done manually?
is there any way to manually export these settings from first/master and to reimport to the second node?
There is a sync button to force syncing all stuff (which is selected in System : HA : Configuration), so no matter if you set up services before of after activating HA.
Please know, HA (XMLRPC Sync) and CARP are not the same. XMLRPC only syncs the configuration, CARP is only a protocol to switch IP addresses on nodes, but it uses the HA link to exchange states.
I for myself also use just HA Sync for a customer to exchange configuration to a passive standby node on a different DC.

How to configure chef to re-bootstrap on instance termination?

I can't seem to find any documentation anywhere on how this is possible.
If I knife bootstrap a new AWS instance, and then a few weeks later that machine goes offline, I would like my Chef server to detect this, and bootstrap a new machine.
I understand how to make chef create a new AWS instance, and bootstrap the instance.
I do not know how to make chef detect that a previously deployed box is no longer available.
I can use the chef API to search for existing nodes. But I do not know how to check that those nodes are still accessible over network, or how to run this check regularly.
I believe I am missing something simple? Most resources I have found on this issue assume that this doesn't need to be discussed, as it is self-evident?

How to create a Janusgraph instance within my program to access the custom graph

JanusGraph: I have created a custom graph using ConfiguredGraphFactory and able to access this graph using the gremlin console. How can I access this graph from my Scala code?
Currently I am running a remote gremlin server and connect to this remote server from my code to preform transactions on my custom graph. I am wondering whether there are any ways to create a janusgraph instance in my program and access the graph rather than through remote server.
Janus Graph Version: 0.2.0
Currently, in TinkerPop compliant systems, you cannot instantiate a local graph reference that acts upon a remote graph reference. However, you can use withRemote to instantiate a local graph traversal object backed by a remote graph traversal reference:
gremlin> cluster = Cluster.open('conf/remote-objects.yaml')
==>localhost/127.0.0.1:8182
gremlin> graph = EmptyGraph.instance()
==>emptygraph[empty]
gremlin> g = graph.traversal().withRemote(DriverRemoteConnection.using(cluster, "g"))
==>graphtraversalsource[emptygraph[empty], standard]
gremlin> g.V().valueMap(true)
==>[name:[marko],id:1,label:person,age:[29]]
==>[name:[vadas],id:2,label:person,age:[27]]
==>[name:[lop],id:3,label:software,lang:[java]]
==>[name:[josh],id:4,label:person,age:[32]]
==>[name:[ripple],id:5,label:software,lang:[java]]
==>[name:[peter],id:6,label:person,age:[35]]
==>[name:[matthias],id:13,label:vertex]
gremlin> g.close()
gremlin> cluster.close()
However, since you are using the ConfiguredGraphFactory, I will assume your graphs are created dynamically. This would mean that your graphs and traversal objects are not bound to any variable names on the remote servers since these bindings are traditionally formed during the instantiation of the graphs defined your gremlin-server.yaml graphs {} object. As you can see above, the only way to use withRemote is to supply the name of the variable bound to the graph traversal on the remote server. JanusGraph does not currently support dynamically updating the server's global bindings, but once it does, then you will be able to use the withRemote method to get a local reference to a remote traversal object. If you need to work with local graph objects bound to remote graph references, you would need to work with the TinkerPop community to enable such functionality. For more information on this matter, please see this TinkerPop Jira.
Another solution is to remove the layer of your remote JanusGraph servers. If you are looking to run the graph processing locally, then perhaps you don't need remote JanusGraph servers at all. You could then instantiate graph references using the JanusGraphFactory and perform your queries on local graph references which talk directly to the backend datastores.
Finally, assuming you configured your remote JanusGraph servers and the backend data stores (i.e. you know how the remote JanusGraph servers are configured to talk to the backend datastores -- this would be the configurations saved on your ConfigurationManagementGraph), you could leave your remote JanusGraph servers alone, and instantiate local graph references by using the JanusGraphFactory to open a properties file with the same configuration as that for your graph as defined by the ConfigurationManagementGraph.

Consul server joining wrong cluster

I'm having some issues trying to create a cluster of consul servers when there is a previous cluster on the same network.
All these new servers use the same configuration except the bind property, retry-join and datacenter. Certificates, certificates keys and encryption passes are all the same. I've assumed there should be no issue since the hostnames are similar enough to use the same certificates.
Yet, those new servers join the previous cluster instead of just creating their own. Which is not what is desired at all.
I'm starting to think that the certificates might have something to do with those servers joining a cluster instead of creating a new one, but might need confirmation. Help?
It turns out the solution was stopping and removing everything and did everything from the scratch...
The problem can be with the encryption keys. In your case you intend to have separate clusters in one network, but you use the same encryption key. Try to use unique encryption key, one for each cluster. Here is similar discussion: link

Resources