Where to find the $PEERID used in Chainlink Offchain Reporting Jobs .toml file --> P2P BootstrapPeers =[...] - chainlink

Where can the $PEERID be found, which is used in Chainlink Offchain Reporting Jobs .toml file
--> P2P BootstrapPeers =[...] ?
This is the link to Offchain Reporting Jobs in Chainlink:
https://docs.chain.link/docs/jobs/types/offchain-reporting/
Question concerning the OCR toml job specification:
The syntax of p2pBootstrapPeer is the following:
/dns4/$DNS/tcp/$PORT/p2p/$PEERID
---> where in my specifications can I find the $PEERID ?
In the Chainlink Nodeoperator GUI? If yes, where exactly?
If not, where else? (edited)

Couple ways to set this:
You can manually set it via an env_car called P2P_PEER_ID. If you don't override it, it will try use this value by default
As per the docs you linked, set the p2pPeerID value in your job spec, which will become the $PEERID. eg
p2pPeerID = "12D3KooWPjceQrSwdWXPyLLeABRXmuqt69Rg3sBYbU1Nft9HyQ6X"

Related

Form Recognizer Heavy Workload

My use case is the following :
Once every day I upload 1000 single page pdf to Azure Storage and process them with Form Recognizer via python azure-form-recognizer latest client.
So far I’m using the Async version of the client and I send the 1000 coroutines concurrently.
tasks = {asyncio.create_task(analyse_async(doc)): doc for doc in documents}
pending = set(tasks)
# Handle retry
while pending:
# backoff in case of 429
time.sleep(1)
# concurrent call return_when all completed
finished, pending = await asyncio.wait(
pending, return_when=asyncio.ALL_COMPLETED
)
# check if task has exception and register for new run.
for task in finished:
arg = tasks[task]
if task.exception():
new_task = asyncio.create_task(analyze_async(doc))
tasks[new_task] = doc
pending.add(new_task)
Now I’m not really comfortable with this setup. The main reason being the unpredictable successive states of the service in the same iteration. Can be up then throw 429 then up again. So not enough deterministic for me. I was wondering if another approach was possible. Do you think I should rather increase progressively the transactions. Start with 15 (default TPS) then 50 … 100 until the queue is empty ? Or another option ?
Thx
We need to enable the CORS and make some changes to that CORS to make it available to access the heavy workload.
Follow the procedure to implement the heavy workload in form recognizer.
Make it for page blobs here for higher and best performance.
Redundancy is also required. Make it ZRS for better implementation.
Create a storage account to upload the files.
Go to CORS and add the URL required.
Set the Allowed origins to https://formrecognizer.appliedai.azure.com
Go to containers and upload the documents.
Upload the documents. Use the container and blob information to give as the input for the recognizer. If the case is from Form Recognizer studio, the size of the total documents is considered and also the number of characters limit is there. So suggested to use the python code using the container created as the input folder.

following chainlink fulfilling-requests doc doesn't work

We had setup the chainlink node on gcp and following https://docs.chain.link/docs/fulfilling-requests/. The job is not fulfilled and the currentprice property always returns 0.
Oracle: https://remix.ethereum.org/#url=https://docs.chain.link/samples/NodeOperators/Oracle.sol&optimize=false&runs=200&evmVersion=null&version=soljson-v0.6.6+commit.6c089d02.js
Consumer: https://remix.ethereum.org/#url=https://docs.chain.link/samples/APIRequests/ATestnetConsumer.sol&optimize=false&runs=200&evmVersion=null&version=soljson-v0.8.7+commit.e28d00a7.js
Thanks
Generally, When we create new job on node it assign externalJobId to job which is like '07269a61-586c-4831-bc26-7d2b581fbd1c'. But when you call function in consumer contract you have to provide this externalJobId in function without dashes so it will be like '07269a61586c4831bc267d2b581fbd1c'.
This is not mentioned in chainlink docs in noticeable way
https://docs.chain.link/docs/fulfilling-requests/

How to get Prometheus Node Exporter metrics with JSON format

I deployed Prometheus Node Exporter pod on k8s. It worked fine.
But when I try to get system metrics by calling Node Exporter metric API in my custom Go application
curl -X GET "http://[my Host]:9100/metrics"
The result format was like this
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.7636e-05
go_gc_duration_seconds{quantile="0.25"} 2.466e-05
go_gc_duration_seconds{quantile="0.5"} 5.7992e-05
go_gc_duration_seconds{quantile="0.75"} 9.1109e-05
go_gc_duration_seconds{quantile="1"} 0.004852894
go_gc_duration_seconds_sum 1.291217651
go_gc_duration_seconds_count 11338
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.12.5"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 2.577128e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 2.0073577064e+10
.
.
.
something like this
Those long texts are hard to parse and I want to get the results in JSON format to parse them easily.
https://github.com/prometheus/node_exporter/issues/1062
I checked Prometheus Node Exporter GitHub Issues and someone recommended prom2json.
But this is not I'm looking for. Because I have to run extra process to execute prom2json to get results. I want to get Node Exporter's system metric by simply calling HTTP request or some kind of Go native packages in my code.
How can I get those Node Exporter metrics in JSON format?
You already mentioned prom2json and you can pull the package into your Go file by importing github.com/prometheus/prom2json.
The sample executable in the repo has the all building blocks you need. First, open the URL and then use the prom2json package to read the data and store the result.
However, you should also have a look at expfmt.TextParser as that is the native way to ingest Prometheus formatted metrics.

Hive execution hook

I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.
The current environment I am using is given below:
Hadoop : Cloudera version 4.1.2
Operating system : Centos
Thanks,
Arun
There are several types of hooks depending on at which stage you want to inject your custom code:
Driver run hooks (Pre/Post)
Semantic analyizer hooks (Pre/Post)
Execution hooks (Pre/Failure/Post)
Client statistics publisher
If you run a script the processing flow looks like as follows:
Driver.run() takes the command
HiveDriverRunHook.preDriverRun()
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Driver.compile() starts processing the command: creates the abstract syntax tree
AbstractSemanticAnalyzerHook.preAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Semantic analysis
AbstractSemanticAnalyzerHook.postAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Create and validate the query plan (physical plan)
Driver.execute() : ready to run the jobs
ExecuteWithHookContext.run()
(HiveConf.ConfVars.PREEXECHOOKS)
ExecDriver.execute() runs all the jobs
For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
ClientStatsPublisher.run() is called to publish statistics
(HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
If a task fails: ExecuteWithHookContext.run()
(HiveConf.ConfVars.ONFAILUREHOOKS)
Finish all the tasks
ExecuteWithHookContext.run() (HiveConf.ConfVars.POSTEXECHOOKS)
Before returning the result HiveDriverRunHook.postDriverRun() ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Return the result.
For each of the hooks I indicated the interfaces you have to implement. In the brackets
there's the corresponding conf. prop. key you have to set in order to register the
class at the beginning of the script.
E.g: setting the PreExecution hook (9th stage of the workflow)
HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;
Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.
Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distribution
differs (too much)
a good start --> http://dharmeshkakadia.github.io/hive-hook/
there are examples...
note: hive cli from console show the messages if you execute from hue, add a logger and you can see the results in hiveserver2 log role.

Run #Scheduled task only on one WebLogic cluster node?

We are running a Spring 3.0.x web application (.war) with a nightly #Scheduled job in a clustered WebLogic 10.3.4 environment. However, as the application is deployed to each node (using the deployment wizard in the AdminServer's web console), the job is started on each node every night thus running multiple times concurrently.
How can we prevent this from happening?
I know that libraries like Quartz allow coordinating jobs inside clustered environment by means of a database lock table or I could even implement something like this myself. But since this seems to be a fairly common scenario I wonder if Spring does not already come with an option how to easily circumvent this problem without having to add new libraries to my project or putting in manual workarounds.
We are not able to upgrade to Spring 3.1 with configuration profiles, as mentioned here
Please let me know if there are any open questions. I also asked this question on the Spring Community forums. Thanks a lot for your help.
We only have one task that send a daily summary email. To avoid extra dependencies, we simply check whether the hostname of each node corresponds with a configured system property.
private boolean isTriggerNode() {
String triggerHostmame = System.getProperty("trigger.hostname");;
String hostName = InetAddress.getLocalHost().getHostName();
return hostName.equals(triggerHostmame);
}
public void execute() {
if (isTriggerNode()) {
//send email
}
}
We are implementing our own synchronization logic using a shared lock table inside the application database. This allows all cluster nodes to check if a job is already running before actually starting it itself.
Be careful, since in the solution of implementing your own synchronization logic using a shared lock table, you always have the concurrency issue where the two cluster nodes are reading/writing from the table at the same time.
Best is to perform the following steps in one db transaction:
- read the value in the shared lock table
- if no other node is having the lock, take the lock
- update the table indicating you take the lock
I solved this problem by making one of the box as master.
basically set an environment variable on one of the box like master=true.
and read it in your java code through system.getenv("master").
if its present and its true then run your code.
basic snippet
#schedule()
void process(){
boolean master=Boolean.parseBoolean(system.getenv("master"));
if(master)
{
//your logic
}
}
you can try using TimerManager (Job Scheduler in a clustered environment) from WebLogic as TaskScheduler implementation (TimerManagerTaskScheduler). It should work in a clustered environment.
Andrea
I've recently implemented a simple annotation library, dlock, to execute a scheduled task only once over multiple nodes. You can simply do something like below.
#Scheduled(cron = "59 59 8 * * *" /* Every day at 8:59:59am */)
#TryLock(name = "emailLock", owner = NODE_NAME, lockFor = TEN_MINUTE)
public void sendEmails() {
List<Email> emails = emailDAO.getEmails();
emails.forEach(email -> sendEmail(email));
}
See my blog post about using it.
You don't neeed to synchronize your job start using a DB.
On a weblogic application you can get the instanze name where the application is running:
String serverName = System.getProperty("weblogic.Name");
Simply put a condition two execute the job:
if (serverName.equals(".....")) {
execute my job;
}
If you want to bounce your job from one machine to the other, you can get the current day in the year, and if it is odd you execute on a machine, if it is even you execute the job on the other one.
This way you load a different machine every day.
We can make other machines on cluster not run the batch job by using the following cron string. It will not run till 2099.
0 0 0 1 1 ? 2099

Resources