I'm struggling to unpick what the output from beantools tail on a beanstalk tube means exactly, specifically age, reserves & releases.
stat shows one job in this tube, but tail spits out thousands of these with the same job id:
id: 1, length: 184, priority: 1024, delay: 0, age: 45, ttr: 60
reserves: 101414, releases: 101413, buries: 0, kicks: 0, timeouts: 0
body:{snip}
age - age in seconds
reserves - a secondary id for this job after getting put back in the queue
releases - the reserve job that's going to get put back in the queue after this one is done
The huge numbers of reserves on the same job ID were caused by the process breaking on a timeout and not being caught - beanstalk saw the job failed and reserved it in a loop.
Related
This is a code snippet to run KMeans using GPU.
Documentation-link:https://pycave.borchero.com/sites/generated/clustering/kmeans/pycave.clustering.KMeans.html
import torch
from pycave.clustering import KMeans
X = torch.cat([
torch.randn(1000, 6) - 5,
torch.randn(1000, 6),
torch.randn(1000, 6) + 5,
])
estimator = KMeans(num_clusters = 3, trainer_params=dict(gpus=1,
enable_progress_bar=0,
max_epochs=100,))
labels = estimator.fit_predict(X).numpy()
pd.value_counts(labels)
The issue is with how to disable the console output from the estimator.
Current Output:
Running initialization...
{'batch_size': 3000, 'collate_fn': <function collate_tensor at 0x000002BE21221700>}
Fitting K-Means...
{'batch_size': 3000, 'collate_fn': <function collate_tensor at 0x000002BE21221700>}
{'batch_size': 1, 'sampler': None, 'batch_sampler': <pytorch_lightning.overrides.distributed.IndexBatchSamplerWrapper object at 0x000002BE593A55B0>, 'collate_fn': <function collate_tensor at 0x000002BE21221700>, 'shuffle': False, 'drop_last': False}
0 1000
2 1000
1 1000
dtype: int64
Expected Output:
0 1000
2 1000
1 1000
dtype: int64
Info regarding trainer_params parameter
(Optional[Dict[str, Any]]) --
Initialization parameters to use when initializing a PyTorch Lightning trainer. By default, it disables various stdout logs unless PyCave is configured to do verbose logging. Checkpointing and logging are disabled regardless of the log level.
The dictionaries that are printed should never be there, that's a bug in a dependency. Resolved in the latest build.
As far as the PyCave logs are concerned (Running initialization... and Fitting K-Means...), you can turn them off easily by adding the following:
import logging
from pycave import set_logging_level
set_logging_level(logging.WARNING)
Note that set_logging_level(logging.WARNING) also turns off the progress bar and the model summary automatically so you don't have to set these flags explicitly.
I'm working on sending Kibana email alerts using Elastalert. I did all the setup and postfix is also working fine but I'm getting no hits and alerts. The following are my config.yaml and frequency.yaml:
frequency.yaml
# Rule name, must be unique
name: Test email alerts
# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency
# (Required)
# Index to search, wildcard supported
index: index-*
# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 1
# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
minutes: 1
# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
filter:
- term:
log: "Performed Task"
# (Required)
# The alert is use when a match is found
alert:
- "email"
# (required, email specific)
# a list of email addresses to send alerts to
email:
- "abc#gmail.com"
config.yaml
# Any .yaml file will be loaded as a rule
rules_folder: example_rules
# How often ElastAlert will query Elasticsearch
# The unit can be anything from weeks to seconds
run_every:
minutes: 1
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
minutes: 15
# The Elasticsearch hostname for metadata writeback
# Note that every rule can have its own Elasticsearch host
es_host: *host*
# The Elasticsearch port
es_port: 9200
es_username: username
es_password: password
Output of elastalert-test-rule rules_folder/frequency.yaml
/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.4) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
To send them but remain verbose, use --verbose instead.
Didn't get any results.
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
To send them but remain verbose, use --verbose instead.
1 rules loaded
INFO:apscheduler.scheduler:Adding job tentatively -- it will be properly scheduled when the scheduler starts
INFO:elastalert:Queried rule Test email alerts from 2021-07-17 23:21 UTC to 2021-07-17 23:22 UTC: 0 / 0 hits
Would have written the following documents to writeback index (default is elastalert_status):
elastalert_status - {'rule_name': 'Test email alerts', 'endtime': datetime.datetime(2021, 7, 17, 23, 22, 7, 154742, tzinfo=tzutc()), 'starttime': datetime.datetime(2021, 7, 17, 23, 21, 6, 554742, tzinfo=tzutc()), 'matches': 0, 'hits': 0, '#timestamp': datetime.datetime(2021, 7, 17, 23, 22, 7, 183348, tzinfo=tzutc()), 'time_taken': 0.008371829986572266}
Can anyone please help me why I'm getting no hits?
I have a spout which reads from a source with 40K qps.
I have two bolt, first one which reads from the source and does a database connection to build a cache which refreshes in every hour. The database has 2 connection open for a user so executor count that I have for this bolt is 2.
Other bolt is assigned 200 executors and 200 task to process the request.
I can't increase the connection to db. And I see that all the request is going to single workers. Other workers keep waiting and prints "0 send message".
kafkaSpoutConfigList:
- executorsCount: 30
taskCount: 30
spoutName: 'kafka_consumer_spout'
topicName: 'request'
processingBoltConfigList:
- executorsCount: 2
taskCount: 2
boltName: 'db_bolt'
boltClassName: 'com.Bolt1Class'
boltSourceList:
- 'kafka_consumer_spout'
- executorsCount: 200
taskCount: 200
boltName: 'bolt2'
boltClassName: 'com.Bolt2Class'
boltSourceList:
- 'db_bolt::streamx'
kafkaBoltConfigList:
- executorsCount: 15
taskCount: 15
boltName: 'kafka_producer_bolt'
topicName: 'consumer_topic'
boltSourceList:
- 'bolt2::Stream1'
- executorsCount: 15
taskCount: 15
boltName: 'kafka_producer_bolt'
topicName: 'data_test'
boltSourceList:
- 'bolt2::Stream2'
I am using localandgroupshuffling.
When you use LocalOrShuffleGrouping, the following happens:
If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping
So let's say your workers look like this:
worker1: {"bolt1 task 1", "bolt2 task 0-50"}
worker2: { "bolt1 task 2", "bolt2 task 50-100"}
worker3: { "bolt2 task 100-150"}
worker4: { "bolt2 task 150-200"}
In this case because you're telling Storm to use a local grouping when sending from bolt1 to bolt2, all the tuples will be going to worker 1 and 2. Worker 3 and 4 will be idle.
If you want to send tuples also to worker 3 and 4, you need to switch to shuffle grouping.
[After a few answers and comments I asked a new question based on the knowledge gained here: Out of memory in Hive/tez with LATERAL VIEW json_tuple ]
One of my query consistently fails with the error:
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1516602562532_3606_2_03, diagnostics=[Task failed, taskId=task_1516602562532_3606_2_03_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_e113_1516602562532_3606_01_000008 finished with diagnostics set to [Container failed, exitCode=255. Exception from container-launch.
Container id: container_e113_1516602562532_3606_01_000008
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 255
]], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
The keyword here seems to be java.lang.OutOfMemoryError: Java heap space.
I looked around but none of what I thought I understood from Tez helps me:
yarn-site/yarn.nodemanager.resource.memory-mb is maxed up => I use all the memory I can
yarn-site/yarn.scheduler.maximum-allocation-mb: same as yarn.nodemanager.resource.memory-mb
yarn-site/yarn.scheduler.minimum-allocation-mb = 1024
hive-site/hive.tez.container.size = 4096 (multiple of yarn.scheduler.minimum-allocation-mb)
My query has 4 mappers, 3 go very fast, the 4th dies everytime. Here is the Tez graphical view of the query:
From this image:
table contact has 150M rows, 283GB of ORC compressed data (there is one large json field, LATERAL VIEW'ed)
table m has 1M rows, 20MB of ORC compressed data
table c has 2k rows, < 1MB ORC compressed
table e has 800k rows, 7GB of ORC compressed
e is LEFT JOIN'ed with all the other tables
e and contact are partitioned and only one partition in selected in the WHERE clause.
I thus tried to increase the number of maps:
tez.grouping.max-size: 650MB by default, even if I lower it to -
tez.grouping.min-size (16MB) it makes no difference
tez.grouping.split-count even increased to 1000 does not make a difference
tez.grouping.split-wave 1.7 by default, even increased to 5 makes no difference
If it's relevant, here are some other memory settings:
mapred-site/mapreduce.map.memory.mb = 1024 (Min container size)
mapred-site/mapreduce.reduce.memory.mb = 2048 (2 * min container size)
mapred-site/mapreduce.map.java.opts = 819 (0.8 * min container size)
mapred-site/mapreduce.reduce.java.opts = 1638 (0.8 * mapreduce.reduce.memory.mb)
mapred-site/yarn.app.mapreduce.am.resource.mb = 2048 (2 * min container size)
mapred-site/yarn.app.mapreduce.am.command-opts = 1638 (0.8 * yarn.app.mapreduce.am.resource.mb)
mapred-site/mapreduce.task.io.sort.mb = 409 (0.4 * min container size)
My understanding was that tez can split the work in many loads, thus taking long but eventually completing. Am I wrong, or is there a way I have not found?
context: hdp2.6, 8 datanodes with 32GB Ram, query using a chunky lateral view based on json run via beeline.
The issue is clearly due to SKEWED data. I would recommand that you add DISTRIBUTE BY COL to you select query from source so that the reducer has evenly distributed data. In the below example COL3 is more evenly distributed data like ID column
Example
ORIGINAL QUERY : insert overwrite table X AS SELECT COL1,COL2,COL3 from Y
NEW QUERY : insert overwrite table X AS SELECT COL1,COL2,COL3 from Y distribute by COL3
I had the same issue and increasing all the memory parameter didnt help.
Then I switched to MR and got the below error.
Failed with exception Number of dynamic partitions created is 2795, which is more than 1000.
After setting the higher value I returned back to tez, and the problem was solved.
The following line in /etc/bashrc_Apple_Terminal
shell_session_history_enable() {
(umask 077; touch "$SHELL_SESSION_HISTFILE_NEW") <<< THIS LINE
HISTFILE="$SHELL_SESSION_HISTFILE_NEW"
SHELL_SESSION_HISTORY=1
}
is printing something like this on every new session.
/Users/me/.bash_sessions/717F6632-A946-44EE-8A27-2547EDDD09E9.historynew Stats {
dev: 16777220,
mode: 33152,
nlink: 1,
uid: 501,
gid: 20,
rdev: 0,
blksize: 4096,
ino: 1406878,
size: 0,
blocks: 0,
atimeMs: 1502801769000,
mtimeMs: 1502801769000,
ctimeMs: 1502801769000,
birthtimeMs: 1502801769000,
atime: 2017-08-15T12:56:09.000Z,
mtime: 2017-08-15T12:56:09.000Z,
ctime: 2017-08-15T12:56:09.000Z,
birthtime: 2017-08-15T12:56:09.000Z }
Closest thing as to when is since last MacOS update.
What's an elegant way to solve this without changing this file I don't really want to change?
This post answers my question
How to deactivate bash_history stats print when opening a new terminal window on my mac?
I didn't entertain the possibility that there was an alias for touch, but indeed this was the case.