Syslog File Size Limit - embedded-linux

I have a yocto image using syslog and logrotate. My syslog allows two log files to be created. One for debug messages, one for all other messages. My debug log file will grow to 200kb then will be moved to debug.0. Archive files such as debug.1, debug.2 etc are never created. This means I am loosing data. I set my logrotate for this file to activate daily and at a size of 10MB.
Where does syslog set the max file size of a log file? In my build my syslog.conf is installed from the busyboxappend.bb and contains
# /etc/syslog.conf Configuration file for busybox's syslogd utility
*.debug /home/root/.evcc_logs/debug
*.info /var/log/info
*.info /dev/console
busybox.bbappend
FILESEXTRAPATHS_prepend := "${THISDIR}/${PN}:"
SRC_URI += "file://evcc_kernel_features.cfg file://syslog.conf"
do_deploy() {
install -d ${D}/etc/
install -m 0755 ${WORKDIR}/syslog.conf ${D}/etc/
}
addtask deploy after do_install
do_install_append () {
install -d ${D}/home/root/.evcc_logs/
install -d ${D}/etc/evcc/
install -d ${D}/tmpfs/
install -d ${D}/tmpfs/can
}
FILES_${PN} += "/home/root/.evcc_logs/ \
/etc/evcc/ \
/tmpfs/ \
/tmpfs/can \
"
logrotate.bbappend
FILESEXTRAPATHS_prepend := "${THISDIR}/files/:"
SRC_URI_append = " file://logrotate.conf file://services file://rotaterules"
inherit systemd
SYSTEMD_SERVICE_${PN} = " logrotate.service logrotate.timer"
do_install_append(){
install -d 755 ${D}/${sysconfdir}/logrotate.d/
install -m 0644 ${WORKDIR}/logrotate.conf ${D}/etc
install -m 0644 ${WORKDIR}/rotaterules/info ${D}/etc/logrotate.d/info
install -m 0644 ${WORKDIR}/rotaterules/debug ${D}/etc/logrotate.d/debug
install -m 0644 ${WORKDIR}/rotaterules/btmp ${D}/etc/logrotate.d/btmp
install -m 0644 ${WORKDIR}/rotaterules/wtmp ${D}/etc/logrotate.d/wtmp
install -d ${D}{systemd_system_unitdir}
install -m 0644 ${WORKDIR}/services/logrotate.service ${D}${systemd_system_unitdir}
install -m 0644 ${WORKDIR}/services/logrotate.timer ${D}${systemd_system_unitdir}
}
FILES_${PN} += " \
/etc/logrotate.conf \
${base_libdir}/systemd/system/logrotate.service \
${base_libdir}/systemd/system/logrotate.timer \
/lib/systemd/system \
"
logrotate.conf
# see "man logrotate" for details
# rotate log files weekly
weekly
# keep 4 weeks worth of backlogs
rotate 4
# create new (empty) log files after rotating old ones
create
# use date as a suffix of the rotated file
dateext
# compress log files
compress
# RPM packages drop log rotation information into this directory
include /etc/logrotate.d
logrotate.timer (set to run every minute for debug)
[Unit]
Description=Timer to run log rotation every day
Requires=logrotate.service
[Timer]
Unit=logrotate.service
OnCalendar=*-*-* *:*:00
Persistent=true
[Install]
WantedBy=timers.target
logrotate.service
[Unit]
Description=Log rotation service
[Service]
Type=simple
ExecStart=/usr/sbin/logrotate -f /etc/logrotate.conf
Edit: I thought my system may have an inherent max file size that I was not aware of. This does not appear to be a feature.

It turns out this is a misconfiguration issue in busybox. I had to go into the logging settings and turn off the rotate logfile option. I also increased the syslogd read buffer size.
# CONFIG_FEATURE_ROTATE_LOGFILE is not set
CONFIG_FEATURE_SYSLOGD_READ_BUFFER_SIZE=512

Related

Add specific header to bitbake wget fetcher

I need to set a specific header to fetch an archive from a resource using the wget fetcher, analogous to:
wget --header "PRIVATE-ACCESS-TOKEN:blablablablabla https://some-resource...."
How can I set specific headers using that fetcher?
Thanks in advance!
You can do it in various ways, here are some:
Download the file manually and place it in downloads folder, as mentioned here
Override the do_fetch task:
do_fetch() {
bbnote "Fetching some file ..."
wget ...
}
But you need to take note that do_unpack uses SRC_URI, so you still gonna need to specify SRC_URI to the file URL for the unpack, example that I test with wget package itself:
LICENSE="CLOSED"
SRC_URI = "http://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.gz"
do_fetch(){
bbwarn "Fetching wget"
wget http://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.gz
}
After running do_fetch the file gets downloaded in downloads and then do_unpack unpacked it under WORKDIR of the recipe.
Specify your own wget command line for the wget fetcher:
FETCHCMD_wget = "/usr/bin/env wget --header "PRIVATE-ACCESS-TOKEN:blablablablabla""
the default wget command is present in: poky/bitbake/lib/bb/fetch2/wget.py:
self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget -t 2 -T 30 --passive-ftp --no-check-certificate"
For more information check: this link.

Slowdown when using Mutect2 container inside Nextflow

I'm trying to run MuTect2 on a sample, which on my machine using java takes about 27 minutes to run.
If I use virtually the same code, but inside Nextflow and using the GATK3:3.6 docker container to run Mutect, it takes 7 minutes longer, for seemingly no apparent reason.
Running on Ubuntu 18.04, the tumor and normal samples are from an Oncomine panel. Tumor is 4.1G, normal is 1.1G. I thought the time might be spent copying in data to the container, but 7-8 minutes seems far too long for that. Could it be from copying in reference files too?
bai_ch is the channel that brings in the tumor and normal index files
process MuTect2 {
label 'mutect'
stageInMode 'copy'
publishDir './output', mode : 'copy', overwrite : true
input:
file tumor_bam_mu from tumor_mu
file normal_bam_mu from normal_mu
file "*" from bai_ch
file mutect2_ref
file ref_index from ref_fasta_i_m
file ref_dict from Channel.fromPath(params.ref_fast_dict)
file regions_file from Channel.fromPath(params.regions)
file cosmic_vcf from Channel.fromPath(params.cosmic_vcf)
file dbsnp_vcf from Channel.fromPath(params.dbsnp_vcf)
file normal_vcf from Channel.fromPath(params.normal_vcf)
output:
file '*' into mutect_ch
script:
"""
ls
echo MuTect2 task path: \$PWD
java -jar /usr/GenomeAnalysisTK.jar \
--analysis_type MuTect2 \
--reference_sequence hg19.fa \
-L designed.bed \
--normal_panel normal_panel.vcf \
--cosmic Cosmic.vcf \
--dbsnp dbsnp.vcf \
--input_file:tumor $tumor_bam_mu \
-o mutect2.somatic.unfiltered.vcf \
--input_file:normal $normal_bam_mu \
--max_alt_allele_in_normal_fraction 0.1 \
--minPruning 10 \
--kmerSize 60
"""
}
My only thought is to create my own docker that has the reference files handy, which will probably save time for copying them in? I'd expect the nextflow+container version to run only slightly slower than the CLI version.
Check the task Bash wrapper in the task work dir to asses the performance issue.

knife vsphere requests root password - is unattended execution possible?

Is there any way to ruyn the knife vsphere for unattended execution? I have a deploy shell script which I am using to help me:
cat deploy-production-20-vm.sh
#!/bin/bash
##############################################
# These are machine dependent variables (need to change)
##############################################
HOST_NAME=$1
IP_ADDRESS="$2/24"
CHEF_BOOTSTRAP_IP_ADDRESS="$2"
RUNLIST=\"$3\"
CHEF_HOST= $HOSTNAME.my.lan
##############################################
# These are psuedo-environment independent variables (could change)
##############################################
DATASTORE="dcesxds04"
##############################################
# These are environment dependent variables (should not change per env)
##############################################
TEMPLATE="\"CentOS\""
NETWORK="\"VM Network\""
CLUSTER="ProdCluster01" #knife-vsphere calls this a resource pool
GATEWAY="10.7.20.1"
DNS="\"10.7.20.11,10.8.20.11,10.6.20.11\""
##############################################
# the magic
##############################################
VM_CLONE_CMD="knife vsphere vm clone $HOST_NAME \
--template $TEMPLATE \
--cips $IP_ADDRESS \
--vsdc MarkleyDC\
--datastore $DATASTORE \
--cvlan $NETWORK\
--resource-pool $CLUSTER \
--cgw $GATEWAY \
--cdnsips $DNS \
--start true \
--bootstrap true \
--fqdn $CHEF_BOOTSTRAP_IP_ADDRESS \
--chost $HOST_NAME\
--cdomain my.lan \
--run-list=$RUNLIST"
echo $VM_CLONE_CMD
eval $VM_CLONE_CMD
Which echos (as a single line):
knife vsphere vm clone dcbsmtest --template "CentOS" --cips 10.7.20.84/24
--vsdc MarkleyDC --datastore dcesxds04 --cvlan "VM Network"
--resource-pool ProdCluster01 --cgw 10.7.20.1
--cdnsips "10.7.20.11,10.8.20.11,10.6.20.11" --start true
--bootstrap true --fqdn 10.7.20.84 --chost dcbsmtest --cdomain my.lan
--run-list="role[my-env-prod-server]"
When it runs it outputs:
Cloning template CentOS Template to new VM dcbsmtest
Finished creating virtual machine dcbsmtest
Powered on virtual machine dcbsmtest
Waiting for sshd...done
Doing old-style registration with the validation key at /home/me/chef-repo/.chef/our-validator.pem...
Delete your validation key in order to use your user credentials instead
Connecting to 10.7.20.84
root#10.7.20.84's password:
If I step away form my desk and it prompts for PWD - then sometimes it times out and the connection is lost and chef doesn't bootstrap. Also I would like to be able to automate all of this to be elastic based on system needs - which won't work with attended execution.
The idea I am going to run with, unless provided a better solution is to have a default password in the template and pass it on the command line to knife, and have chef change the password once the build is complete, minimizing the exposure of a hard coded password in the bash script controlling knife...
Update: I wanted to add that this is working like a charm. Ideally we could have changed the centOs template we were deploying - but it wasn't possible here - so this is a fine alternative (as we changed the root password after deploy anyhow).

Logrotate does not upload to S3

I've spent some hours trying to figure out why logrotate won't successfully upload my logs to S3, so I'm posting my setup here. Here's the thing--logrotate uploads the log file correctly to s3 when I force it like this:
sudo logrotate -f /etc/logrotate.d/haproxy
Starting S3 Log Upload...
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
/var/log/haproxy-2014-12-23-044414.gz -> s3://my-haproxy-access-logs/haproxy-2014-12-23-044414.gz [1 of 1]
315840 of 315840 100% in 0s 2.23 MB/s done
But it does not succeed as part of the normal logrotate process. The logs are still compressed by my postrotate script, so I know that it is being run. Here is my setup:
/etc/logrotate.d/haproxy =>
/var/log/haproxy.log {
size 1k
rotate 1
missingok
copytruncate
sharedscripts
su root root
create 777 syslog adm
postrotate
/usr/local/admintools/upload.sh 2>&1 /var/log/upload_errors
endscript
}
/usr/local/admintools/upload.sh =>
echo "Starting S3 Log Upload..."
BUCKET_NAME="my-haproxy-access-logs"
# Perform Rotated Log File Compression
filename=/var/log/haproxy-$(date +%F-%H%M%S).gz \
tar -czPf "$filename" /var/log/haproxy.log.1
# Upload log file to Amazon S3 bucket
/usr/bin/s3cmd put "$filename" s3://"$BUCKET_NAME"
And here is the output of a dry run of logrotate:
sudo logrotate -fd /etc/logrotate.d/haproxy
reading config file /etc/logrotate.d/haproxy
Handling 1 logs
rotating pattern: /var/log/haproxy.log forced from command line (1 rotations)
empty log files are rotated, old logs are removed
considering log /var/log/haproxy.log
log needs rotating
rotating log /var/log/haproxy.log, log->rotateCount is 1
dateext suffix '-20141223'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
renaming /var/log/haproxy.log.1 to /var/log/haproxy.log.2 (rotatecount 1, logstart 1, i 1),
renaming /var/log/haproxy.log.0 to /var/log/haproxy.log.1 (rotatecount 1, logstart 1, i 0),
copying /var/log/haproxy.log to /var/log/haproxy.log.1
truncating /var/log/haproxy.log
running postrotate script
running script with arg /var/log/haproxy.log : "
/usr/local/admintools/upload.sh 2>&1 /var/log/upload_errors
"
removing old log /var/log/haproxy.log.2
Any insight appreciated.
It turned out that my s3cmd was configured for my user, not for root.
ERROR: /root/.s3cfg: No such file or directory
ERROR: Configuration file not available.
ERROR: Consider using --configure parameter to create one.
Solution was to copy my config file over. – worker1138

how to move elasticsearch data from one server to another

How do I move Elasticsearch data from one server to another?
I have server A running Elasticsearch 1.1.1 on one local node with multiple indices.
I would like to copy that data to server B running Elasticsearch 1.3.4
Procedure so far
Shut down ES on both servers and
scp all the data to the correct data dir on the new server. (data seems to be located at /var/lib/elasticsearch/ on my debian boxes)
change permissions and ownership to elasticsearch:elasticsearch
start up the new ES server
When I look at the cluster with the ES head plugin, no indices appear.
It seems that the data is not loaded. Am I missing something?
The selected answer makes it sound slightly more complex than it is, the following is what you need (install npm first on your system).
npm install -g elasticdump
elasticdump --input=http://mysrc.com:9200/my_index --output=http://mydest.com:9200/my_index --type=mapping
elasticdump --input=http://mysrc.com:9200/my_index --output=http://mydest.com:9200/my_index --type=data
You can skip the first elasticdump command for subsequent copies if the mappings remain constant.
I have just done a migration from AWS to Qbox.io with the above without any problems.
More details over at:
https://www.npmjs.com/package/elasticdump
Help page (as of Feb 2016) included for completeness:
elasticdump: Import and export tools for elasticsearch
Usage: elasticdump --input SOURCE --output DESTINATION [OPTIONS]
--input
Source location (required)
--input-index
Source index and type
(default: all, example: index/type)
--output
Destination location (required)
--output-index
Destination index and type
(default: all, example: index/type)
--limit
How many objects to move in bulk per operation
limit is approximate for file streams
(default: 100)
--debug
Display the elasticsearch commands being used
(default: false)
--type
What are we exporting?
(default: data, options: [data, mapping])
--delete
Delete documents one-by-one from the input as they are
moved. Will not delete the source index
(default: false)
--searchBody
Preform a partial extract based on search results
(when ES is the input,
(default: '{"query": { "match_all": {} } }'))
--sourceOnly
Output only the json contained within the document _source
Normal: {"_index":"","_type":"","_id":"", "_source":{SOURCE}}
sourceOnly: {SOURCE}
(default: false)
--all
Load/store documents from ALL indexes
(default: false)
--bulk
Leverage elasticsearch Bulk API when writing documents
(default: false)
--ignore-errors
Will continue the read/write loop on write error
(default: false)
--scrollTime
Time the nodes will hold the requested search in order.
(default: 10m)
--maxSockets
How many simultaneous HTTP requests can we process make?
(default:
5 [node <= v0.10.x] /
Infinity [node >= v0.11.x] )
--bulk-mode
The mode can be index, delete or update.
'index': Add or replace documents on the destination index.
'delete': Delete documents on destination index.
'update': Use 'doc_as_upsert' option with bulk update API to do partial update.
(default: index)
--bulk-use-output-index-name
Force use of destination index name (the actual output URL)
as destination while bulk writing to ES. Allows
leveraging Bulk API copying data inside the same
elasticsearch instance.
(default: false)
--timeout
Integer containing the number of milliseconds to wait for
a request to respond before aborting the request. Passed
directly to the request library. If used in bulk writing,
it will result in the entire batch not being written.
Mostly used when you don't care too much if you lose some
data when importing but rather have speed.
--skip
Integer containing the number of rows you wish to skip
ahead from the input transport. When importing a large
index, things can go wrong, be it connectivity, crashes,
someone forgetting to `screen`, etc. This allows you
to start the dump again from the last known line written
(as logged by the `offset` in the output). Please be
advised that since no sorting is specified when the
dump is initially created, there's no real way to
guarantee that the skipped rows have already been
written/parsed. This is more of an option for when
you want to get most data as possible in the index
without concern for losing some rows in the process,
similar to the `timeout` option.
--inputTransport
Provide a custom js file to us as the input transport
--outputTransport
Provide a custom js file to us as the output transport
--toLog
When using a custom outputTransport, should log lines
be appended to the output stream?
(default: true, except for `$`)
--help
This page
Examples:
# Copy an index from production to staging with mappings:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=mapping
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=data
# Backup index data to a file:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_mapping.json \
--type=mapping
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index.json \
--type=data
# Backup and index to a gzip using stdout:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz
# Backup ALL indices, then use Bulk API to populate another ES cluster:
elasticdump \
--all=true \
--input=http://production-a.es.com:9200/ \
--output=/data/production.json
elasticdump \
--bulk=true \
--input=/data/production.json \
--output=http://production-b.es.com:9200/
# Backup the results of a query to a file
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody '{"query":{"term":{"username": "admin"}}}'
------------------------------------------------------------------------------
Learn more # https://github.com/taskrabbit/elasticsearch-dump`enter code here`
Use ElasticDump
1) yum install epel-release
2) yum install nodejs
3) yum install npm
4) npm install elasticdump
5) cd node_modules/elasticdump/bin
6)
./elasticdump \
--input=http://192.168.1.1:9200/original \
--output=http://192.168.1.2:9200/newCopy \
--type=data
You can use snapshot/restore feature available in Elasticsearch for this. Once you have setup a Filesystem based snapshot store, you can move it around between clusters and restore on a different cluster
There is also the _reindex option
From documentation:
Through the Elasticsearch reindex API, available in version 5.x and later, you can connect your new Elasticsearch Service deployment remotely to your old Elasticsearch cluster. This pulls the data from your old cluster and indexes it into your new one. Reindexing essentially rebuilds the index from scratch and it can be more resource intensive to run.
POST _reindex
{
"source": {
"remote": {
"host": "https://REMOTE_ELASTICSEARCH_ENDPOINT:PORT",
"username": "USER",
"password": "PASSWORD"
},
"index": "INDEX_NAME",
"query": {
"match_all": {}
}
},
"dest": {
"index": "INDEX_NAME"
}
}
I've always had success simply copying the index directory/folder over to the new server and restarting it. You'll find the index id by doing GET /_cat/indices and the folder matching this id is in data\nodes\0\indices (usually inside your elasticsearch folder unless you moved it).
I tried on ubuntu to move data from ELK 2.4.3 to ELK 5.1.1
Following are the steps
$ sudo apt-get update
$ sudo apt-get install -y python-software-properties python g++ make
$ sudo add-apt-repository ppa:chris-lea/node.js
$ sudo apt-get update
$ sudo apt-get install npm
$ sudo apt-get install nodejs
$ npm install colors
$ npm install nomnom
$ npm install elasticdump
in home directory goto
$ cd node_modules/elasticdump/
execute the command
If you need basic http auth, you can use it like this:
--input=http://name:password#localhost:9200/my_index
Copy an index from production:
$ ./bin/elasticdump --input="http://Source:9200/Sourceindex" --output="http://username:password#Destination:9200/Destination_index" --type=data
If you can add the second server to cluster, you may do this:
Add Server B to cluster with Server A
Increment number of replicas for indices
ES will automatically copy indices to server B
Close server A
Decrement number of replicas for indices
This will only work if number of replaces equal to number of nodes.
If anyone encounter the same issue, when trying to dump from elasticsearch <2.0 to >2.0 you need to do:
elasticdump --input=http://localhost:9200/$SRC_IND --output=http://$TARGET_IP:9200/$TGT_IND --type=analyzer
elasticdump --input=http://localhost:9200/$SRC_IND --output=http://$TARGET_IP:9200/$TGT_IND --type=mapping
elasticdump --input=http://localhost:9200/$SRC_IND --output=http://$TARGET_IP:9200/$TGT_IND --type=data --transform "delete doc.__source['_id']"
We can use elasticdump or multielasticdump to take the backup and restore it, We can move data from one server/cluster to another server/cluster.
Please find a detailed answer which I have provided here.
You can take a snapshot of the complete status of your cluster (including all data indices) and restore them (using the restore API) in the new cluster or server.
If you simply need to transfer data from one elasticsearch server to another, you could also use elasticsearch-document-transfer.
Steps:
Open a directory in your terminal and run
$ npm install elasticsearch-document-transfer.
Create a file config.js
Add the connection details of both elasticsearch servers in config.js
Set appropriate values in options.js
Run in the terminal
$ node index.js
i guess that you can copy the folder data.
Another great new tool which uses the _bulk API to reindex data between server is esm:
Download and Install
wget https://github.com/medcl/esm/releases/download/v0.6.1/migrator-linux-amd64
mv migrator-linux-amd64 esm
chmod +x esm
Migrate One Index
Migrate a single index between 2 servers using 40 workers:
./esm -s https://my.source.server.com:9200 \
-m elastic:*** \
-d http://my.destination.server.com:9200 \
-n elastic:*** \
-x myindex \
-w 40
It may be necessary to create your index (or index template) on the destination server first.
See docs for further examples of how to migrate all or multiple indices.
If you don't want to use the elasticdump like a console tool. You can use next node.js script

Resources