I need to set a specific header to fetch an archive from a resource using the wget fetcher, analogous to:
wget --header "PRIVATE-ACCESS-TOKEN:blablablablabla https://some-resource...."
How can I set specific headers using that fetcher?
Thanks in advance!
You can do it in various ways, here are some:
Download the file manually and place it in downloads folder, as mentioned here
Override the do_fetch task:
do_fetch() {
bbnote "Fetching some file ..."
wget ...
But you need to take note that do_unpack uses SRC_URI, so you still gonna need to specify SRC_URI to the file URL for the unpack, example that I test with wget package itself:
SRC_URI = "http://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.gz"
bbwarn "Fetching wget"
wget http://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.gz
After running do_fetch the file gets downloaded in downloads and then do_unpack unpacked it under WORKDIR of the recipe.
Specify your own wget command line for the wget fetcher:
FETCHCMD_wget = "/usr/bin/env wget --header "PRIVATE-ACCESS-TOKEN:blablablablabla""
the default wget command is present in: poky/bitbake/lib/bb/fetch2/wget.py:
self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget -t 2 -T 30 --passive-ftp --no-check-certificate"
For more information check: this link.
I'm trying to run MuTect2 on a sample, which on my machine using java takes about 27 minutes to run.
If I use virtually the same code, but inside Nextflow and using the GATK3:3.6 docker container to run Mutect, it takes 7 minutes longer, for seemingly no apparent reason.
Running on Ubuntu 18.04, the tumor and normal samples are from an Oncomine panel. Tumor is 4.1G, normal is 1.1G. I thought the time might be spent copying in data to the container, but 7-8 minutes seems far too long for that. Could it be from copying in reference files too?
bai_ch is the channel that brings in the tumor and normal index files
process MuTect2 {
label 'mutect'
stageInMode 'copy'
publishDir './output', mode : 'copy', overwrite : true
file tumor_bam_mu from tumor_mu
file normal_bam_mu from normal_mu
file "*" from bai_ch
file mutect2_ref
file ref_index from ref_fasta_i_m
file ref_dict from Channel.fromPath(params.ref_fast_dict)
file regions_file from Channel.fromPath(params.regions)
file cosmic_vcf from Channel.fromPath(params.cosmic_vcf)
file dbsnp_vcf from Channel.fromPath(params.dbsnp_vcf)
file normal_vcf from Channel.fromPath(params.normal_vcf)
file '*' into mutect_ch
echo MuTect2 task path: \$PWD
java -jar /usr/GenomeAnalysisTK.jar \
--analysis_type MuTect2 \
--reference_sequence hg19.fa \
-L designed.bed \
--normal_panel normal_panel.vcf \
--cosmic Cosmic.vcf \
--dbsnp dbsnp.vcf \
--input_file:tumor $tumor_bam_mu \
-o mutect2.somatic.unfiltered.vcf \
--input_file:normal $normal_bam_mu \
--max_alt_allele_in_normal_fraction 0.1 \
--minPruning 10 \
--kmerSize 60
My only thought is to create my own docker that has the reference files handy, which will probably save time for copying them in? I'd expect the nextflow+container version to run only slightly slower than the CLI version.
Check the task Bash wrapper in the task work dir to asses the performance issue.
Is there any way to ruyn the knife vsphere for unattended execution? I have a deploy shell script which I am using to help me:
cat deploy-production-20-vm.sh
# These are machine dependent variables (need to change)
# These are psuedo-environment independent variables (could change)
# These are environment dependent variables (should not change per env)
NETWORK="\"VM Network\""
CLUSTER="ProdCluster01" #knife-vsphere calls this a resource pool
# the magic
VM_CLONE_CMD="knife vsphere vm clone $HOST_NAME \
--template $TEMPLATE \
--cips $IP_ADDRESS \
--vsdc MarkleyDC\
--datastore $DATASTORE \
--cvlan $NETWORK\
--resource-pool $CLUSTER \
--cgw $GATEWAY \
--cdnsips $DNS \
--start true \
--bootstrap true \
--chost $HOST_NAME\
--cdomain my.lan \
Which echos (as a single line):
knife vsphere vm clone dcbsmtest --template "CentOS" --cips
--vsdc MarkleyDC --datastore dcesxds04 --cvlan "VM Network"
--resource-pool ProdCluster01 --cgw
--cdnsips ",," --start true
--bootstrap true --fqdn --chost dcbsmtest --cdomain my.lan
When it runs it outputs:
Cloning template CentOS Template to new VM dcbsmtest
Finished creating virtual machine dcbsmtest
Powered on virtual machine dcbsmtest
Waiting for sshd...done
Doing old-style registration with the validation key at /home/me/chef-repo/.chef/our-validator.pem...
Delete your validation key in order to use your user credentials instead
Connecting to
root#'s password:
If I step away form my desk and it prompts for PWD - then sometimes it times out and the connection is lost and chef doesn't bootstrap. Also I would like to be able to automate all of this to be elastic based on system needs - which won't work with attended execution.
The idea I am going to run with, unless provided a better solution is to have a default password in the template and pass it on the command line to knife, and have chef change the password once the build is complete, minimizing the exposure of a hard coded password in the bash script controlling knife...
Update: I wanted to add that this is working like a charm. Ideally we could have changed the centOs template we were deploying - but it wasn't possible here - so this is a fine alternative (as we changed the root password after deploy anyhow).
I've spent some hours trying to figure out why logrotate won't successfully upload my logs to S3, so I'm posting my setup here. Here's the thing--logrotate uploads the log file correctly to s3 when I force it like this:
sudo logrotate -f /etc/logrotate.d/haproxy
Starting S3 Log Upload...
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
/var/log/haproxy-2014-12-23-044414.gz -> s3://my-haproxy-access-logs/haproxy-2014-12-23-044414.gz [1 of 1]
315840 of 315840 100% in 0s 2.23 MB/s done
But it does not succeed as part of the normal logrotate process. The logs are still compressed by my postrotate script, so I know that it is being run. Here is my setup:
/etc/logrotate.d/haproxy =>
/var/log/haproxy.log {
size 1k
rotate 1
su root root
create 777 syslog adm
/usr/local/admintools/upload.sh 2>&1 /var/log/upload_errors
/usr/local/admintools/upload.sh =>
echo "Starting S3 Log Upload..."
# Perform Rotated Log File Compression
filename=/var/log/haproxy-$(date +%F-%H%M%S).gz \
tar -czPf "$filename" /var/log/haproxy.log.1
# Upload log file to Amazon S3 bucket
/usr/bin/s3cmd put "$filename" s3://"$BUCKET_NAME"
And here is the output of a dry run of logrotate:
sudo logrotate -fd /etc/logrotate.d/haproxy
reading config file /etc/logrotate.d/haproxy
Handling 1 logs
rotating pattern: /var/log/haproxy.log forced from command line (1 rotations)
empty log files are rotated, old logs are removed
considering log /var/log/haproxy.log
log needs rotating
rotating log /var/log/haproxy.log, log->rotateCount is 1
dateext suffix '-20141223'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
renaming /var/log/haproxy.log.1 to /var/log/haproxy.log.2 (rotatecount 1, logstart 1, i 1),
renaming /var/log/haproxy.log.0 to /var/log/haproxy.log.1 (rotatecount 1, logstart 1, i 0),
copying /var/log/haproxy.log to /var/log/haproxy.log.1
truncating /var/log/haproxy.log
running postrotate script
running script with arg /var/log/haproxy.log : "
/usr/local/admintools/upload.sh 2>&1 /var/log/upload_errors
removing old log /var/log/haproxy.log.2
Any insight appreciated.
It turned out that my s3cmd was configured for my user, not for root.
ERROR: /root/.s3cfg: No such file or directory
ERROR: Configuration file not available.
ERROR: Consider using --configure parameter to create one.
Solution was to copy my config file over. – worker1138
How do I move Elasticsearch data from one server to another?
I have server A running Elasticsearch 1.1.1 on one local node with multiple indices.
I would like to copy that data to server B running Elasticsearch 1.3.4
Procedure so far
Shut down ES on both servers and
scp all the data to the correct data dir on the new server. (data seems to be located at /var/lib/elasticsearch/ on my debian boxes)
change permissions and ownership to elasticsearch:elasticsearch
start up the new ES server
When I look at the cluster with the ES head plugin, no indices appear.
It seems that the data is not loaded. Am I missing something?
The selected answer makes it sound slightly more complex than it is, the following is what you need (install npm first on your system).
npm install -g elasticdump
elasticdump --input=http://mysrc.com:9200/my_index --output=http://mydest.com:9200/my_index --type=mapping
elasticdump --input=http://mysrc.com:9200/my_index --output=http://mydest.com:9200/my_index --type=data
You can skip the first elasticdump command for subsequent copies if the mappings remain constant.
I have just done a migration from AWS to Qbox.io with the above without any problems.
More details over at:
Help page (as of Feb 2016) included for completeness:
elasticdump: Import and export tools for elasticsearch
Usage: elasticdump --input SOURCE --output DESTINATION [OPTIONS]
Source location (required)
Source index and type
(default: all, example: index/type)
Destination location (required)
Destination index and type
(default: all, example: index/type)
How many objects to move in bulk per operation
limit is approximate for file streams
(default: 100)
Display the elasticsearch commands being used
(default: false)
What are we exporting?
(default: data, options: [data, mapping])
Delete documents one-by-one from the input as they are
moved. Will not delete the source index
(default: false)
Preform a partial extract based on search results
(when ES is the input,
(default: '{"query": { "match_all": {} } }'))
Output only the json contained within the document _source
Normal: {"_index":"","_type":"","_id":"", "_source":{SOURCE}}
sourceOnly: {SOURCE}
(default: false)
Load/store documents from ALL indexes
(default: false)
Leverage elasticsearch Bulk API when writing documents
(default: false)
Will continue the read/write loop on write error
(default: false)
Time the nodes will hold the requested search in order.
(default: 10m)
How many simultaneous HTTP requests can we process make?
5 [node <= v0.10.x] /
Infinity [node >= v0.11.x] )
The mode can be index, delete or update.
'index': Add or replace documents on the destination index.
'delete': Delete documents on destination index.
'update': Use 'doc_as_upsert' option with bulk update API to do partial update.
(default: index)
Force use of destination index name (the actual output URL)
as destination while bulk writing to ES. Allows
leveraging Bulk API copying data inside the same
elasticsearch instance.
(default: false)
Integer containing the number of milliseconds to wait for
a request to respond before aborting the request. Passed
directly to the request library. If used in bulk writing,
it will result in the entire batch not being written.
Mostly used when you don't care too much if you lose some
data when importing but rather have speed.
Integer containing the number of rows you wish to skip
ahead from the input transport. When importing a large
index, things can go wrong, be it connectivity, crashes,
someone forgetting to `screen`, etc. This allows you
to start the dump again from the last known line written
(as logged by the `offset` in the output). Please be
advised that since no sorting is specified when the
dump is initially created, there's no real way to
guarantee that the skipped rows have already been
written/parsed. This is more of an option for when
you want to get most data as possible in the index
without concern for losing some rows in the process,
similar to the `timeout` option.
Provide a custom js file to us as the input transport
Provide a custom js file to us as the output transport
When using a custom outputTransport, should log lines
be appended to the output stream?
(default: true, except for `$`)
This page
# Copy an index from production to staging with mappings:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
# Backup index data to a file:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_mapping.json \
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index.json \
# Backup and index to a gzip using stdout:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz
# Backup ALL indices, then use Bulk API to populate another ES cluster:
elasticdump \
--all=true \
--input=http://production-a.es.com:9200/ \
elasticdump \
--bulk=true \
--input=/data/production.json \
# Backup the results of a query to a file
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody '{"query":{"term":{"username": "admin"}}}'
Learn more # https://github.com/taskrabbit/elasticsearch-dump`enter code here`
Use ElasticDump
1) yum install epel-release
2) yum install nodejs
3) yum install npm
4) npm install elasticdump
5) cd node_modules/elasticdump/bin
./elasticdump \
--input= \
--output= \
You can use snapshot/restore feature available in Elasticsearch for this. Once you have setup a Filesystem based snapshot store, you can move it around between clusters and restore on a different cluster
There is also the _reindex option
From documentation:
Through the Elasticsearch reindex API, available in version 5.x and later, you can connect your new Elasticsearch Service deployment remotely to your old Elasticsearch cluster. This pulls the data from your old cluster and indexes it into your new one. Reindexing essentially rebuilds the index from scratch and it can be more resource intensive to run.
POST _reindex
"source": {
"remote": {
"username": "USER",
"password": "PASSWORD"
"index": "INDEX_NAME",
"query": {
"match_all": {}
"dest": {
"index": "INDEX_NAME"
I've always had success simply copying the index directory/folder over to the new server and restarting it. You'll find the index id by doing GET /_cat/indices and the folder matching this id is in data\nodes\0\indices (usually inside your elasticsearch folder unless you moved it).
I tried on ubuntu to move data from ELK 2.4.3 to ELK 5.1.1
Following are the steps
$ sudo apt-get update
$ sudo apt-get install -y python-software-properties python g++ make
$ sudo add-apt-repository ppa:chris-lea/node.js
$ sudo apt-get update
$ sudo apt-get install npm
$ sudo apt-get install nodejs
$ npm install colors
$ npm install nomnom
$ npm install elasticdump
in home directory goto
$ cd node_modules/elasticdump/
execute the command
If you need basic http auth, you can use it like this:
Copy an index from production:
$ ./bin/elasticdump --input="http://Source:9200/Sourceindex" --output="http://username:password#Destination:9200/Destination_index" --type=data
If you can add the second server to cluster, you may do this:
Add Server B to cluster with Server A
Increment number of replicas for indices
ES will automatically copy indices to server B
Close server A
Decrement number of replicas for indices
This will only work if number of replaces equal to number of nodes.
If anyone encounter the same issue, when trying to dump from elasticsearch <2.0 to >2.0 you need to do:
elasticdump --input=http://localhost:9200/$SRC_IND --output=http://$TARGET_IP:9200/$TGT_IND --type=analyzer
elasticdump --input=http://localhost:9200/$SRC_IND --output=http://$TARGET_IP:9200/$TGT_IND --type=mapping
elasticdump --input=http://localhost:9200/$SRC_IND --output=http://$TARGET_IP:9200/$TGT_IND --type=data --transform "delete doc.__source['_id']"
We can use elasticdump or multielasticdump to take the backup and restore it, We can move data from one server/cluster to another server/cluster.
Please find a detailed answer which I have provided here.
You can take a snapshot of the complete status of your cluster (including all data indices) and restore them (using the restore API) in the new cluster or server.
If you simply need to transfer data from one elasticsearch server to another, you could also use elasticsearch-document-transfer.
Open a directory in your terminal and run
$ npm install elasticsearch-document-transfer.
Create a file config.js
Add the connection details of both elasticsearch servers in config.js
Set appropriate values in options.js
Run in the terminal
$ node index.js
i guess that you can copy the folder data.
Another great new tool which uses the _bulk API to reindex data between server is esm:
Download and Install
wget https://github.com/medcl/esm/releases/download/v0.6.1/migrator-linux-amd64
mv migrator-linux-amd64 esm
chmod +x esm
Migrate One Index
Migrate a single index between 2 servers using 40 workers:
./esm -s https://my.source.server.com:9200 \
-m elastic:*** \
-d http://my.destination.server.com:9200 \
-n elastic:*** \
-x myindex \
-w 40
It may be necessary to create your index (or index template) on the destination server first.
See docs for further examples of how to migrate all or multiple indices.
If you don't want to use the elasticdump like a console tool. You can use next node.js script