I am trying to import index from json file to elasticsearch server but it is failing.
Specifications:
elasticsearch : 4.10.3
elasticdump : 2.4.2
command I am using
elasticdump --input=/home/ubuntu/Files/stocks.json --output=http://localhost:9200/ --type=data`
My stocks.json file looks like
{"_index":"stocks","_type":"stock","_id":"AVhKm5L8FPDye23IuJqe","_score":1,"_source":{"name":"Sun Pharmaceutical Industries Ltd.","industry":"PHARMA","isin":"INE044A01036","symbol":"SUNPHARMA","tweet":"sun pharma' OR 'SUNPHARMA'"}}
{"_index":"stocks","_type":"stock","_id":"AVhKm5L8FPDye23IuJqV","_score":1,"_source":{"name":"Tata Steel Ltd.","industry":"METALS","isin":"INE081A01012","symbol":"TATASTEEL","tweet":"tata steel' OR 'TATASTEEL'"}}
{"_index":"stocks","_type":"stock","_id":"AVhKm5L7FPDye23IuJp2","_score":1,"_source":{"name":"ICICI Bank Ltd.","industry":"FINANCIAL SERVICES","isin":"INE090A01021","symbol":"ICICIBANK","tweet":"icici bank' OR 'ICICIBANK'"}}
I am getting following message
Sat, 07 Oct 2017 05:46:52 GMT | starting dump
Sat, 07 Oct 2017 05:46:52 GMT | got 100 objects from source file
(offset: 0)
Sat, 07 Oct 2017 05:46:52 GMT | sent 100 objects to destination
elasticsearch, wrote 0
Sat, 07 Oct 2017 05:46:52 GMT | got 0 objects from source file
(offset: 100)
Sat, 07 Oct 2017 05:46:52 GMT | Total Writes: 0
Sat, 07 Oct 2017 05:46:52 GMT | dump complete
I had used same json file before but somehow this is not working in this new server. I have installed elasticsearch, node recently in this server.
Thanks for help
J
Related
I'm using elasticdump and got weird error
Mon, 14 Nov 2022 14:42:21 GMT | starting dump
Mon, 14 Nov 2022 14:42:22 GMT | got 10 objects from source elasticsearch (offset: 0)
Mon, 14 Nov 2022 14:42:22 GMT | sent 10 objects to destination file, wrote 10
Mon, 14 Nov 2022 14:42:22 GMT | Error Emitted => This and all future requests should be directed to the given URI.
Mon, 14 Nov 2022 14:42:22 GMT | Error Emitted => This and all future requests should be directed to the given URI.
Mon, 14 Nov 2022 14:42:22 GMT | Total Writes: 0
Mon, 14 Nov 2022 14:42:22 GMT | dump ended with error (get phase) => MOVED_PERMANENTLY: This and all future requests should be directed to the given URI.
It successfully moved 10 objects and stopped
--input-index is for a different use case.
Try with just --input like this
elasticdump --input=http://localhost/dev_index --output=test2.json
I have a text file from which I need to read the lines based on last matching condition. e.g read all the line till the end of file after last occurrence of specific word or string.
Sample file:
2016 Jun 01 13:48:46:590 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300006 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating
2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:48:46:590 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300006 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating
2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01
2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64
2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet started
From above file I want to read all the lines after last occurrence of the string COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating
Expected Output:
2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01
2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64
2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet started
Try this :-
def file = new File("file.txt")
def index = file.findLastIndexOf {it =~ "COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating" }
def lines = file.readLines()
lines[(index+1)..(lines.size()-1)].each { println it }
Output :-
2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01
2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64
2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COM PLEXITY_CALCULATOR-GenerateComplexitySheet started
Hope it will help you..:)
Regardless of language, there are two algorithms that can achieve this:
First:
initialise a temporary store (memory or temp file)
open input
while(read line) {
if(line matches search pattern) {
clear temp store
}
write line to temp store
}
copy temp store to output
Second:
open input
while(read line) {
if(line matches search pattern) {
store line number in variable
}
}
close input
open input again
read until stored line number
read / write until end
The first option has the advantage that it works with piped input, where you cannot reopen the input at the start. But it has the disadvantage that you have to store output lines somewhere temporary until you reach the final line of input.
The second option has the advantage that it only ever holds one line of input in memory at a time. It has the disadvantage that it can never work with a source of input it can't re-open from the beginning.
You should be able to implement either of these fairly easily, either in Groovy or shell.
In shell you can cobble together a version of the second algorithm, if the input is a file:
tail --lines=+$(grep -n pattern input.txt | tail -1 | cut -d: -f1) input.txt
Here we're using grep -n to find the matching lines (with line numbers), tail -1 to pick the last one, cut to extract the line number, and tail --lines=+n to write those lines to stdout.
Since you posted this on groovy tag, I expect you can use groovy from the shell. I wrote the following script which works. Though it iterates through the file twice, it will work on a stream-y way and won't blow up your memory:
f = new File("sample.txt")
def lastIndex
f.eachLine { line, index ->
if (line.contains("GenerateComplexitySheet terminating")) {
lastIndex = index + 1
}
}
new File("out.txt").with {
write ""
withWriter { writer ->
f.eachLine { line, index ->
if (index >= lastIndex) {
writer.writeLine line
}
}
}
assert text == '''2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24
2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01
2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64
2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet started
'''
}
I use an apache storm topology on a cluster of 8+1 machines. The date on these machines is not the same and we may have more than 5 minutes of difference.
preprod-storm-nimbus-01:
Thu Feb 25 16:20:30 GMT 2016
preprod-storm-supervisor-01:
Thu Feb 25 16:20:32 GMT 2016
preprod-storm-supervisor-02:
Thu Feb 25 16:20:32 GMT 2016
preprod-storm-supervisor-03:
Thu Feb 25 16:14:54 UTC 2016 <<-- this machine is very late :(
preprod-storm-supervisor-04:
Thu Feb 25 16:20:31 GMT 2016
preprod-storm-supervisor-05:
Thu Feb 25 16:20:17 GMT 2016
preprod-storm-supervisor-06:
Thu Feb 25 16:20:00 GMT 2016
preprod-storm-supervisor-07:
Thu Feb 25 16:20:31 GMT 2016
preprod-storm-supervisor-08:
Thu Feb 25 16:19:55 GMT 2016
preprod-storm-supervisor-09:
Thu Feb 25 16:20:30 GMT 2016
Question:
Is the storm topology affected by this non-synchronization?
Note: I know that synchronizing is better, but the sysadmins won't do it without proving them proofs/reasons that they have to do it. Do they really have to do it, "for the topology's sake" :) ?
Thanks
It depends on the computation you are doing... It might have an effect on your result if you do time based window operations. Otherwise, it doesn't matter.
For Storm as an execution engine it has no effect at all.
I am using elasticdump to dump data from local machine to the server. But my dumps always ended with this error:
...
Tue, 20 Oct 2015 22:56:35 GMT | sent 100 objects to destination elasticsearch, wrote 100
Tue, 20 Oct 2015 22:56:35 GMT | got 100 objects from source elasticsearch (offset: 21200)
Tue, 20 Oct 2015 22:56:36 GMT | Error Emitted => read ECONNRESET
Tue, 20 Oct 2015 22:56:36 GMT | Total Writes: 21200
Tue, 20 Oct 2015 22:56:36 GMT | dump ended with error (set phase) => Error: read ECONNRESET
...
How should I solve this problem?
Is there a better way to dump data from local machine to the server? Thanks in advance!
It sounds like your issue is being caused by the elasticdump opening too many sockets to your elasticsearch cluster. You can use the --maxSockets option to limit the number of sockets opened.
For example:
$ elasticdump --input http://192.168.2.222:9200/index1 --output http://192.168.2.222:9200/index2 --type=data --maxSockets=5
You can find a detailed explanation of the issue here:
https://github.com/taskrabbit/elasticsearch-dump/issues/98
there are 200 records,for example:
[
{time:"Thu Nov 07 2013 13:09:08",value:"10"},
{time:"Thu Nov 07 2013 11:09:08",value:"30"},
{time:"Thu Nov 07 2013 11:09:08",value:"25"},
....more
{time:"Thu Nov 06 2013 10:09:08",value:"65"},
{time:"Tue Aug 06 2013 16:54:31",value:"25"},
{time:"Tue Aug 06 2013 16:54:31",value:"45"},
]
there are one or two unique recodes that the time is too early.
when i draw a line use time as xAxis, beacuse of the unique recodes ( {time:"Tue Aug 06 2013 16:54:31",value:"25"},{time:"Tue Aug 06 2013 16:54:31",value:"45"}) ,the line has a blank at between Nov to Aug .
how can i deal with the records what is unique and several .
Any help is appreciated.
Just add an unique id field to your data if you want to draw both or remove one (and display an average value or whatever is suitable)