Elasticsearch: scroll between specified time frame - elasticsearch

I have some data in elasticsearch. as shown in the image
I used below link example to do the scrolling
https://gist.github.com/drorata/146ce50807d16fd4a6aa
page = es.search(
index = INDEX_NAME,
scroll = '1m',
size = 1000,
body={"query": {"match_all": {}}})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
print( "Scrolling...")
while (scroll_size > 0):
print("Page: ",count)
page = es.scroll(scroll_id = sid, scroll = '10m')
# Update the scroll ID
sid = page['_scroll_id']
for hit in page['hits']['hits']:
#some code processing here
Currently my requirement is that i want to scroll but want to specify the start timestamp and end timestamp
Need help as to how to do this using scroll.

Simply replace
body={"query": {"match_all": {}}})
by
body={"query": {"range": {"timestamp":{"gte":"2018-08-05T05:30:00Z", "glte":"2018-08-06T05:30:00Z"}}}})

example code. time range should be in es query. Also You should process the first query result.
es_query_dict = {"query": {"range": {"timestamp":{
"gte":"2018-08-00T00:00:00Z", "lte":"2018-08-17T00:00:00Z"}}}}
def get_es_logs():
es_client = Elasticsearch([source_es_ip], port=9200, timeout=300)
total_docs = 0
page = es_client.search(scroll=scroll_time,
size=scroll_size,
body=json.dumps(es_query_dict))
while True:
sid = page['_scroll_id']
details = page["hits"]["hits"]
doc_count = len(details)
if len(details) > 0:
total_docs += doc_count
print("scroll size: " + str(doc_count))
print("start bulk index docs")
# index_bulk(details)
print("end success")
else:
break
page = es_client.scroll(scroll_id=sid, scroll=scroll_time)
print("total docs: " + str(total_docs))

Also have a look at elasticsearch.helpers.scan where you already have the loop logic implemented for you, just pass it query={"query": {"range": {"timestamp": {"gt": ..., "lt": ...}}}}

Related

for loop and same context (doxctpl)

I have made a code to generate the word document report and even with the for loop I end up getting multiple documents only difference in the image, meaning all other data such as title and volume and rate and price are same for all documents.
I used doxctpl , Docxtemplate for coding,
I created the template word doc with image and words.
then, I tried to change the words context first then change images in the coding.
for i in csv: #csv file has multiple columns named title, volume, rate, price info)
number = number + 1
DEST_FILE = "dir/auto_" + str(number) + ".docx" # to save individual doc file
Title = products[0].product_title
Volume = products[0].lastest_volume
Rate = products[0].evaluate_rate
Price = products[0].sale_price
context = {"Title": Title, "Volume": Volume, "Rate": Rate, "Price": Price}
print(context)
for file in files:
old_im = 'dir.media_to_paste.jpg'
new_im = f"image/{file}"
tpl.replace_media(old_im, new_im)
tpl.render(context)
tpl.save(DEST_FILE)
I changed the image to change first, but the result are same.
Results show as
auto1.docx
Image1 + Title 1, Volume 1....
auto2. docx
Image2 + Tilte 1, Volume 1 ....
for i in op[1:]:
number = number + 1
Title = i.split(",")[0]
Volume =i.split(",")[1]
Rate = i.split(",")[2]
Price = i.split(",")[3]
Currency = i.split(",")[4]
old_im = 'dir.media_to_paste.jpg'
new_im = f"image/{file}"
tpl.replace_media(old_im, new_im)
doc.render(context)
fixed

How do I get historical candlestick data or kline from Phemex Public API?

I need to be able to extract historical candlestick data (such as Open, Close, High, Low, and Volume) of a candlestick in differing intervals (1m, 3m, 5m, 1H, etc.) at a specified time (timestamps) from Phemex.
Other exchanges, such as Binance or FTX, seem to provide REST Websocket API for this, yet I can't seem to find one for Phemex. Mind helping me resolve this issue? Thank you so much.
Steps I have taken, yet found no resolution:
Went to https://phemex.com/user-guides/api-overview
Went to https://github.com/phemex/phemex-api-docs/blob/master/Public-Contract-API-en.md
None of the items listed in 'Market Data API List' seem to do the task
This code will get the candels and save them to a csv file. Hope this helps:)
exchange = ccxt.phemex({
'options': { 'defaultType': 'swap' },
'enableRateLimit': True
})
# Load the markets
markets = exchange.load_markets()
curent_time = int(time.time()*1000)
one_min = 60000
def get_all_candels(symbol,start_time,stop_time):
counter = 0
candel_counter = 0
data_set = []
t = 0
while t < stop_time:
if data_set == []:
block = exchange.fetch_ohlcv(symbol,'1m',start_time)
for candle in block:
if candle == []:
break
data_set.append(candle)
last_time_in_block = block[-1][0]
counter += 1
candel_counter += len(block)
print(f'{counter} - {block[0]} - {candel_counter} - {last_time_in_block}')
if data_set != []:
t = last_time_in_block + one_min
block = exchange.fetch_ohlcv(symbol,'1m',t)
if block == []:
break
for candle in block:
if candle == []:
break
data_set.append(candle)
last_time_in_block = block[-1][0]
candel_counter += len(block)
counter += 1
print(f'{counter} - {block[0]} - {candel_counter} - {last_time_in_block}')
time.sleep(1)
return data_set
data_set = get_all_candels('BTCUSD',1574726400000,curent_time)
print(np.shape(data_set))
with open('raw.csv', 'w', newline='') as csv_file:
column_names = ['time', 'open', 'high', 'low', 'close', 'volume']
csv_writer = csv.DictWriter(csv_file,fieldnames=column_names)
csv_writer.writeheader()
for candel in data_set:
csv_writer.writerow({
'time':candel[0],
'open':candel[1],
'high':candel[2],
'low':candel[3],
'close':candel[4],
'volume':candel[5]
})

multisearch give diffrent total hits in different runs

I'm using Elasticsearch 6.8 and python 3.
I'm running on my laptop with 1 node, and there are no threads/processes that insert/update/delete docs into the index while I'm running the multi search
I'm running the following multi search command:
es = Elasticsearch()
search_arr = []
# search-1
search_arr.append({'index': 'test1', 'type': 'type1'})
search_arr.append({"query": {"term": {"confidence": "1"}}})
# search-2
search_arr.append({'index': 'test1', 'type': 'type1'})
search_arr.append({"query": {"match_all": {}}, 'from': 0, 'size': 2})
request = ''
for each in search_arr:
request += '%s \n' % json.dumps(each)
res = es.msearch(body=request)
print("First Query, num of results = ", res['responses'][0]['hits']['total'])
print("Second Query, num of results = ", res['responses'][1]['hits']['total'])
Each time I run this code, I'm getting different results (as I wrote before, there are no processes that insert/delete/update documents)
Why I'm getting different result each time ?
And what I need to do in order to fix it and get consistent results ?
I found the problem cause.
I needed to add refresh = True after the first time I added new data.
(I bulked thousands of new documents and right after use the multi search).

Google Analytics API ruby client - multiple metrics

I'm using the google API ruby client and I want to implement some more complex analytics queries such as suggested in this document
https://developers.google.com/analytics/devguides/reporting/core/v3/common-queries
This document suggests that metrics can be supplied as a comma delimited string of multiple metrics but the API client only accepts an expression.
How can I query on multiple metrics in a single query? The ruby client appears only to accept an expression which generally consists of a single metric such as sessions or pageviews like this:
metric = Google::Apis::AnalyticsreportingV4::Metric.new(expression: 'ga:sessions')
If I remove "expression" and enter a list of metrics I just get an error.
Invalid value 'ga:sessions;ga:pageviews' for metric parameter.
Here is my solution, together with a generic method for reporting Google Analytics data:
This answer should be read in conjunction with https://developers.google.com/drive/v3/web/quickstart/ruby
analytics = Google::Apis::AnalyticsreportingV4::AnalyticsReportingService.new
analytics.client_options.application_name = APPLICATION_NAME
analytics.authorization = authorize
def get_analytics_data( analytics,
view_id,
start_date: (Date.today + 1 - Date.today.wday) - 6,
end_date: (Date.today + 1 - Date.today.wday),
metrics: ['ga:sessions'],
dimensions: [],
order_bys: [],
segments: nil, # array of hashes
filter: nil,
page_size: nil )
get_reports_request_object = Google::Apis::AnalyticsreportingV4::GetReportsRequest.new
report_request_object = Google::Apis::AnalyticsreportingV4::ReportRequest.new
report_request_object.view_id = view_id
analytics_date_range_object = Google::Apis::AnalyticsreportingV4::DateRange.new
analytics_date_range_object.start_date = start_date
analytics_date_range_object.end_date = end_date
report_request_object.date_ranges = [analytics_date_range_object]
# report_request_metrics = []
report_request_object.metrics = []
metrics.each { |metric|
analytics_metric_object = Google::Apis::AnalyticsreportingV4::Metric.new
analytics_metric_object.expression = metric
report_request_object.metrics.push(analytics_metric_object) }
# report_request_object.metrics = report_request_metrics
unless dimensions.empty?
report_request_object.dimensions = []
dimensions.each { |dimension|
analytics_dimension_object = Google::Apis::AnalyticsreportingV4::Dimension.new
analytics_dimension_object.name = dimension
report_request_object.dimensions.push(analytics_dimension_object) }
end
unless segments.nil?
report_request_object.segments = []
analytics_segment_object = Google::Apis::AnalyticsreportingV4::Segment.new
analytics_dynamic_segment_object = Google::Apis::AnalyticsreportingV4::DynamicSegment.new
analytics_segment_definition_object = Google::Apis::AnalyticsreportingV4::SegmentDefinition.new
analytics_segment_filter_object = Google::Apis::AnalyticsreportingV4::SegmentFilter.new
analytics_simple_segment_object = Google::Apis::AnalyticsreportingV4::SimpleSegment.new
analytics_or_filters_for_segment_object = Google::Apis::AnalyticsreportingV4::OrFiltersForSegment.new
analytics_segment_filter_clause_object = Google::Apis::AnalyticsreportingV4::SegmentFilterClause.new
analytics_segment_metric_filter_object = Google::Apis::AnalyticsreportingV4::SegmentMetricFilter.new
analytics_dimension_object = Google::Apis::AnalyticsreportingV4::Dimension.new
analytics_dimension_object.name = 'ga:segment'
report_request_object.dimensions.push(analytics_dimension_object)
analytics_or_filters_for_segment_object.segment_filter_clauses = []
analytics_simple_segment_object.or_filters_for_segment = []
analytics_segment_definition_object.segment_filters = []
segments.each { |segment|
analytics_segment_metric_filter_object.metric_name = segment[:metric_name]
analytics_segment_metric_filter_object.comparison_value = segment[:comparison_value]
analytics_segment_metric_filter_object.operator = segment[:operator]
analytics_segment_filter_clause_object.metric_filter = analytics_segment_metric_filter_object
analytics_or_filters_for_segment_object.segment_filter_clauses.push(analytics_segment_filter_clause_object)
analytics_simple_segment_object.or_filters_for_segment.push(analytics_or_filters_for_segment_object)
analytics_segment_filter_object.simple_segment = analytics_simple_segment_object
analytics_segment_definition_object.segment_filters.push(analytics_segment_filter_object)
analytics_dynamic_segment_object.name = segment[:name]
analytics_dynamic_segment_object.session_segment = analytics_segment_definition_object
analytics_segment_object.dynamic_segment = analytics_dynamic_segment_object
report_request_object.segments.push(analytics_segment_object) }
end
unless order_bys.empty?
report_request_object.order_bys = []
order_bys.each { |orderby|
analytics_orderby_object = Google::Apis::AnalyticsreportingV4::OrderBy.new
analytics_orderby_object.field_name = orderby
analytics_orderby_object.sort_order = 'DESCENDING'
report_request_object.order_bys.push(analytics_orderby_object)}
end
unless filter.nil?
report_request_object.filters_expression = filter
end
unless page_size.nil?
report_request_object.page_size = page_size
end
get_reports_request_object.report_requests = [report_request_object]
response = analytics.batch_get_reports(get_reports_request_object)
end
If using dimensions, you can report data like this:
response = get_analytics_data(analytics, VIEW_ID, metrics: ['ga:pageviews'], dimensions: ['ga:pagePath'], order_bys: ['ga:pageviews'], page_size: 25)
response.reports.first.data.rows.each do |row|
puts row.dimensions
puts row.metrics.first.values.first.to_i
puts
end

Google Search Console API not returning enough data

https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query
Google Analytics tells me ~20k pages were visited from Google search, but the Google Search Console API returns just under 5k urls. Changing startRow doesn't help.
What's really odd is I've connected Google Search Console to Google Analytics, and when viewing GSC data in GA (Acquisition -> Search Console -> Landing Pages) the GSC data there also gives me ~20k rows.
How do I get all ~20k rows out of the Google Search Console API?
date_str = '2017-12-20'
start_index = 0
row_limit = 5000
next_index = 5000
rows = []
while next_index == row_limit:
req = webmasters_service.searchanalytics().query(
siteUrl='https://tenor.com/',
fields='responseAggregationType,rows',
body={
"startDate": date_str,
"endDate": date_str,
"searchType": search_type,
"dimensions": [
"page",
],
"rowLimit": row_limit,
"startRow": start_index,
},
)
try:
resp = req.execute()
next_rows = resp['rows']
rows += next_rows
next_index = len(next_rows)
start_index += next_index
except Exception as e:
print(e)
break
return rows
For anyone else viewing this post:
When I view 'Search Results' under 'Performance' in the sidebar of the google search console webpage, at the end of the url there is a variable 'resource_id=sc-domain%3example.com'. I recommend using this resource_id variable as your siteURL.

Resources