I am trying to get a collection of tasks that are in a specific team and completed between certain dates such as last month. I see in the API docs that there is a task.find_all but that seems to give everything after a certain completion date and then it even gives incomplete tasks.
From ruby-asana docs:
.find_all(client, assignee: nil, workspace: nil, completed_since: nil, >modified_since: nil, per_page: 20, options: {}) ⇒ Object
Returns the compact task records for some filtered set of tasks.
From Asana API:
completed_since '2012-02-22T02:06:58.158Z'
Only return tasks that are either incomplete or that have been completed since this time.
It seems like I would need to get tasks completed after a certain date and then iterate and select those that have a completion date before my end date and are completed.
Is there a better way?
Apologies for the delayed answer, but yes, that is currently the way to do it.
Related
I'm using AWS Lambda to do a delete_by_query on an Elasticsearch index so I get rid of everything older than 7 days. That works, but I noticed that the count of the documents is the same before and after, so if I were to run a query in Elasticsearch I may not get correct results until the delete_by_query is completed.
I found this post (python 3.x - Right way to delete and then reindex ES documents - Stack Overflow) that states that it is "best to set wait_for_completion to False. In this case you'll get task details and will be able to track task progress." For one, I haven't found anything that states why this is the case, unless your delete takes 4 hours like that example.
I found code to determine if the delete_by_query is still running at this wonderful site here and tried:
es_client.tasks(detailed=True,actions="*/delete/byquery")
However, I'm getting the message that
'TasksClient' object is not callable.
I am not entirely sure if that is true or not , or if my syntax is incorrect and thus that is why it is not working. It doesn't make sense that I can't programmatically query Tasks with python if I can do it in the console and with curl.
If it is not good to set wait_for_completion to False, and I can't query this with Python, how am I to programmatically get any information about the task or an understanding as to whether I can go ahead with the analytical queries or whatever else I want to do that depends on this task being done?
Okay, I'm not entirely sure why you are getting that error, so I can't help with that in particular. But, I noticed that the python elasticsearch documentation on how to get the task id from the delete_by_query when wait_for_completion is set to false isn't very clear, so I'm going to provide this in case it helps.
from elasticsearch import Elasticsearch
es = Elasticsearch()
response = es.delete_by_query(index=someIndex, body=someQuery, wait_for_completion=False)
# get task id
print(response['task'])
Hope that helps!
Suppose I have the following Server data model:
Server
-> created_at Timestamp
-> last_ping Timestamp
A "stale" Server is defined as a Server whose last_ping occurred more than one hour ago (i.e., last_ping < Time.now - 1 hour). It should be destroyed if there exists another non-stale server that has come online (created_at) within one hour of the last_ping of the stale server.
How can I find all the Servers that should be destroyed? What would a query look like for this?
Something like…
def clean_stale_servers
return unless Server.exists?(last_ping: 1.hour..)
Server.where(last_ping: ...1.hour.ago)
.destroy_all # .delete_all is faster, use that if possible
end
Then you can call the clean_stale_servers method periodically, i.e. from a cronjob.
I am using Elasticsearch version 5.6.10. I have a query that deletes records for a given agency, so they can later be updated by a nightly script.
The query is in elasticsearch-dsl and look like this:
def remove_employees_from_search(jurisdiction_slug, year):
s = EmployeeDocument.search()
s = s.filter('term', year=year)
s = s.query('nested', path='jurisdiction', query=Q("term", **{'jurisdiction.slug': jurisdiction_slug}))
response = s.delete()
return response
The problem is I am getting a ConflictError exception when trying to delete the records via that function. I have read this occurs because the documents were different between the time the delete process started and executed. But I don't know how this can be, because nothing else is modifying the records during the delete process.
I am going to add s = s.params(conflicts='proceed') in order to silence the exception. But this is a band-aid as I do not understand why the delete is not processing as expected. Any ideas on how to troubleshoot this? A snapshot of the error is below:
ConflictError:TransportError(409,
u'{
"took":10,
"timed_out":false,
"total":55,
"deleted":0,
"batches":1,
"version_conflicts":55,
"noops":0,
"retries":{
"bulk":0,
"search":0
},
"throttled_millis":0,
"requests_per_second":-1.0,
"throttled_until_millis":0,
"failures":[
{
"index":"employees",
"type":"employee_document",
"id":"24681043",
"cause":{
"type":"version_conflict_engine_exception",
"reason":"[employee_document][24681043]: version conflict, current version [5] is different than the one provided [4]",
"index_uuid":"G1QPF-wcRUOCLhubdSpqYQ",
"shard":"0",
"index":"employees"
},
"status":409
},
{
"index":"employees",
"type":"employee_document",
"id":"24681063",
"cause":{
"type":"version_conflict_engine_exception",
"reason":"[employee_document][24681063]: version conflict, current version [5] is different than the one provided [4]",
"index_uuid":"G1QPF-wcRUOCLhubdSpqYQ",
"shard":"0",
"index":"employees"
},
"status":409
}
You could try making it do a refresh first
client.indices.refresh(index='your-index')
source https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#_indices_refresh
First, this is a question that was asked 2 years ago, so take my response with a grain of salt due to the time gap.
I am using the javascript API, but I would bet that the flags are similar. When you index or delete there is a refresh flag which allows you to force the index to have the result appear to search.
I am not an Elasticsearch guru, but the engine must perform some systematic maintenance on the indices and shards so that it moves the indices to a stable state. It's probably done over time, so you would not necessarily get an immediate state update. Furthermore, from personal experience, I have seen when delete does not seemingly remove the item from the index. It might mark it as "deleted", give the document a new version number, but it seems to "stick around" (probably until general maintenance sweeps run).
Here I am showing the js API for delete, but it is the same for index and some of the other calls.
client.delete({
id: string,
index: string,
type: string,
wait_for_active_shards: string,
refresh: 'true' | 'false' | 'wait_for',
routing: string,
timeout: string,
if_seq_no: number,
if_primary_term: number,
version: number,
version_type: 'internal' | 'external' | 'external_gte' | 'force'
})
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#_delete
refresh
'true' | 'false' | 'wait_for' - If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes.
For additional reference, here is the page on Elasticsearch refresh info and what might be a fairly relevant blurb for you.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
Use the refresh API to explicitly refresh one or more indices. If the request targets a data stream, it refreshes the stream’s backing indices. A refresh makes all operations performed on an index since the last refresh available for search.
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. You can change this default interval using the index.refresh_interval setting.
Assuming I have a long running update query where I am updating ~200k to 500k, perhaps even more.Why I need to update so many documents is beyond the scope of the question.
Since the client times out (I use the official ES python client), I would like to have a way to check what the status of the bulk update request is, without having to use enormous timeout values.
For a short request, the response of the request can be used, is there a way I can get the response of the request as well or if I can specify a name or id to a request so as to reference it later.
For a request which is running : I can use the tasks API to get the information.
But for other statuses - completed / failed, how do I get it.
If I try to access a task which is already completed, I get resource not found .
P.S. I am using update_by_query for the update
With the task id you can look up the task directly:
GET /_tasks/taskId:1
The advantage of this API is that it integrates with
wait_for_completion=false to transparently return the status of
completed tasks. If the task is completed and
wait_for_completion=false was set on it them it’ll come back with a
results or an error field. The cost of this feature is the document
that wait_for_completion=false creates at .tasks/task/${taskId}. It is
up to you to delete that document.
From here https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html#docs-update-by-query-task-api
My use case went like this, I needed to do an update_by_query and I used painless as the script language. At first I did a reindex (when testing). Then I tried using the update_by_query functionality (they resemble each other a lot). I did a request to the task api (the operation hasn't finished of course) and I saw the task being executed. When it finished I did a query and the data of the fields that I was manipulating had disappeared. The script worked since I used the same script for the reindex api and everything went as it should have. I didn't investigate further because of lack of time, but... yeah, test thoroughly...
I feel GET /_tasks/taskId:1 confusing to understand. It should be
GET http://localhost:9200/_tasks/taskId
A taskId looks something like this NCvmGYS-RsW2X8JxEYumgA:1204320.
Here is my trivial explanation related to this topic.
To check a task, you need to know its taskId.
A task id is a string that consists of node_id, a colon, and a task_sequence_number. An example is taskId = NCvmGYS-RsW2X8JxEYumgA:1204320 where node_id = NCvmGYS-RsW2X8JxEYumgA and task_sequence_number = 1204320. Some people including myself thought taskId = 1204320, but that's not the way how the elasticsearch codebase developers understand it at this moment.
A taskId can be found in two ways.
wait_for_deletion = false. When sending a request to ES, with this parameter, the response will be {"task" : "NCvmGYS-RsW2X8JxEYumgA:1204320"}. Then, you can check a status of that task like this GET http://localhost:9200/_tasks/NCvmGYS-RsW2X8JxEYumgA:1204320
GET http://localhost:9200/_tasks?detailed=false&actions=*/delete/byquery. This example will return you the status of all tasks with action = delete_by_query. If you know there is only one task running on ES, you can find your taskId from the response of all running tasks.
After you know the taskId, you can get the status of a task with this.
GET /_tasks/taskId
Notice you can only check the status of a task when the task is running, or a task is generated with wait_for_deletion == false.
More trivial explanation, wait_for_deletion by default is true. Based on my understanding, tasks with wait_for_deletion = true are "in-memory" only. You can still check the status of a task while it's running. But it's completely gone after it is completed/canceled. Meaning checking the status will return you a 'resouce_not_found_exception'. Tasks with wait_for_deletion = false will be stored in an ES system index .task. You can still check it's status after it finishes. However, you might want to delete this task document from .task index after you are done with it to save some space. The deletion request looks like this
http://localhost:9200/.tasks/task/NCvmGYS-RsW2X8JxEYumgA:1204320
You will receive resouce_not_found_exception if a taskId is not present. (for example, you deleted some task twice, or you are deleting an in-memory task, whose wait_for_deletetion == true).
About this confusing taskId thing, I made a pull request https://github.com/elastic/elasticsearch/pull/31122 to help clarify the Elasticsearch document. Unfortunately, they rejected it. Ugh.
I was using the Hipchat API (v2) a bit today and ran into an odd issue where I was not able to really pull up all of the history for a room. It seemed as though when I queried a specific date, for example, it would only retrieve a fraction of the history for that date given. I had had plans to simply iterate across all of the dates for a Room to extract the history in a format that I could use, but ended up hitting this and am now unsure if it is really possible to pull out the history fully.
I realize that this is a bit clunky. It is pulling the JSON as a string and then I have to form it into a hash so I know I'm not doing this as good as it could be done, but here is roughly what I quickly did just to test out the history method for the API:
api_token = "MY_TOKEN"
client = HipChat::Client.new(api_token, :api_version => 'v2')
history = client['ROOM_NAME'].history
history = JSON.parse(history)
history.each do |key, history|
if history.is_a? Array
history.each do |message|
if message.is_a? Hash
puts "#{message['from']['name']}: #{message['message']}"
end
end
end
end
Obviously then the extension to that was to just curse through the dates in the desired range (using: client['ROOM_NAME'].history(:date => '2010-11-19', :timezone => 'PST')), but again, I was only getting a fraction of the history for the room. Are there some additional parameters that I'm missing for this to make it work as expected?
I got this working but it was a big pain.
Start by sending a query with the current time, in UTC, but without including the time zone, as the start date:
https://internal-hipchat-server/v2/room/2/history?reverse=false&date=2015-06-25T20:42:18.658439&max-results=1000&auth_token=XXX
This is very fiddly:
If you specify just the current date, without a timezone, as documented in the API, it is interpreted as midnight last night and you only get messages from yesterday or older.
If you try specifying tomorrow’s date instead, the response is 400 Bad Request This day has not yet come to pass.
If you specify the time as 2015-06-25T20:42:18.658439+00:00, which is the format that times come in HipChat API responses, HipChat’s parser seems to fail and interpret it as midnight last night.
When you get the response back, take the oldest items.date property, strip the timezone, and resubmit the above URL with an updated date parameter:
https://internal-hipchat-server/v2/room/2/history?reverse=false&date=2015-06-17T19:56:34.533182&max-results=1000&auth_token=XXX
Be sure to include the microseconds, in case a notification posted multiple messages to the same room in the same second.
This will get you the next page of messages. Keep doing this until you get fewer than max-results messages back.
There is a start-index parameter I tried passing before I got the above working, and it will give you a few pages of results, with responses lacking a links.next property, but it won’t give you the full history. On a chatroom with 9166 messages in the history according to statistics.messages_sent, it only returned 3217 messages. So don’t use it. You can use statistics.messages_sent as a sanity check for whether you get all messages.
Oh yeah, and the last_active property in the /v2/room call cannot be trusted because it doesn’t update when notification messages are posted to the room.