API - GET TASKS - custom filters - mesos

Is there a way to filter results of GET_TASKS in Mesos API ?
http://mesos.apache.org/documentation/latest/operator-http-api/#get_tasks
There is no info about filters, like for example get TASKS that belongs to specific principal or with status done. There is no info about limit / offset. Getting whole list of tasks is slow.
Maybe there is some way to filter results ?

You are referring to operator API but there is also general \task API that allow some basic filtering with query parameters:
framework_id=VALUE Only return tasks belonging to the framework with this ID.
limit=VALUE Maximum number of tasks returned (default is 100).
offset=VALUE Starts task list at offset.
order=(asc|desc) Ascending or descending sort order (default is descending).
task_id=VALUE Only return tasks with this ID (should be used together with parameter framework_id).
http://mesos.apache.org/documentation/latest/endpoints/master/tasks/

Related

Datadog distinct-like custom metrics

Given following scenario:
A lambda receives an event via SQS
The lambda receives a uuid pointing to an entity.
The lambda may fail with an error
SQS will retrial that particular entity several times
The lambda will be called with different entities thousand of times
Right now we monitor a custom error-count metric like myService.errorType.
Which gives us an exact number of how many times an error occurred - independent from a specific entity: If an entity can't be processed like 100 times, then the metric value will be 100.
What I'd like to have, though, is a distinct metric based on the UUID.
Example:
entity with id 123 fails 10 times
entity with id 456 succeeds
entity with id 789 fails 20 times
Then I'd like to have a metric with the value of 2 - because the processes failed for two entities only (and not for 30, as it would be reported right now).
While searching for a solution I found the possibility of using tags. But as the docs point out they are not meant for such a use-case:
Tags shouldn’t originate from unbounded sources, such as epoch timestamps, user IDs, or request IDs. Doing so may infinitely increase the number of metrics for your organization and impact your billing.
So are there any other possibilities to achieve my goals?
I've solved it now by verifying the status via code and by adding tags to the metrics:
occurrence:first
subsequent
This way I can filter in my dashboard for occurrence:first only.
To make sure things are clear, you have a metric called myService.errorType with a tag entity. This metric is a counter that will increase every time an entity is in error. You will then use this metric query:
sum:myService.errorType{*} by {entity}
When you speak about UUID, it seems that the cardinality is small (here you show 3). Which means that every hour you will have small amount of UUID available. In that case, adding UUID to the metric tags is not as critical as user ID, timestamp, etc. which have a limitless number of options.
I would invite you to add this uuid tag, and check the cardinality in the metric summary page to ensure it works.
Then to get the number of UUID concerned by errors, you can use something like:
count_not_null(sum:myService.errorType{*} by {uuid})
Finally, as an alternative, if the cardinality of UUID can go through the roof, I would invite you to work with logs or work with Christopher's solution which seems to limit the cardinality increase as well.

Apache Storm, co-partitioning of streams

I have the following situation where I need to join two streams Bid(Seller, Item, Price) and Ask(Buyer, Item, Price) where I need to emit a tuple (Seller, Buyer) when the buyer offers a higher price than requested by the seller.
I know that I can configure the Bolt's grouping option FieldGrouping. But if I configure each input separately, is there a guarantee that the data with the same value will always go to the same Bolt task.
I am putting a pseudo code to help explain more
builder.setBolt("goodPrice", new GoodPriceBolt(), 5)
.fieldsGrouping("Bid", new Fields("Item"))
.FieldsGrouping("Ask", new Fields("Item"));
Now, as per the documentation http://storm.apache.org/releases/current/Concepts.html, we can guarantee that all Bid data points for the same item value will be delivered to the same task. But, I am not sure if the code above will guarantee also that all Ask data points with the same item value as that of the Bid will be delivered to the same task.
In other words, I need to partition on Bid.Item = Ask.Item. Is that possible in Storm?
Yes, as far as I know. Joins is listed on Storm's page as a common pattern http://storm.apache.org/releases/2.0.0-SNAPSHOT/Common-patterns.html.
Here's the implementation of fields grouping in Storm https://github.com/apache/storm/blob/09e01231cc427004bab475c9c70f21fa79cfedef/storm-client/src/jvm/org/apache/storm/daemon/GrouperFactory.java#L160. The values list contains the values of the fields you've specified in the field grouping (in your case "Item"). The id of the task to send the tuple to is based on https://github.com/apache/storm/blob/09e01231cc427004bab475c9c70f21fa79cfedef/storm-client/src/jvm/org/apache/storm/utils/TupleUtils.java#L44, which uses the hash code of the values. As long as whatever is in your "Item" field implements hashCode properly, you should be good.
You might also be interested in http://storm.apache.org/releases/1.2.1/Joins.html, and maybe https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/SingleJoinExample.java. Keep in mind that when you join streams, you should try to take into account that the matching tuples might not show up in the joiner at the same time, which is why Storm provides a join bolt that lets you specify a window for how long you want to wait on one part of the match.

FileNet P8 ACCE Sweep job filter

I'm trying to setup a sweep job that moves a document from one class to a different class, but I only want to test right now -- not move ALL documents.
I was trying to add a filter to only pull over certain documents to test this before I pull the trigger, but it isn't working (ALL documents get listed in the results when I run this as preview).
The current filter I have is:
[DocumentTitle] like '%Z*%'
Any ideas what I need to do to change the filter to only have this run on the subset of documents I want??
Please clarify on below queries to resolve your issue:
1) Is your sweep job based on Java API / .net API? or
2) Is it based on FEM (Enterprise manager) tool
From the Filter : [DocumentTitle] like "%Z%" will filter all documents with the title %Z%, Please try to filter with ID to fetch one record, Once successful, then test with multiple records.
Thanks,
Habi
The sweep jobs typically take a condition that is similar to part after WHERE condition in search, the easiest way hence is simply to go to the search view, create your search, move to the SQL view tab, and then take whatever after WHERE condition and then add it to your sweep search filter.
Here are examples of filter conditions:
VersionStatus = 4 //All superseded documents
DateCreated < NOW() - TimeSpan(365, 'Days') //All documents that were created at least a year ago
StorageArea = OBJECT('{5E2BE09A-F4B1-49E2-A229-77FE32E5FEF1}') //All content in a specific storage area
VersionStatus = 4 AND DateCreated < NOW() - TimeSpan(365, 'Days') AND ContentSize > (1024 * 1024 * 500) //Complex logical expression
Final point in regard to your question about
I only want to test right now -- not move ALL documents.
Sweeps has Sweep Mode which defines how the sweep is going to execute, in your case you need to set it to Preview.

How to properly read all changed entities from external API

I need to properly traverse over all items in some external API. All items have "update_time" property and I can query the items from API in ascending or descending order. Which I should use to properly get all items without missing any of them?
Facts:
External API has pagination (limit and page parameters are fixed) and I cannot query all items by one query.
Querying of items takes some time.
Processing of received items takes some time.
While a page of items is queried or processed, items in external system can be changed -> this cause updating its 'update_time' property and influence ordering (paginating), so next page API call can cause "gap" in list of received items.
I don't want to process all items every time - only updated ones by the last traverse (this task is scheduled every 1 hour for example) - I store max of "last_update" property of all received items and skip processing of older items next traverse.
Thanks, it's really complicated to imagine for me.

Writing LDAP query filter

I have trouble writing a filter for LDAP query.
I have two object classes - Person and Service. Database consists of a number of Persons, each having zero or more services as children. Each person has an identifier, personNumber attribute. I want to select several persons and all their services given person numbers. Is it possible to do so in one query?
For example, if we have the following set of objects:
personNumber=1,ou=root,o=org
serviceNumber=1,personNumber=1,ou=root,o=org
serviceNumber=2,personNumber=1,ou=root,o=org
personNumber=2,ou=root,o=org
serviceNumber=3,personNumber=2,ou=root,o=org
personNumber=3,ou=root,o=org
serviceNumber=4,personNumber=3,ou=root,o=org
, is it possible, given person numbers 1 and 2, to retrieve these objects:
personNumber=1,ou=root,o=org
serviceNumber=1,personNumber=1,ou=root,o=org
serviceNumber=2,personNumber=1,ou=root,o=org
personNumber=2,ou=root,o=org
serviceNumber=3,personNumber=2,ou=root,o=org
but not these:
personNumber=3,ou=root,o=org
serviceNumber=4,personNumber=3,ou=root,o=org
, using one query only? It is an example; it is possible to have more than two identifiers to load. They are not known a priori.
Also, is there a way to specify that attribute value should be in some collection of values, like IN (..) clause in SQL, other than generating big (|(a=..)(a=..)(a=..)..) filter?
The answer is No, per RFC https://www.rfc-editor.org/rfc/rfc2254, there is NO such filter.If the IN list is very large and you have lots of people in ldap, you need write a simple paged query to get all results using (objectClass=Person) filter, and filter the result after retrial. if your code is written in Java, you can checkout unboundid LDAP SDK
If person doesn't have a multivalued attribute holding the service there is no way this can be returned in one ldapsearch. You'll need at least a two stage rocket: first select person, for each person check on childnodes.
AFAIK there is no IN operator in LDAP filters. The RFC is clear about that. So you're stuck with your tedious (|(a=s1)(a=s2)(a=s3)...) construct.

Resources