Apache Storm Fields Grouping Calculation - apache-storm

I asked this question in the Storm user group and haven't gotten a response yet, so I decided to ask it here. I've found the code, and many references to how the the taskIndex is calculated but when I try using the following I don't get the same result as my Storm topology. I've also seen more than one posting where others report the same.
Here's the question:
Hello,
I’ve tried to use the information below to generate the hash, mod it, and in turn, calculate the correct consuming destination task index, but without success. I’ve scoured the Internet to find an example of a hand calculation of this nature and have turned up empty. I must be missing something in my hand calc, so I’m hoping someone on the list can help me out.
I have field grouped as follows:
.fieldsGrouping(EXAMPLE_BOLT, EXAMPLE_BOLT_STREAM, new Fields(TopologyConstants.EXAMPLE_FIELD_GROUPING_ID))
My EXAMPLE_BOLT emits as shown here:
collector.emit(TopologyConstants.EXAMPLE_BOLT_STREAM, new Values(EXAMPLE_FIELD_GROUPING_ID_VALUE, EXAMPLE_DATA_INSTANCE));
I perform the calculation as follows:
int numberOfConsumingTasks = x;
Integer EXAMPLE_FIELD_GROUPING_ID_VALUE = y;
ArrayList alist = new ArrayList();
alist.add(EXAMPLE_FIELD_GROUPING_ID_VALUE);
int hashCode = Arrays.deepHashCode(alist.toArray());
int targetTaskIndex = Math.abs(hashCode) % numberOfConsumingTasks;
The resulting targetTaskIndex value from this calculation does not match the value produced by Storm, when I use real values from my topology.
Can someone tell me what I’m doing wrong?
Thanks,
Aubrey

Related

Leveraging spring to reduce DB calls

I have a data piece that is:
foo{
string: one
string: two
list<string>: listOne
list<string>: listTwo
}
such that in the DB one is associated with multiple entries of listOne.
not much background, I'm at a loss as to where to even look for answers. I received feed back to try to eliminate a jdbctemplate.query during a code review with a "there may be a way to reduce this using #autowire".
no code to share, I just need a place to start looking for answers. I've been on the spring website and I don't see anything that looks like I can use it. and I didn't see any google results that resemble what I'm looking for.
I should probably preface this with the fact that I'm a new dev so even a simple answer is likely not something I've tried. so this came about because my query for listOne and listTwo are returning columns. so I first tried using a mapper with the jdbcTemplate.query() that returned a string. but jdbc didn't like that. so I ended up returning a list from the mapper. then jdbc turns those answers into a list>, I then afterwards loop through those list> to convert them to a list and store them in foo. in my mind an ideal solution allows me to combine the two queries and the mapper looks like (pseudo code):
public foo fooMapper implements<RowMapper>(){
foo.one = resultSet.get("thingOne")
foo.two = resultSet.get("thingTwo")
foo.listOne = resultSet.get("[a portion of the column]listThingOne")
foo.listTwo = resultSet.get("[a portion of the column]listThingTwo")
return foo;
}
it should be noted that the the result set is mono-directional, I found out when I tried using a string[] instead of a list.

Report Builder Expressions

Im new to Report Builder and having issues with some expressions that Im trying to implement in a report. I got the standard ones to work however as soon as I try any distinctions, I get error messages. Over the last couple weeks, Ive tried many combinations, read the expression help, google and looking at other questions at internet sites. To reduce my frustrations, I even would jump to other expressions and walk away hoping I would have different insight coming back.
Its probably something simple or something I dont know about writing expressions.
Im hoping that someone can help with these expressions; they are the versions I get the least errors with(usually just expression expected) and show what Im trying to accomplish.
=IIF((Fields!RECORDFLAG.Value)='D',COUNTDISTINCT(Fields!TICKETNUM.Value),0)
=IIF((Fields!TRANSTYPE.Value)='1' and (Fields!RECORDFLAG.VALUE)='A' or
'B',SUM(Fields!DOLLARS.Value),0)
=IIF((Fields!TRANSTYPE.Value)='1' and
(Fields!RECORDFLAG.VALUE)='P',SUM(Fields!DOLLARS.Value),0)
=Sum([DOLLARS] case when [RECORDFLAG]='P' then -1*[DOLLARS])
Thank You.
=IIF((Fields!RECORDFLAG.Value)=”D”,COUNTDISTINCT(Fields!TICK‌​ETNUM.Value))
The error message gives you the answer here - no false part of the iif() has been specified. Use =IIF((Fields!RECORDFLAG.Value)=”D”,COUNTDISTINCT(Fields!TICK‌​ETNUM.Value), 0)
=IIF((Fields!TRANSTYPE.Value)="1" and (Fields!RECORDFLAG.VALUE)="A" or "B",SUM(Fields!DOLLARS.Value),0)
This is not how an OR works in SSRS. Use:
=IIF((Fields!TRANSTYPE.Value)="1" and (Fields!RECORDFLAG.VALUE="A" or Fields!RECORDFLAG.Value = "B"),SUM(Fields!DOLLARS.Value),0)
The 0s are returned due to your report design. countdistinct() is an aggregate function - it's meant to be used on a set of data. However, your iif() is only testing on a per row basis - you're basically saying "if the current row is thing, count all the distinct values" which doesn't make sense. There are a couple of ways forward:
You can count the number of times a certain value occurs in a given condition using a sum(). This is not the same as the countdistinct(), but if you use =sum(iif(Fields!RECORDFLAG.Value = "D", 1, 0)) then you will get the number of times RECORDFLAG is D in that set. Note: this requires the data to be aggregated (so in SSRS, grouped in a tablix).
You can use custom code to count distinct values in a set. See https://itsalocke.com/aggregate-on-a-lookup-in-ssrs/. You can apply this even if you have only one dataset - just reference the same one twice.
You can change the way your report works. You can group on Fields!RECORDFLAG.Value and filter the group to where Fields!RECORDFLAG.Value = "D". Then in your textbox, use =countdistinct(Fields!TICKETNUM.Value) to get the distinct values for TICKETNUM when RECORDFLAG is D.

Query Pandas DataFrame distinctly with multiple conditions by unique row values

I have a DataFrame with event logs:
eventtime, eventname, user, execution_in_s, delta_event_time
The eventname e.g. can be "new_order", "login" or "update_order".
My problem is that I want to know if there is eventname == "error" in the periods between login and update_order by distinct user. A period for me has a start time and an end time.
That all sounded easy until I tried it this morning.
For the time frame of the 24h logs I might not have a pair, because the login might have happened yesterday. I am not sure how to deal with something like that.
delta_event_time is a computed column of the eventtime minus the executions_in_s. I am considering these the real time stamps. I computed them:
event_frame["delta_event_time"] = event_frame["eventtime"] - pandas.to_timedelta(event_frame["execution_in_s"], unit='s')
I tried something like this:
events_keys = numpy.array(["login", "new_order"])
users = numpy.unique(event_frame["user"])
for user in users:
event_name = event_frame[event_frame["eventname"].isin(events_keys) & event_frame["user" == user]]["event_name"]
But this not using the time periods.
I know that Pandas has between_time() but I don't know how to query a DataFrame with periods, by user.
Do I need to iterate over the DataFrame with .iterrows() to calculate the start and end time tupels? It takes a lot of time to do that, just for basic things in my tries. I somehow think that this would make Pandas useless for this task.
I tried event_frame.sort(["user", "eventname"]) which works nicely so that I can see the relevant lines already. I did not have any luck with .groupby("user"), because it mixed users although they are unique row values.
Maybe a better workflow solution is to dump the DataFrame into a MongoDB instead of pursuing a solution with Pandas to perform the analysis in this case. I am not sure, because I am new to the framework.
Here is pseudocode for what I think will solve your problem. I will update it if you share a sample of your data.
grouped = event_frame.groupby('user') # This should work.
# I cannot believe that it didn't work for you! I won't buy it till you show us proof!
for name, group in grouped:
group.set_index('eventtime') # This will make it easier to work with time series.
# I am changing index here because different users may have similar or
# overlapping times, and it is a pain in the neck to resolve indexing conflicts.
login_ind = group[group['eventname'] == 'login'].index
error_ind = group[group['eventname'] == 'error'].index
update_ind = group[group['eventname'] == 'update_order'].index
# Here you can compare the lists login_ind, error_ind and update_ind however you wish.
# Note that the list can even have a length of 0.
# User name is stored in the variable name. So you can get it from there.
Best way might be to create a function that does the comparing. Because then you can create a dict by declaring error_user = {}.
Then calling your function inside for name, group in grouped: like so: error_user[name] = function_which_checks_when_user_saw_error(login_ind, error_ind, update_ind).

SPMETAL / LINQ to SharePoint Decimal Types

I've hit a pretty major snag with the entities generated by spmetal / linq to sharepoint. I am hoping someone has dealt with this before.. or maybe I am missing something obvious.
Let's say we have a list with a number field. The field will be expected to hold reasonably precise values.. for example, 0.0000451. Once the value is in the list- SharePoint is fine with it. It displays in the list and display/edit views correctly.
Now if we generate entities based on this list with spmetal, we will get..
//...
private System.Nullable<double> _number;
//..
[Microsoft.SharePoint.Linq.ColumnAttribute(Name="Number", Storage="_number", Required=true, FieldType="Number")]
public System.Nullable<double> Number {
get {
return this._number;
}
set {
if ((value != this._number))
{
this.OnPropertyChanging("Number", this._number);
this._number= value;
this.OnPropertyChanged("Number");
}
}
}
//...
Since the type determined by spmetal is doublewe get notation when trying to retrieve it.. for example:
var number = (from x in myDc.MyList select x.Number).First();
number would actually result in a double of 4.51E-05, not 0.0000451.
I am assuming this can be fixed by using a decimal. If I change the types throughout the generated entities to System.Nullable<decimal> I get type conversion failures.
How should I fix this?
EDIT I think maybe it is better to ask "how should I deal with this"? for example, I can simply convert my double values to decimal later on down the line.. my linq query, for example. If I do that, the example case would return the expected result. That seems clunky, though, and I'd like to correct this at the source.
There are several cases like this where SPMetal will give you clunky code. You can, and sometimes have to, fix that. And I admit, it definitely feels better to do it at the source.
But there is a downside.
When your data model changes you will have to re-run SPMetal to incorporate your new entities. Any changes you made to the generated file will have to be carefully documented and re-done, or your code will be broken. Therefore, I would advise to leave the generated code alone if you can work with it.
If you can write a wrapper around the objects/methods it would of course be preferable to just converting the types at the end-point, but that's general good programming practice.
4.51E-05 actually equals 0.0000451 so there is nothing wrong with your code.
In other words 4.51E-05 means 4.51 times ten to the minus five power, or 0.0000451

SolrNet query assistance and some debugging options

How would I use SolrNet to execute a GREATER THAN/LESS THAN query?
Example:
My documents have a field called "minimumDays" and I only want to return docs where that field is LESS THAN OR EQUAL TO the number I pass into the query.
I currently have this, but am not sure it's correct.
int requestedDays = 3;
var minimumNightsQuery = new SolrQueryByRange<int>("minimumDays", 0, requestedDays, true);
Am I on the right track?
The second part here is if there is some way to better understand the query that is being passed into Solr from SolrNet? Debugging value or something where I can inspect the "q" variable for instance.
Thanks again for your help
You can use SolrQueryByRange for the first part of your question. Your code does look good. debugging your query and results might help. I have found that SolrNet does some odd things. - http://code.google.com/p/solrnet/wiki/Facets#Arbitrary_facet_queries
For the second part, You can intercept the ISolrConnection and put in your own in between. For a good start check this out: http://code.google.com/p/solrnet/source/browse/trunk/SampleSolrApp/LoggingConnection.cs?r=513
I have one that logs the query and the results, and if a config setting is on it appends the debug param and logs that result also. Its great info to have.... and one of the only ways to get it.

Resources