How to best get the data out of a lookup table - algorithm

I have a few product lines with products that have various features. I have a list of drawings made for each product, and below is a sample of what the product lines, products, and features are represented in the drawings.
I am trying to figure out, when given a particular product with all of its features (i.e. a single row from the table), what is the most efficient most concise way to structure my code around the features, to select the correct drawing?
I can always do something like
if ($product == "A"
&& $motor == true
&& $feet == 3
&& $outlet == true
&& $manual == false)
getDrawing("A_motor_3_outlet_no_manual.jpg");
and end up with as many flat if statements as there are rows in the table. But there must be a better way. I think it may have something to do with cardinality, or maybe with variability of the options in each of the properties of the product. The exact concept escapes me for some reason.
I have a feeling for example that it is better to check if the product has a motor first, just because once you know that it will eliminate about half of all the options and narrow it down faster. i.e in the main outer if block do something like this:
if ($motor == true)
{
//7 drawings left to check
}
else
{
//6 drawings left to consider
}
as opposed to something like this:
if ($feet == 1)
{
//still 10 rows left to consider
}
else if ($feet == 2)
{
//just 2 options left to check
}
else if ($feet == 3)
{
//one option only left:
getDrawing("B_motor_3_no_outlet_manual.jpg");
}
But maybe I am just overthinking this and just need a lookup table somehow with all the properties, maybe like
getDrawing($lookup[$product][$motor][$feet][$outlet][$manual]);
Question still remains on what order of properties is best to use for the lookup table, and if it matters.
Question:
Is there an "optimal" if-then-else block nesting order to decide for product properties, to minimize the total number of decisions the code has to make overall, or do I need to abandon that train of thought and just use a lookup table? Why or why not?
EDIT: By the way .. this looks like a good candidate to use a database .. but I have only 48 rows total and can encode them directly into code. This will be read-only, not updated at all that frequently, so I am thinking of using a multidimensional array to encode this data.

This is what databases are made for. You will be able to use a simple SELECT statement to find exactly what you need. Even if right now it's only one table with very few rows, that's the way I'd go for.
If you really want to store everything in the script, you could go ahead and hash the properties of each product and use the hash value as key of the lookup table. When given certain product properties, you can then just hash them and retreive the respective drawing.

You could just encode your data into drawing filenames. Exactly the way you did with the first if statement. For example:
$filename = $product."_".$feet."_".($outlet ? "true":"false")."_".($manual ? "manual":"no_manual")."jpg";
if (!file_exists($filename)) $filename = "missing.jpg";
This will keep all the troubles outside, and simplyfy the code.

Related

Spring Data JPA fetching list always returns at least a single result

I've noticed a slight problem with how my API is working where I'm using Spring Data JPA.
My query looks something along the lines of:
#Query("SELECT p.id AS id, COUNT(l) AS likes FROM Post p LEFT JOIN Like l ON l.post = p WHERE p.location.id = ?1")
My actual query is bigger, this this contains everything necessary to explain what the issue is. This query will return a list, but assume the location does not exist, it should return null or an empty list, correct? Oh, how wrong you are, my sweet summer child!
This query will instead always return a list of at least one element, regardless of whether or not there are any posts linked to said location.
[{"id": null, "likes": 0}]
That is what the result looks like when serialized to JSON. I am not quite sure what to do about this little predicament, as I obviously don't want to return a list with faulty data, but needing to use processing to filter out duds also seems dumb and unnecessary.
Is there any way to prevent this that I've yet to find? If it is of any relevance, I am using projections currently for my responses.
What I've tried so far:
Adding a not null condition for fields. Does not work, ignored by COUNT.
Adding constraints to all fields #NotNull. Does not work, will still become null.
For what it's worth, I've tried different kinds of joins, though anything but LEFT JOIN doesn't make much sense.
I haven't been able to find any other case which resembles this either, although it most likely exists, but is drowned out by everything else. I'm not quite sure what can be done in this regard, so I'm curious if it's just a quirk with the framework, or if there is an actual solution.
It might be possible to solve through native queries, but I would prefer not to use them.
I'm no SQL expert but I believe that a left join will give you this result if the ID does not exist.
Have you run the query in your DB? Doesn't it give you one row in your result set for IDs that do not exist?
I believe this is intended to say there is a 0 match.
You might want to validate your query before running it. Meaning checking that the location exists first.
As the issue is inherently due to a COUNT and CASE keyword in my real query, resulting in there always being at least one row, and I can't find any method of doing this automatically, the solution I've used is the following:
List<Item> items = repository.customQuery(id);
if (0 < items.size() && null == items.get(0).getId()) {
items.remove(0);
}
The first condition is arbitrary as I know there is always at least one entry, but is done just as a safety measure. A try-catch block would do the trick as well. In the case where you use a primitive int instead of Integer, you'd need to initialize the value in the constructor to something which would normally never be present in the database, such as -1.
If anyone knows of a better method, I'd love to know about it.

Query Pandas DataFrame distinctly with multiple conditions by unique row values

I have a DataFrame with event logs:
eventtime, eventname, user, execution_in_s, delta_event_time
The eventname e.g. can be "new_order", "login" or "update_order".
My problem is that I want to know if there is eventname == "error" in the periods between login and update_order by distinct user. A period for me has a start time and an end time.
That all sounded easy until I tried it this morning.
For the time frame of the 24h logs I might not have a pair, because the login might have happened yesterday. I am not sure how to deal with something like that.
delta_event_time is a computed column of the eventtime minus the executions_in_s. I am considering these the real time stamps. I computed them:
event_frame["delta_event_time"] = event_frame["eventtime"] - pandas.to_timedelta(event_frame["execution_in_s"], unit='s')
I tried something like this:
events_keys = numpy.array(["login", "new_order"])
users = numpy.unique(event_frame["user"])
for user in users:
event_name = event_frame[event_frame["eventname"].isin(events_keys) & event_frame["user" == user]]["event_name"]
But this not using the time periods.
I know that Pandas has between_time() but I don't know how to query a DataFrame with periods, by user.
Do I need to iterate over the DataFrame with .iterrows() to calculate the start and end time tupels? It takes a lot of time to do that, just for basic things in my tries. I somehow think that this would make Pandas useless for this task.
I tried event_frame.sort(["user", "eventname"]) which works nicely so that I can see the relevant lines already. I did not have any luck with .groupby("user"), because it mixed users although they are unique row values.
Maybe a better workflow solution is to dump the DataFrame into a MongoDB instead of pursuing a solution with Pandas to perform the analysis in this case. I am not sure, because I am new to the framework.
Here is pseudocode for what I think will solve your problem. I will update it if you share a sample of your data.
grouped = event_frame.groupby('user') # This should work.
# I cannot believe that it didn't work for you! I won't buy it till you show us proof!
for name, group in grouped:
group.set_index('eventtime') # This will make it easier to work with time series.
# I am changing index here because different users may have similar or
# overlapping times, and it is a pain in the neck to resolve indexing conflicts.
login_ind = group[group['eventname'] == 'login'].index
error_ind = group[group['eventname'] == 'error'].index
update_ind = group[group['eventname'] == 'update_order'].index
# Here you can compare the lists login_ind, error_ind and update_ind however you wish.
# Note that the list can even have a length of 0.
# User name is stored in the variable name. So you can get it from there.
Best way might be to create a function that does the comparing. Because then you can create a dict by declaring error_user = {}.
Then calling your function inside for name, group in grouped: like so: error_user[name] = function_which_checks_when_user_saw_error(login_ind, error_ind, update_ind).

pull out r squared from fit model to table in JSL JMP

I'm trying to figure out how to use JSL to write some of the analysis of variance variables values to a table in JMP. My idea is to write a script that runs different types of models with different parameters with R^2 and RSME logging to a table (maybe there is a better way to do this I'm on my second day of JMP). Going through the documentation it seems that different analysis have different ways of doing this and I can't find one for "fit model". I also will need to know how to do this for a neural network which I think I may have found the documentation for.
If you're doing something like screening variables to determine an optimized model, you're in the right place with the fit model platform. However, running the fit model in a loop without human judgment in model selection as you've suggested isn't necessarily expedient.
So at the expense of trying to make JMP/JSL do something it's not really suited for, one way to achieve your generic goal of grabbing text from the fit model platform output is to send your platform to a "report" and then pull from that "report" the data you want, and then send it to a data table. From that data table, you can concatenate it with another data table and you would have your log. That's the idea, here's an example, for some dummy data "Ydata" and "Xdata":
thing = Fit Model(
Y( :Ydata ),
Effects( :Xdata ),
Personality( Standard Least Squares ),
Emphasis( Minimal Report ),
Run(
:Ydata << {Plot Actual by Predicted( 0 ),
Plot Residual by Predicted( 0 ), Plot Effect Leverage( 0 )}
)
);
thing_report = thing<<report;
thing_report_dt_ref = thing_report["Summary of Fit"][1] << make into data table;
//alternatively
//thing_report_dt_ref = thing_report[TableBox(1)] << make into data table;
thing_report_dt_ref << Set Name("Choose_a_name_for_your_new_data_table");
You'd have to handle the looping part, but if you can do it once, you can do it N times.
Because JMP/JSL is stupid, you can alternatively call the "Summary of Fit" directly if your know it's name in the tree structure. In my case, its name was "TableBox(1)". Do:
thing << show tree structure
To see where your data lives in the platform display box.

SPMETAL / LINQ to SharePoint Decimal Types

I've hit a pretty major snag with the entities generated by spmetal / linq to sharepoint. I am hoping someone has dealt with this before.. or maybe I am missing something obvious.
Let's say we have a list with a number field. The field will be expected to hold reasonably precise values.. for example, 0.0000451. Once the value is in the list- SharePoint is fine with it. It displays in the list and display/edit views correctly.
Now if we generate entities based on this list with spmetal, we will get..
//...
private System.Nullable<double> _number;
//..
[Microsoft.SharePoint.Linq.ColumnAttribute(Name="Number", Storage="_number", Required=true, FieldType="Number")]
public System.Nullable<double> Number {
get {
return this._number;
}
set {
if ((value != this._number))
{
this.OnPropertyChanging("Number", this._number);
this._number= value;
this.OnPropertyChanged("Number");
}
}
}
//...
Since the type determined by spmetal is doublewe get notation when trying to retrieve it.. for example:
var number = (from x in myDc.MyList select x.Number).First();
number would actually result in a double of 4.51E-05, not 0.0000451.
I am assuming this can be fixed by using a decimal. If I change the types throughout the generated entities to System.Nullable<decimal> I get type conversion failures.
How should I fix this?
EDIT I think maybe it is better to ask "how should I deal with this"? for example, I can simply convert my double values to decimal later on down the line.. my linq query, for example. If I do that, the example case would return the expected result. That seems clunky, though, and I'd like to correct this at the source.
There are several cases like this where SPMetal will give you clunky code. You can, and sometimes have to, fix that. And I admit, it definitely feels better to do it at the source.
But there is a downside.
When your data model changes you will have to re-run SPMetal to incorporate your new entities. Any changes you made to the generated file will have to be carefully documented and re-done, or your code will be broken. Therefore, I would advise to leave the generated code alone if you can work with it.
If you can write a wrapper around the objects/methods it would of course be preferable to just converting the types at the end-point, but that's general good programming practice.
4.51E-05 actually equals 0.0000451 so there is nothing wrong with your code.
In other words 4.51E-05 means 4.51 times ten to the minus five power, or 0.0000451

Create Random Integer Based on Id in Ruby

I have a scenario where I need to generate 4 digit confirmation codes for individual orders. I don't want to just do random codes due to the off chance that two exact codes would be generated near the same time. Is there a way to use the id of each order and generate a 4 digit code from that? I know I am going to eventually have repetitive codes with this but it will be ok because they will not be generated around the same time.
Do you really need to base the code on the ID? Four digits only gives you ten thousand possible values so you could generate them all with a script and toss them in a database table. Then just pull a random one out of the database when you need it and put it back in when you're done with it.
Your code table would look like this:
code: The code
uuid: A UUID, a NULL value here indicates that this code is free.
Then, to grab a code, first generate a UUID, uuid, and do this:
update code_table
set uuid = ?
where code = (
select code
from code_table
where uuid is null
order by random()
limit 1
)
-- Depending on how your database handles transactions
-- you might want to add "and uuid is null" to the outer
-- WHERE clause and loop until it works
(where ? would be your uuid) to reserve the code in a safe manner and then this:
select code
from code_table
where uuid = ?
(where ? is again your uuid) to pull the code out of the database.
Later on, someone will use the code for something and then you just:
update code_table
set uuid = null
where code = ?
(where code is the code) to release the code back into the pool.
You only have ten thousand possible codes, that's pretty small for a database even if you are using order by random().
A nice advantage of this approach is that you can easily see how many codes are free; this lets you automatically check the code pool every day/week/month/... and complain if the number of free codes fall below, say, 20% of the entire code space.
You have to track the in-use codes anyway if you want to avoid duplicates so why not manage it all in one place?
If your order id has more than 4 digits, it is theoreticly impossible without checking the generated value in a array of already generated values, you can do something like this:
require 'mutex'
$confirmation_code_mutex = Mutex.new
$confirmation_codes_in_use = []
def generate_confirmation_code
$confirmation_code_mutex.synchronize do
nil while $confirmation_codes_in_use.include?(code = rand(8999) + 1000)
$confirmation_codes_in_use << code
return code
end
end
Remember to clean up $confirmation_codes_in_use after using the code.

Resources