Numeric Filter and missing values (Weka) - filter

I'm using SMOTE to oversample my dataset (affected by class imbalance). Some of my attributes have integer values, others have only two decimals but SMOTE creates new instances with many decimals. So to solve this problems I thought to use NumericCleaner Filter and set the number of decimals I desire. This seems to work but I've got problems with missing values. Each missing values is replaced with a 0.0 value, I need to evaluate my model using missing values in dataset. So how can I use NumericCleaner (or other filters that permit to round values) and keep my missing values?

Very interesting question. Okay, here is the solution:
use SMOTE to oversample the minority group (this produces decimal points but the missing values remain missing values)
then select weka filter->unsupervised->attribute->NumericTransform
then click on this filter and set the attribute instances (where you are having decimal points features) and in the methodName instead of "abs", put "ceil".
I hope that solves the problem.

Related

Mapbox - setFilter expression for 'Contains' name within array of possible values

I'm trying filter my Mapbox layer (mapboxgl) to return just places that start with say 'North', 'South, 'East' or 'West'.
My places are in geojson data as a property.
I'm using map.setFilter with an expression to filter the geojson.
Example:
map.setFilter('markers', ["all",filter1,filter2]);
Where filter2 is a simple check on if the place is 'open' or 'closed' and filter1 is my main filter expression explained below.
The main issue is that the name of the place needs to START with one of these values.
So far I can do this IF the strings to check are all the same length by grabbing the length of the strings and then filtering by that substring of the name.
But this doesn't work (obviously) when I have words that are not the same length.
This is pretty simple in most other languages, but my brain cant figure out this in mapbox's expression filters for some reason!
Here's what I have so far (please excuse the poor code quality here, I'm not a professional coder!!):
//the array data to check for a match
filterArray = ['North','South','East','West'];
//checks first values length - not ideal!
var lengthToCheck = filterArray[0].length;
//apply it to the filter to only return ones starting with X
filter1 = ['in', ['slice',['to-string', ['get', 'name']],0,lengthToCheck], ['literal', filterArray]];
I have to slice the name of the place as it might be for example 'Northampton', which wont be found within the array value[0] being 'North'.
This is equally as bad code for the 'East' values as the name slice is going to be 5 characters, so it will be trying something like 'Eastb' (Eastbourne) within the array value 'East' which just wont work.
It would be good if I could some how flip this round so the filter can check the values in the array within the Name, but I can't figure this out!
Is there a way to do that or is there a magic expression feature I'm missing that would solve this problem?
I'm also going to have the same problem with a 'Contains' type check of an array of values too, but I'll cross that bridge once I figure this part out!
Any help appreciated (before I go putting in some convoluted workaround!).

Web2py Number Formatting for Thousands

I'm sort of new to Web2py. I have a system that's working just fine, but I want to make an improvement regarding visualization. There's a couple of fields that use numbers (defined as double in their respective define_table methods) to represent currency, but I want them to also show with a separator for thousands, such as 183,403,293.34. I checked some documentation, but I couldn't find a direct way to handle this form of customization, though I could be missing something.
Any suggestions regarding this? Cheers!
First, if representing currency, you should use the decimal field type rather than double (some calculations using double values may yield incorrect results due to the use of floating point representations internally). However, if using SQLite, there is no distinction between decimal and double, so in that case, you might want to multiply all values by 100 and instead store integers.
In any case, to display a given numeric value with thousands separators in Python, you can do:
'{:,}'.format(myvalue)
For more details, see https://stackoverflow.com/a/10742904/440323 and https://stackoverflow.com/a/21208495/440323.
If you are using the values via web2py functionality that makes use of the field's represent function (e.g., the grid or the .render() method), you can define a custom represent function, such as:
Field('amount', 'decimal(12, 2)',
represent=lambda v, r: '{:,}'.format(v) if v is not None else '')
You could use the Python function of the locale module:
{{= locale.format ('%. 2f', your_value, grouping = True)}}

Extracting data from text file in AMPL without adding indexes

I'm new to AMPL and I have data in a text file in matrix form from which I need to use certain values. However, I don't know how to use the matrices directly without having to manually add column and row indexes to them. Is there a way around this?
So the data I need to use looks something like this, with hundreds of rows and columns (and several more matrices like this), and I would like to use it as a parameter with index i for rows and j for columns.
t=1
0.0 40.95 40.36 38.14 44.87 29.7 26.85 28.61 29.73 39.15 41.49 32.37 33.13 59.63 38.72 42.34 40.59 33.77 44.69 38.14 33.45 47.27 38.93 56.43 44.74 35.38 58.27 31.57 55.76 35.83 51.01 59.29 39.11 30.91 58.24 52.83 42.65 32.25 41.13 41.88 46.94 30.72 46.69 55.5 45.15 42.28 47.86 54.6 42.25 48.57 32.83 37.52 58.18 46.27 43.98 33.43 39.41 34.0 57.23 32.98 33.4 47.8 40.36 53.84 51.66 47.76 30.95 50.34 ...
I'm not aware of an easy way to do this. The closest thing is probably the table format given in section 9.3 of the AMPL Book. This avoids needing to give indices for every term individually, but it still requires explicitly stating row and column indices.
AMPL doesn't seem to do a lot with position-based input formats, probably because it defaults to treating index sets as unordered so the concept of "first row" etc. isn't meaningful.
If you really wanted to do it within AMPL, you could probably put together a work-around along these lines:
declare a single-index param with length equal to the total size of your matrix (e.g. if your matrix is 10 x 100, this param has length 1000)
edit the beginning and end of your "matrix" data file to turn it into appropriate format for a single-index parameter indexed from 1 to n
then define your matrix something like this:
param m{i in 1..nrows,j in 1..ncols} := x[j+i*(ncols-1)];
(not tested, I won't promise that I have rows and columns the right way around there!)
But you're probably better off editing the input file into one of the standard AMPL matrix formats. AMPL isn't really designed for data wrangling - you can do it in a pinch but if you're doing this kind of thing repeatedly it may be less trouble to code it in a general-purpose language e.g. Python.

Google Sheets/Excel: Checking if a time falls within a range

I'm trying to find a way to see if I can find a way to determine if a time that I stipulate falls between two other times. For example:
Start End
11:33:48 11:53:48
12:20:22 12:38:21
12:39:27 13:00:09
14:16:23 14:20:49
14:20:54 14:22:56
Then, I want to check if a cell (here the value of 12:50 in cell E30) falls between ANY two values in a range in THE SAME ROW. For me, I can get the obvious way to check for this in one row, and this simple version totally works:
=If(AND(E30>A4,E30<B4), "TRUE", "FALSE")
However, I want to check if that number falls within ANY of the values within the ROWS above cells in a range, and I can't get that to work. For example, I tried this and it didn't work:
=If(AND(E30>A:A,E30<B:B), "TRUE", "FALSE")
I also tried a simple countif variation but that didn't do it either:
=COUNTIFS(A:A,">"&E30,B:B,"<"&E30)
Any advice on how to adjust one of these formulas to get it to work?
Try switching the angle brackets around:
=COUNTIFS(A:A,"<"&E30,B:B,">"&E30)
I think this should work for the above data set -
=IF((FILTER(A2:B6, D2>A2:A6,D2<B2:B6)),TRUE,FALSE)
This will give you if there is any match or not.
For the number of rows count that match -
=ROWS((FILTER(A2:B6, D2>A2:A6,D2<B2:B6)))
Alomsot =IF(Q2>R2,IF(AND($X$16>HOUR(Q2),$X$16<(HOUR(R2)+12)),1,0),IF(AND(HOUR($X$14)>=HOUR(Q2),HOUR($X$14)<=HOUR(R2)),1,0))

SSRS 2008: Using StDevP from multiple fields / Combining multiple fields in general

I'd like to calculate the standard deviation over two fields from the same dataset.
example:
MyFields1 = 10, 10
MyFields2 = 20
What I want now, is the standard deviation for (10,10,20), the expected result is 4.7
In SSRS I'd like to have something like this:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value)
Unfortunately this isn't possible, since (Fields!MyField1.Value + Fields!MyField2.Value) returns a single value and not a list of values. Is there no way to combine two fields from the same dataset into some kind of temporary dataset?
The only solutions I have are:
To create a new Dataset that contains all values from both fields. But this is very annoying because I need about twenty of those and I have six report parameters that need to filter every query. => It's probably getting very slow and annoying to maintain.
Write the formula by hand. But I don't really know how yet. StDevP is not that trivial to me. This is how I did it with Avg which is mathematically simpler:
=(SUM(Fields!MyField1.Value)+SUM(Fields!MyField2.Value))/2
found here: http://social.msdn.microsoft.com/Forums/is/sqlreportingservices/thread/7ff43716-2529-4240-a84d-42ada929020e
Btw. I know that it's odd to make such a calculation, but this is what my customer wants and I have to deliver somehow.
Thanks for any help.
CTDevP is standard deviation.
Such expression works fine for me
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value) but it's deviation from one value (Fields!MyField1.Value + Fields!MyField2.Value) which is always 0.
you can look here for formula:
standard deviation (wiki)
I believe that you need to calculate this for some group (or full dataset), to do this you need set in the CTDevP your scope:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value, "MyDataSet1")

Resources