Excel Formula for extracting specific rows - algorithm

I have a huge data set and I want to extract the rows which do not have certain keywords.
For example, let says I have the following data set (two columns):
+--------------+------------------+
| Nylon | Nylon wire |
| Cable | 5mm metal cable |
| Epoxy | some comment |
| Polyester | some comment |
+--------------+------------------+
I want to find the rows which do not contain the keywords Nylon and Epoxy (and other keywords for that matter) and put those rows in another place (i.e. sheet).
Thanks in advance!

Sub a()
With Worksheets(1)
j = 1
For i = 1 To .UsedRange.Rows.Count
If .Rows(i).Find(what:="Nylon") Is Nothing And .Rows(i).Find(what:="Epoxy") Is Nothing Then
.Rows(i).Copy Destination:=Worksheets(2).Rows(j)
j = j + 1
End If
Next i
End With
End Sub

A | B | C
-------------------- ------------------- --------
1 Search Term -> | nylon |
2 Name | Description | Found
3 Nylon | Nylon Wire | TRUE
4 Cable | 5 mm metal cable | FALSE
5 Exoxy | some comment | FALSE
6 Polyester | some comment | FALSE
In the above example, I would create an AutoFilter on A2:C6 with the first row being my headers. In each cell in C3:C6 I would have a formula akin to (this is from C3):
=OR(NOT(ISERROR(SEARCH($B$1,A3))),NOT(ISERROR(SEARCH($B$1,B3))))
Now, you can use the AutoFilter tools to filter for those where Found is true.

I'll show how you can check if one string is within some other columns, returning a boolean. Then, you'll need to decide how to handle the positive cases. I believe you'll use a VLOOKUP or something like this.
Please, replace ; by ,. I'm not using English regional settings ATM.
You can combine FIND and ISERROR function to find your result. ISERROR returns a boolean, and you can combine several column checks as much as you want.
Example:
Let's say you have the test keywords in cells C1 and D1, and the range you provided above starts at A2.
Now, we can add into C2 a testing to check if the string Nylon exists within A2, that is =ISERROR(FIND(C1;$A$2)). We also need to check if the string Nylon exists in B2, then we add the second condition: AND(ISERROR(FIND(C1;$A$2));ISERROR(FIND(C1;$B$2)))
As we're testing if the FIND function returned error or not, it means that our function will return false when the string has been found. To be easier to understand, I believe that's better to add a NOT condition in our formula, then in case the string in C1 appears in A2 or B2, our function will return TRUE:
=NOT(AND(ISERROR(FIND(C1;$A$2));ISERROR(FIND(C1;$B$2))))
Then, we copy this formula one cell to the right to test against D1 value, Epoxy.
Now, that's the result structure:
Nylon Epoxy
Nylon | Nylon wire | TRUE | FALSE
Cable | 5mm metal cable | FALSE | FALSE
Epoxy | some comment | FALSE | TRUE
Polyester | some comment | FALSE | FALSE

Related

PROC FREQ in SAS gives too wide columns

I did a simple proc freq in SAS:
PROC FREQ DATA=test;
a * b;
RUN;
This raised the error: insufficient page size to print frequency table
From ERROR: Insufficient page size to print frequency table in SAS PROC FREQ I learned that the error is fixed by enlarging the page size:
option pagesize=max;
But then my table still looked strange with super high white spaces in column b:
Frequency |
Percent |
Row Pct | value 1 | value 2 |
Col Pct | | |
| | |
...etc... ...etc...
| | |
----------+----------+----------+
a | 12 | 3 |
What solved my problem was adding a format to the proc freq that truncated variable b.
PROC FREQ DATA=test;
FORMAT B $7.;
a * b;
RUN;
now my result looks like this and I'm happy enough:
Frequency |
Percent |
Row Pct |
Col Pct | value 1 | value 2 |
----------+----------+----------+
a | 12 | 3 |
I'm left a bit bewilderd, because nowhere in the code did I apply a format to b before, just a lenght statement. Other variables that had their lengths fixed did not have this problem. I did switch from an excel sourcefile to oracle-exadata as source. Is it possible that Oracle pushes variable formats to SAS?
SAS has a nasty habit of attaching formats to character variables pulled from external databases, including PROC IMPORT from an EXCEL file. So if a character variable has a storage length of 200 then SAS will also attach the $200. format to the variable.
When you combine two dataets that both contain the same variable the length will be set by the first version of the variable seen. But the format attached will be set by the first non-empty format seen. So you could combine a dataset where A has length $10 and no format attached with another dataset where A has the format $200. attached and the result will a variable with an actual length of 10 but the $200. format attached.
You can use the format statement where you list variable names but no format specification to remove them. You could do it in the PROC step.
PROC FREQ DATA=test;
tables a * b;
format _character_ ;
RUN;
Or do it in a data step or use PROC DATASETS to modify the formats attached to the variable in an existing dataset.

How to insert into semi sorted List?

I have a large semi sorted list of Strings sorted by only first character. Each String is accompanied by an ID. The first x entries start with letter A then follow entries starting with letter B and so on. Not all letters are necesarily represented.
By semi sorted I mean that there are exceptions (wrongly sorted entries). It is NOT possible to sort the entries in a correct fashion. Already existent entries have to remain at their ID.
I have crafted the follwing example only including starting letters A and B. The entries starting with C, Z and S have been wrongly entered.
Example:
| ID | NAME |
|------|------|
| 6000 | AXXX |
| 6001 | AXZS |
| 6003 | AAFD |
| 6004 | CSDF |
| 6005 | ZSSF |
| 6006 | ASDF |
| 6007 | BXAS |
| 6010 | BZDS |
| 6011 | SHZF |
| 6012 | BHZT |
I want to add entries to the list. A entry with a Name starting with letter A should be inserted grouped with other entries starting with letter A if possible or otherwise at the very end.
In the above example a entry with a Name starting with letter A should be inserted with ID 6002.
A entry with a Name starting with letter B should be added with ID 6008.
I am not sure how to solve this. My first thoughts are to first iterate over the existing list starting with the lowest ID and to save information on the letter group.
Like:
Letter: A StartID: 6000 EndID: 6006 IsFull:False
Letter: B StartID: 6007 EndID: 6012 IsFull:False
And then when it comes to inserting using the above information for the determination of possible IDs of the new entry. After inserting a new entry this information would have to be updated.
However I am not sure on how to exactly achieve this. All I need is some pseudo code for a possible solution so I can write my own code.
You probably want a few steps
find the position of the insertion group if it exists (what if the first few were B, Z, before A?)
find the last member of the group if it exists, otherwise the last member of the prior group (for example when inserting F)
determine if there's room before in the left index after the last member of the group, and before the first member of the next one
if a position exists for insertion, insert the value, else
find the last position
append the value
Some considerations
you must keep track of and consider several positions, some structure will help you with this
if you have runs ordered A,B,Z,C in the left column, does the block with Z comprise a group? is the block C misplaced? otherwise it seems your values should grow wherever the first new member is
"next" needs to consider multiple characters (presumably ABAA comes after AABB)

How to grab all text inside of matching brackets with ruby and/or Regular Expressions

I am working on doing some code cleanup and need to make sure that my gsub! only runs on a small section of code. The portion of the code I need to examine starts with {{Infobox television (\{\{[Ii]nfobox\s[Tt]elevision to be technical) and ends with the matching double brackets "}}".
An example of the gsub! that will be run is text.gsub!(/\|(\s*)channel\s*=\s*(.*)\n/, "|\\1network = \\2\n")
...
{{Infobox television
| show_name = 60 Minutos
| image =
| director =
| developer =
| channel = [[NBC]]
| presenter = [[Raúl Matas]] (1977–86)<br />[[Raquel Argandoña]] (1979–81)
| language = [[Spanish language|Spanish]]
| first_aired = {{Date|7 April 1975}}
| website = {{url|https://foo.bar.com}}
}}
...
Note:
Using sub instead of gsub is not an option due to the fact that multiple instances of the parameter needed to be substituted may exist.
I cannot just look for the first set of }} as there may be multiple sets as show in the example above.
You may use a regex with a bit of recursion:
/(?=\{\{[Ii]nfobox\s[Tt]elevision)(\{\{(?>[^{}]++|\g<1>)*}})‌​/
Or, if there are single { or } inside, you will need to also match those with (?<!{){(?!{)|(?<!})}(?!}):
/(?=\{\{[Ii]nfobox\s[Tt]elevision)(\{\{(?>[^{}]++|(?<!{){(?!{)|(?<!})}(?!})|\g<1>)*}})/
See the Rubular demo
Details:
(?=\{\{[Ii]nfobox\s[Tt]elevision) - a positive lookahead making sure the current location is followed with {{Infobox television like string (with different casing)
(\{\{(?>[^{}]++|\g<1>)*}})‌​ - Group 1 that matches the following:
\{\{ - a {{ substring
(?>[^{}]++|\g<1>)* - zero or more occurrences of:
[^{}]++ - 1 or more chars other than { and }
(?<!{){(?!{) - a { not enclosed with other {
(?<!})}(?!}) - a } not enclosed with other }
| - or
\g<1> - the whole Group 1 subpattern
}} - a }} substring
Can't give you a direct answer without spending a lot of time on it.
But it is noteable that the first bracket set is at the beginning of a line, as is the last one.
So you have
^{{(.*)^}}$/m
The m means multiline match. That will match everything between the braces - the () brackets mean that you can pull out what was matched inside the braces, for example:
string = <<_EOT
{{Infobox television
| show_name = 60 Minutos
| image =
| director =
| developer =
| channel = [[NBC]]
| presenter = [[Raúl Matas]] (1977–86)<br />[[Raquel Argandoña]] (1979–81)
| language = [[Spanish language|Spanish]]
| first_aired = {{Date|7 April 1975}}
| website = {{url|https://foo.bar.com}}
}}
_EOT
matcher = string.match(^{{(.*)^}}$/m)
matcher[0] will give you the whole expression
matcher[1] will give you what was matched inside the () brackets
The danger with this is that it will do "greedy" matching and match the largest piece of text it can, so you will have to turn this off. Without more info on what you're trying to do I can't help any more.
NB - to match () brackets you have to escape them. See https://ruby-doc.org/core-2.1.1/Regexp.html for more info.

Adding the values in dax based on a string in another Column

I want to write a Query which would give the Sum of the value where the string contains 'SP11' without any break
For Example in the below table I want to add the value of the 3rd, 6th and 7th rows
String | Value
________________|_______
A/B/SP1/ADDS | 12
ss/B/SP2/A | 2
A/C/D/SP11/C | 66
Ass/C/ASD | 46
ACD/SP1/C/V/C | 45
F/D/SP11/C | 85
F/D/SP11/C/12/D | 21
Which would result in something like SP11 = 172 which was derived by adding up the values of
Value of 3rd row(A/C/D/SP11/C)+
Value of 6th row(F/D/SP11/C)+Value of 7th row(F/D/SP11/C/12/D)
= 66+85+21=172
This is the Query I tried to get the value required but this doesn't work
CALCULATE(Sum(Query1[Value]), FIND("*SP11*",Query1[Value])>0)
The correct measure is this :
Measure:=CALCULATE(sum([value]),filter(Table1,FIND("SP11",Table1[string],1,0)>0))
try this:
CALCULATE(SUM(TABLE[VALUE]), SEARCH("SP11",Table[String],1,0)>0)

How to add description to the "value" of the query attribute

apiary shows me how to add descriptions to the parameter. However, what I need is having descriptions on the value.
For example /users{?skills}. I have my own skill codes for this parameter
'1' means can speak English
'2' means can swim
'3' means can drive
Adding them after the parameter description is way to do it. What if I have tones of skill codes? And the formatting of this approach is ugly. How can i achieve it?
There is currently no way to achieve it by using standard instruments of API Blueprint.
Neither
+ Values
+ `A - means something`
+ `B`
+ `C`
or
+ Values
+ `A` means something
+ `B`
+ `C`
will work correctly. I filed a feature request under API Blueprint's repository. If you want to be part of the design process and help us to get the best solution to your problem, you can track it and comment under it.
Appearance
I understand that rendering of this feature isn't very nice in standard docs.
However, in "beta new docs", it looks much better. Try it out - in settings, turn on following switch:
Then the rendering should look like this (two states):
Using tables
When in troubles with API Blueprint, you can always use plain old Markdown in endpoint's description to supplement or substitute what's missing. E.g. you can freely use tables as an addition or replacement to the Values section:
# My API
## Sample [/endpoint{?id}]
Description.
| Value | Meaning |
| ------------ |:----------------:|
| A | Alaska |
| B | Bali |
| C | Czech Republic |
+ Parameters
+ id (string)
Description...
| Value | Meaning |
| ------------ |:----------------:|
| A | Alaska |
| B | Bali |
| C | Czech Republic |
Description...
+ Values
+ `A`
+ `B`
+ `C`
Rendering of tables in old and new docs:

Resources