How to insert into semi sorted List? - algorithm

I have a large semi sorted list of Strings sorted by only first character. Each String is accompanied by an ID. The first x entries start with letter A then follow entries starting with letter B and so on. Not all letters are necesarily represented.
By semi sorted I mean that there are exceptions (wrongly sorted entries). It is NOT possible to sort the entries in a correct fashion. Already existent entries have to remain at their ID.
I have crafted the follwing example only including starting letters A and B. The entries starting with C, Z and S have been wrongly entered.
Example:
| ID | NAME |
|------|------|
| 6000 | AXXX |
| 6001 | AXZS |
| 6003 | AAFD |
| 6004 | CSDF |
| 6005 | ZSSF |
| 6006 | ASDF |
| 6007 | BXAS |
| 6010 | BZDS |
| 6011 | SHZF |
| 6012 | BHZT |
I want to add entries to the list. A entry with a Name starting with letter A should be inserted grouped with other entries starting with letter A if possible or otherwise at the very end.
In the above example a entry with a Name starting with letter A should be inserted with ID 6002.
A entry with a Name starting with letter B should be added with ID 6008.
I am not sure how to solve this. My first thoughts are to first iterate over the existing list starting with the lowest ID and to save information on the letter group.
Like:
Letter: A StartID: 6000 EndID: 6006 IsFull:False
Letter: B StartID: 6007 EndID: 6012 IsFull:False
And then when it comes to inserting using the above information for the determination of possible IDs of the new entry. After inserting a new entry this information would have to be updated.
However I am not sure on how to exactly achieve this. All I need is some pseudo code for a possible solution so I can write my own code.

You probably want a few steps
find the position of the insertion group if it exists (what if the first few were B, Z, before A?)
find the last member of the group if it exists, otherwise the last member of the prior group (for example when inserting F)
determine if there's room before in the left index after the last member of the group, and before the first member of the next one
if a position exists for insertion, insert the value, else
find the last position
append the value
Some considerations
you must keep track of and consider several positions, some structure will help you with this
if you have runs ordered A,B,Z,C in the left column, does the block with Z comprise a group? is the block C misplaced? otherwise it seems your values should grow wherever the first new member is
"next" needs to consider multiple characters (presumably ABAA comes after AABB)

Related

UiPath how to get Type of generic Value

Background Story
I have an excel table of values with thousands seperator . and floating point seperator ,.
If the number is lower than 1000, therefore only the , exists. In UiPath, I'm using a read range and store the data in a data table. Somehow, Uipath manages to replace the , by a . because it interprets the value as float. But this only happens to values lower than 1000. Larger numbers are interpreted as string and all the seperators stay the same.
Example:
+───────────+───────────────+─────────+
| Input | UiPath Value | Type |
+───────────+───────────────+─────────+
| 4.381,14 | 4.381,14 | String |
| 5.677,50 | 5.677,50 | String |
| 605,27 | 605.27 | Double |
+───────────+───────────────+─────────+
Problem
I want to loop through the data table and apply some logic to each value. Because of the different data types, I assign the value to a generic value variable. It is a huge problem that the , is automatically replaced by a ., because in my context, this is a completely different value. Therefore I somehow need to check the data type, so i can replace the seperator again.
Attempt
I'm trying to get the type by GetType().ToString(), but it only delivers me: UiPath.Core.GenericValue
I tried to replicate it. And I have successfully converted to double using the following steps. I have taken one value and followed the below steps.
strValue = dt(0)(0).ToString.Replace(".","$")
strValue = strValue.Replace(",",".")
strValue = strValue.Replace("$",",")
dblValue = CDbl(strValue)
In UiPath, when we read data from Excel, it will be treating the cell values as generic objects. So, we explicitly convert it to String.

PROC FREQ in SAS gives too wide columns

I did a simple proc freq in SAS:
PROC FREQ DATA=test;
a * b;
RUN;
This raised the error: insufficient page size to print frequency table
From ERROR: Insufficient page size to print frequency table in SAS PROC FREQ I learned that the error is fixed by enlarging the page size:
option pagesize=max;
But then my table still looked strange with super high white spaces in column b:
Frequency |
Percent |
Row Pct | value 1 | value 2 |
Col Pct | | |
| | |
...etc... ...etc...
| | |
----------+----------+----------+
a | 12 | 3 |
What solved my problem was adding a format to the proc freq that truncated variable b.
PROC FREQ DATA=test;
FORMAT B $7.;
a * b;
RUN;
now my result looks like this and I'm happy enough:
Frequency |
Percent |
Row Pct |
Col Pct | value 1 | value 2 |
----------+----------+----------+
a | 12 | 3 |
I'm left a bit bewilderd, because nowhere in the code did I apply a format to b before, just a lenght statement. Other variables that had their lengths fixed did not have this problem. I did switch from an excel sourcefile to oracle-exadata as source. Is it possible that Oracle pushes variable formats to SAS?
SAS has a nasty habit of attaching formats to character variables pulled from external databases, including PROC IMPORT from an EXCEL file. So if a character variable has a storage length of 200 then SAS will also attach the $200. format to the variable.
When you combine two dataets that both contain the same variable the length will be set by the first version of the variable seen. But the format attached will be set by the first non-empty format seen. So you could combine a dataset where A has length $10 and no format attached with another dataset where A has the format $200. attached and the result will a variable with an actual length of 10 but the $200. format attached.
You can use the format statement where you list variable names but no format specification to remove them. You could do it in the PROC step.
PROC FREQ DATA=test;
tables a * b;
format _character_ ;
RUN;
Or do it in a data step or use PROC DATASETS to modify the formats attached to the variable in an existing dataset.

Substituting string labels by integer IDs and back

My data files contain lines with the first entity being a string label followed by features. For example:
MEMO |f write down this note
CALL |f call jim's cell
The problem is that Vowpal Wabbit accepts only integer labels. How can I quickly change from string labels to unique integer IDs and back? That is quickly modify the data file to:
1 |f write down this note
2 |f call jim's cell
... and back when needed.
For my sample dataset I did it manually for each class using ``sed'', but this breaks seriously my workflow.
cat input.data | perl -nale '$i=$m{$F[0]}; $i or $i=$m{$F[0]}=++$n; $F[0]=$i; print "#F"; END{warn "$_ $m{$_}\n" for sort {$m{$a}<=>$m{$b}} keys %m}' > output.data 2> mapping.txt

How to add description to the "value" of the query attribute

apiary shows me how to add descriptions to the parameter. However, what I need is having descriptions on the value.
For example /users{?skills}. I have my own skill codes for this parameter
'1' means can speak English
'2' means can swim
'3' means can drive
Adding them after the parameter description is way to do it. What if I have tones of skill codes? And the formatting of this approach is ugly. How can i achieve it?
There is currently no way to achieve it by using standard instruments of API Blueprint.
Neither
+ Values
+ `A - means something`
+ `B`
+ `C`
or
+ Values
+ `A` means something
+ `B`
+ `C`
will work correctly. I filed a feature request under API Blueprint's repository. If you want to be part of the design process and help us to get the best solution to your problem, you can track it and comment under it.
Appearance
I understand that rendering of this feature isn't very nice in standard docs.
However, in "beta new docs", it looks much better. Try it out - in settings, turn on following switch:
Then the rendering should look like this (two states):
Using tables
When in troubles with API Blueprint, you can always use plain old Markdown in endpoint's description to supplement or substitute what's missing. E.g. you can freely use tables as an addition or replacement to the Values section:
# My API
## Sample [/endpoint{?id}]
Description.
| Value | Meaning |
| ------------ |:----------------:|
| A | Alaska |
| B | Bali |
| C | Czech Republic |
+ Parameters
+ id (string)
Description...
| Value | Meaning |
| ------------ |:----------------:|
| A | Alaska |
| B | Bali |
| C | Czech Republic |
Description...
+ Values
+ `A`
+ `B`
+ `C`
Rendering of tables in old and new docs:

Excel Formula for extracting specific rows

I have a huge data set and I want to extract the rows which do not have certain keywords.
For example, let says I have the following data set (two columns):
+--------------+------------------+
| Nylon | Nylon wire |
| Cable | 5mm metal cable |
| Epoxy | some comment |
| Polyester | some comment |
+--------------+------------------+
I want to find the rows which do not contain the keywords Nylon and Epoxy (and other keywords for that matter) and put those rows in another place (i.e. sheet).
Thanks in advance!
Sub a()
With Worksheets(1)
j = 1
For i = 1 To .UsedRange.Rows.Count
If .Rows(i).Find(what:="Nylon") Is Nothing And .Rows(i).Find(what:="Epoxy") Is Nothing Then
.Rows(i).Copy Destination:=Worksheets(2).Rows(j)
j = j + 1
End If
Next i
End With
End Sub
A | B | C
-------------------- ------------------- --------
1 Search Term -> | nylon |
2 Name | Description | Found
3 Nylon | Nylon Wire | TRUE
4 Cable | 5 mm metal cable | FALSE
5 Exoxy | some comment | FALSE
6 Polyester | some comment | FALSE
In the above example, I would create an AutoFilter on A2:C6 with the first row being my headers. In each cell in C3:C6 I would have a formula akin to (this is from C3):
=OR(NOT(ISERROR(SEARCH($B$1,A3))),NOT(ISERROR(SEARCH($B$1,B3))))
Now, you can use the AutoFilter tools to filter for those where Found is true.
I'll show how you can check if one string is within some other columns, returning a boolean. Then, you'll need to decide how to handle the positive cases. I believe you'll use a VLOOKUP or something like this.
Please, replace ; by ,. I'm not using English regional settings ATM.
You can combine FIND and ISERROR function to find your result. ISERROR returns a boolean, and you can combine several column checks as much as you want.
Example:
Let's say you have the test keywords in cells C1 and D1, and the range you provided above starts at A2.
Now, we can add into C2 a testing to check if the string Nylon exists within A2, that is =ISERROR(FIND(C1;$A$2)). We also need to check if the string Nylon exists in B2, then we add the second condition: AND(ISERROR(FIND(C1;$A$2));ISERROR(FIND(C1;$B$2)))
As we're testing if the FIND function returned error or not, it means that our function will return false when the string has been found. To be easier to understand, I believe that's better to add a NOT condition in our formula, then in case the string in C1 appears in A2 or B2, our function will return TRUE:
=NOT(AND(ISERROR(FIND(C1;$A$2));ISERROR(FIND(C1;$B$2))))
Then, we copy this formula one cell to the right to test against D1 value, Epoxy.
Now, that's the result structure:
Nylon Epoxy
Nylon | Nylon wire | TRUE | FALSE
Cable | 5mm metal cable | FALSE | FALSE
Epoxy | some comment | FALSE | TRUE
Polyester | some comment | FALSE | FALSE

Resources