I want to write a Query which would give the Sum of the value where the string contains 'SP11' without any break
For Example in the below table I want to add the value of the 3rd, 6th and 7th rows
String | Value
________________|_______
A/B/SP1/ADDS | 12
ss/B/SP2/A | 2
A/C/D/SP11/C | 66
Ass/C/ASD | 46
ACD/SP1/C/V/C | 45
F/D/SP11/C | 85
F/D/SP11/C/12/D | 21
Which would result in something like SP11 = 172 which was derived by adding up the values of
Value of 3rd row(A/C/D/SP11/C)+
Value of 6th row(F/D/SP11/C)+Value of 7th row(F/D/SP11/C/12/D)
= 66+85+21=172
This is the Query I tried to get the value required but this doesn't work
CALCULATE(Sum(Query1[Value]), FIND("*SP11*",Query1[Value])>0)
The correct measure is this :
Measure:=CALCULATE(sum([value]),filter(Table1,FIND("SP11",Table1[string],1,0)>0))
try this:
CALCULATE(SUM(TABLE[VALUE]), SEARCH("SP11",Table[String],1,0)>0)
Related
I did a simple proc freq in SAS:
PROC FREQ DATA=test;
a * b;
RUN;
This raised the error: insufficient page size to print frequency table
From ERROR: Insufficient page size to print frequency table in SAS PROC FREQ I learned that the error is fixed by enlarging the page size:
option pagesize=max;
But then my table still looked strange with super high white spaces in column b:
Frequency |
Percent |
Row Pct | value 1 | value 2 |
Col Pct | | |
| | |
...etc... ...etc...
| | |
----------+----------+----------+
a | 12 | 3 |
What solved my problem was adding a format to the proc freq that truncated variable b.
PROC FREQ DATA=test;
FORMAT B $7.;
a * b;
RUN;
now my result looks like this and I'm happy enough:
Frequency |
Percent |
Row Pct |
Col Pct | value 1 | value 2 |
----------+----------+----------+
a | 12 | 3 |
I'm left a bit bewilderd, because nowhere in the code did I apply a format to b before, just a lenght statement. Other variables that had their lengths fixed did not have this problem. I did switch from an excel sourcefile to oracle-exadata as source. Is it possible that Oracle pushes variable formats to SAS?
SAS has a nasty habit of attaching formats to character variables pulled from external databases, including PROC IMPORT from an EXCEL file. So if a character variable has a storage length of 200 then SAS will also attach the $200. format to the variable.
When you combine two dataets that both contain the same variable the length will be set by the first version of the variable seen. But the format attached will be set by the first non-empty format seen. So you could combine a dataset where A has length $10 and no format attached with another dataset where A has the format $200. attached and the result will a variable with an actual length of 10 but the $200. format attached.
You can use the format statement where you list variable names but no format specification to remove them. You could do it in the PROC step.
PROC FREQ DATA=test;
tables a * b;
format _character_ ;
RUN;
Or do it in a data step or use PROC DATASETS to modify the formats attached to the variable in an existing dataset.
I have a large semi sorted list of Strings sorted by only first character. Each String is accompanied by an ID. The first x entries start with letter A then follow entries starting with letter B and so on. Not all letters are necesarily represented.
By semi sorted I mean that there are exceptions (wrongly sorted entries). It is NOT possible to sort the entries in a correct fashion. Already existent entries have to remain at their ID.
I have crafted the follwing example only including starting letters A and B. The entries starting with C, Z and S have been wrongly entered.
Example:
| ID | NAME |
|------|------|
| 6000 | AXXX |
| 6001 | AXZS |
| 6003 | AAFD |
| 6004 | CSDF |
| 6005 | ZSSF |
| 6006 | ASDF |
| 6007 | BXAS |
| 6010 | BZDS |
| 6011 | SHZF |
| 6012 | BHZT |
I want to add entries to the list. A entry with a Name starting with letter A should be inserted grouped with other entries starting with letter A if possible or otherwise at the very end.
In the above example a entry with a Name starting with letter A should be inserted with ID 6002.
A entry with a Name starting with letter B should be added with ID 6008.
I am not sure how to solve this. My first thoughts are to first iterate over the existing list starting with the lowest ID and to save information on the letter group.
Like:
Letter: A StartID: 6000 EndID: 6006 IsFull:False
Letter: B StartID: 6007 EndID: 6012 IsFull:False
And then when it comes to inserting using the above information for the determination of possible IDs of the new entry. After inserting a new entry this information would have to be updated.
However I am not sure on how to exactly achieve this. All I need is some pseudo code for a possible solution so I can write my own code.
You probably want a few steps
find the position of the insertion group if it exists (what if the first few were B, Z, before A?)
find the last member of the group if it exists, otherwise the last member of the prior group (for example when inserting F)
determine if there's room before in the left index after the last member of the group, and before the first member of the next one
if a position exists for insertion, insert the value, else
find the last position
append the value
Some considerations
you must keep track of and consider several positions, some structure will help you with this
if you have runs ordered A,B,Z,C in the left column, does the block with Z comprise a group? is the block C misplaced? otherwise it seems your values should grow wherever the first new member is
"next" needs to consider multiple characters (presumably ABAA comes after AABB)
I have a data set as below:
2,5
159,5
2,100
2,858
3,100
3,114
3,171
3,858
5,100
858,2
2,2456
4500,2
2456,3
If I choose an element from column 1 such as 2. I need to get the corresponding elements of the chosen element from column 2.
I have used :
awk -F, '$1=="2" {print $2}' Sample.txt
This returns the corresponding column 2 elements of the element 2 which is as below:
5
100
858
2456
I would like the next iteration to perform a check on 5 and return the column 2 elements. In this case, 5 should return 100 but it is already shown by 2 so I don't need 100. The same check for 100 and so forth till 2456. For 2456 it should return 3 which is the corresponding column 2 element and is unique. I would want that iteration to continue the same for 3 and return the unique corresponding column 2 unique elements until there are no column 2 elements to return.
Final output should look like :
5
100
858
2456
3
114
171
# 3 is got as a column 2 element of 2456 and 114,171 are got as column 2 element of 3. Since, 114 and 171 don't have any further unique column 2 elements (Refer the sample data set above). The iteration stops. Can this be recursively achieved as I am able to do it only for the first chosen element.
The command you have can be changed to:
awk -F, '$1==2 {print $2}' Sample.txt >> tmp.txt
5
100
858
2456
I am finding the difference between two columns in a file like
cat "trace-0-dir2.txt" | awk '{print expr $2-$1}' | sort
this gives me values like :
-1.28339e+09
-1.28339e+09
-1.28339e+09
-1.28339e+09
I want to avoid the rounding off and want the exact value.How can this be achieved?
FYI ,trace-0-dir2.txt contains:
1283453524.342134 65337.141749 10 2
1283453524.556784 65337.388047 11 2
1283453524.556794 65337.411165 12 2
1283453524.556806 65337.435947 13 2
1283453524.556811 65337.435989 14 2
1283453524.556816 65337.453931 15 2
1283453524.771522 65337.484866 16 2
printf function can help get you the formatting you need. You don't need expr and you don't need cat. awk can do any calculation and you can invoke awk directly on the file.
You can alter the 20.20 to any number based on the format you are looking for.
[jaypal:~/Temp] cat file0
1283453524.342134 65337.141749 10 2
1283453524.556784 65337.388047 11 2
1283453524.556794 65337.411165 12 2
1283453524.556806 65337.435947 13 2
1283453524.556811 65337.435989 14 2
1283453524.556816 65337.453931 15 2
1283453524.771522 65337.484866 16 2
[jaypal:~/Temp] awk '{ printf("%20.20f\n", $2-$1)}' file0
-1283388187.20038509368896484375
-1283388187.16873693466186523438
-1283388187.14562892913818359375
-1283388187.12085914611816406250
-1283388187.12082219123840332031
-1283388187.10288500785827636719
-1283388187.28665614128112792969
From the man page:
Field Width:
An optional digit string specifying a field width; if the output string has fewer characters than the field width it will be blank-padded on the left (or right, if the left-adjustment indicator has been given) to make up the field width (note that a leading zero is a flag, but an embedded zero is part of a field width);
Precision:
An optional period, `.', followed by an optional digit string giving a precision which specifies the number of digits to appear after the decimal point, for e and f formats, or the maximum number of characters to be printed from a string; if the digit string is missing, the precision is treated as zero;
I have a huge data set and I want to extract the rows which do not have certain keywords.
For example, let says I have the following data set (two columns):
+--------------+------------------+
| Nylon | Nylon wire |
| Cable | 5mm metal cable |
| Epoxy | some comment |
| Polyester | some comment |
+--------------+------------------+
I want to find the rows which do not contain the keywords Nylon and Epoxy (and other keywords for that matter) and put those rows in another place (i.e. sheet).
Thanks in advance!
Sub a()
With Worksheets(1)
j = 1
For i = 1 To .UsedRange.Rows.Count
If .Rows(i).Find(what:="Nylon") Is Nothing And .Rows(i).Find(what:="Epoxy") Is Nothing Then
.Rows(i).Copy Destination:=Worksheets(2).Rows(j)
j = j + 1
End If
Next i
End With
End Sub
A | B | C
-------------------- ------------------- --------
1 Search Term -> | nylon |
2 Name | Description | Found
3 Nylon | Nylon Wire | TRUE
4 Cable | 5 mm metal cable | FALSE
5 Exoxy | some comment | FALSE
6 Polyester | some comment | FALSE
In the above example, I would create an AutoFilter on A2:C6 with the first row being my headers. In each cell in C3:C6 I would have a formula akin to (this is from C3):
=OR(NOT(ISERROR(SEARCH($B$1,A3))),NOT(ISERROR(SEARCH($B$1,B3))))
Now, you can use the AutoFilter tools to filter for those where Found is true.
I'll show how you can check if one string is within some other columns, returning a boolean. Then, you'll need to decide how to handle the positive cases. I believe you'll use a VLOOKUP or something like this.
Please, replace ; by ,. I'm not using English regional settings ATM.
You can combine FIND and ISERROR function to find your result. ISERROR returns a boolean, and you can combine several column checks as much as you want.
Example:
Let's say you have the test keywords in cells C1 and D1, and the range you provided above starts at A2.
Now, we can add into C2 a testing to check if the string Nylon exists within A2, that is =ISERROR(FIND(C1;$A$2)). We also need to check if the string Nylon exists in B2, then we add the second condition: AND(ISERROR(FIND(C1;$A$2));ISERROR(FIND(C1;$B$2)))
As we're testing if the FIND function returned error or not, it means that our function will return false when the string has been found. To be easier to understand, I believe that's better to add a NOT condition in our formula, then in case the string in C1 appears in A2 or B2, our function will return TRUE:
=NOT(AND(ISERROR(FIND(C1;$A$2));ISERROR(FIND(C1;$B$2))))
Then, we copy this formula one cell to the right to test against D1 value, Epoxy.
Now, that's the result structure:
Nylon Epoxy
Nylon | Nylon wire | TRUE | FALSE
Cable | 5mm metal cable | FALSE | FALSE
Epoxy | some comment | FALSE | TRUE
Polyester | some comment | FALSE | FALSE