Finding close numbers in columns of data - sorting

I have 3 columns of data (hours of the day)
C1 C2 C3
01 05 00
05 09 06
11 11 10
16 17 14
20 22 18
I need to be able to separate this into an n by 3 matrix where the three numbers on each row are +/-2 hours away from each other. (The range of each row must be <=4)
Each value in each column can only be used once, so if there are more than one combination that uses the same number then one of the combinations is ignored.
So the final result would be:
05 05 06 (Taken from the 2nd in C1, 1st in C2 and 2nd in C3)
11 09 10 (Taken from the 3rd in C1, 2nd in C2 and 3rd in C3)
16 17 18 (Taken from the 4th in C1, 4th in C2 and 5th in C3)
The data in each column must remain in that same column in the final matrix, for example the 16 found in C1 needs to be in the first column of the final matrix.
I'm really struggling to find a way to put this into code, can you help?

I managed to get this to almost work in MATLAB, I dont think its particularly efficient or clever but it does the job.
It relies quite heavily however on the value in the first column, treating it as a midpoint.
This means that the allowed points in columns 2 and 3 must be either +/-2.
The situation where the value in C1 is a lower limit (eg. 30) and the other 2 values are up to 4 greater than the one found in C1 is classed as invalid, although it is still technically a solution.
x = [1,5,0; 5,9,6; 11,11,10; 16,17,14; 20,22,18];
d=size(x); %Get the size in of x in the form [row,col]
rows=d(1); %Number Of Rows
cols=d(2); %Number Of Columns
clear d;
y = nan(rows,cols); %nan Matrix the same size as x used for the output
a = ones(1,cols); %Keep track of the current index in question in each column
c = 0; %Number of "matches" or rows that are valid in the output matrix
time = zeros(1,cols); %Keep track of the current values in each column
while(max(a)<rows+1) %For every row check that no index is invalid
time(1)=x(a(1),1); %Get the value in column1
b = 2; %column counter
skip=0; %Increment the counter for column 1 if this is true
while(b<cols+1&&~skip&&max(a)<rows+1) %For columns 2->cols, if we don't need to skip the value in column 1 and all the indexes are valid.
time(b)=x(a(b),b); %Get the value in column b at row a(b)
delta = time(b)-time(1); %work out the difference in value from the first column value that is selected
if(delta>2)
%Skip first column by 1
a(1)=a(1)+1; %Increment the counter for column 1
skip=1; %Return back to the first while loop
elseif(delta<-2)
%Skip b'th column by 1
a(b)=a(b)+1; %Increment the counter for column b
else
%Its valid
if(b==cols) %If at the last column and its valid
c=c+1; %Increment the match counter
y(c,:)=time(1:cols); %Set the c'th row of the output to what we've found
a=a+1; %Move onto next number in column 1
skip=1; %Start all over
else %Not at last column yet
b=b+1;
end
end
end
end
Final Result:
05 05 06
11 09 10
16 17 14
20 22 18
nan nan nan

It seems although you are talking of having three columns of values, their row and column has no meaning. Effectively you have just a list of numbers: 01, 05, 00, 05, 09, 06, 11, 11, 10, 16, 17, 14, 20, 22, 18.
Then you order them: 00, 01, 05, 05, 06, 09, 10, 11, 11, 14, 16, 17, 18, 20, 22
Then you take three and look at their distances: 00, 01, 05 = bad, for 01 and 05 are too far apart.
Next number. 01, 05, 05? No. Next number. 05, 05, 06? Yes. Continue after them. 09, 10, 11? Yes. Continue after them. 11, 14, 16? No. Next number. 14, 16, 17? Yes. You found a solution:
05, 05, 06
09, 10, 11
14, 16, 17

Related

How to verify ISBN and calculate checksum digit in COBOL?

My problem is I get a different out put from what I am supposed to get:
look very below for the output wanted. here is the output I get:
978-1734314502 (correct and valid)
978-1734314509 (incorrect, contains a non-digit)
978-1788399081 (correct and valid)
978-1788399083 (incorrect, contains a non-digit)
Here is what's the question is asking me to do:
do a modern Cobol program to perform ISBN validation of a series of 10-digit ISBNs
stored in a user-inputted file.
Include three “subprograms” in the form of paragraphs:
2.1. readISBNnum - Prompts the user for the name of an ASCII file containing the list of ISBN
numbers. Reads the values of the ISBN numbers and processes them. If the file does not
exist, the program should produce an error message and re-prompt for the filename.
2.2. isValidate - Checks the validity of the ISBN, i.e. whether or not it contains
characters it shouldn’t. Responses should include an indication of whether a number
contains erroneous characters.
2.3. checkSUM - Extracts the individual digits, and calculates the checksum digit.
Produce an output for each ISBN number in the file, identifying whether it is valid or not
Here is what I have done so far:
IDENTIFICATION DIVISION.
PROGRAM-ID. testSubs.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
REPOSITORY.
FUNCTION ALL INTRINSIC
FUNCTION validISBN13.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
DATA DIVISION.
FILE SECTION.
WORKING-STORAGE SECTION.
01 IX PIC S9(4) COMP.
01 TEST-ISBNS.
02 FILLER PIC X(14) VALUE '978-1734314502'.
02 FILLER PIC X(14) VALUE '978-1734314509'.
02 FILLER PIC X(14) VALUE '978-1788399081'.
02 FILLER PIC X(14) VALUE '978-1788399083'.
01 TEST-ISBN REDEFINES TEST-ISBNS
OCCURS 4 TIMES
PIC X(14).
PROCEDURE DIVISION.
MAIN-PROCEDURE.
PERFORM
VARYING IX
FROM 1
BY 1
UNTIL IX > 4
DISPLAY TEST-ISBN (IX) ' ' WITH NO ADVANCING
END-DISPLAY
IF validISBN13(TEST-ISBN (IX)) = -1
DISPLAY '(incorrect, contains a non-digit)'
ELSE
DISPLAY '(correct and valid)'
END-IF
END-PERFORM.
GOBACK.
END PROGRAM testSubs.
IDENTIFICATION DIVISION.
FUNCTION-ID. validISBN13.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
REPOSITORY.
FUNCTION ALL INTRINSIC.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
DATA DIVISION.
FILE SECTION.
WORKING-STORAGE SECTION.
01 PASSED-SIZE PIC S9(6) COMP-5.
01 IX PIC S9(4) COMP.
01 WORK-FIELDS.
02 WF-DIGIT PIC X.
02 WF-COUNT PIC 9(2).
88 WEIGHT-1 VALUE 1, 3, 5, 7, 9, 11, 13.
88 WEIGHT-3 VALUE 2, 4, 6, 8, 10, 12.
02 WF-SUM PIC S9(8) COMP.
LINKAGE SECTION.
01 PASSED-ISBN PIC X ANY LENGTH.
01 RETURN-VALUE PIC S9.
PROCEDURE DIVISION USING PASSED-ISBN
RETURNING RETURN-VALUE.
CALL 'C$PARAMSIZE'
USING 1
GIVING PASSED-SIZE
END-CALL.
COMPUTE-CKDIGIT.
INITIALIZE WORK-FIELDS.
PERFORM
VARYING IX
FROM 1
BY 1
UNTIL IX GREATER THAN PASSED-SIZE
MOVE PASSED-ISBN (IX:1) TO WF-DIGIT
IF WF-DIGIT IS NUMERIC
ADD 1 TO WF-COUNT
IF WEIGHT-1
ADD NUMVAL(WF-DIGIT) TO WF-SUM
ELSE
COMPUTE WF-SUM = WF-SUM +
(NUMVAL(WF-DIGIT) * 3)
END-COMPUTE
END-IF
END-IF
END-PERFORM.
IF MOD(WF-SUM, 10) = 0
MOVE +0 TO RETURN-VALUE
ELSE
MOVE -1 TO RETURN-VALUE
END-IF.
GOBACK.
===================================================================================
IDENTIFICATION DIVISION.
PROGRAM-ID. sedol.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT sedol-file ASSIGN "sedol.txt"
ORGANIZATION LINE SEQUENTIAL
FILE STATUS sedol-file-status.
DATA DIVISION.
FILE SECTION.
FD sedol-file.
01 sedol PIC X(6).
WORKING-STORAGE SECTION.
01 sedol-file-status PIC XX.
88 sedol-file-ok VALUE "00".
01 digit-num PIC 9 COMP.
01 digit-weights-area VALUE "1317391".
03 digit-weights PIC 9 OCCURS 7 TIMES.
01 weighted-sum-parts-area.
03 weighted-sum-parts PIC 9(3) COMP OCCURS 6 TIMES.
01 weighted-sum PIC 9(3) COMP.
01 check-digit PIC 9.
PROCEDURE DIVISION.
OPEN INPUT sedol-file
PERFORM UNTIL NOT sedol-file-ok
READ sedol-file
AT END
EXIT PERFORM
END-READ
MOVE FUNCTION UPPER-CASE(sedol) TO sedol
PERFORM VARYING digit-num FROM 1 BY 1 UNTIL digit-num > 6
EVALUATE TRUE
WHEN sedol (digit-num:1) IS ALPHABETIC-UPPER
IF sedol (digit-num:1) = "A" OR "E" OR "I" OR "O" OR "U"
DISPLAY "Invalid SEDOL: " sedol
EXIT PERFORM CYCLE
END-IF
COMPUTE weighted-sum-parts (digit-num) =
(FUNCTION ORD(sedol (digit-num:1)) - FUNCTION ORD("A")
+ 10) * digit-weights (digit-num)
WHEN sedol (digit-num:1) IS NUMERIC
MULTIPLY FUNCTION NUMVAL(sedol (digit-num:1))
BY digit-weights (digit-num)
GIVING weighted-sum-parts (digit-num)
WHEN OTHER
DISPLAY "Invalid SEDOL: " sedol
EXIT PERFORM CYCLE
END-EVALUATE
END-PERFORM
INITIALIZE weighted-sum
PERFORM VARYING digit-num FROM 1 BY 1 UNTIL digit-num > 6
ADD weighted-sum-parts (digit-num) TO weighted-sum
END-PERFORM
COMPUTE check-digit =
FUNCTION MOD(10 - FUNCTION MOD(weighted-sum, 10), 10)
DISPLAY sedol check-digit
END-PERFORM
CLOSE sedol-file
.
END PROGRAM sedol.
However, I should get the output to look like this:
1856266532 correct and valid
0864500572 correct and valid with leading zero
0201314525 correct and valid with leading zero
159486781X correct and valid with trailing uppercase X
159486781x correct and valid with trailing lowercase X
0743287290 correct and valid with leading and training zero
081185213X correct and valid with leading zero, trailing X
1B56266532 incorrect, contains a non-digit
159486781Z incorrect, contains a non-digit/X in check digit
1856266537 correct, but not valid (invalid check digit)
Based on the assignment given in the post, the following code implements 2.2 isValidate and 2.3 checkSUM procedures for 10-digit ISBNs and 3. identifies each sample ISBN as valid or not.
Code:
data division.
working-storage section.
01 isbn-table.
03 isbn-test-numbers.
05 pic x(10) value "1856266532".
05 pic x(10) value "0864500572".
05 pic x(10) value "0201314525".
05 pic x(10) value "159486781X".
05 pic x(10) value "159486781x".
05 pic x(10) value "0743287290".
05 pic x(10) value "081185213X".
05 pic x(10) value "1B56266532".
05 pic x(10) value "159486781Z".
05 pic x(10) value "1856266537".
03 isbn-10-number redefines isbn-test-numbers
pic x(10) occurs 10 indexed isbn-idx.
01 isbn-work.
03 isbn-digit pic 9 occurs 9.
03 pic x.
01 check-digit.
03 check-digit-9 pic 9.
01 digit-position comp pic 9(4).
01 digit-weight comp pic 9(4).
01 weighted-sum comp pic 9(4).
01 validation-flags.
88 no-messages value all "0".
03 pic 9.
88 invalid-checksum value 1.
03 pic 9.
88 invalid-content value 1.
01 isbn-message pic x(50).
procedure division.
perform varying isbn-idx from 1 by 1
until isbn-idx > 10
move isbn-10-number (isbn-idx) to isbn-work
perform isValidate
display isbn-message
end-perform
stop run
.
isValidate.
set no-messages to true
if isbn-work (1:9) is numeric
perform checkSUM
if function upper-case (isbn-work (10:1))
not equal check-digit
set invalid-checksum to true
end-if
else
set invalid-content to true
end-if
move spaces to isbn-message
evaluate true
when invalid-checksum
string isbn-work " invalid checksum"
delimited size into isbn-message
when invalid-content
string isbn-work " invalid content"
delimited size into isbn-message
when other
string isbn-work " valid ISBN"
delimited size into isbn-message
end-evaluate
.
checkSUM.
move 0 to weighted-sum
perform varying digit-position from 1 by 1
until digit-position > 9
compute digit-weight = (11 - digit-position)
compute weighted-sum = weighted-sum
+ (isbn-digit (digit-position) * digit-weight)
end-perform
compute weighted-sum = 11 - function mod (weighted-sum 11)
compute weighted-sum = function mod (weighted-sum 11)
if weighted-sum = 10
move "X" to check-digit
else
move weighted-sum to check-digit-9
end-if
.
Output:
1856266532 valid ISBN
0864500572 valid ISBN
0201314525 valid ISBN
159486781X valid ISBN
159486781x valid ISBN
0743287290 valid ISBN
081185213X valid ISBN
1B56266532 invalid content
159486781Z invalid checksum
1856266537 invalid checksum

How to replace entries with smaller values while keeping order?

What is an efficient algorithm to replace the values in an image while
minimizing the largest value and maintaining order?
Background
I have a 8.5Gb image which is represented as a rows and columns.
Suppose we have a smaller version (there are no duplicates in input):
4, 5, 9,
2, 3, 7,
8, 6, 1
I need to replace the entries at each pixel to the smallest positive value possible (greater than zero) in the entire matrix
while preserving the row-wise and column-wise ordering.
One possible output (duplicates allowed here) is the following and the maximum value is 5 ( I do not believe we can reduce it to 4):
2, 3, 4,
1, 2, 3,
5, 4, 1
The reason it works:
Input: First Row: 4 < 5 < 9 and first Column: 4 > 2 < 8
Output: First Row: 2 < 3 < 4 and First Column 2 > 1 < 5 (column)
The orderings are being maintained. The same for the other rows and columns:
5 > 3 < 6 <=> 3 > 2 < 4
...
...
----------------------------------------- Attempt: My wrong algorithm -----------------------------------------
1. Each row and column will contain unique elements. So start with the first row and assign integers from the range {1, total the number of rows}:
1 2 3
x x x
x x x
The maximum in that row is currently at 3.
2. Go to the next row which is 2,3,7 and again assign numbers in the range {1, total number of rows}. When we assign 1 we look at all the previous rows if there are conflicts. In this case 1 is already present in the previous row. And we need a number which is smaller than 1. So place a zero there (I will offset every entries by on later).
1 2 3
0 1 2
* * *
The maximum in that row is currently 2.
3. Go to the next row and again fill as above. But 1 already occurred before and we need a number larger than the first and second rows:
So, try 2. The next number needs to be larger than 2 and 1 (column) and smaller than 2 (row). That is a huge problem. I need to change too many cells each time.
For severe clarity, I'll add 10 to each of your values.
Input Ordering
14 15 19 - - -
12 13 17 - - -
18 16 11 - - -
Consider each of the values in order, smallest to largest. Each element receives an ordering value that is the smallest integer available at that location. "Available" means that the assigned number is larger than any in the same row or column.
11 and 12 aren't in the same row or column, so we can assign both of those immediately.
Input Ordering
14 15 19 - - -
12 13 17 1 - -
18 16 11 - - 1
When we consider 13, we see that it is in the same row with a 1, so it must have the next larger value:
Input Ordering
14 15 19 - - -
12 13 17 1 2 -
18 16 11 - - 1
14 has the same problem, being above a 1:
Input Ordering
14 15 19 2 - -
12 13 17 1 2 -
18 16 11 - - 1
Continue this process for each number. Take the maximum of the orderings in that number's row and column. Add 1 and assign that ordering.
Input Ordering
14 15 19 2 3 -
12 13 17 1 2 -
18 16 11 - 4 1
Input Ordering
14 15 19 2 3 4
12 13 17 1 2 3
18 16 11 5 4 1
There's a solution. The "dominance" path 18 > 16 > 15 > [14 or 13] > 12 demonstrates that 5 is the lowest max value.
You can also solve this by converting the locations to a directed graph. Nodes in the same row or column have an edge connecting them; the edge is directed from the smaller to the larger. It will be sufficient to order the values and merely connect the adjacent values: given 14->15 and 15->19, we don't need 14->19 as well.
Add a node 0 with label 0 and an edge to each node that has no other input edges.
Now follow a typical labeling iteration: any node with all its inputs labeled receives a label that is one more than the largest of its inputs.
This is the same algorithm as the above, but the correctness and minimalism are much easier to see.
14 -> 15 -> 19
12 -> 13 -> 17
11 -> 16 -> 18
12 -> 14 -> 18
13 -> 15 -> 16
11 -> 17 -> 19
0 -> 11
0 -> 12
Now, if we shake out the topology of this, starting on the left, we get:
0 11 13 17
12 14 15 16 18
19
This makes the numbering obvious: each node is labeled with the length of its longest path from the start node.
Your memory problem should be edited into your question proposal, or given as a new question. You have non-trivial dependencies along rows and columns. If your data do not fit into memory, then you may want to make a disk-hosted data base to store your pre-processed data. For instance, you could store the graph as a list of edges keyed by dependencies:
11 none
12 none
13 12
14 12
15 13, 14
16 11, 15
17 11, 13
18 14, 16
19 15, 17
You haven't described the shape of your data. At the very worst, you should be able to build this graph data base with one pass to do the rows, and then one pass per column -- or multiple columns in each pass, depending on how many you can fit into memory at once.
Then you can apply the algorithm to the items int he data base. You can speed it up if you keep in memory, not only all nodes with no dependencies, but another list with few dependencies -- "few" being dependent on your memory availability.
For instance, make one pass over the data base to grab every cell with 0 or 1 dependencies. Put the independent nodes in your "active" list; as you process those, add nodes only from the "1-dependency" list as they're freed up. Once you've exhausted those sub-graphs, then make a large pass to (1) update the data base; (2) extract the next sets of nodes with 0 or 1 dependency.
Let's look at this with the example you gave. First, we make a couple of lists from the original graph:
0-dep 11, 12
1-dep 13 (on 12), 14 (on 12)
This pass is trivial: we assign 1 to cells 11 and 12; 2 to cells 13 and 14. Now update the graph:
node dep done (assigned values)
15 none 2, 2
16 15 1
17 none 1, 2
18 16 2
19 15, 17
Refresh the in-memory lists:
0-dep 15, 17
1-dep 16 (on 15), 18 (on 16)
On this pass, both 15 and 17 depend on a node with value 2, so they are both assigned 3. Resolving 15 frees node 16, which gets value 4. This, in turn, frees up node 18, which gets the value 5.
In one final pass, we now have node 19 with no outstanding dependencies. it's maximum upstream value is 3, so it gets the value 4.
In the worst case -- you can't even hold all independent nodes in memory at once -- you can still grab as many as you can fit, assign their values in an in-memory pass, and return to the disk for more to process.
Can you handle the data manipulations from here?

Finding cummulative sum of MAX values

I need to calculate the cumulative sum of Max value per period (or per category). See the embedded image.
So, first, I need to find max value for each category/month per year. Then I want to calculate the cumulative SUM of these max values. I tried it by setting up max measure (which works fine for the first step - finding max per category/month for a given year) but then I fail at finding a solution to finding cumulative SUM (finding the cumulative Max is easy, but it is not what I'm looking for).
Table1
Year Month MonthlyValue MaxPerYear
2016 Jan 10 15
2016 Feb 15 15
2016 Mar 12 15
2017 Jan 22 22
2017 Feb 19 22
2017 Mar 12 22
2018 Jan 5 17
2018 Feb 16 17
2018 Mar 17 17
Desired Output
Year CumSum
2016 15
2017 37
2018 54
This is a bit similar to this question and this question and this question as far as subtotaling, but also includes a cumulative component as well.
You can do this in two steps. First, calculate a table that gives the max for each year and then use a cumulative total pattern.
CumSum =
VAR Summary =
SUMMARIZE(
ALLSELECTED(Table1),
Table1[Year],
"Max",
MAX(Table1[MonthlyValue])
)
RETURN
SUMX(
FILTER(
Summary,
Table1[Year] <= MAX(Table1[Year])
),
[Max]
)
Here's the output:
If you expand to the month level, then it looks like this:
Note that if you only need the subtotal to work leaving each row as a max (15, 22, 17, 54) rather than as a cumulative sum of maxes (15, 37, 54, 54), then you can use a simpler approach:
MaxSum =
SUMX(
VALUES( Table1[Year] ),
CALCULATE( MAX( Table1[MonthlyValue] ) )
)
This calculates the max for each year separately and then adds them together.
External References:
Subtotals and Grand Totals That Add Up “Correctly”
Cumulative Total - DAX Patterns

Print the row and column wise sorted 2 D matrix in a sorted order

Given an n x n matrix, where every row and column is sorted in non-decreasing order. Print all elements of matrix in sorted order.
Example:
Input:
mat[][] = { {10, 20, 30, 40},
{15, 25, 35, 45},
{27, 29, 37, 48},
{32, 33, 39, 50},
};
Output:
(Elements of matrix in sorted order)
10 15 20 25 27 29 30 32 33 35 37 39 40 45 48 50
I am unable to figure out how to do this.But according to me we can put the 2 D matrix in one matrix and apply the sort function.But i am in a need of space optimized code.
Using a Heap would be a good idea here.
Please refer to the following for a very similar question:
http://www.geeksforgeeks.org/kth-smallest-element-in-a-row-wise-and-column-wise-sorted-2d-array-set-1/
Thought the problem in the link above is different, the same approach could be used for the problem you specify. Instead of looping k times as the link explains, you need to visit all elements in the matrix i.e you should loop till the heap is empty.

Summing by Column

Suppose we have the following columns:
X Y Z
Category Date Amount
A January 10
A February 20
A March 30
B January 34
B February 45
B March 65
C January 87
C February 98
C March 100
D January 80
D February 90
I want to sum the Amount column by Category and Date . So for Category A, we would have the sum of the amount be 10+20+30 = 60 for the dates between January and March. In Oracle BI, how would we do this? Note that Some categories might have missing dates. So I want to sum the Amounts for the only the the available dates between January and March. Category D, for example, has March missing. So the total amount would be 80+90 = 170.
When I do the following, I just get the sum of all the amounts:
sum("Z"."Amount")
If the required result has to be achieved through OBIEE Answer, then it can be done in following way.
Create a table with columns - Category, Date, Amount.
Go to Results tab. Edit view of the table.
Click on Total By icon above Category column. Both After and Report-Based Total (when applicable) should be ticked.
The result will be coming as shown.
Category Date Amount
A January 10
February 20
March 30
A Total 60
B January 34
February 45
March 65
B Total 144
C January 87
February 98
March 100
C Total 285
D January 80
February 90
D Total 170
You can do this quite simply by editing the column formula from within the Criteria. When you look at it to begin, your Amount column formula probably looks something like "Z"."Amount". You can edit this slightly to change the aggregation level:
sum("Z"."Amount" by "X"."Category")
That should give you something like:
Category Date Amount
A Jan 60
A Feb 60
A Mar 60
B Jan 144
B Feb 144
B Mar 144

Resources