Exact orthogonalization of vectors in Wolfram - wolfram-mathematica

What I have is a matrix, I need to orthogonolize its eigen vectors.
That is basically all I need, but in exact form.
So here is my wolfram input
(orthogonolize(eigenvectors({{146, 112, 78, 17, 122}, {112, 86, 60, 13, 94}, {78, 60, 42 , 9, 66}, {17, 13, 9, 2, 14}, {122, 94, 66, 14, 104}})))
That gives me float numbers, while I need the exact forms.
Any ways to fix this?

Wolfram Mathematica, not WolframAlpha which is a completely different product with different rules and gives different results, given this
FullSimplify[Orthogonalize[Eigenvectors[{
{146, 112, 78, 17, 122}, {112, 86, 60, 13, 94}, {78, 60, 42 , 9, 66},
{17, 13, 9, 2, 14}, {122, 94, 66, 14, 104}}]]]
returns this exact form
{{Sqrt[121/342 + 52/(9*Sqrt[35587])], Sqrt[5/38 + 18/Sqrt[35587]],
Sqrt[25/342 + 64/(9*Sqrt[35587])], Sqrt[7/38 - 26/Sqrt[35587]]/3,
2*Sqrt[2/19 - 7/Sqrt[35587]]},
{-1/3*Sqrt[121/38 - 52/Sqrt[35587]], -Sqrt[5/38 - 18/Sqrt[35587]],
Sqrt[25/38 - 64/Sqrt[35587]]/3, -1/3*Sqrt[7/38 + 26/Sqrt[35587]],
Sqrt[8/19 + 28/Sqrt[35587]]},
{3/Sqrt[35], -Sqrt[5/7], 0, 0, 1/Sqrt[35]},
{-11/Sqrt[5110], -Sqrt[5/1022], 0, Sqrt[70/73], 4*Sqrt[2/2555]},
{-17/(3*Sqrt[2774]), -7/Sqrt[2774], Sqrt[146/19]/3, Sqrt[2/1387]/3, -9*Sqrt[2/1387]}}
Think of at least two different ways you can check that for correctness before you depend on that.
The last three of those can be simplified somewhat
1/Sqrt[35]*{3,-5,0,0,1},
1/Sqrt[5110]*{-11,-5,0,70,8},
1/(3*Sqrt[2774])*{-17,-21,146,2,-54}
but I cannot yet see a way to simplify the first two to a third of their current size. Can anyone else see a way to do that? Please check these results very carefully.

Related

Problem with Most Significant Digit (Radix Sort)

You see, I'm doing an analysis with a default array of both the Least Significant Digit (LSD) implementation of Radix Sort, and the Most Significant Digit (MSD) implementation.
This is the default array:
{98, 113, 32, 18, 5, 77, 248, 7}
But, here I implemented LSD:
{31, 113, 5, 77, 7, 98, 18, 248}
{5, 7, 113, 18, 31, 248, 77, 98}
{5, 7, 18, 31, 77, 98, 113, 248} Done! Right?
And here I implemented MSD:
{98, 32, 18, 5, 77, 7, 113, 248}
{5, 7, 18, 113, 32, 248, 77, 98}
{32, 113, 5, 7, 77, 18, 248, 98} Why?
Do you know why it is that in the case of MSD I can't sort the array with queues, since in theory, the implementation already examined each digit and should be the same? I've read somewhere that it's because MDS requires a recursion, but I'm not entirely sure and would love to hear other opinions on this.

Tackling the 'Small Data' Problem with Distributed Computing Cluster?

I'm learning about Hadoop + MapReduce and Big Data and from my understanding it seems that the Hadoop ecosystem was mainly designed to analyze large amounts of data that's distributed on many servers. My problem is a bit different.
I have a relatively small amount of data (a file consisting of 1-10 million lines of numbers) which needs to be analyzed in millions of different ways. For example, consider the following dataset:
[1, 6, 7, 8, 10, 17, 19, 23, 27, 28, 28, 29, 29, 29, 29, 30, 32, 35, 36, 38]
[1, 3, 3, 4, 4, 5, 5, 10, 11, 12, 14, 16, 17, 18, 18, 20, 27, 28, 39, 40]
[2, 3, 7, 8, 10, 10, 12, 13, 14, 15, 15, 16, 17, 19, 27, 30, 32, 33, 34, 40]
[1, 9, 11, 13, 14, 15, 17, 17, 18, 18, 18, 19, 19, 23, 25, 26, 27, 31, 37, 39]
[5, 8, 8, 10, 14, 16, 16, 17, 20, 21, 22, 22, 23, 28, 29, 30, 32, 32, 33, 38]
[1, 1, 3, 3, 13, 17, 21, 24, 24, 25, 26, 26, 30, 31, 32, 35, 38, 39, 39, 39]
[1, 2, 4, 4, 5, 5, 10, 13, 14, 14, 14, 14, 15, 17, 28, 29, 29, 35, 37, 40]
[1, 2, 6, 8, 12, 13, 14, 15, 15, 15, 16, 22, 23, 24, 26, 30, 31, 36, 36, 40]
[3, 6, 7, 8, 8, 10, 10, 12, 13, 17, 17, 20, 21, 22, 33, 35, 35, 36, 39, 40]
[1, 3, 8, 8, 11, 11, 13, 18, 19, 19, 19, 23, 24, 25, 27, 33, 35, 37, 38, 40]
I need to analyze how frequently a number of each column (Column N) repeats itself a certain number of rows later (L rows later. For example, if we were analyzing Column A with 1L (1-Row-Later) the result would be as follows:
Note: The position does not need to match - so number can appear anywhere in the next row
Column: A N-Later: 1 Result: YES, NO, NO, NO, NO, YES, YES, NO, YES -> 4/9.
We would repeat the above analysis for each column separately and for maximum N later times. In the above dateset which only consists of 10 lines it means a maximum of 9 N later. But in a dateset of 1 million lines, the analyses (for each column) would be repeated 999,999 times.
I looked into the MapReduce framework but it doesn't seem to cut it; it doesn't seem like an efficient solution for this problem and it requires a great deal of work to convert the core code into a MapReduce friendly structure.
As you can see in the above example, each analyses is independent of each other. For example, it is possible to analyze Column A separately from Column B. It is also possible to perform 1L analyses separately from 2L and so on. However, unlike Hadoop where the data lives on separate machines, in our scenario, each server needs access to all of the data to perform it's "share" of analysis.
I looked into possible solutions for this problem and it seems there are very few options: Ray or building a custom application on top of YARN using Apache Twill. Apache Twill was moved to the Attic in 2020 which means that Ray is the only available option.
Is Ray the best way to tackle this problem or are there other, better options? Ideally, the solution should automatically handle fail over and distribute the processing load intelligently. For example, in the above example, if we wanted to distribute the load to 20 machines, one way of doing so would be to divide 999,999 N Later by 20 and let Machine A analyze 1L-49999L, Machine B from 50000L - 100000L and so on. However, when you think about it, the load isn't being distributed equally - as it takes much longer to analyze 1L vs. 500000L as the latter contains only about half the number of rows (for 500000L the first row we are analyzing is row 500001 so we are essentially omitting the first 500K rows from analysis).
It should also not require a great deal of modification to the core code (like MapReduce does).
I'm working with Java.
Thanks
Well you are right - your scenario and your technological stack are not that suitable. Which raise the question - why not (add) something more relevant to your current technological stack? For instance - Redis DB.
Seems that your common action is mainly count values and you want to prevent over-calculation and make it more performant (e.g. - properly index your data). Given that this is one of the main features of Redis - it sounds logical to use it as a processor
My suggestion:
Create a hashmap that uses the numeric value as key and its count as value. This way - you will be able to pull different calculations over those metrics and always iterate your data-set once. Afterwards - you just need to pull the data from Redis by different criteria (per calculation or metric).
From this point, it's easy to save your calculated data back to your database and make it ready for direct querying. The overall process may be similar to this:
Scan data from file
Properly index it to redis (using hashmap)
Make desired calculations (over the indexed count)
Save it in your DB (as a digested data-set)
Flush Redis DB
Query your DB (as much as you like)
Follow the docs for both populating and retrieving data

Looking for an algorithm for a perfect "snake" from the center of a field?

I'm looking for a piece of code:
From the middle, in a "circle"-way, slowly to the ends of the edges of a rectangle. And when it reaches the boundaries on one side, just skip the pixels.
I tried already some crazy for-adventures, but that was to much code.
Does anyone have any idea for a simple/ingenious way?
It's like to start the game snake from the center until the full field is used. I'll use this way to scan a picture (from the middle to find the first pixel next to center in a other color).
Maybe a picture could describe it better:
From this link requires numpy and python of course.
import numpy as np
a = np.arange(7*7).reshape(7,7)
def spiral_ccw(A):
A = np.array(A)
out = []
while(A.size):
out.append(A[0][::-1]) # first row reversed
A = A[1:][::-1].T # cut off first row and rotate clockwise
return np.concatenate(out)
def base_spiral(nrow, ncol):
return spiral_ccw(np.arange(nrow*ncol).reshape(nrow, ncol))[::-1]
def to_spiral(A):
A = np.array(A)
B = np.empty_like(A)
B.flat[base_spiral(*A.shape)] = A.flat
return B
to_spiral(a)
array([[42, 43, 44, 45, 46, 47, 48],
[41, 20, 21, 22, 23, 24, 25],
[40, 19, 6, 7, 8, 9, 26],
[39, 18, 5, 0, 1, 10, 27],
[38, 17, 4, 3, 2, 11, 28],
[37, 16, 15, 14, 13, 12, 29],
[36, 35, 34, 33, 32, 31, 30]])
how do you think about run from edge to center? It really easy to code, just run from (0;0) and if you hit a edge or a pixel already visited just turn right 90*

Search algorithm with best Time Complexity [duplicate]

This question already has answers here:
How do I search for a number in a 2d array sorted left to right and top to bottom?
(21 answers)
Closed 4 years ago.
Given the following data:
[4]
[5, 8]
[9, 12, 20]
[10, 15, 23, 28]
[14, 19, 31, 36, 48]
[15, 22, 34, 41, 53, 60]
[19, 26, 42, 49, 65, 72, 88]
[20, 29, 45, 54, 70, 79, 95, 104]
[24, 33, 53, 62, 82, 91, 111, 120, 140]
[25, 36, 56, 67, 87, 98, 118, 129, 149, 160]
[29, 40, 64, 75, 99, 110, 134, 145, 169, 180, 204]
[30, 43, 67, 80, 104, 117, 141, 154, 178, 191, 215, 228]
[34, 47, 75, 88, 116, 129, 157, 170, 198, 211, 239, 252, 280]
[35, 50, 78, 93, 121, 136, 164, 179, 207, 222, 250, 265, 293, 308]
[Etc.]
What could be the best searching algorithm with the most optimal Time Complexity for finding a given number?
The rows are sorted
The columns are sorted
A number may occur more than once
Extra info:
Suppose we are looking for the number 26:
Due to order, this means we can eliminate the first 3 rows and the remaining columns to the right.
Due to order, this also means we can ignore every row after row=11.
Which results to this:
[10, 15, 23]
[14, 19, 31]
[15, 22, 34]
[19, 26, 42]
[20, 29, 45]
[24, 33, 53]
[25, 36, 56]
[29, 40, 64]
My current algorithm has a time complexity of O(x log(y)) where x is the amount of columns and y is the size for the Binary Search algorithm for each column.
I'm looking for something faster because I'm dealing with huge amount of data.
Currently I'm using BST on every column, but could I use BST on rows aswell? maybe achieving a O(log(x) log(y))?
It can be done in O(x)
Let's call the element we are trying to find n
Start with the bottom left element.
For each element we search through (let's call it e):
if e == n: we found it
if e < n: move to the right
Justification:
All elements to the left of e, including the column that e is in, are less than e. Those elements cannot == n and can be eliminated.
if e > n: move up
Justification:
All elements below e are greater than e and can be eliminated. What about the values less than e to the left of e? Can't those be == n? No. For e to make those moves to the right and have values to it's left, those values would have been already eliminated in step 2
Repeat until n found or index out of bounds in which case such an element does not exist.
Time complexity:
The worst case scenario is if the element isn't in the array and we have an index out of bounds. This occurs at the main diagonal and the total distance to the right and total distance up to any element on the long diagonal always sums to x.
You can find the bottom left of your trimmed array with a binary search of the first column, and the top right with a binary search of the last column of each row.
From there, the problem degenerates to How do I search for a number in a 2d array sorted left to right and top to bottom? which is well-studied in the linked question. The best algorithm is dependent on the shape of the result.

What are some algorithms for finding a closed form function given an integer sequence?

I'm looking form a programatic way to take an integer sequence and spit out a closed form function. Something like:
Given: 1,3,6,10,15
Return: n(n+1)/2
Samples could be useful; the language is unimportant.
This touches an extremely deep, sophisticated and active area of mathematics. The solution is damn near trivial in some cases (linear recurrences) and damn near impossible in others (think 2, 3, 5, 7, 11, 13, ....) You could start by looking at generating functions for example and looking at Herb Wilf's incredible book (cf. page 1 (2e)) on the subject but that will only get you so far.
But I think your best bet is to give up, query Sloane's comprehensive Encyclopedia of Integer Sequences when you need to know the answer, and instead spend your time reading the opinions of one of the most eccentric personalities in this deep subject.
Anyone who tells you this problem is solvable is selling you snake oil (cf. page 118 of the Wilf book (2e).)
There is no one function in general.
For the sequence you specified, The On-Line Encyclopedia of Integer Sequences finds 133 matches in its database of interesting integer sequences. I've copied the first 5 here.
A000217 Triangular numbers: a(n) = C(n+1,2) = n(n+1)/2 = 0+1+2+...+n.
0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231, 253, 276, 300, 325, 351, 378, 406, 435, 465, 496, 528, 561, 595, 630, 666, 703, 741, 780, 820, 861, 903, 946, 990, 1035, 1081, 1128, 1176, 1225, 1275, 1326, 1378, 1431
A130484 Sum {0<=k<=n, k mod 6} (Partial sums of A010875).
0, 1, 3, 6, 10, 15, 15, 16, 18, 21, 25, 30, 30, 31, 33, 36, 40, 45, 45, 46, 48, 51, 55, 60, 60, 61, 63, 66, 70, 75, 75, 76, 78, 81, 85, 90, 90, 91, 93, 96, 100, 105, 105, 106, 108, 111, 115, 120, 120, 121, 123, 126, 130, 135, 135, 136, 138, 141, 145, 150, 150, 151, 153
A130485 Sum {0<=k<=n, k mod 7} (Partial sums of A010876).
0, 1, 3, 6, 10, 15, 21, 21, 22, 24, 27, 31, 36, 42, 42, 43, 45, 48, 52, 57, 63, 63, 64, 66, 69, 73, 78, 84, 84, 85, 87, 90, 94, 99, 105, 105, 106, 108, 111, 115, 120, 126, 126, 127, 129, 132, 136, 141, 147, 147, 148, 150, 153, 157, 162, 168, 168, 169, 171, 174, 178, 183
A104619 Write the natural numbers in base 16 in a triangle with k digits in the k-th row, as shown below. Sequence gives the leading diagonal.
1, 3, 6, 10, 15, 2, 1, 1, 14, 3, 2, 2, 5, 12, 4, 4, 4, 13, 6, 7, 11, 6, 9, 9, 10, 7, 12, 13, 1, 0, 1, 10, 5, 1, 12, 8, 1, 1, 14, 1, 9, 7, 1, 4, 3, 1, 2, 2, 1, 3, 4, 2, 7, 9, 2, 14, 1, 2, 8, 12, 2, 5, 10, 3, 5, 11, 3, 8, 15, 3, 14, 6, 3, 7, 0, 4, 3, 13, 4, 2, 13, 4, 4, 0, 5, 9, 6, 5, 1, 15, 5, 12, 11, 6
A037123 a(n) = a(n-1) + Sum of digits of n.
0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 46, 48, 51, 55, 60, 66, 73, 81, 90, 100, 102, 105, 109, 114, 120, 127, 135, 144, 154, 165, 168, 172, 177, 183, 190, 198, 207, 217, 228, 240, 244, 249, 255, 262, 270, 279, 289, 300, 312, 325, 330, 336, 343, 351, 360, 370, 381
If you restrict yourself to polynomial functions, this is easy to code up, and only mildly tedious to solve by hand.
Let , for some unknown
Now solve the equations
…
which simply a system of linear equations.
If your data is guaranteed to be expressible as a polynomial, I think you would be able to use R (or any suite that offers regression fitting of data). If your correlation is exactly 1, then the line is a perfect fit to describe the series.
There's a lot of statistics that goes into regression analysis, and I am not familiar enough with even the basics of calculation to give you much detail.
But, this link to regression analysis in R might be of assistance
The Axiom computer algebra system includes a package for this purpose. You can read its documentation here.
Here's the output for your example sequence in FriCAS (a fork of Axiom):
(3) -> guess([1, 3, 6, 10, 15])
2
n + 3n + 2
(3) [[function= -----------,order= 0]]
2
Type: List(Record(function: Expression(Integer),order: NonNegativeInteger))
I think your problem is ill-posed. Given any finite number of integers in a sequence with
no generating function, the next element can be anything.
You need to assume something about the sequence. Is it geometric? Arithmetic?
If your sequence comes from a polynomial then divided differences will find that polynomial expressed in terms of the Newton basis or binomial basis. See this.
There is no general answers; a simple method can be implemented bu using Pade approximants; in two words, assume your sequence is a sequence of coefficients of the Taylor expansion of an unknown function, then apply an algorithm (similar to the continued-fraction algorithm) in order to "simplify" this Taylor-expansion (more precisely: find a rational function very close to the initial (and truncated) function. The Maxima program can do it: look at "pade" on the page: http://maxima.sourceforge.net/docs/manual/maxima_28.html
Another answer tells about the "guess" package in the FriCAS fork of Axiom (see previous answer by jmbr). If I am not wrong; this package is itself inspired from the Rate program by Christian Krattenthaler; you can find it here: http://www.mat.univie.ac.at/~kratt/rate/rate.html Maybe looking at its source could tell you about other methods.

Resources