Related
I am trying to convert a matrix to the type that can be received by gensim. AuthorTopic Model, which means I should convert a matrix to a sparse vector. I have already tried several functions in gensim like gensim.matutils.full2sparse and gensim.matutils.any2sparse. But there is something wrong:
my code:
matrix=numpy.array([[1,0 ,1],[0,1,1]])
mycorpus=any2sparse(matrix)
print(matrix)
print(mycorpus)
the output:
[[1 0 1]
[0 1 1]]
[(0, 1.0), (0, 1.0), (1, 0.0), (1, 0.0)] #mycorpus
accoring to the tutorial, mycorpus should be like:
[[(0,1),(2,1)]
[(1,1),(2,1)]]
I have no idea what's wrong. I really appreciate if anyone could give me some advise.
The Gensim AuthorTopicModel docs describe its desired corpus-format as iterable of list of (int, float).
Those int values would be word-ids, and ideally be accompanied by the id2word dict which idntifies which int means which word.
What's the source of your matrix, & do you know if it's the rows or the columns that represent words, and have a mapping of indexes to words? That will drive the conversion.
Also, as the docs mention, "The model is closely related to LdaModel. The AuthorTopicModel class inherits LdaModel, and its usage is thus similar.
Have you reviewed guides to Gensim LDA usage to see how they prepare their corpus, such as the multiple Usage Examples, to see if that helps suggest steps & necessary formats?
Or, is your corpus still available as texts, so you can directly use the examples there as a model to turn the text into the BoW format (rather than your already-processed matrix)?
If you're still having problems, you should expand your question text with more details, especially how the true corpus matrix that you have was created, and which errors you've encountered (& how you triggered them) that convince you things aren't working.
I am kinda of new in Mathematica and there are lots of appendto in my code which I think take up a look of time. I know there are some other ways optimize but I cannot really know exactly how to achieve. I think getBucketShocks can be improved a lot? Anyone?
getBucketShocks[BucketPivots_,BucketShock_,parallelOffset_:0]:=
Module[{shocks,pivotsNb},
shocks={};
pivotsNb=Length[BucketPivots];
If[pivotsNb>1,
AppendTo[shocks,LinearFunction[{0,BucketShock},{BucketPivots[[1]],BucketShock},{BucketPivots[[2]],0},BucketPivots[[2]],0},parallelOffset]];
Do[AppendTo[shocks,LinearFunction[{BucketPivots[[i-1]],0},{BucketPivots[[i]],BucketShock},{BucketPivots[[i+1]],0},{BucketPivots[[i+1]],0},parallelOffset]],{i,2,pivotsNb-1}];
AppendTo[shocks,LinearFunction[{BucketPivots[[pivotsNb-1]],0},{BucketPivots[[pivotsNb]],BucketShock},{BucketPivots[[pivotsNb]],BucketShock},{BucketPivots[[pivotsNb]],BucketShock},parallelOffset]],
If[pivotsNb==1,AppendTo[shocks,BucketShock+parallelOffset&]];
];
shocks];
LinearInterpolation[x_,{x1_,y1_},{x2_,y2_},parallelOffset_:0]:=parallelOffset+y1+(y2-y1)/(x2-x1)*(x-x1);
LinearFunction[p1_,p2_,p3_,p4_,parallelOffset_:0]:=Which[
#<=p1[[1]],parallelOffset+p1[[2]],
#<=p2[[1]],LinearInterpolation[#,p1,p2,parallelOffset],
#<=p3[[1]],LinearInterpolation[#,p2,p3,parallelOffset],
#<=p4[[1]],LinearInterpolation[#,p3,p4,parallelOffset],
#>p4[[1]],parallelOffset+p4[[2]]]&;
I think you can optimize the middle Do loop a lot by using some form of Map one way or another. At every iteration, you're trying to access 3 adjacent elements of BucketPivots. This seems like this would be the easiest to do with MovingMap, but you need to jump through a few hoops to get the arguments in the right place. This one is probably the easiest solution:
shocks = MovingMap[
LinearFunction[
{#[[1]], 0},
{#[[2]], BucketShock},
{#[[3]], 0},
{#[[3]], 0},
parallelOffset
]&,
BucketPivots,
2
]
As a general principle: if you want to do a Do or For loop in Mathematica that runs over the Length of another list, try to find a way you can do it with a function from the Map family (Map, MapIndexed, MapAt, MapThread, etc.) and get familiar with those. They are great substitutions for iterations!
After this, the first and last elements of shocks you can then add with AppendTo.
BTW: here's a free tip. I recommend that in Mathematica you avoid giving variables and functions names that start with a capital (like you did with BucketPivots). All of Mathematica's own symbols start with capitals, so if you avoid starting with them yourself, you'll never clash with a build-in function.
Given some list
numbers = {2,3,5,7,11,13};
How do I translate this to
translatedNumbers = {{1,2},{2,3},{3,5},{4,7},{5,11},{6,13}}
concisely?
I am aware of how to do this using the procedural style of programming as follows:
Module[{lst = {}, numbers = {2, 3, 5, 7, 11, 13}},
Do[AppendTo[lst, {i, numbers[[i]]}], {i, 1, Length#numbers}]; lst]
But such is fairly verbose for what seems to me to be a simple operation. For example the haskell equivalent of this is
numbers = zip [1..] [2,3,5,7,11,13]
I can't help but think that there is a more concise way of "indexing" a list of numbers in Mathematica.
Potential Answer
Apparently I'm not allowed to answer my own question after having had a lightbulb go off unless I have 100 "rep". So I'll just put my answer here. Let me know if I should do anything differently then I have done.
Well I'm feeling a little silly now after having asked this. For if I treat mathematica lists as a matrix I'm able to transpose them. Thus an answer (perhaps not the best) to my question is as follows:
Transpose[{Range#6, {2, 3, 5, 7, 11, 13}}]
Edited to work for arbitrary input lists, I think something like:
With[{lst={2, 3, 5, 7, 11, 13}},Transpose[{Range#Length#lst,lst}]]
will work. Could I do any better?
One thing to consider is if the transformation will not unpack the data. This is important for large data sets.
On["Packing"]
numbers = Developer`ToPackedArray#{2, 3, 5, 7, 11, 13};
This will unpack
MapIndexed[{First[#2], #1} &, numbers]
this will not
Transpose[{Range[Length[#]], #}] &[numbers]
Off["Packing"]
I would use MapIndexed instead
MapIndexed[{First[#2], #1} &, numbers]
Well, my „solution“ is perhaps not as smart as the solution from cobbal, but when I test it with long arrays, it is faster (factor of 5!).
I am simply using:
newList = Transpose[{Range[Length[numbers]], numbers}]
AHH! ruebenko posted a similar answer during I have written my post. Sorry for this almost superfluous post. Well, perhaps it is not so superfluous. I have tested my solution with and without packing, and it works at fastest without packing.
I have some issue here which I don't know how to solve in a good way. For example, I want to use BaseForm[1/3, 3]. However, this does not do what I intended unless I input BaseForm[1/3.,3]. Given the data in Rational form, how to turn it to Real? I tried with Apply, it does not work. (Strange enough, uh? To me, Apply can always be used to change header.)
To this specific problem, I could have done something like BaseForm[1/3*1.,3], but it really isn't very nice.
Thanks for your help.
BaseForm takes a rational in base 10 to a rational in what ever base you want... so it does what you expect.
In[1]:= BaseForm[1/3,3]
Out[1]//BaseForm= Subscript[1, 3]/Subscript[10, 3]
And as you pointed out, giving it a Real number can be done like:
In[2]:= BaseForm[1/3.,3]
Out[2]//BaseForm= Subscript[0.1, 3]
The safest way to change things would be to define your own baseForm which is the same as BaseForm except for when it's given rational numbers:
baseForm[r_Rational,b_]:=BaseForm[N[r],b]
Then
In[3]:= baseForm[1/3,3]
Out[3]//BaseForm= Subscript[0.1, 3]
The less safe way (because you don't know what else it might break) is to redefine BaseForm
Unprotect[BaseForm];
BaseForm[r_Rational, b_] := BaseForm[N[r], b]
Protect[BaseForm];
and then use as normal.
I may be missing the subtlety of your request, but if you always want a real-number output, why not merely use N?
BaseForm[N[1/3], 3]
(* Out= 0.13 *)
Ok, imagine I have this Matrix: {{1,2},{2,3}}, and I'd rather have {{4,1,2},{5,2,3}}. That is, I prepended a column to the matrix. Is there an easy way to do it?
My best proposal is this:
PrependColumn[vector_List, matrix_List] :=
Outer[Prepend[#1, #2] &, matrix, vector, 1]
But it obfuscates the code and constantly requires loading more and more code. Isn't this built in somehow?
Since ArrayFlatten was introduced in Mathematica 6 the least obfuscated solution must be
matrix = {{1, 2}, {2, 3}}
vector = {{4}, {5}}
ArrayFlatten#{{vector, matrix}}
A nice trick is that replacing any matrix block with 0 gives you a zero block of the right size.
I believe the most common way is to transpose, prepend, and transpose again:
PrependColumn[vector_List, matrix_List] :=
Transpose[Prepend[Transpose[matrix], vector]]
I think the least obscure is the following way of doing this is:
PrependColumn[vector_List, matrix_List] := MapThread[Prepend, {matrix, vector}];
In general, MapThread is the function that you'll use most often for tasks like this one (I use it all the time when adding labels to arrays before formating them nicely with Grid), and it can make things a lot clearer and more concise to use Prepend instead of the equivalent Prepend[#1, #2]&.
THE... ABSOLUTELY.. BY FAR... FASTEST method to append or prepend a column from my tests of various methods on array RandomReal[100,{10^8,5}] (kids, don't try this at home... if your machine isn't built for speed and memory, operations on an array this size are guaranteed to hang your computer)
...is this: Append[tmp\[Transpose], Range#Length#tmp]\[Transpose].
Replace Append with Prepend at will.
The next fastest thing is this: Table[tmp[[n]]~Join~{n}, {n, Length#tmp}] - almost twice as slow.