Is there way to remove element from list in Mathematica - wolfram-mathematica

Is there function in Wolfram Mathematica for removing element from original list?
For example
a={1,2,3};
DeleteFrom[a,1];
a
a={2,3}
If it is absent can anyone give example of efficient variant of such function?
(I know that there is function Delete() but it will create new list. This is not good if list is big)

If you want to drop the first element from a list a the statement
Drop[a,1]
returns a list the same as a without its first element. Note that this does not update a. To do that you could assign the result to a, eg
a = Drop[a,1]
Note that this is probably exactly what Delete is doing behind the scenes; first making a copy of a without its first element, then assigning the name a to that new list, then freeing the memory used by the old list.
Comparing destructive updates and non-destructive updates in Mathematica is quite complicated and can take one deep into the system's internals. You'll find a lot about the subject on the Stack Exchange Mathematica site.

Every time you change the length of a list in Mathematica you cause a reallocation of the list, which takes O(n) rather than O(1) time. Though no "DeleteFrom" function exists, if it did it would be no faster than a = Delete[a, x].
If you can create in advance a list of all the elements you wish to delete and delete them all at once you will get much better performance. If you cannot you will have to find another way to formulate your problem. I suggest you join us on the proper Stack Exchange site if you have additional questions:

Assign the element to an empty sequence and it will removed from the list. This works for any element.
In[1] := a = {1,2,3}
Out[1]= {1,2,3}
In[2] := a[[1]] = Sequence[]
Out[2] = Sequence[]
In[3] := a
Out[3] = {2,3}
Yes Mathematica tends towards non-destructive programming, but the programmers at Wolfram are pretty clever folks and the code seems to run pretty fast. It hard to believe they would always copy a whole list to change one element, i.e. not make any optimizations.

Improving the user3446498's answer, you can do the following:
In[1] := a = {1,2,3};
In[2] := a[[1]] = Nothing;
In[3] := a
Out[3] = {2,3}
In[4] := a == {2,3}
Out[4] = True
this Nothing symbol was introduced in in the 10th version (2015), see here.

Both solutions from #user3446498 and #pmsoltani won't actually delete the element. Test:
a = {1, 2, 3};
a[[2]] = Sequence[]; (* or Nothing *)
--a[[1]];
++a[[2]];
a
They both output {0, 4, 3}, while {0, 4} is expected.
Replacing 2nd line with a = Delete[a, 2]; would work.

Related

Preallocate or change size of vector

I have a situation where I have a process which needs to "burn-in". This means that I
Start with p values, p relatively small
For n>p, generate nth value using most recently generated p values (e.g. p+1 generated from values 1 to p, p+2 generated from values 2, p+1, etc.)
Repeat until n=N, where N large
Now, only the most recently generated p values will be useful to me, so there are two ways for me to implement this. I can either
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value or,
Preallocate a large array of length N, where first p elements are initial values. At iteration n, mutate nth value with most recently generated value
There are pros and cons to both approaches.
Pros of the first, are that we only store most relevant values. Cons of the first are that we are changing the length of the vector at each iteration.
Pros of the second are that we preallocate all the memory we need. Cons of the second is that we store much more than we need.
What is the best way to proceed? Does it depend on what aspect of performance I most need to care about? What will be the quickest?
Cheers in advance.
edit: approximately, p is usually in the order of low tens, N can be several thousand
The first solution has another huge cons: removing the first item of an array takes O(n) time since elements should be moved in memory. This certainly cause the algorithm to runs in quadratic time which is not reasonable. Shifting the items as proposed by #ForceBru should also cause this quadratic run time (since many items are moved just to add one value every time).
The second solution should be pretty fast compared to the first but, indeed, it can use a lot of memory so it should be sub-optimal (it takes time to write values in the RAM).
A faster solution is to use a data structure called a deque. Such data structure enable you to remove the first item in constant time and append a new value at the end also in constant time. That being said, it also introduces some overhead to be able to do that. Julia provide such data structure (more especially queues).
Since the number of in-flight items appears to be bounded in your algorithm, you can implement a rolling buffer. Fortunately, Julia also implement this: see CircularBuffer. This solution should be quite simple and fast (since the operations you want to do are done in O(1) time on it).
It is probably simplest to use CircularArrays.jl for your use case:
julia> using CircularArrays
julia> c = CircularArray([1,2,3,4])
4-element CircularVector(::Vector{Int64}):
1
2
3
4
julia> for i in 5:10
c[i] = i
#show c
end
c = [5, 2, 3, 4]
c = [5, 6, 3, 4]
c = [5, 6, 7, 4]
c = [5, 6, 7, 8]
c = [9, 6, 7, 8]
c = [9, 10, 7, 8]
In this way - as you can see - you can can continue using an increasing index and array will wrap around internally as needed (discarding old values that are not needed any more).
In this way you always store last p values in the array without having to copy anything or re-allocate memory in each step.
...only the most recently generated p values will be useful to me...
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value.
Cons of the first are that we are changing the length of the vector at each iteration.
There's no need to change the length of the vector. Simply shift its elements to the left (overwriting the first element) and write the new data to the_vector[end]:
the_vector = [1,2,3,4,5,6]
function shift_and_add!(vec::AbstractVector, value)
vec[1:end-1] .= #view vec[2:end] # shift
vec[end] = value # replace the last value
vec
end
#assert shift_and_add!(the_vector, 80) == [2,3,4,5,6,80]
# `the_vector` will be mutated
#assert the_vector == [2,3,4,5,6,80]

Efficiently searching a doubly-linked list for a value with a pointer constraint?

I have been given this problem to solve:
Suppose you're given a doubly-linked list and a pointer to some element of that list. Let's suppose that you want to search the list for some value v, which you know exists somewhere. Using no pointers except p, design an Θ(m)-time algorithm for finding v, where m is the distance between p and the node containing v. Do not modify the input list.
Does anyone have any ideas how to solve this? The only way I can think of doing this destructively modifies the list.
As a hint: What happens if you take one step forward, two steps back, four steps forward, eight steps back, etc.? How many total steps will it take to find the element?
When doing the analysis, you might find it useful to sum up a geometric series. If it helps, 1 + r + r2 + r3 + ... + rn-1 = (rn - 1) / (r - 1).
Hope this helps!
I think templatetypedef has the correct answer, but the below solution came to mind, based on the original phrasing of the question, which states that the list can't be "destroyed or shuffled", which might imply that modifying it is okay if you can change it back again - even if this is not true, I still think this is an interesting approach.
It can easily be solved with 2 pointers, right? One to go forward and one to go back, moving both at the same time until you the value you're looking for.
Now this is all well and good, except that we're only allowed to have 1 pointer.
If you think about a double linked-list, we know that the previous node's next pointer is just a pointer to the current element, so, if we were to lose it, we could just set it up again.
So we could use p.prev.next as free storage (and p.prev.prev.next, given a similar argument).
So, we could only use one pointer going forward, and store the one going back in p.prev.next, moving it appropriately as we go along.
Some Java-esque pseudo-code ignoring end-of-list checks and not finding it for simplicity (I think it's already complicated enough):
if p.data == v
return v
p = p.next
p.prev.next = p.prev.prev
while not found
if p == v || p.prev.next == v
p.prev.next = p // restore p.prev.next
return v
p = p.next
p.prev.next = p.prev.prev.next.prev // set up p.prev.next from p.prev.prev.next
p.prev.prev.next = p.prev // restore p.prev.prev.next

How to select sublists faster in Mathematica?

My question sounds more general, but I have a specific example. I have a list of data in form:
plotDataAll={{DateList1, integerValue1}, {DateList2, integerValue2}...}
The dates are sorted chronologically, that is plotDataAll[[2,1]] is a more recent time then plotDataAll[[1,1]].
I want to create plots of specific periods, 24h ago, 1 week ago, etc. For that I need just a portion of data. Here's how I got what I wanted:
mostRecentDate=Max[Map[AbsoluteTime, plotDataAll[[All,1]]]];
plotDataLast24h=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-86400.)&];
plotDataLastWeek=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-604800.)&];
plotDataLastMonth=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-2.592*^6)&];
plotDataLast6M=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-1.5552*^7)&];
Then I used DateListPlot to plot the data. This becomes slow if you need to do this for many sets of data.
What comes to my mind, if I could find the index of first element in list that satisfies the date condition, because it's chronologically sorted, the rest of them should satisfy the condition as well. So I would have:
plotDataLast24h=plotDataAll[[beginningIndexThatSatisfiesLast24h;;Length[plotDataAll]]
But how do I get the index of the first element that satisfies the condition?
If you have a faster way to do this, please share your answer. Also, if you have a simple, faster, but sub-optimal solution, that's fine too.
EDIT:
Time data is not in regular intervals.
If your data is at regular intervals you should be able to know how many elements constitute a day, week, etc. and use Part.
plotDataAll2[[knownIndex;;-1]]
or more specifically if the data was hourly:
plotDataAll2[[-25;;-1]]
would give you the last 24 hours. If the spacing is irregular then use Select or Pick. Date and time functions in Mma are horrendously slow unfortunately. If you are going to do a lot of date and time calculation better to do a conversion to AbsoluteTime just once and then work with that. You will also notice that your DateListPlots render much faster if you use AbsoluteTime.
plotDataAll2=plotDataAll;
plotDataAll2[[All,1]]=AbsoluteTime/#plotDataAll2[[All,1]];
mostRecentDate=plotDataAll2[[-1,1]]
On my computer Pick is about 3 times faster but there may be other improvements you can make to the code below:
selectInterval[data_, interval_] := (tmp = data[[-1, 1]] - interval;
Select[data, #[[1]] > tmp &])
pickInterval[data_, interval_] := (tmp = data[[-1, 1]] - interval;
Pick[data, Sign[data[[All, 1]] - tmp], 1])
So to find data within the last week:
Timing[selectInterval[plotDataAll2, 604800]]
Timing[pickInterval[plotDataAll2, 604800]]
The thing that you want to avoid is checking all the values in the data table. Since the data is sequential you can just start checking from the back and stop when you have found the correct index.
Schematically:
tab = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
i = j = Length#tab;
While[tab[[i]] > 5, --i];
tab[[i ;; j]]
-> {5, 6, 7, 8, 9}
sustitute > 5 for whatever you want to check for. I didn't have time to test this right now but in your case, e.g.,
maxDate=AbsoluteTime#plotDataAll[[-1,1]]; (* no need to find Max if data is sequential*)
i24h = iWeek = iMonth = iMax = Length#plotDataAll;
While[AbsoluteTime#plotDataAll[[i24h,1]] > maxDate-86400.,--i24h];
While[AbsoluteTime#plotDataAll[[iWeek,1]] > maxDate-604800.,--iWeek];
While[AbsoluteTime#plotDataAll[[iMonth,1]] > maxDate-2.592*^6.,--iMonth];
While[AbsoluteTime#plotDataAll[[i6Month,1]] > maxDate-1.5552*^7.,--i6Month];
Then, e.g.,
DateListPlot#plotDataAll[[i24h;;iMax]]
If you want to start somewhere in the middle of plotDataAll just use a While to first find the starting point and set iMax and maxDate apropriately.
For large data sets this may be one of the few instances where a loop construct is better than MMA's inbuilt functions. That, however, may be my own ignorance and if anyone here knows of a MMA inbuilt function that does this sort of "stop when match found" comparison better than While.
EDIT: Timing comparisons
I played around a bit with Mike's and my solution and compared it to the OP's method. Here is the toy code I used for each solution
tab = Range#1000000;
(* My solution *)
i = j = tab[[-1]];
While[tab[[i]] > j - 24, --i];
tab[[i ;; j]]
(* Mike's solution *)
tmp = tab[[-1]] - 24;
Pick[tab, Sign[tab[[All]] - tmp], 1]
(* Enedene's solution *)
j = tab[[-1]];
Select[tab, # > (j - 24) &]
Here are the results (OS X, MMA 8.0.4, Core2Duo 2.0GHz)
As you can see, Mike's solution has a definite advantage over enedene's solution but, as I surmised originally, the downside of using inbuilt functions like Pick is that they still perform a comparative check on all the element in a list which is highly superfluous in this instance. My solution has constant time due to the fact that no unneccessary checks are made.

Mathematica appending a Matrix

I have to solve a system of non-linear equations for a wide range of parameter space. I'm using FindRoot, which is sensitive to initial start point so I have to do it by hand and by trial and error and plotting, rather than putting the equations in a loop or in a table.
So what I want to do is create a database or a Matrix with a fixed number of columns but variable number of rows so I can keep appending it with new results as and when I solve for them.
Right now I've used something like:
{{{xx, yy}} = {x, y} /. FindRoot[{f1(x,y) == 0,f2(x,y)==0}, {x,a},{y,b}],
g(xx,yy)} >>> "Attempt1.txt"
Where I am solving for two variables and then storing the variables and also a function g(xx,yy) of the variables.
This seems to work for me but the result is not a Matrix any more but the data is stored as some text type thing.
Is there anyway I can get this to stay a matrix or a database where I keep adding rows to it each time I solve for FindRoot by hand? Again, I need to do FindRoot by hand because it is sensitive to the start points and I don't know the good start points without first plotting it.
Thanks a lot
Unless I'm not understanding what you're trying to do, this should work
results = {};
results = Append[Flatten[{{xx, yy} = {x, y} /. FindRoot[{f1(x,y) == 0,f2(x,y)==0}, {x,a},{y,b}],g(xx,yy)}],results];
Then every time you're tying to add a line to the matrix results by hand, you would just type
results = Append[Flatten[{{xx, yy} = {x, y} /. FindRoot[{f1(x,y) == 0,f2(x,y)==0}, {x,a},{y,b}],g(xx,yy)}],results];
By the way, to get around the problem of sensitivity to the initial a and b values, you could explore the parameter space in a loop, varying the parameters slowly and using the solution of x and y from the previous loop iteration for your new a and b values each time.
What you want to do can be achieved by using Read instead of Get. While Get reads the complete file in one run, Read can be adjusted to extract a single Expression, Byte, Number and many more. So what you should do is open your file and read expression after expression and pack it inside a list.
PutAppend[{{1, 2}, {3, 4}}, "tmp.mx"]
PutAppend[{{5, 6}, {7, 8}}, "tmp.mx"]
PutAppend[{{9, 23}, {11, 12}}, "tmp.mx"]
PutAppend[{{13, 14}, {15, 16}}, "tmp.mx"]
stream = OpenRead["tmp.mx"];
mat = ArrayPad[
NestWhileList[Read[stream, Expression] &,
stream = OpenRead["tmp.mx"], # =!= EndOfFile &], -1];
Close[stream];
And now you have in mat a list containing all lines. The ArrayPad, which cuts off one element at each end is necessary because the first element contains the output of the OpenRead and the last element contains EndOfFile. If you are not familiar with functional constructs like NestWhileList then you can put it in a loop as you like, since it is really just the iterated calls to Read
stream = OpenRead["tmp.mx"];
mat = {};
AppendTo[mat, Read[stream, Expression]];
AppendTo[mat, Read[stream, Expression]];
AppendTo[mat, Read[stream, Expression]];
AppendTo[mat, Read[stream, Expression]];
Close[stream];

On PackedArray, looking for advice for using them

I have not used PackedArray before, but just started looking at using them from reading some discussion on them here today.
What I have is lots of large size 1D and 2D matrices of all reals, and no symbolic (it is a finite difference PDE solver), and so I thought that I should take advantage of using PackedArray.
I have an initialization function where I allocate all the data/grids needed. So I went and used ToPackedArray on them. It seems a bit faster, but I need to do more performance testing to better compare speed before and after and also compare RAM usage.
But while I was looking at this, I noticed that some operations in M automatically return lists in PackedArray already, and some do not.
For example, this does not return packed array
a = Table[RandomReal[], {5}, {5}];
Developer`PackedArrayQ[a]
But this does
a = RandomReal[1, {5, 5}];
Developer`PackedArrayQ[a]
and this does
a = Table[0, {5}, {5}];
b = ListConvolve[ {{0, 1, 0}, {1, 4, 1}, {0, 1, 1}}, a, 1];
Developer`PackedArrayQ[b]
and also matrix multiplication does return result in packed array
a = Table[0, {5}, {5}];
b = a.a;
Developer`PackedArrayQ[b]
But element wise multiplication does not
b = a*a;
Developer`PackedArrayQ[b]
My question : Is there a list somewhere which documents which M commands return PackedArray vs. not? (assuming data meets the requirements, such as Real, not mixed, no symbolic, etc..)
Also, a minor question, do you think it will be better to check first if a list/matrix created is already packed before calling calling ToPackedArray on it? I would think calling ToPackedArray on list already packed will not cost anything, as the call will return right away.
thanks,
update (1)
Just wanted to mention, that just found that PackedArray symbols not allowed in a demo CDF as I got an error uploading one with one. So, had to remove all my packing code out. Since I mainly write demos, now this topic is just of an academic interest for me. But wanted to thank everyone for time and good answers.
There isn't a comprehensive list. To point out a few things:
Basic operations with packed arrays will tend to remain packed:
In[66]:= a = RandomReal[1, {5, 5}];
In[67]:= Developer`PackedArrayQ /# {a, a.a, a*a}
Out[67]= {True, True, True}
Note above that that my version (8.0.4) doesn't unpack for element-wise multiplication.
Whether a Table will result in a packed array depends on the number of elements:
In[71]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {10}]]
Out[71]= False
In[72]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {11}]]
Out[72]= True
In[73]:= Developer`PackedArrayQ[Table[RandomReal[], {25}, {10}]]
Out[73]= True
On["Packing"] will turn on messages to let you know when things unpack:
In[77]:= On["Packing"]
In[78]:= a = RandomReal[1, 10];
In[79]:= Developer`PackedArrayQ[a]
Out[79]= True
In[80]:= a[[1]] = 0 (* force unpacking due to type mismatch *)
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {10}. >>
Out[80]= 0
Operations that do per-element inspection will usually unpack the array,
In[81]:= a = RandomReal[1, 10];
In[82]:= Position[a, Max[a]]
Developer`FromPackedArray::unpack: Unpacking array in call to Position. >>
Out[82]= {{4}}
There penalty for calling ToPackedArray on an already packed list is small enough that I wouldn't worry about it too much:
In[90]:= a = RandomReal[1, 10^7];
In[91]:= Timing[Do[Identity[a], {10^5}];]
Out[91]= {0.028089, Null}
In[92]:= Timing[Do[Developer`ToPackedArray[a], {10^5}];]
Out[92]= {0.043788, Null}
The frontend prefers packed to unpacked arrays, which can show up when dealing with Dynamic and Manipulate:
In[97]:= Developer`PackedArrayQ[{1}]
Out[97]= False
In[98]:= Dynamic[Developer`PackedArrayQ[{1}]]
Out[98]= True
When looking into performance, focus on cases where large lists are getting unpacked, rather than the small ones. Unless the small ones are in big loops.
This is just an addendum to Brett's answer:
SystemOptions["CompileOptions"]
will give you the lengths being used for which a function will return a packed array. So if you did need to pack a small list, as an alternative to using Developer`ToPackedArray you could temporarily set a smaller number for one of the compile options. e.g.
SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> 20}]
Note also some difference between functions which to me at least doesn't seem intuitive so I generally have to test these kind of things whenever I use them rather than instinctively knowing what will work best:
f = # + 1 &;
g[x_] := x + 1;
data = RandomReal[1, 10^6];
On["Packing"]
Timing[Developer`PackedArrayQ[f /# data]]
{0.131565, True}
Timing[Developer`PackedArrayQ[g /# data]]
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {1000000}.
{1.95083, False}
Another addition to Brett's answer: If a list is a packed array then a ToPackedArray is very fast since this checked quite early. Also you might find this valuable:
http://library.wolfram.com/infocenter/Articles/3141/
In general for numerics stuff look for talks from Rob Knapp and/or Mark Sofroniou.
When I develop numerics codes, I write the function and then use On["Packing"] to make sure that everything is packed that needs to be packed.
Concerning Mike's answer, the threshold has been introduced since for small stuff there is overhead. Where the threshold is is hardware dependent. It might be an idea to write a function that sets these threshold based on measurements done on the computer.

Resources