Why list ++ requires to scan all elements of list on its left? - performance

The Haskell tutorial says, be cautious that when we use "Hello"++" World", the new list construction has to visit all single elements(here, every character of "Hello"), so if the list on the left of "++" is long, then using "++" will bring down performance.
I think I was not understanding correctly, does Haskell's developers never tune the performance of list operations? Why this operation remains slow, to have some kind of syntax consistencies in any lambda function or currying?
Any hints? Thanks.

In some languages, a "list" is a general-purpose sequence type intended to offer good performance for concatenation, splitting, etc. In Haskell, and most traditional functional languages, a list is a very specific data structure, namely a singly-linked list. If you want a general-purpose sequence type, you should use Data.Sequence from the containers package (which is already installed on your system and offers very good big-O asymptotics for a wide variety of operations), or perhaps some other one more heavily optimized for common usage patterns.

If you have immutable list which has a head and a reference to the tail, you cannot change its tail. If you want to add something to the 'end' of the list, you have to reach the end and then put all items one by one to the head of your right list. It is the fundamential property of immutable lists: concatenation is expensive.

Haskell lists are like singly-linked lists: they are either empty or they consist of a head and a (possibly empty) tail. Hence, when appending something to a list, you'll first have to walk the entire list to get to the end. So you end up traversing the entire list (the list to which you append, that is), which needs O(n) runtime.

Related

Prolog: Looping through elements of list A and comparing to members of list B

I'm trying to write Prolog logic for the first time, but I'm having trouble. I am to write logic that takes two lists and checks for like elements between the two. For example, consider the predicate similarity/2 :
?- similarity([2,4,5,6,8], [1,3,5,6,9]).
true.
?- similarity([1,2,3], [5,6,8]).
false.
The first query will return true as those two lists have 5 and 6 in common. The second returns false as there are no common elements between the two lists in that query.
I CANNOT use built in logic, such as member, disjoint, intersection, etc. I am thinking of iterating through the first list provided, and checking to see if it matches each element in the second list. Is this an efficient approach to this problem? I will appreciate any advice and help. Thank you so much.
Writing Prolog for the first time can be really daunting, since it is unlike many traditional programming languages that you will most likely encounter; however it is a very rewarding experience once you've got a grasp on this new style of programming! Since you mention that you are writing Prolog for the first time I'll give some general tips and tricks about writing Prolog, and then move onto some hints to your problem, and then provide what I believe to be a solution.
Think Recursively
You can think of every Prolog program that you write to be intrinsically recursive in nature. i.e. you can provide it with a series of "base-cases" which take the following form:
human(John). or wildling(Ygritte) In my opinion, these rules should always be the first ones that you write. Try to break down the problem into its simplest case and then work from there.
On the other hand, you can also provide it with more complex rules which will look something like this: contains(X, [H|T]):- contains(X, T) The key bit is that writing a rule like this is very much equivalent to writing a recursive function in say, Python. This rule does a lot of the heavy lifting in looking to see whether a value is contained in a list, but it isn't complete without a "base-case". A complete contains rule would actually be two rules put together: contains(X, [X|_]).
contains(X, [H|T]):-contains(X, T).
The big takeaway from this is to try and identify the simple cases of your problem, which can act like base cases in a recursive function, and then try to identify how you want to "recurse" and actually do work on the problem at hand.
Pattern Matching
Part of the great thing about Prolog is the pattern matching system that it has in place. You should 100% use this to your advantage whenever you can -- it is especially helpful when trying to do anything with lists. For example:
head(X, [X|T]).
Will evaluate to true when called thusly: head(1, [1, 2, 3]) because intrinsic in the rule is the matching of X. This sort of pattern matching on the first element of a list is incredibly important and really the key way that you will do any work on lists in Prolog. In my experience, pattern matching on the head of a list will often be one of the "base-cases" that I mentioned beforehand.
Understand The Flow of the Program
Another key component of how Prolog works is that it takes a "top-down" approach to reading code. What I mean by that is that every time a rule is called (except for definitions of the form king(James).), Prolog starts at line 1 and continues until it reaches a rule that is true or the end of the file. Therefore, the ordering of your rules is incredibly important. I'm assuming that you know that you can combine rules together via a comma to indicate logical AND, but what is maybe more subtle is that if you order one rule above another, it can act as a logical OR, simply because it will be evaluated before another rule, and can potentially cause the program to recurse.
Specific Example
Now that I've gotten all of my general advice out of the way, I'll actually reference the given problem. First, I'd write my "base-case". What would happen if you are given two lists whose first elements are the same? If the first element in each list is not the same, then they have to be different. So, you have to look through the second list to see if this element is contained anywhere in the rest of the list. What kind of rule would this produce? OR it could be the case that the first element of the first list is not contained within the second at all, in which case you have to advance once in the first list, and start again with the second list. What kind of rule would this produce?
In the end, I would say that your approach is the correct one to take, and I have provided my own solution below:
similarity([H|_], [H|_]).
similarity(H1|T1], [_|T2]):- similarity([H1|T1], T2).
similarity([_|T1], [H2|T2]):- similarity(T1, [H2|T2]).
Hope all of this helps in some way!

Constant-time list concatenation in OCaml

Is it possible to implement constant-time list concatenation in OCaml?
I imagine an approach where we deal directly with memory and concatenate lists by pointing the end of the first list to the beginning of the second list. Essentially, we're creating some type of linked-list like object.
With the normal list type, no, you can't. The algorithm you gave is exactly the one implemented ... but you still have to actually find the end of the first list...
There are various methods to implement constant time concatenation (see Okazaki for fancy details). I will just give you names of ocaml libraries that implement it: BatSeq, BatLazyList (both in batteries), sequence, gen, Core.Sequence.
Pretty sure there is a diff-list implementation somewhere too.
Lists are already (singly) linked lists. But list nodes are immutable. So you change any node's pointer to point to anything different. In order to concatenate two lists you must therefore copy all the nodes in the first list.

Difference between "open-ended lists" and "difference lists"

What is the difference between "open-ended lists" and "difference lists"?
As explained on http://homepages.inf.ed.ac.uk/pbrna/prologbook/node180.html, open list is a tool used to implement a difference list.
Open list is any list where you have a unassigned variable at some point in the list, e.g.: [a,b,c|X]. You can use open list to implement a data structure called difference list, which formally specifies a structure of two terms pointing to first element and to the open end, traditionally defined as: [a,b,c|X]-X, to make operating on such lists easier.
For example, if all you have is an open list, adding element to the end is possible, but you need to iterate over all items. In a difference list you can just use the end-of-list variable (called a Hole on the page above) to skip iteration and perform the operation in constant time.
Both notions seem to be lists, but in fact they are not. One is a concrete term, the other rather a convention.
Open-ended lists, partial lists
Open-ended lists are terms that are not lists but can be instantiated such that they become lists. In standard lingo, they are called partial lists. Here are partial lists: X, [a|X], [X|X] are all partial lists.
The notion open-ended lists suggests a certain usage of such lists to simulate some open-ended state. Think of a dictionary that might be represented by an open-ended list. Every time you add a new item, the variable "at the end of the partial list" is instantiated to a new element. While this programming technique is quite possible in Prolog, it has one big downside: The programs will heavily depend on a procedural interpretation. And in many situations there is no way to have a declarative interpretation at all.
Difference lists
Difference lists are effectively not lists as such but a certain way how lists are used such that the intended list is represented by two variables: one for the start and one for the end of the list. For this reason it would help a lot to rather talk of list differences instead of difference lists.
Consider:
el(E, [E|L],L).
Here, the last two arguments can be seen as forming a difference: a list that contains the single element [E]. You can now construct more complex lists out of simpler ones, provided you respect certain conventions which are essentially that the second argument is only passed further on. The differences as such are never compared to each other!
el2(E, F, L0,L) :-
el(E, L0,L1),
el(F, L1,L).
Note that this is merely a convention. The lists are not enforced. Think of:
?- el2(E, F, L, nonlist).
L = [E,F|nonlist].
This technique is also used to encode dcgs.
For example
Open-ended : [a,b,c | _]
Difference-list : [a,b,c|U]-U.

Why does F# prefer lists over arrays?

I'm trying to learn F# and was watching a video when something odd (at least, to me) came up. The video in question is here and the relevant part starts at 2:30 for those interested. But basically, the guy says that F# makes it awkward to work with arrays and that the designers did so on purpose because lists are easier to "prepend and append".
The question that immediately sprang to mind: isn't easy prepending and appending something that should be frowned upon in an immutable language? Specifically, I'm thinking of C#'s Lists where you can do something like List.Add(obj); and mutate the list. With an array you'd have to create an entirely new array, but that's also what would need to happen in an immutable language.
So why do the designers of F# prefer lists? What is the fundamental difference in an immutable environment between a list and an array? What am I missing? Are lists in F# really linked lists?
I would disagree that "F# makes it awkward to work with arrays". In fact, F# makes working with arrays quite nice compared to most languages.
For example, F# has literal array construction: let arr = [|1;2;3;4;|]. And perhaps even cooler, pattern matching on arrays:
match arr with
| [|1;2;_;_|] -> printfn "Starts with 1;2"
| [|_;_;3;4|] -> printfn "Ends with 3;4"
| _ -> printfn "Array not recognized"
As to why immutable singly linked lists are preferred in functional programming like F#, there's a lot to say, but the short answer is that it allows an O(1) prepend efficiency, and allows the implementation to share nodes, so it is easy on memory. For example,
let x = [2;3]
let y = 1::x
Here y is created by prepending 1 to x, but x is neither modified nor copied, so making y was very cheap. We can see a bit how this is possible, since x points to the head, 2, of the initially constructed list and can only move forward, and since the elements of the list it points to can't be mutated, it doesn't matter that y shares nodes with it.
In functional languages lists are usually single-linked lists. I.e. it is not necessary to copy the complete list. Instead prepending (often called cons) is a O(1) operation and you can still use the old list, because lists are immutable.
First of all, arrays are pretty low-level data structure and they are really only useful if you know the lenght of the array when creating it. That's not often the case and that's a reason why C# programmers use System.Collections.Generic.List<T> and F# programmers use F# list<T>.
The reason why F# prefers its own functional list rather than using .NET List<T> is that functional languages prefer immutable types. Instead of modifying the object by calling list.Add(x), you can create new list with items added to the front by writing let newList = x::list.
I also agree with Stephen that using arrays in F# is not awkward at all. If you know the number of elements you're working with or you're transforming some existing data source, then working with arrays is quite easy:
/ You can create arrays using `init`
let a = Array.init 10 (fun i -> (* calculate i-th element here *) )
// You can transform arrays using `map` and `filter`
a |> Array.map (fun n -> n + 10)
|> Array.filter (fun n -> n > 12)
// You can use array comprehensions:
let a2 = [| for n in a do
if n > 12 then yield n + 10 |]
This is essentially the same as processing lists - there you would use list comprehensions [ ... ] and list processing functions such as List.map etc. The difference really appears just when initializing the list/array.
F# makes it awkward to work with arrays
F# provides many features that make it easier to work with arrays than in other languages, including array literals, array patterns and higher-order functions.
The question that immediately sprang to mind: isn't easy prepending and appending something that should be frowned upon in an immutable language?
I believe you have misunderstood what that statement means. When people talk about prepending and appending in the context of purely functional data structures they are referring to the creation of a new collection that is derived (and shared most of its internals) with an existing collection.
So why do the designers of F# prefer lists?
F# inherited some list-related capabilities from OCaml which inherited them from Standard ML and ML because singly-linked immutable lists are very useful in the context of their application domain (metaprogramming) but I would not say that the designers of F# prefer lists.
What is the fundamental difference in an immutable environment between a list and an array?
In F#, lists provide O(1) prepend and append and O(n) random access whereas arrays provide O(n) prepend and append and O(1) random access. Arrays can be mutated but lists cannot.
What am I missing?
Basic knowledge of purely functional data structures. Read Okasaki.
Are lists in F# really linked lists?
Yes. Specifically, singly-linked immutable lists. In fact, in some MLs the list type can be defined as:
type 'a list =
| ([])
| (::) of 'a * 'a list
This is why the :: operator is a constructor and not a function so you cannot write (::) as you can with, for example, (+).
An F# list is more like the following datastructure - a single linked list:
public class List<T> {
public List(T item, List<T> prev) { /*...*/ }
public T Item { get; }
public List<T> Prev { get; }
}
So when a new list is created, it is actually creating a single node with a reference to the first element of the previous list, rather than copying the entire array.

insert element in a list and return the same list updated

Hi i'm trying to insert an element in a list but it is very important from my program that the result is stored in the original list and not in a new one.
Any code that i have written or found on the internet only succeeds if you create a new list in which the end result is kept.
So my question is can anyone tell me how to define a function: insert(X,L) where X is an element and L is a list?
No, Prolog just doesn't work that way. There is no such thing as "modifying" a value. A variable can be unified with a specific value, but if it was already [1,3], it won't ever be [1,2,3] later.
As aschepler says, you cannot add or make any change to a proper list, i.e. a list in which every element is already bound. The only "modifying" we can do is unifying one expression with another.
However there is a concept of a partial list to which additional elements can be "added" at the end. This is typically known as a difference list, although that nomenclature may not be immediately understandable.
Suppose we start, not with an empty list, but with a free variable X. One might however think of subtracting X from X and getting "nothing". That is, an empty difference list is represented by X - X. The minus "-" here is a purely formal operator; no evaluation of the difference is intended. It's just a convenient syntax as you see from how difference lists can be used to accomplish what you (probably) want to do.
We can add an element to a difference list as follows:
insertDL(M,X-Y,X-Z) :- Y = [M|Z].
Here M is the new element we want to add, X-Y is the "old" difference list, and X-Z is the "new" difference (to which M has been added, by unifying the previously free variable Y with the partial list [M|Z], so that Z becomes the "open" tail of partial list X).
When we are finally done inserting things into our difference list, we can turn X into a proper list by setting the "free tail" at that point to the empty list [ ]. In this sense X is the "same" variable as when we first began, just unified by incremental steps from free variable to proper list.
This is a very powerful technique in Prolog programming, and it takes some practice to feel comfortable using it. Some links to further discussion on the Web:
[From Prolog lists to difference lists]
http://www.irisa.fr/prive/ridoux/ICLP91/node8.html
[Implementing difference lists in Prolog]
http://www.cl.cam.ac.uk/~jpw48/difflists.pdf
[Lecture Notes: Difference Lists]
http://www.cs.cmu.edu/~fp/courses/lp/lectures/11-diff.pdf
Some prologs provide the setarg/3 predicate in order to modify terms in place.
In order to use it over lists, you only need to consider that they are just a nice representation of chains of compound terms with functor '.'/2
In any case, when you need to use setarg/3 in Prolog, it probably means you are doing something wrong.

Resources