How to debug type-level programs - debugging

I'm trying to do some hoopy type-level programming, and it just doesn't work. I'm tearing my hair out trying to figure out why the heck GHC utterly fails to infer the type signatures I want.
Is there some way to make GHC tell me what it's doing?
I tried -ddump-tc, which just prints out the final type signatures. (Yes, they're wrong. Thanks, I already knew that.)
I also tried -ddump-tc-trace, which dumps out ~70KB of unintelligible gibberish. (In particular, I can't see any user-written identifiers mentioned anywhere.)
My code is so close to working, but somehow an extra type variable keeps appearing. For some reason, GHC can't see that this variable should be completely determined. Indeed, if I manually write the five-mile type signature, GHC happily accepts it. So I'm clearly just missing a constraint somewhere... but where?!? >_<

As has been mentioned in the comments, poking around with :kind and :kind! in GHCi is usually how I go about doing it, but it also surprisingly matters where you place the functions, and what looks like it should be the same, isn't always.
For instance, I was trying to make a dependently typed functor equivalent, for a personal project, which looked like
class IFunctor f where
ifmap :: (a -> b) -> f n a -> f n b
and I was writing the instance for
data IEither a n b where
ILeft :: a -> IEither a Z b
IRight :: b -> IEither a (S n) b
It should be fairly simple, I thought, just ignore f for the left case, apply it in the right.
I tried
instance IFunctor (IEither a) where
ifmap _ l#(ILeft _) = l
ifmap f (IRight r) = IRight $ f r
but for the specialized version of ifmap in this case being ifmap :: (b -> c) -> IEither a Z b -> IEither a Z c, Haskell inferred the type of l to be IEither a Z b on the LHS, which, makes sense, but then refused to produce b ~ c.
So, I had to unwrap l, get the value of type a, then rewrap it to get the IEither a Z c.
This isn't just the case with dependent types, but also with rank-n types.
For instance, I was trying to convert isomorphisms of a proper form into natural transformations, which should be fairly easy, I thought.
Apparently, I had to put the deconstructors in a where clause of the function, because otherwise type inference didn't work properly.

Related

Why does refactoring data to newtype speed up my haskell program?

I have a program which traverses an expression tree that does algebra on probability distributions, either sampling or computing the resulting distribution.
I have two implementations computing the distribution: one (computeDistribution) nicely reusable with monad transformers and one (simpleDistribution) where I concretize everything by hand. I would like to not concretize everything by hand, since that would be code duplication between the sampling and computing code.
I also have two data representations:
type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show
When I use the data version with the reusable code, computing the distribution of 20d2 (ghc -O3 program.hs; time ./program 20 > /dev/null) takes about one second, which seems way too long. Pick higher values of n at your own peril.
When I use the hand-concretized code, or I use the newtype representation with either implementation, computing 20d2 (time ./program 20 s > /dev/null) takes the blink of an eye.
Why?
How can I find out why?
My knowledge of how Haskell is executed is almost nil. I gather there's a graph of thunks in basically the same shape as the program, but that's about all I know.
I figure with newtype the representation of Distribution is the same as that of Measure, i.e. it's just a list, whereas with the data version each Distribution is kinda' like a single-field record, except with a pointer to the contained list, and so the data version has to perform more allocations. Is this true? If true, is this enough to explain the performance difference?
I'm new to working with monad transformer stacks. Consider the Let and Uniform cases in simpleDistribution — do they do the same as the walkTree-based implementation? How do I tell?
Here's my program. Note that Uniform n corresponds to rolling an n-sided die (in case the unary-ness was surprising).
Update: based on comments I simplified my program by removing everything not contributing to the performance gap. I made two semantic changes: probabilities are now denormalized and all wonky and wrong, and the simplification step is gone. But the essential shape of my program is still there. (See question edit history for the non-simplified program.)
Update 2: I made further simplifications, reducing Distribution down to the list monad with a small twist, removing everything to do with probabilities, and shortening the names. I still observe large performance differences when using data but not newtype.
import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)
main = do
args <- getArgs
let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
print $ f dieCount
fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()
walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)
data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show
class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform
instance Functor P where fmap f (P pxs) = P $ fmap f pxs
instance Applicative P where
pure x = P [x]
(P pfs) <*> (P pxs) = P $ pfs <*> pxs
instance Monad P where
(P pxs) >>= f = P $ do
x <- pxs
case f x of P fxs -> fxs
How can I find out why?
This is, in general, hard.
The extreme way to do it is to look at the core code (which you can produce by running GHC with -ddump-simpl). This can get complicated really quickly, and it's basically a whole new language to learn. Your program is already big enough that I had trouble learning much from the core dump.
The other way to find out why is to just keep using GHC and asking questions and learning about GHC optimizations until you recognize certain patterns.
Why?
In short, I believe it's due to list fusion.
NOTE: I don't know for sure that this answer is correct, and it would take more time/work to verify than I'm willing to put in right now. That said, it fits the evidence.
First off, we can check whether this slowdown you're seeing is a result of something truly fundamental vs a GHC optimization triggering or not by running in O0, that is, without optimizations. In this mode, both Distribution representations result in about the same (excruciatingly long) runtime. This leads me to believe that it's not the data representation that is inherently the problem but rather there's an optimization that's triggered with the newtype version that isn't with the data version.
When GHC is run in -O1 or higher, it engages certain rewrite rules to fuse different folds and maps of lists together so that it doesn't need to allocate intermediate values. (See https://markkarpov.com/tutorial/ghc-optimization-and-fusion.html#fusion for a decent tutorial on this concept as well as https://stackoverflow.com/a/38910170/14802384 which additionally has a link to a gist with all of the rewrite rules in base.) Since computeDistribution is basically just a bunch of list manipulations (which are all essentially folds), there is the potential for these to fire.
The key is that with the newtype representation of Distribution, the newtype wrapper is erased during compilation, and the list operations are allowed to fuse. However, with the data representation, the wrappers are not erased, and the rewrite rules do not fire.
Therefore, I will make an unsubstantiated claim: If you want your data representation to be as fast as the newtype one, you will need to set up rewrite rules similar to the ones for list folding but that work over the Distribution type. This may involve writing your own special fold functions and then rewriting your Functor/Applicative/Monad instances to use them.

Substitution system

I am currently developing an advanced text editor with some features.
One of the features is a substitution system. It will help user to replace strings quickly.
It has a following format:
((a x) | x) (d e)
We can divide the string into two parts: left ((a x) | x) and right (d e). If there is a letter (x, in current case) after a |, then we then can perform a substitute action - replace all x in the left side with a string on the right side.
So after the action we will receive (a (d e)).
Of course, these parenthesized expressions might be nested : ((a | a) (d x | d)) (e f | f) -> ((e f | f) x) -> (e x).
Unfortunately, the reductions may continue infinitely: (a a | a) (a a | a).
I need to show a warning if user would write a string for which there is no way to reduce it into a form without reductions.
Any suggestions on how to do it?
Congratulations, you have just invented the λ-calculus (pronounced "Lambda-calculus")
Why this problem is hard
Let's use the original notation for this, since it was already invented by Church in the 1930s:
((λx.f) u) is the rewriting of f where all x's have been replaced by u's.
Note that this is equivalent to you notation (f | x) u, where f is a string that can contain x's. It's a mathematical tool introduced to understand what functions are "computable", i.e. can be given to a computer with an adequate program, so that for all input, the computer will run its program and output the correct answer. Spoiler: computable functions are exactly the functions that can be written as λ-terms (i.e. rewritings in your settings).
Sometimes λ-terms can be simplified, as you have noted yourself. For instance
(λx.yx)((λu.uu)(λz.z)i) -> (λx.yx)((λz.z)(λz.z)i) -> (λx.yx)((λz.z)i) -> (λx.yx)i -> yi
Sometimes the sequence of simplifications (also called "evaluation", or "beta-reduction") is infinite. One very famous is the Omega operator (λx.xx)(λx.xx) (rings a bell?).
Ideally, we would like to restrict the λ-calculus so that all terms are "simplifiable" to a final usable form in a finite number of steps. This property is called "normalization". The actual thing we want however is one step further : we want that all sequences of simplifications end up in a finite number of steps in the final form, so that when faced with multiple choices, you can choose either and not get stuck in an infinite loop because of a bad choice. This is called "strong normalization".
Here's the issue : the λ-calculus is not strongly normalizing. This property is simply not true. There are terms that do not end up in a final - "normal" - form.
You have found one yourself.
How this problem has been solved theoretically
The key to gaining the strong normalization property was to rule out λ-terms which did not satisfy this property (print a warning and spit an error in your case) so that we only consider strongly normalizing λ-terms. This "ruling out" was put in place via a typing system : λ-terms that have a valid type are strongly normalizing, and λ-terms that are not strongly normalizing cannot have a valid type. Awesome, now we know which ones should give errors ! Let's check that with simple cases.
From the way you are able to phrase your problem in a very clear way, I'm assuming you already have experience with programming and static type systems, but I can modify this answer if you'd rather have a full explanation. I'll be using Caml-like notation for types so s is a string, s -> s is a function that to a string associates a string, and s -> (s -> s) is a function that to a string associates a function from string to string, etc. I denote x : t when variable x has type t.
λx.ax : s -> s provided a : s -> s
(λx.yx)((λu.uu)(λz.z)i) : s provided y : s -> s, i : s as we have seen by the reductions above
λx.x : s -> s. But watch out, λx.x : (s -> s) -> (s -> s) is also true. Deciding the type is hard
How you may solve this problem programatically
You problem is slightly easier, because you are only dealing with string replacements, so you know that the base type of everything is a string, and you can try to "type you way up" until you are either able to type the whole rewriting (i.e. no errors), or able to prove that the rewriting is not typable (see Omega) and spit an error. Watch out though : just because you are not able to type a rewriting does not mean that it cannot be typed !
In your example (λx.ax)(de), you know the actual values for a, d and e so you may have for instance a : s -> s, d : s -> s, e : s, hence de : s and λx.ax : s -> s so the whole thing has type s and you're good to go. You can probably write a compiler that will try to type the rewriting and figure out if it's typable or not based on a set of cleverly-crafted decision rules for the specific use that you want. You can even decide that if the compiler fails to type a rewriting, then it is rejected (even when it's valid) because cases so intricate that they will fail though being correct should never happen in a reasonable editor text substitution scenario.
Do you want to solve this programatically ?
No. I mean it. You really don't want to.
Remember, λ-terms describe all computable functions.
If you were to really implement a fully correct warning generator as you seem to intend to, this means that one could encode any program as a string substitution in you editor, so your editor is essentially a programming language on its own, which is typically not what you want your editor to be.
I mean, writing a full program that queries the webcam, analyzes the image, guesses who is typing, and loads an ad based on Google's suggestion when opening a file, all as a cleverly-written text substitution command, really ? Is that what you're trying to achieve ?
PS: If this is indeed what you're trying to achieve, have a look at Lisp and Haskell, their syntax should look somewhat... familiar to what you've seen here.
And good luck with your editor !

What kind of lens combinator is this?

I’m making a lens combinator in Ruby and I can’t figure out what the generally accepted name of it is. The unknown function composes two lenses that have the same source type and their target type (using Benjamin C. Pierce’s terminology) is a hash map. The unknown function takes in those two lenses and returns a new lens which has the same source type and target type as the original lenses.
It looks like this (Ruby syntax):
lens_a.get(source)
> {:title => "some title"}
lens_b.get(source)
> {:description => "some description"}
new_lens = unknown_function(lens_a, lens_b)
new_lens.get(source)
> {:title => "some title", :description => "some description"}
A diagram of the combinator I’m trying to build can be seen on slide 18 of this presentation (the slide’s title is “Merge?”).
I’ve looked at Haskell's lens docs (small parts of which I can understand), but I can’t figure out which combinator this is.
What is the standard name for the unknown_function above? If this lens doesn’t have a standard name, are there a few standard functions that can be composed to make it? If not, I’ll probably just call it merge.
I believe that roughly the right idea for your combinator, in Haskell, is
merge :: Lens' s t -> Lens' s t -> Lens' s (t, t)
possibly generalized to
merge :: Lens' s t -> Lens' s t' -> Lens' s (t, t')
so that your two targets may differ in type. We could "implement" this as follows, but doing so will reveal a problem
merge l1 l2 = lens pull push where
pull s = (view l1 s, view l2 s)
push s (t1, t2) = set l2 t2 (set l1 t1 s)
In particular, on the Setter side of this equation we explicitly order the way we put back our values. This means that we can use merge to easily build lenses which violate the lens laws. In particular, they violate the PUT-GET law. Here's a counterexample
newtype Only a = Only a
-- A lens focused on the only slot in Only
only :: Lens' (Only a) a
only inj (Only a) = Only <$> inj a
-- We merge this lens with *itself*
testLens :: Lens' (Only a) (a, a)
testLens = merge only only
-- and now it violates the lens laws
> view testLens (Only 1 & testLens .~ (1,2))
(2,2)
Or, in plain english, if we merge a lens with itself then the Setter side turns into two sequential sets on the same location. This means that if we try to set it with a pair of values which are different only the second set will persist and thus we violate the PUT-GET law.
The lens library tends to eschew invalid lenses so this combinator isn't available. The closest we have is alongside which has the following (restricted) type
alongside :: Lens' s a
-> Lens' s' a'
-> Lens' (s,s') (a,a')
but, as you can see, this ensures that our source is also a product type so that the Setters apply uniquely to each side of the source. If we were to try to write a dup Lens which we could compose with alongside to build merge we'll run into the same problem as before
dup :: Lens' a (a, a) -- there are two possible implementations, neither are lenses
dup1 inj a = fst <$> inj a
dup2 inj a = snd <$> inj a
Now, all of this might be fairly technical and meaningless from a Ruby perspective. In Ruby you're not going to have compiler enforced type safety and so you'll likely blend a lot of the rules together and achieve less strictness. In such a case, it might be meaningful to implement merge with its sequential writes semantics.
In fact, based on the examples you've given, it looks like the counterexample I pointed out cannot even happen in Ruby since your target type is a Hash and ensures unique keys already.
But it's important to note that merge can be used to build improper lenses if you look at it hard enough. Fundamentally what that means is that in more complex scenarios, using merge can easily lead to weird bugs caused by it violating our intuitive sense for what a lens is. This may not be a huge problem for Ruby as complex abstraction is frowned upon in the Ruby community anyway, but that's why we have the laws.
Seems to be like you want to combine two lenses into a traversal. I don't see that combinator anywhere, but I'm new to lens so I'm not sure what it would be called anyway.
Something like:
compoundEye :: Traversable tl => tl (Lens s t a b) -> Traversal s t a b
But, the hoogle results are not promising.

Haskell: Caches, memoization, and referential transparency [duplicate]

I can't figure out why m1 is apparently memoized while m2 is not in the following:
m1 = ((filter odd [1..]) !!)
m2 n = ((filter odd [1..]) !! n)
m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.
GHC does not memoize functions.
It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:
m1' = (!!) (filter odd [1..]) -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n
(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)
Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.
Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function
f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])
GHC might notice that y does not depend on x and rewrite the function to
f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])
In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.
m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form
There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:
m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.
I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not. The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input. Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.
Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime. It does it lazily, to be sure, but does it nonetheless. I noticed it myself, when I wrote the following function:
primes = filter isPrime [2..]
where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
where f `divides` n = (n `mod` f) == 0
Then to test it, I entered GHCI and wrote: primes !! 1000. It took a few seconds, but finally I got the answer: 7927. Then I called primes !! 1001 and got the answer instantly. Similarly in an instant I got the result for take 1000 primes, because Haskell had to compute the whole thousand-element list to return 1001st element before.
Thus if you can write your function such that it takes no parameters, you probably want it. ;)

lenses, fclabels, data-accessor - which library for structure access and mutation is better

There are at least three popular libraries for accessing and manipulating fields of records. The ones I know of are: data-accessor, fclabels and lenses.
Personally I started with data-accessor and I'm using them now. However recently on haskell-cafe there was an opinion of fclabels being superior.
Therefore I'm interested in comparison of those three (and maybe more) libraries.
There are at least 4 libraries that I am aware of providing lenses.
The notion of a lens is that it provides something isomorphic to
data Lens a b = Lens (a -> b) (b -> a -> a)
providing two functions: a getter, and a setter
get (Lens g _) = g
put (Lens _ s) = s
subject to three laws:
First, that if you put something, you can get it back out
get l (put l b a) = b
Second that getting and then setting doesn't change the answer
put l (get l a) a = a
And third, putting twice is the same as putting once, or rather, that the second put wins.
put l b1 (put l b2 a) = put l b1 a
Note, that the type system isn't sufficient to check these laws for you, so you need to ensure them yourself no matter what lens implementation you use.
Many of these libraries also provide a bunch of extra combinators on top, and usually some form of template haskell machinery to automatically generate lenses for the fields of simple record types.
With that in mind, we can turn to the different implementations:
Implementations
fclabels
fclabels is perhaps the most easily reasoned about of the lens libraries, because its a :-> b can be directly translated to the above type. It provides a Category instance for (:->) which is useful as it allows you to compose lenses. It also provides a lawless Point type which generalizes the notion of a lens used here, and some plumbing for dealing with isomorphisms.
One hindrance to the adoption of fclabels is that the main package includes the template-haskell plumbing, so the package is not Haskell 98, and it also requires the (fairly non-controversial) TypeOperators extension.
data-accessor
[Edit: data-accessor is no longer using this representation, but has moved to a form similar to that of data-lens. I'm keeping this commentary, though.]
data-accessor is somewhat more popular than fclabels, in part because it is Haskell 98. However, its choice of internal representation makes me throw up in my mouth a little bit.
The type T it uses to represent a lens is internally defined as
newtype T r a = Cons { decons :: a -> r -> (a, r) }
Consequently, in order to get the value of a lens, you must submit an undefined value for the 'a' argument! This strikes me as an incredibly ugly and ad hoc implementation.
That said, Henning has included the template-haskell plumbing to automatically generate the accessors for you in a separate 'data-accessor-template' package.
It has the benefit of a decently large set of packages that already employ it, being Haskell 98, and providing the all-important Category instance, so if you don't pay attention to how the sausage is made, this package is actually pretty reasonable choice.
lenses
Next, there is the lenses package, which observes that a lens can provide a state monad homomorphism between two state monads, by definining lenses directly as such monad homomorphisms.
If it actually bothered to provide a type for its lenses, they would have a rank-2 type like:
newtype Lens s t = Lens (forall a. State t a -> State s a)
As a result, I rather don't like this approach, as it needlessly yanks you out of Haskell 98 (if you want a type to provide to your lenses in the abstract) and deprives you of the Category instance for lenses, which would let you compose them with .. The implementation also requires multi-parameter type classes.
Note, all of the other lens libraries mentioned here provide some combinator or can be used to provide this same state focalization effect, so nothing is gained by encoding your lens directly in this fashion.
Furthermore, the side-conditions stated at the start don't really have a nice expression in this form. As with 'fclabels' this does provide template-haskell method for automatically generating lenses for a record type directly in the main package.
Because of the lack of Category instance, the baroque encoding, and the requirement of template-haskell in the main package, this is my least favorite implementation.
data-lens
[Edit: As of 1.8.0, these have moved from the comonad-transformers package to data-lens]
My data-lens package provides lenses in terms of the Store comonad.
newtype Lens a b = Lens (a -> Store b a)
where
data Store b a = Store (b -> a) b
Expanded this is equivalent to
newtype Lens a b = Lens (a -> (b, b -> a))
You can view this as factoring out the common argument from the getter and the setter to return a pair consisting of the result of retrieving the element, and a setter to put a new value back in. This offers the computational benefit that the 'setter' here can recycle some of the work used to get the value out, making for a more efficient 'modify' operation than in the fclabels definition, especially when accessors are chained.
There is also a nice theoretical justification for this representation, because the subset of 'Lens' values that satisfy the 3 laws stated in the beginning of this response are precisely those lenses for which the wrapped function is a 'comonad coalgebra' for the store comonad. This transforms 3 hairy laws for a lens l down to 2 nicely pointfree equivalents:
extract . l = id
duplicate . l = fmap l . l
This approach was first noted and described in Russell O'Connor's Functor is to Lens as Applicative is to Biplate: Introducing Multiplate and was blogged about based on a preprint by Jeremy Gibbons.
It also includes a number of combinators for working with lenses strictly and some stock lenses for containers, such as Data.Map.
So the lenses in data-lens form a Category (unlike the lenses package), are Haskell 98 (unlike fclabels/lenses), are sane (unlike the back end of data-accessor) and provide a slightly more efficient implementation, data-lens-fd provides the functionality for working with MonadState for those willing to step outside of Haskell 98, and the template-haskell machinery is now available via data-lens-template.
Update 6/28/2012: Other Lens Implementation Strategies
Isomorphism Lenses
There are two other lens encodings worth considering. The first gives a nice theoretical way to view a lens as a way to break a structure into the value of the field, and 'everything else'.
Given a type for isomorphisms
data Iso a b = Iso { hither :: a -> b, yon :: b -> a }
such that valid members satisfy hither . yon = id, and yon . hither = id
We can represent a lens with:
data Lens a b = forall c. Lens (Iso a (b,c))
These are primarily useful as a way to think about the meaning of lenses, and we can use them as a reasoning tool to explain other lenses.
van Laarhoven Lenses
We can model lenses such that they can be composed with (.) and id, even without a Category instance by using
type Lens a b = forall f. Functor f => (b -> f b) -> a -> f a
as the type for our lenses.
Then defining a lens is as easy as:
_2 f (a,b) = (,) a <$> f b
and you can validate for yourself that function composition is lens composition.
I've recently written on how you can further generalize van Laarhoven lenses to get lens families that can change the types of fields, just by generalizing this signature to
type LensFamily a b c d = forall f. Functor f => (c -> f d) -> a -> f b
This does have the unfortunate consequence that the best way to talk about lenses is to use rank 2 polymorphism, but you don't need to use that signature directly when defining lenses.
The Lens I defined above for _2 is actually a LensFamily.
_2 :: Functor f => (a -> f b) -> (c,a) -> f (c, b)
I've written a library that includes lenses, lens families, and other generalizations including getters, setters, folds and traversals. It is available on hackage as the lens package.
Again, a big advantage of this approach is that library maintainers can actually create lenses in this style in your libraries without incurring any lens library dependency whatsoever, by just supplying functions with type Functor f => (b -> f b) -> a -> f a, for their particular types 'a' and 'b'. This greatly lowers the cost of adoption.
Since you don't need to actually use the package to define new lenses, it takes a lot of pressure off my earlier concerns about keeping the library Haskell 98.

Resources