Notes: How to take a derivative of a generalized continued fraction
Entire books have been written about generalized continued fractions1⁻2, and there is a great review article on numerical evaluation3. Of course the article on Wikipedia is also good. But I didn’t find an explanation of how to compute a derivative of a generalized continued fraction in any of these, which is why I wrote up these notes. Anyway, let’s start at the beginning.
Simple continued fractions
Before we get into generalized continued fractions, we should start with their predecessors, just plain continued fractions. Traditionally, a continued fraction was a way to represent any real number – for example, – by the closest rational approximations. A simple algorithm lets you generate an infinite sequence of integers by subtracting off one rational approximation and finding the integer whose reciprocal is closest to the remainder. The sequence is our way of denoting
Nesting all those fractions can make things look messy, so people usually resort to a kind of hacky notation,
If this sequence terminates, then it’s a rational number. For a number like , there is an obvious pattern,
In fact, a “quadratic irrational” will have a continued fraction with a periodic sequence of integers! Other numbers have patterns without repeating, while some just don’t have any pattern you can spot…
Obviously we’re not going to be taking derivatives of constants, so let’s move on to generalized continued fractions.
Generalized continued fractions
It’s not hard to imagine how to generalize the classical continued fractions so that they depend on some parameter . This kind of thing actually comes up quite naturally in the theory of three term recurrence relations, which is used for solving certain differential equations – and lots of other places in math. The continued fraction representation sometimes converges much more quickly than other ways of computing a function. A generalized continued fraction looks like
So, now we have two sequences, and , and they both depend on some parameter (or multiple parameters!). One example is
that is, it’s given by the two sequences
If we throw away all the terms with for some , then we get a sequence of “approximants” or “convergents”. For example let’s say we use this above continued fraction to try to evaluate , by evaluating at and multiplying the final result by 4. Then we would get the sequence
The last of these is good to about 0.004% (note that this is not as good as the best continued fraction for with the same number of terms, but that is a different question).
How to take a derivative of a generalized continued fraction
Suppose we’re given a function that we only know in terms of its continued fraction representation, and we want to compute its derivative . The first thing you might try (well, that I tried) is to apply the quotient rule and chain rule on the expression in Eq. \eqref{eq:general}. This leads to an explosion of algebra but not an answer.
Instead of working with the notation of an infinite nested fraction, we will instead think about the value of a continued fraction in terms of its convergents. That is, when the CF converges, its value is the limit of the convergents,
Here you truncate the terms with , then do the algebra to clear out denominators and make an ordinary fraction , with no nested fractions. This sequence of ratios starts off
All the way back in 1655/6 in his text Arithmetica Infinitorum, John Wallis showed that the and satisfy a recurrence (which you can prove by induction),
which starts off with a fake “-1” term and the zeroth term,
As an aside, the ’s and ’s may grow exponentially4 and lead to a loss of precision on the computer. To avoid this, there are various improvements to Wallis’s original algorithm, one of which we will discuss below.
So what does this have to do with evaluating the derivative? Well, starting from the limit definition of the CF, and assuming the CF converges absolutely in a neighborhood so we can bring the derivative inside the limit, we will find the derivative from
So, if we know how to compute the derivatives and , we’ll be in business. All we have to do is differentiate Eqs. \eqref{eq:A_recurrence} and \eqref{eq:B_recurrence} to get the recurrence relations,
Here think of the and as derivatives that you calculate by hand, since you know the original functions; but the terms and as values in a recurrence that we compute from the bottom up. Of course we need initial values, which we get by differentiating Eq. \eqref{eq:AB_init},
Hopefully it is clear how to generalize this to continued fractions that depend on k variables: each of the derivatives above is just replaced by a k-dimensional gradient (co)vector, and the result of all the recurrences is the (k-dimensional) gradient df.
Similarly, if you want the Hessian matrix, or any higher derivative tensor, just apply more partial derivatives, and keep track of more auxiliary variables.
Automatic differentiation point of view
As a note here, I should thank Rob Corless who helped me ensure I understood how to do the above, and emphasized another point of view. This point of view is as follows: we should not distinguish between (a) some abstract mathematical function and (b) a computer algorithm that can be used to produce arbitrarily precise numerical values from that function. Just like power series or integrals or continued fractions or other representations, a computer algorithm is a representation of a function. Now for different representations of , we may find various representations of – maybe a series or integral or the algorithm up above.
How, in general, do you find the derivative of a numerical algorithm? Your first temptation might be to use finite difference. But we can do much better. In fact, every algorithm (for a differentiable function) contains basically all the information on how to compute its derivative (or gradient, with more arguments). Usually this is expressed in terms of “automatic differentiation” and/or “dual numbers”. Replace every number x with a pair . Now define algebraic operations on these dual numbers, for example,
and so on. Here we see that second argument is just expressing linearity of the derivative, the product rule, and the quotient rule. Now as your algorithm is doing some calculation, it is also keeping track of the derivative – as long as it knows a few basic rules like these.
If you are working with a language that allows polymorphism or generics, then you can promote any numerical algorithm to one that can automatically compute its own derivative (in “forward mode” auto-diff). Just build an algebraic type for dual numbers and overload all of its arithmetic operations. You can also make specializations for special functions when you can compute the derivative by hand, for example
So, if you have an implementation of an algorithm for computing a continued fraction, then you can automatically get an algorithm for computing the derivative of a continued fraction. Or, you can implement the derivative of Wallis’s algorithm above, or for the modified Lentz method below.
Modified Lentz method
To avoid the possibly exponential growth of the and coefficients, the modified Lentz method4 instead constructs a recurrence for their successive ratios,
(but we never actually need the ’s or ’s). From the recurrence relations for and , we get the new recurrence relations
And finally to compute the CF, we multiply these successive ratios together,
As before we need to start the recurrence with initial conditions,
But, there is a danger in the Lentz method, because of the division steps involved in Eq. \eqref{eq:CD_recur}. To avoid this potential pitfall, is set to a tiny but non-zero value if it ever exactly cancels (this includes in the initial condition \eqref{eq:lentz_IC}). Similarly, if the denominator of ever exactly cancels, then is replaced with the reciprocal of that tiny number.
Finally, this algorithm needs a stopping condition. This is usually determined by testing if the absolute change is smaller than your desired tolerance.
Derivatives in modified Lentz method
To compute the derivative of a CF using the modified Lentz method, we again assume we’re handed methods to compute and (or k-dimensional gradients). Then we differentiate all the recurrence relations above, to find a recurrence for the derivatives,
Here the only dangerous division is by , which is replaced with a tiny number if it exactly vanishes.
Code example
Pseudocode for the modified Lentz method is listed in5 or in the
freely-available article4. I implemented this in python in my
package qnm
, since computing
the quasinormal mode frequencies of black holes requires finding roots
of continued fraction equations. Here let me list a version that
would also compute a derivative of the continued fraction at the same
time. The user should specify functions a, b, da, db
that will
return the values of . I have also modified
the stopping condition so that it can be made to perform a minimum or
maximum number of iterations (steps of the recursion).
import numpy as np
def lentz_with_grad(a, b, da, db,
args=(),
tol=1.e-10,
N_min=0, N_max=np.Inf,
tiny=1.e-30):
"""Compute a continued fraction (and its derivative) via modified
Lentz's method.
This implementation is by the book [1]_. The value to compute is:
b_0 + a_1/( b_1 + a_2/( b_2 + a_3/( b_3 + ...)))
where a_n = a(n, *args) and b_n = b(n, *args).
Parameters
----------
a: callable returning numeric.
b: callable returning numeric.
da: callable returning array-like.
db: callable returning array-like.
args: tuple [default: ()]
Additional arguments to pass to the user-defined functions a, b,
da, and db. If given, the additional arguments are passed to
all user-defined functions, e.g. `a(n, *args)`. So if, for
example, `a` has the signature `a(n, x, y)`, then `b` must have
the same signature, and `args` must be a tuple of length 2,
`args=(x,y)`.
tol: float [default: 1.e-10]
Tolerance for termination of evaluation.
N_min: int [default: 0]
Minimum number of iterations to evaluate.
N_max: int or comparable [default: np.Inf]
Maximum number of iterations to evaluate.
tiny: float [default: 1.e-30]
Very small number to control convergence of Lentz's method when
there is cancellation in a denominator.
Returns
-------
(float, array-like, float, int)
The first element of the tuple is the value of the continued
fraction.
The second element is the gradient.
The third element is the estimated error.
The fourth element is the number of iterations.
References
----------
.. [1] WH Press, SA Teukolsky, WT Vetterling, BP Flannery,
"Numerical Recipes," 3rd Ed., Cambridge University Press 2007,
ISBN 0521880688, 9780521880688 .
"""
if not isinstance(args, tuple):
args = (args,)
f_old = b(0, *args)
if (f_old == 0):
f_old = tiny
C_old = f_old
D_old = 0.
# f_0 = b_0, so df_0 = db_0
df_old = db(0, *args)
dC_old = df_old
dD_old = 0.
conv = False
j = 1
while ((not conv) and (j < N_max)):
aj, bj = a(j, *args), b(j, *args)
daj, dbj = da(j, *args), db(j, *args)
# First: modified Lentz
D_new = bj + aj * D_old
if (D_new == 0):
D_new = tiny
D_new = 1./D_new
C_new = bj + aj / C_old
if (C_new == 0):
C_new = tiny
Delta = C_new * D_new
f_new = f_old * Delta
# Second: the derivative calculations
# The only possibly dangerous denominator is C_old,
# but it can't be 0 (at worst it's "tiny")
dC_new = dbj + (daj*C_old - aj*dC_old)/(C_old*C_old)
dD_new = -D_new*D_new*(dbj + daj*D_old + aj*dD_old)
df_new = df_old*Delta + f_old*dC_new*D_new + f_old*C_new*dD_new
# Did we converge?
if ((j > N_min) and (np.abs(Delta - 1.) < tol)):
conv = True
# Set up for next iter
j = j + 1
C_old = C_new
D_old = D_new
f_old = f_new
dC_old = dC_new
dD_old = dD_new
df_old = df_new
# Success or failure can be assessed by the user
return f_new, df_new, np.abs(Delta - 1.), j-1
Now let’s demonstrate this with a calculation of the continued fraction for . Of course the tangent function is already included in almost any numerical library… and from first year calculus, we know that its derivative is , which we can also compute from a numerical library. But sometimes we don’t have these luxuries! Anyway, the continued fraction is
which is given by the two sequences
Taking derivatives we immediately get
We can code these up in a few lines:
def tanx_a(n, x):
return x if n==1 else -x*x
def tanx_b(n, x):
return 0. if n==0 else 2*n-1
def tanx_da(n, x):
return 1. if n==1 else -2*x
def tanx_db(n, x):
return 0.
And finally, let’s call the routine to evaluate the continued fraction
, which will also compute the derivative,
. We specify with the parameter args=1.
(which
should be a tuple for functions that take more than one parameter).
Let’s ask for 15 digits of accuracy:
>>> lentz_with_grad(tanx_a, tanx_b,
... tanx_da, tanx_db,
... args=1., tol=1.e-15)
(1.5574077246549018, 3.4255188208147596, 2.220446049250313e-16, 10)
The return value tells us that ,
, the estimated error on the
function value is , and it only took 10
iterations to compute these two numbers! Just for peace of mind,
let’s check these values with the library routines in numpy
(which
are really from libm
)
>>> (np.tan(1.), 1/np.cos(1.)**2)
(1.557407724654902, 3.425518820814759)
They agree! And indeed, the difference between the CF approach and the result from the standard library is about , in agreement with the estimated error.
References
-
Hall, Analytic theory of continued fractions, (1948), Chelsea publishing company. ↩
-
Cuyt et al., Handbook for continued fractions for special functions, (2008), Springer. ↩
-
Blanch, Numerical Evaluation of Continued Fractions, SIAM Review, 6(4), 383-421 (1964). ↩
-
Press and Teukolsky, Evaluating continued fractions and computing exponential integrals, Computers in Physics, 2(5), 88-89 (1988). ↩ ↩2 ↩3
-
Press et al., Numerical Recipes, 3rd ed. (2007), Cambridge University Press. ↩