The Art of Doing Science and Engineering

The Art of Doing Science and Engineering features some mathematics that I found a little difficult to follow. These are some notes I've made to help fill in the gaps.

Chapter 1

Hamming makes two claims:

  • Scientific knowledge doubles every 17 years
  • 90% of scientists who have ever lived are now alive He makes these claims in order to give an example of back-of-the-envelope calculations that you can do as part of sense checking. In this case, we want to check if these claims are compatible with each other.

He begins by assuming that the number of scientists at any time tt is:

y(t)=aebty(t) = ae^{bt}

He also assumes that the amount of knowledge produced annually has a constant kk proportionality to the number of scientists alive. So we can write our equation for the amount of scientific knowledge produced in year tt as

p(t)=kaebtp(t) = kae^{bt}

We can then get the total amount of scientific knowledge by calculating the integral of this equation between the limits -\infty and the current time TT.

  • The integral of ebte^{bt} is 1bet\frac{1}{b}e^t so when we multiply by the constants kaka, the indefinite integral of kaebtkae^{bt} is kaebtdt=kabebt+C\int_{} kae^{bt} dt = \frac{ka}{b}e^{bt} + C
  • At the lower limit, we can see from our plot that ebte^{bt} approaches 0 as tt gets smaller and as the whole term is multiplied by ebte^{bt}, assuming that b>0b > 0. This means that limxkabebt=0\lim_{x\to-\infty}\frac{ka}{b}e^{bt} = 0
  • At the upper limit t=Tt = T, so kabebT\frac{ka}{b}e^{bT}
  • The definite integral is the difference between the antiderivative evaluated at these points. In this case, it's a very straightforward sum kabebT0=kabebT\frac{ka}{b}e^{bT} - 0 = \frac{ka}{b}e^{bT} The process is exactly the same when finding the sum of knowledge up to 17 years ago, except our upper limit will be kabeb(T17)\frac{ka}{b}e^{b(T - 17)}.

So, to summarise:

Tkaebt dt=kabebTT17kaebt dt=kabeb(T17)\begin{aligned} \int_{-\infty}^{T} kae^{bt} \ dt &= \frac{ka}{b}e^{bT}\\ \int_{-\infty}^{T-17} kae^{bt} \ dt &= \frac{ka}{b}e^{b(T - 17)} \end{aligned}

Hamming's claim is that knowledge has been doubling every 17 years so we can say:

12=T17kaebtdtTkaebtdt=kabeb(T17)kabebT=eb(T17)ebT=eb(T17)bT=e17b\begin{aligned} \frac{1}{2} &= \frac{\int_{-\infty}^{T-17} kae^{bt} \, dt}{\int_{-\infty}^{T} kae^{bt} \, dt}\\ \\ &= \frac{\frac{ka}{b}e^{b(T-17)}}{\frac{ka}{b}e^{bT}}\\ \\ &= \frac{e^{b(T-17)}}{e^{bT}}\\ \\ &= e^{b(T-17) - bT} \\ \\ &= e^{-17b} \\ \end{aligned}

Above we are using the law of exponents that dividing exponential expressions with the same base is equivalent to subtracting their exponents.

Hamming estimates the length of a scientific career to be 55 years. Using his original equation for the number of scientists in a given year y(t)=aebty(t) = ae^{bt} we can divide the integral for the equation up to the current year by the integral of the equation up to 55 years ago.

T55TaebtdtTaebtdt=kabebTkabeb(T55)kabebT0=ebTeb(T55)ebT=1eb(T55)bT=1e55b\begin{aligned} \frac{\int_{T - 55}^{T}ae^{bt}dt}{\int_{-\infty}^{T}ae^{bt}dt} &= \frac{\frac{ka}{b}e^{bT} - \frac{ka}{b}e^{b(T - 55)}}{\frac{ka}{b}e^{bT} - 0}\\ \\ &= \frac{e^{bT} - e^{b(T - 55)}}{e^{bT}}\\ \\ &= 1 - e^{b(T - 55) - bT}\\ \\ &= 1 - e^{-55b} \end{aligned}

We can use our expression for the doubling of scientific knowledge e17b=12e^{-17b} = \frac{1}{2} to get the proportion of scientists alive today. We can use the law that ekx=(ex)ke^{kx} = (e^x)^k.

e17b=12(e17b)5517=(12)5517e17b5517=(12)5517e55b=(12)5517\begin{aligned} e^{-17b} &= \frac{1}{2}\\ \\ \left(e^{-17b}\right)^\frac{55}{17} &= \left(\frac{1}{2}\right)^\frac{55}{17}\\ \\ e^{-17b \cdot \frac{55}{17}} &= \left(\frac{1}{2}\right)^\frac{55}{17}\\ \\ e^{-55b} &= \left(\frac{1}{2}\right)^\frac{55}{17}\\ \end{aligned}

Substituting this back into the equation:

T55TaebtdtTaebtdt=1(12)5517=0.894...\begin{aligned} \frac{\int_{T - 55}^{T}ae^{bt}dt}{\int_{-\infty}^{T}ae^{bt}dt} &= 1 - \left(\frac{1}{2}\right)^\frac{55}{17}\\ \\ &= 0.894... \end{aligned}

This number is indeed very close to 90%.

Let's now come at the question from a different angle. Let's let DD be the doubling period and LL be the length of a scientific career.

The first equation becomes

ebD=12e^{-bD} = \frac{1}{2}

and the second becomes

1ebL=9101(12)LD=910(12)LD=110ln((12)LD)=ln(110)LDln(12)=ln(110)LD=ln(110)ln(12)LD=ln(1)ln(10)ln(1)ln(2)LD=ln(10)ln(2)LD=ln(10)ln(2)LD=log2(10)=3.3219...\begin{aligned} 1 - e^{-bL} &= \frac{9}{10}\\ \\ 1 - \left(\frac{1}{2}\right)^\frac{L}{D} &= \frac{9}{10}\\ \\ \left(\frac{1}{2}\right)^\frac{L}{D} &= \frac{1}{10}\\ \\ \ln\left(\left(\frac{1}{2}\right)^\frac{L}{D}\right) &= \ln\left(\frac{1}{10}\right)\\ \\ \frac{L}{D} \cdot \ln\left(\frac{1}{2}\right) &= \ln\left(\frac{1}{10}\right)\\ \\ \frac{L}{D} &= \frac{\ln\left(\frac{1}{10}\right)}{\ln\left(\frac{1}{2}\right)}\\ \\ \frac{L}{D} &= \frac{\ln(1) - \ln(10)}{\ln(1) - \ln(2)}\\ \\ \frac{L}{D} &= \frac{-\ln(10)}{-\ln(2)}\\ \\ \frac{L}{D} &= \frac{\ln(10)}{\ln(2)}\\ \\ \frac{L}{D} &= \log_{2}(10)\\ \\ &= 3.3219... \end{aligned}

Multiplying 3.3219 by our supposed doubling period of 17 years gives us 56.47 years for the average length of a scientific career, very close to Hamming's estimate of 55 years.

Chapter 2

In this chapter, Hamming discusses growth models. The simplest growth model assumes the growth rate is proportional to the current size. For instance, in the case of compound interest. We can describe this model with a differential equation.

dydt=ky\frac{dy}{dt} = ky

A differential equation is an equation that relates a function to its derivatives. In the context of growth models, differential equations are used to describe how something changes over time. The equation above is a first-order differential equation where dtdy\frac{dt}{dy}​ represents the rate of change of yy with respect to time tt, and kyky suggests that this rate of change is proportional to the current value of yy. Here, kk is a constant of proportionality.

The solution to a differential equation is a function (or a set of functions) that satisfies the equation. This means that if you take the solution and its derivatives and plug them back into the original differential equation, the equation will hold. In other words, the left-hand side of the equation will equal the right-hand side for all points in the domain of the solution.

Hamming tells us the solution to the equation but skips over how to derive it. We do it like so:

  1. We start by rearranging the equation so that each variable is on a different side of the equation. 1ydy=kdt\frac{1}{y} \cdot dy = k \cdot dt
  2. Next we integrate both sides of the equation ln(y)=kt+C\ln(y) = kt + C
  3. We can then exponentiate both sides of the equation to get rid of the natural logarithm. eln(y)=ekt+Ce^{\ln(y)} = e^{kt + C}
  4. This simplifies to y=ekteCy = e^{kt} \cdot e^{C}
  5. Since eCe^C is just a constant we can represent it as just AA. Thus we get the solution given by Hamming: y(t)=Aekty(t) = Ae^{kt} We can think of AA as representing the initial condition of the system. The equation y(t)=Aekty(t) = Ae^{kt} then tells us how the quantity yy changes over time. If k>0k > 0 we have growth. If k<0k < 0 we have decay.

We can verify that this is a solution by:

  1. Differentiating the solution such that it is equal to the left-hand side of the differential equation (NB the derivate of ekte^{kt} is kektke^{kt}) dydt=kAekt\frac{dy}{dt} = kAe^{kt}
  2. Substituting the right-hand side of the solution into the right-hand side of the differential equation ky=k(Aekt)ky = k(Ae^{kt}) We see that both sides of the equation match.

Hamming now updates this model of growth to include a limiting factor LL.

dydt=ky(Ly)\frac{dy}{dt} = ky(L-y)

Hamming "reduces" the equation to a standard form, meaning we don't have to write the constants. He says let y=Lzy = Lz and x=t(kL)x = t(kL). Note that there is a typo in the book where they've written tkL2\frac{t}{kL^2}So substituting those in we get:

dydt=kLz(LLz)=kL2z(1z)\begin{aligned} \frac{dy}{dt} &= kLz(L - Lz) \\ &= kL^{2}z(1 - z) \end{aligned}

We can get dydt\frac{dy}{dt} in terms of dzdt\frac{dz}{dt} by differentiating y=Lzy = Lz with respect to tt: dydt=Ldzdt\frac{dy}{dt} = L\frac{dz}{dt} Given x=t(kL)x = t(kL) we can find dxdt\frac{dx}{dt}:

x=t(kL)dxdt=kL\begin{aligned} x &= t(kL) \\ \frac{dx}{dt} &= kL \end{aligned}

We can use this to express dzdt\frac{dz}{dt} in terms of dzdx\frac{dz}{dx}. The chain rule states that if we have a function z(x(t))z(x(t)), the derivative of zz with respect to tt will be dzdt=dzdxdxdt\frac{dz}{dt} = \frac{dz}{dx} \cdot \frac{dx}{dt}. So we can say that:

dzdt=dzdxdxdt=dzdxkL\begin{aligned} \frac{dz}{dt} &= \frac{dz}{dx} \cdot \frac{dx}{dt} \\ &= \frac{dz}{dx} \cdot kL \end{aligned}

Going back to the earlier equation dydt=Ldzdt\frac{dy}{dt} = L\frac{dz}{dt}, we can now say:

dydt=LdzdtkL2z(1z)=L(dzdxkL)=dzdxkL2dzdx=z(1z)\begin{aligned} \frac{dy}{dt} &= L\frac{dz}{dt} \\ kL^{2}z(1 - z) &= L (\frac{dz}{dx} \cdot kL) \\ &= \frac{dz}{dx} \cdot kL^2 \\ \frac{dz}{dx} &= z(1 - z) \\ \end{aligned}

This is the standard form derived by Hamming.

Suppose we wanted to integrate this equation to find zz in terms of xx. We might first rearrange it to separate the variables so it looks like this: 1z(1z)dz=dx\frac{1}{z(1 - z)}dz = dxThe left-hand side is a complex fraction. To make it easier to integrate, Hamming uses partial fractions. This means:

  1. Expressing 1z(1z)\frac{1}{z(1-z)} as a sum of simpler fractions Az\frac{A}{z} and B1z\frac{B}{1-z}
  2. Multiplying through the common denominator z(1z)z(1-z)
  3. Setting zz to various values to solve for AA and BB. In this case, we will use 0 and 1
1z(1z)=Az+B1z1=A(1z)+Bz1=A(10)+B0A=1   when z=01=A(11)+B1B=1   when z=11z(1z)=1z+11z\begin{aligned} \frac{1}{z(1-z)} &= \frac{A}{z} + \frac{B}{1-z} \\ 1 &= A(1 - z) + Bz \\ \\ 1 &= A(1 - 0) + B \cdot 0 \\ A &= 1\ \ \ \text when\ z = 0 \\ \\ 1 &= A(1 - 1) + B \cdot 1 \\ B &= 1\ \ \ \text when\ z = 1 \\ \\ \frac{1}{z(1-z)} &= \frac{1}{z} + \frac{1}{1-z} \\ \end{aligned}

This is much simpler to integrate. We know that the derivative of ln(x)\ln(x) is 1x\frac{1}{x} so 1z=ln(z)\int{\frac{1}{z}} = \ln(z). We have to remember to apply the chain rule when integrating 11z\frac{1}{1 - z} though. We can do this with substitution.

u=1zdu=dzdz=du11z=1udu=1udu=ln(u)+C=ln(1z)+C\begin{aligned} u &= 1 - z \\ du &= -dz \\ dz &= -du \\ \int{\frac{1}{1 - z}} &= \int{\frac{1}{u} \cdot -du} \\ &= -\int{\frac{1}{u}du} \\ &= -\ln(u) + C \\ &= -ln(1 - z) + C \end{aligned}

Therefore in summary:

1z+11zdz=dxln(z)ln(1z)=x+C\begin{aligned} \int{\frac{1}{z} + \frac{1}{1 - z} dz} &= \int{dx} \\ \\ \ln{(z)} - \ln{(1 - z)} &= x + C \end{aligned}

We can use the law of logarithms that ln(x)ln(y)=ln(xy)\ln(x) - \ln(y) = \ln(\frac{x}{y}). So:

ln(z1z)=x+Cz1z=ex+C\begin{aligned} \ln(\frac{z}{1 - z}) &= x + C \\ \frac{z}{1 - z} &= e^{x + C} \end{aligned}

Since ex+C=exeCe^{x+C} = e^x \cdot e^C and eCe^C is just a constant, we can replace it with a new constant which we'll call AA. So we now have:

z1z=Aex\begin{aligned} \frac{z}{1 - z} &= Ae^x \\ \end{aligned}

We can solve for zz like so:

z1z=Aexz=Aex(1z)z=AexAexzz+Aexz=Aexz(1+Aex)=Aexz=Aex1+Aex\begin{aligned} \frac{z}{1 - z} &= Ae^x \\ z &= Ae^x(1 - z) \\ z &= Ae^x - Ae^x \cdot z \\ z + Ae^x \cdot z &= Ae^x \\ z(1 + Ae^x) &= Ae^x \\ z &= \frac{Ae^x}{1 + Ae^x} \end{aligned}

To get the answer in the form given by Hamming we can divide the numerator and denominator by AexAe^x

z=Aex1+Aex=AexAex1Aex+AexAex=11+1Aex=11+(1A)ex\begin{aligned} z &= \frac{Ae^x}{1 + Ae^x} \\ \\ &= \frac{\frac{Ae^x}{Ae^x}}{\frac{1}{Ae^x} + \frac{Ae^x}{Ae^x}} \\ \\ &= \frac{1}{1 + \frac{1}{Ae^x}} \\ \\ &= \frac{1}{1 + (\frac{1}{A})e^{-x}} \end{aligned}

As Hamming points out, AA is determined by the initial conditions. By this, he means where you set tt or xx equal to 0. As xx approaches -\infty, the denominator will get larger and zz will approach 0. As it gets bigger, (1A)ex(\frac{1}{A})e^{-x} will approach 0 and zz will approach 1.

Hamming shows us a more flexible model for growth dzdx=za(1z)b,  (a,b>0)\frac{dz}{dx} = z^a(1 - z)^b,\ \ (a, b > 0) We can plot growth curves for different values of aa and bb to see how they change how the model behaves: We can find zz by separation of variables and integration. We can also find the steepest slope by differentiating the right-hand side and setting it equal to 0. To differentiate we use the product rule: d(uv)dz=udvdz+vdudz\frac{d(uv)}{dz} = u \cdot \frac{dv}{dz} + v \cdot \frac{du}{dz}Hence:

ddz[za(1z)b]=aza1(1z)b+zab(1z)b1(1)=0=aza1(1z)bzab(1z)b1=0=za1(1z)b1(a(1z)bz)=0=a(1z)bz=0\begin{aligned} \frac{d}{dz}[z^a(1 - z)^b] &= a \cdot z^{a-1} \cdot (1 -z)^b + z^a \cdot b \cdot (1 - z)^{b - 1} \cdot (-1) = 0 \\ &= a \cdot z^{a-1} \cdot (1 -z)^b - z^a \cdot b \cdot (1 - z)^{b - 1} = 0\\ &= z^{a - 1} \cdot (1 - z)^{b - 1} \cdot (a(1 - z) - bz) = 0 \\ &= a(1 - z) - bz = 0 \end{aligned}

Notice how we can factor out the terms za1z^{a-1} and (1z)b1(1 - z)^{b-1} in line 3 above. It took me a while to see this but it's obvious when you think about it that e.g. xy1x=xyx^{y - 1} * x = x^y.

From here it's easy to solve for zz:

a(1z)bz=0aazbz=0az(a+b)=0a=z(a+b)z=aa+b\begin{aligned} a(1 - z) - bz &= 0 \\ a - az - bz &= 0 \\ a - z(a + b) &= 0 \\ a &= z(a + b) \\ z &= \frac{a}{a + b} \end{aligned}

Substituting this value of zz back into the original differential equation dzdx=za(1z)b\frac{dz}{dx} = z^a(1 - z)^b gives us

dzdx=(aa+b)a(1aa+b)b=(aa+b)a(a+ba+baa+b)b=(aa+b)a(ba+b)b=aabb(a+b)a+b\begin{aligned} \frac{dz}{dx} &= \left(\frac{a}{a + b}\right)^a\left(1 - \frac{a}{a + b}\right)^b \\ &= \left(\frac{a}{a + b}\right)^a\left(\frac{a + b}{a + b} - \frac{a}{a + b}\right)^b \\ &= \left(\frac{a}{a + b}\right)^a\left(\frac{b}{a + b}\right)^b \\ &= \frac{a^ab^b}{(a+b)^{a+b}} \\ \end{aligned}

This expression represents the slope of the curve at the point z=aa+bz = \frac{a}{a + b} which is the steepest point of the curve. In the context of a growth model, this is the maximum growth rate.

Hamming draws our attention to two special cases. When a=ba = b, the maximum slope will be:

aabb(a+b)a+b=aaaa(a+a)a+a=a2a(2a)2a=a2a22aa2a=122a=22a\begin{aligned} \frac{a^ab^b}{(a+b)^{a+b}} &= \frac{a^aa^a}{(a+a)^{a+a}} \\ &= \frac{a^{2a}}{(2a)^{2a}} \\ &= \frac{a^{2a}}{2^{2a}a^{2a}} \\ &= \frac{1}{2^{2a}} \\ &= 2^{-2a} \\ \end{aligned}

When a=b=12a = b = \frac{1}{2} our differential equation will be

dzdx=za(1z)b=z12(1z)12=z(1z)dzz(1z)=dx\begin{aligned} \frac{dz}{dx} &= z^a(1 - z)^b \\ &= z^\frac{1}{2}(1 - z)^\frac{1}{2} \\ &= \sqrt{z(1 - z)} \\ \frac{dz}{\sqrt{z(1 - z)}} &= dx \\ \end{aligned}

We can integrate this by making a trigonometric substitution z=sin2(θ)z = \sin^2(\theta). Differentiating this with the chain rule tells us that dzdθ=2sin(θ)cos(θ)\frac{dz}{d\theta} = 2\sin(\theta)\cos(\theta) and dz=2sin(θ)cos(θ)dθdz = 2\sin(\theta)\cos(\theta)d\theta. Substituting this back into the differential equation gives:

2sin(θ)cos(θ)dθsin2(θ)(1sin2(θ))=dx2sin(θ)cos(θ)dθsin2(θ)cos2(θ)=dx2sin(θ)cos(θ)dθsin(θ)cos(θ)=dx2dθ=dx2dθ=dx2θ=x+Cθ=x+C2\begin{aligned} \frac{2\sin(\theta)\cos(\theta)d\theta}{\sqrt{\sin^2(\theta)(1 - \sin^2(\theta))}} &= dx \\ \frac{2\sin(\theta)\cos(\theta)d\theta}{\sqrt{\sin^2(\theta)\cos^2(\theta)}} &= dx \\ \frac{2\sin(\theta)\cos(\theta)d\theta}{\sin(\theta)\cos(\theta)} &= dx \\ 2d\theta &= dx \\ \int{2d\theta} &= \int{dx} \\ 2\theta &= x + C \\ \theta &= \frac{x + C}{2} \\ \end{aligned}

Having found θ\theta, we can substitute it back into z=sin2(θ)z = \sin^2(\theta) to get: z=sin2(x2+C),(Cx2π2C)z = \sin^2(\frac{x}{2} + C), (-C \le \frac{x}{2} \le \frac{\pi}{2} - C) Hamming tells us that the solution curve has a finite range. We can see that because sin2(θ)\sin^2(\theta) is always within 0 and 1. The solution is valid for θ\theta such that sin(θ)\sin(\theta) is real and non-negative. Hence the bounds Cx2π2C-C \le \frac{x}{2} \le \frac{\pi}{2} - C.

Chapter 9

Hamming begins chapter 9 by reminding us that we can extend the Pythagorean theorem into higher dimensions because the square of the diagonal is the sum of the squares of the individual mutually perpendicular sides D2=i=1nxi2D^2 = \sum^n_{i=1}{x^2_i} where xix_i are the lengths of the sides of the rectangular block in nn dimensions.

The Stirling Approximation

Next, he derives the Stirling approximation for n!n!. This is especially useful for large factorials. It also becomes increasingly accurate as nn increases. He starts by taking the natural log of n!n! to get ln(n!)=k=1nln(k)\ln(n!) = \sum^n_{k=1}{\ln(k)} He then finds the integral 1nln(x)dx\int^n_1\ln(x)dx by using integration by parts. This is a technique that comes from the product rule of differentiation. It is given by the formula udv=uvvdu\int{udv} = uv - \int{vdu} Hamming sets u=lnxu = \ln{x} and dv=dxdv = dx. It therefore follows that du=1xdxdu = \frac{1}{x}dx and v=dv=dx=xv = \int{dv} = \int{dx} = x. Substituting this into the integration by parts formula we get:

1nln(x)dx=ln(x)xx1xdx=ln(x)x1dx=ln(x)xx=(nln(n)n)(1ln(1)1)=nln(n)n+1\begin{aligned} \int^n_1{\ln(x)dx} &= \ln(x)x - \int{x}\frac{1}{x}dx \\ &= \ln(x)x - \int{1dx} \\ &= \ln(x)x - x \\ &= (n\ln(n) - n) - (1\ln(1) - 1) \\ &= n\ln(n) - n + 1 \end{aligned}

Hamming also shows us how we could use the trapezium rule to approximate the integral. (See my trapezium rule notes for how we get this formula).

1nlnxdx12ln1+ln2+ln3+ ... +12lnn\int^n_1{\ln{x}dx} \approx \frac{1}{2}\ln{1} + \ln{2} + \ln{3} +\ ...\ + \frac{1}{2}\ln{n}

Note that he appears to be assuming that we have divided the curve into n1n - 1 segments which would result in our term for the width of the trapeziums Δx\Delta x being equal to 1. That would explain why we don't see it in the equation above

Since ln1=0\ln{1} = 0, we can simplify this to 1nlnxdxln2+ln3+ ... +12lnn\int^n_1{\ln{x}dx} \approx \ln{2} + \ln{3} +\ ...\ + \frac{1}{2}\ln{n} It is a property of logarithms that the sum of logarithms is approximately equal to the logarithm of the product of terms. Thus the sum of ln2+ln3+ ... +12lnn\ln{2} + \ln{3} +\ ...\ + \frac{1}{2}\ln{n} can be approximated as ln(n!)\ln(n!).

The Stirling approximation of n!n! is nnen2πnn^n e^{-n} \sqrt{2\pi n}. Taking the log of this gets us: ln(n!)nln(n)n+ln(2πn)ln(n!) \approx n\ln(n) - n + \ln(\sqrt{2 \pi n}) The term ln2πn\ln{\sqrt{2\pi n}} is often neglected in rough approximations, leading to: ln(n!)nln(n)nln(n!) \approx n\ln(n) - n Hamming adds a term 12lnn\frac{1}{2}\ln{n} to account for the half contribution of the endpoint nn. He there is also a term +1+1 in his final result. ChatGPT suggests that it may be an adjustment or correction factor to improve the accuracy of the approximation for specific ranges of nn. In some numerical approximations, especially when dealing with sums and series, such correction factors are introduced to fine-tune the approximation, particularly for smaller values of nn where Stirling's approximation (which is more accurate for large nn) may not be as precise. Either way

k=1nlnknlnnn+1+12lnn\sum^n_{k=1}\ln{k} \approx n\ln{n} - n + 1 + \frac{1}{2}\ln{n}

Undoing the logs by taking the exponential of each side gives:

lnn\ln{n} to both terms, we get, finally:

k=1nlnknlnnn+1+12lnn\sum^n_{k=1}\ln{k} \approx n\ln{n} - n + 1 + \frac{1}{2}\ln{n}

Undoing the logs by taking the exponential of each side gives: n!Cnnennn! \approx Cn^ne^{-n}\sqrt{n} CC is a constant (not far from ee) independent of nn since we are approximating an integral by the trapezium rule and the error in the trapezium approximation increases more slowly as nn grows larger, and as CC is the limiting value.

This is the first form of Stirling's formula. Hamming skips deriving the limiting, at infinity, value of CC which is 2π=2.5066...\sqrt{2\pi} = 2.5066... (e=2.71828...e = 2.71828...). However, doing so would show us how we get the usual Stirling's formula for the factorial n!nnen2πnn! \approx n^ne^{-n}\sqrt{2 \pi n} Hamming provides the following table to give us a sense of the quality of the Stirling approximation.

nStirlingTrueStirling/True10.9221410.9221421.9190020.9595035.8362160.97270423.50618240.979425118.019171200.983496710.078187200.9862274980.395850400.98817839902.3955403200.989649359536.873628800.99079103598695.636288000.99170\begin{array}{l l l l} \hline n & \text{Stirling} & \text{True} & \text{Stirling/True} \\ \hline 1 & 0.92214 & 1 & 0.92214 \\ 2 & 1.91900 & 2 & 0.95950 \\ 3 & 5.83621 & 6 & 0.97270 \\ 4 & 23.50618 & 24 & 0.97942 \\ 5 & 118.01917 & 120 & 0.98349 \\ 6 & 710.07818 & 720 & 0.98622 \\ 7 & 4980.3958 & 5040 & 0.98817 \\ 8 & 39902.3955 & 40320 & 0.98964 \\ 9 & 359536.87 & 362880 & 0.99079 \\ 10 & 3598695.6 & 3628800 & 0.99170 \\ \hline \end{array}

He notes that as the numbers get larger, the ratio approaches 1 but the absolute differences get greater. Consider the two functions:

f(n)=n+ng(n)=n\begin{aligned} f(n) &= n + \sqrt{n} \\ g(n) &= n \end{aligned}

The limit of the ratio f(n)g(n)\frac{f(n)}{g(n)}, as nn approaches infinity, is 1. But the difference f(n)g(n)=nf(n) - g(n) = \sqrt{n} grows larger as nn increases.

Extending the factorial function to all positive real numbers

Hamming introduces the gamma function in the form of the integral

Γ(n)=0xn1exdx\Gamma(n) = \int^\infty_0{x^{n-1}e^{-x}dx}

which he tells us converges for all n>0n > 0.

For n>1n > 1 we integrate by parts again. We use

dv=exu=xn1\begin{aligned} dv &= e^{-x} \\ u &= x^{n-1} \end{aligned}

Hamming tells us that at the two limits, the integrated part is 0. This is because as xx \to \infty exe^{-x} tends towards 0 while as x0x \to 0 xn1x^{n-1} will tend towards 0 (remember this is for n>1n > 1).

The integration by parts formula is:

udv=uvvdu\int{udv} = uv - \int{vdu}

We can also quickly work out that

du=(n1)xn2dxv=ex\begin{aligned} du &= (n - 1)x^{n - 2}dx \\ v &= -e^{-x} \end{aligned}

Hamming tells us that at the limits where the integrated part is 0 we have the reduction formula:

Γ(n)=(n1)Γ(n1)\Gamma(n) = (n - 1)\Gamma(n - 1)

with Γ(1)=1\Gamma(1) = 1

Tags: Mathematics