The Art of Doing Science and Engineering features some mathematics that I found a little difficult to follow. These are some notes I've made to help fill in the gaps.
Chapter 1
Hamming makes two claims:
Scientific knowledge doubles every 17 years
90% of scientists who have ever lived are now alive
He makes these claims in order to give an example of back-of-the-envelope calculations that you can do as part of sense checking. In this case, we want to check if these claims are compatible with each other.
He begins by assuming that the number of scientists at any time t is:
y(t)=aebt
He also assumes that the amount of knowledge produced annually has a constant k proportionality to the number of scientists alive. So we can write our equation for the amount of scientific knowledge produced in year t as
p(t)=kaebt
We can then get the total amount of scientific knowledge by calculating the integral of this equation between the limits −∞ and the current time T.
The integral of ebt is b1et so when we multiply by the constants ka, the indefinite integral of kaebt is ∫kaebtdt=bkaebt+C
At the lower limit, we can see from our plot that ebt approaches 0 as t gets smaller and as the whole term is multiplied by ebt, assuming that b>0. This means that limx→−∞bkaebt=0
At the upper limit t=T, so bkaebT
The definite integral is the difference between the antiderivative evaluated at these points. In this case, it's a very straightforward sum bkaebT−0=bkaebT
The process is exactly the same when finding the sum of knowledge up to 17 years ago, except our upper limit will be bkaeb(T−17).
Above we are using the law of exponents that dividing exponential expressions with the same base is equivalent to subtracting their exponents.
Hamming estimates the length of a scientific career to be 55 years. Using his original equation for the number of scientists in a given year y(t)=aebt we can divide the integral for the equation up to the current year by the integral of the equation up to 55 years ago.
We can use our expression for the doubling of scientific knowledge e−17b=21 to get the proportion of scientists alive today. We can use the law that ekx=(ex)k.
Multiplying 3.3219 by our supposed doubling period of 17 years gives us 56.47 years for the average length of a scientific career, very close to Hamming's estimate of 55 years.
Chapter 2
In this chapter, Hamming discusses growth models. The simplest growth model assumes the growth rate is proportional to the current size. For instance, in the case of compound interest. We can describe this model with a differential equation.
dtdy=ky
A differential equation is an equation that relates a function to its derivatives. In the context of growth models, differential equations are used to describe how something changes over time. The equation above is a first-order differential equation where dydt represents the rate of change of y with respect to time t, and ky suggests that this rate of change is proportional to the current value of y. Here, k is a constant of proportionality.
The solution to a differential equation is a function (or a set of functions) that satisfies the equation. This means that if you take the solution and its derivatives and plug them back into the original differential equation, the equation will hold. In other words, the left-hand side of the equation will equal the right-hand side for all points in the domain of the solution.
Hamming tells us the solution to the equation but skips over how to derive it. We do it like so:
We start by rearranging the equation so that each variable is on a different side of the equation. y1⋅dy=k⋅dt
Next we integrate both sides of the equation ln(y)=kt+C
We can then exponentiate both sides of the equation to get rid of the natural logarithm. eln(y)=ekt+C
This simplifies to y=ekt⋅eC
Since eC is just a constant we can represent it as just A. Thus we get the solution given by Hamming: y(t)=Aekt
We can think of A as representing the initial condition of the system. The equation y(t)=Aekt then tells us how the quantity y changes over time. If k>0 we have growth. If k<0 we have decay.
We can verify that this is a solution by:
Differentiating the solution such that it is equal to the left-hand side of the differential equation (NB the derivate of ekt is kekt) dtdy=kAekt
Substituting the right-hand side of the solution into the right-hand side of the differential equation ky=k(Aekt)
We see that both sides of the equation match.
Hamming now updates this model of growth to include a limiting factor L.
dtdy=ky(L−y)
Hamming "reduces" the equation to a standard form, meaning we don't have to write the constants. He says let y=Lz and x=t(kL). Note that there is a typo in the book where they've written kL2tSo substituting those in we get:
dtdy=kLz(L−Lz)=kL2z(1−z)
We can get dtdy in terms of dtdz by differentiating y=Lz with respect to t: dtdy=Ldtdz
Given x=t(kL) we can find dtdx:
xdtdx=t(kL)=kL
We can use this to express dtdz in terms of dxdz. The chain rule states that if we have a function z(x(t)), the derivative of z with respect to t will be dtdz=dxdz⋅dtdx. So we can say that:
dtdz=dxdz⋅dtdx=dxdz⋅kL
Going back to the earlier equation dtdy=Ldtdz, we can now say:
Suppose we wanted to integrate this equation to find z in terms of x. We might first rearrange it to separate the variables so it looks like this: z(1−z)1dz=dxThe left-hand side is a complex fraction. To make it easier to integrate, Hamming uses partial fractions. This means:
Expressing z(1−z)1 as a sum of simpler fractions zA and 1−zB
Multiplying through the common denominator z(1−z)
Setting z to various values to solve for A and B. In this case, we will use 0 and 1
This is much simpler to integrate. We know that the derivative of ln(x) is x1 so ∫z1=ln(z). We have to remember to apply the chain rule when integrating 1−z1 though. We can do this with substitution.
As Hamming points out, A is determined by the initial conditions. By this, he means where you set t or x equal to 0. As x approaches −∞, the denominator will get larger and z will approach 0. As it gets bigger, (A1)e−x will approach 0 and z will approach 1.
Hamming shows us a more flexible model for growth dxdz=za(1−z)b,(a,b>0)
We can plot growth curves for different values of a and b to see how they change how the model behaves:
We can find z by separation of variables and integration. We can also find the steepest slope by differentiating the right-hand side and setting it equal to 0. To differentiate we use the product rule: dzd(uv)=u⋅dzdv+v⋅dzduHence:
Notice how we can factor out the terms za−1 and (1−z)b−1 in line 3 above. It took me a while to see this but it's obvious when you think about it that e.g. xy−1∗x=xy.
From here it's easy to solve for z:
a(1−z)−bza−az−bza−z(a+b)az=0=0=0=z(a+b)=a+ba
Substituting this value of z back into the original differential equation dxdz=za(1−z)b gives us
This expression represents the slope of the curve at the point z=a+ba which is the steepest point of the curve. In the context of a growth model, this is the maximum growth rate.
Hamming draws our attention to two special cases. When a=b, the maximum slope will be:
We can integrate this by making a trigonometric substitution z=sin2(θ). Differentiating this with the chain rule tells us that dθdz=2sin(θ)cos(θ) and dz=2sin(θ)cos(θ)dθ. Substituting this back into the differential equation gives:
Having found θ, we can substitute it back into z=sin2(θ) to get:
z=sin2(2x+C),(−C≤2x≤2π−C)
Hamming tells us that the solution curve has a finite range. We can see that because sin2(θ) is always within 0 and 1. The solution is valid for θ such that sin(θ) is real and non-negative. Hence the bounds −C≤2x≤2π−C.
Chapter 9
Hamming begins chapter 9 by reminding us that we can extend the Pythagorean theorem into higher dimensions because the square of the diagonal is the sum of the squares of the individual mutually perpendicular sides D2=∑i=1nxi2 where xi are the lengths of the sides of the rectangular block in n dimensions.
The Stirling Approximation
Next, he derives the Stirling approximation for n!. This is especially useful for large factorials. It also becomes increasingly accurate as n increases. He starts by taking the natural log of n! to get ln(n!)=∑k=1nln(k)
He then finds the integral ∫1nln(x)dx by using integration by parts. This is a technique that comes from the product rule of differentiation. It is given by the formula ∫udv=uv−∫vdu
Hamming sets u=lnx and dv=dx. It therefore follows that du=x1dx and v=∫dv=∫dx=x. Substituting this into the integration by parts formula we get:
Hamming also shows us how we could use the trapezium rule to approximate the integral. (See my trapezium rule notes for how we get this formula).
∫1nlnxdx≈21ln1+ln2+ln3+...+21lnn
Note that he appears to be assuming that we have divided the curve into n−1 segments which would result in our term for the width of the trapeziums Δx being equal to 1. That would explain why we don't see it in the equation above
Since ln1=0, we can simplify this to ∫1nlnxdx≈ln2+ln3+...+21lnn
It is a property of logarithms that the sum of logarithms is approximately equal to the logarithm of the product of terms. Thus the sum of ln2+ln3+...+21lnn can be approximated as ln(n!).
The Stirling approximation of n! is nne−n2πn. Taking the log of this gets us: ln(n!)≈nln(n)−n+ln(2πn)
The term ln2πn is often neglected in rough approximations, leading to: ln(n!)≈nln(n)−n
Hamming adds a term 21lnn to account for the half contribution of the endpoint n. He there is also a term +1 in his final result. ChatGPT suggests that it may be an adjustment or correction factor to improve the accuracy of the approximation for specific ranges of n. In some numerical approximations, especially when dealing with sums and series, such correction factors are introduced to fine-tune the approximation, particularly for smaller values of n where Stirling's approximation (which is more accurate for large n) may not be as precise. Either way
k=1∑nlnk≈nlnn−n+1+21lnn
Undoing the logs by taking the exponential of each side gives:
lnn to both terms, we get, finally:
k=1∑nlnk≈nlnn−n+1+21lnn
Undoing the logs by taking the exponential of each side gives: n!≈Cnne−nnC is a constant (not far from e) independent of n since we are approximating an integral by the trapezium rule and the error in the trapezium approximation increases more slowly as n grows larger, and as C is the limiting value.
This is the first form of Stirling's formula. Hamming skips deriving the limiting, at infinity, value of C which is 2π=2.5066... (e=2.71828...). However, doing so would show us how we get the usual Stirling's formula for the factorial n!≈nne−n2πn
Hamming provides the following table to give us a sense of the quality of the Stirling approximation.
He notes that as the numbers get larger, the ratio approaches 1 but the absolute differences get greater. Consider the two functions:
f(n)g(n)=n+n=n
The limit of the ratio g(n)f(n), as n approaches infinity, is 1. But the difference f(n)−g(n)=n grows larger as n increases.
Extending the factorial function to all positive real numbers
Hamming introduces the gamma function in the form of the integral
Γ(n)=∫0∞xn−1e−xdx
which he tells us converges for all n>0.
For n>1 we integrate by parts again. We use
dvu=e−x=xn−1
Hamming tells us that at the two limits, the integrated part is 0. This is because as x→∞e−x tends towards 0 while as x→0xn−1 will tend towards 0 (remember this is for n>1).
The integration by parts formula is:
∫udv=uv−∫vdu
We can also quickly work out that
duv=(n−1)xn−2dx=−e−x
Hamming tells us that at the limits where the integrated part is 0 we have the reduction formula: