coollog.me

Taxes now or taxes later

2019-07-29T14:10:11-04:00

Disclaimer: I am not a financial professional and this information is only general in nature. Individuals have different financial circumstances and should consult a professional.

There are many different retirement plans to choose from and it’s sometimes hard to determine which one is best for you to choose to reach your retirement goal. I won’t go into the details and rules of all the potential plans in this post. Instead, I’ll consider a specific question: should you pay taxes now or later? Specifically, we’ll compare the future value of contributing the same amount to a traditional 401(k) vs a Roth 401(k).

A traditional 401(k) contribution is tax-deferred. This means that if you put in $100 now, you don’t have to pay taxes on $100 of your income this year. However, when you retire and withdraw that money (take a distribution), you will have to pay income tax in the year of withdrawal. This means that for a $100 contribution, it 1) starts in full, 2) grows tax-free for however many years it sits in your retirement account, and then 3) has your future tax rate reduce the future amount.

401(k) future value model

Let’s put the traditional 401(k) growth model in equation form so that we can analyze it.

Let $x$ be the initial contribution amount. The Contribution portion is just $x$ .
Let $r$ be the expected annual growth rate and $y$ be the number of years to grow. Assuming compounding annual growth by $r$ for $y$ years, the Growth portion is $x(r^y-1)$ .
Let $t_1$ be the future tax rate. The tax rate is applied on the total of the Contribution and Growth. The total Future taxes paid is thus $t_1(x+x(r^y-1))$ .

Adding these terms together gets us the Future value for the 401(k) model:

$x+x(r^y-1)-t_1(x+x(r^y-1))$ $=x+xr^y-x-t_1x-t_1xr^y+t_1x$ $=xr^y-t_1xr^y$ $=xr^y(1-t_1)$

Just for the sake of illustration, let’s plug in some numbers. Let’s say we contributed $100 initially and left it to grow with 5% annual returns (a 1.05 annual growth rate) for 30 years, and our future tax rate is 25%, the Future value is:

$$100 \times 1.05 ^ {30} \times (1-.25) = $432 \times .75 = $324$

From this equation, we can see that, ideally, we would want our future tax rate as low as possible to maximize the actual amount we withdraw. This is why most retirees do not withdraw the entirety of their traditional 401(k) in a single year, as much of that withdrawal would fall under high tax brackets. This is also why there is a minimum distribution limit so that retirees don’t just let the money sit untaxed forever.

Let’s compare this to the same amount, $x$ , except grown outside of a traditional 401(k) (or any other tax-advantaged account). This means that for an $100 initial contribution, the current tax would have been taken away, and then the amount would have grown over the years. However, this growth rate would be lower than $r^y$ since gains outside a tax-advantaged account can be taxed as long term capital gains or as ordinary income.

No tax advantage model

Let’s put this in equation form too.

We’ll define $t_0$ to be the current tax rate and $r^t$ to be the growth rate with taxes.
The Contribution is still $x$ .
The Taxes now is just $t_0x$ .
The Growth (taxed) is $x(r_t^y-1)$ . Note that we are applying the growth to the full initial contribution. The alternative would be to apply the growth to the contribution amount after Taxes now are applied. However, this would make it an unfair comparison since a lower amount is allowed to grow. The solution is to bring the taxes out into its own term like we have here and then include another cost, which is the lost potential growth on the amount that went to taxes. This lost growth is $t_0x(r_t^y-1)$ .

Adding these terms together gets us the Future value for the non-tax-advantaged model:

$x-t_0x-t_0x(r_t^y-1)+x(r_t^y-1)$ $=x-t_0x-t_0xr_t^y+t_0x+xr_t^y-x$ $=xr_t^y-t_0xr_t^y$ $=xr_t^y(1-t_0)$

Note, however, that if we had just taken the tax out and let the after-tax portion to grow, the resulting Future value would actually still be the same here.

Here, the tax is first taken out from the initial amount at the current tax rate. Compared to the model for the tax-deferred (traditional) 401(k) contribution, the tax-deferred 401(k) contribution is worth it if:

Your future tax rate $t_1$ is lower than your current tax rate, or
Your future tax rate $t_1$ is higher than your current tax rate, but only up to however much is lost from having a lower growth rate ( $r^y/r_t^y>(1-t_0)/(1-t_1)$ )

In the event that your future tax rate is the same as your current tax rate, the traditional 401(k) contribution becomes more worth it as time goes by (since the difference between $r^y$ and $r_t^y$ increases exponentially as $y$ increases) - so contribute early. Also, the tax that you save at present by contributing to your traditional 401(k) is your marginal tax rate, and the tax you pay at withdrawal time would likely be your effective tax rate (that is, if your traditional 401(k) distributions are your main source of retirement income). This means that, most likely, your current tax rate would be higher than your future tax rate for the traditional 401(k) contribution. Unless your future tax rate is significantly higher than your current tax rate or the taxes on growth is just too miniscule, it seems always better to put the money in the traditional 401(k) than just leaving it without tax advantages.

Now, let’s take a look at the Roth 401(k). Roth 401(k)s have tax-free growth, which means that once you put money into a Roth 401(k), that money can grow tax-free, and you’ll never have to pay tax on that money again. When you withdraw that money in retirement, you’ll also pay no taxes.

Roth 401(k) model

Individuals contribute to a Roth 401(k) with after-tax money. For these contributions, the future value of the Roth 401(k) is the same as the non-tax-advantaged case except that it can enjoy the higher non-taxed growth rate, making it strictly better than leaving money outside. (As a side note, Roth IRAs act the same as Roth 401(k)s, except that normally, only individuals under a certain income threshold can directly contribute to them. This is why direct Roth IRA contributions have a very low limit and individuals with incomes over a certain threshold are not allowed to contribute directly. However, there are a few ways for individuals above the income threshold to contribute a significant amount, including backdoor and mega-backdoor contributions. These methods essentially take after-tax money from different sources and place them into a Roth IRA, and are equivalent to doing a direct contribution. Alright, back to Roth 401(k)s.)

Let’s put the Roth 401(k) model in equation form.

The Contribution is $x$ .
The Taxes now is $t_0x$ . The Growth (tax-free) is $x(r^y-1)$ . Here, we have to subtract the lost growth on the portion paid as taxes as well, which is $t_0x(r_t^y-1)$ . Note that we use the growth with taxes here since the portion paid to taxes would have grown without tax advantages.

The Future value for the Roth 401(k) model is thus:

$x-t_0x-t_0x(r_t^y-1)+x(r^y-1)$ $=xr^y-t_0xr_t^y$

This is quite similar to the non-tax-advantaged case except that we have the positive term as $xr^y$ instead of $xr_t^y$ . Even though this seems like a small difference, the difference can be huge as the number of years $y$ gets large. For instance, a $100 contribution with 1.05 tax-free growth rate would grow to $704 in 40 years, but only $480 with a 1.04 taxed growth rate. In other words, the 20% annual tax on long-term gains turns into a 32% overall tax after 40 years. (As a side note, if you’re following modern portfolio theory, don’t rebalance too often even with long-term gains. Having a 1.04 growth rate compared to a 1.05 growth rate every year is a lot less than just paying 20% tax one time.)

Now, let’s compare the Roth 401(k) to a traditional 401(k). Let’s take a look at the traditional 401(k) model (with the tax cost split out into its own term) alongside the Roth 401(k) model:

$xr^y-t_1xr^y$ traditional 401(k) vs. $xr^y-t_0xr_t^y$ Roth 401(k)

Now we can compare the traditional 401(k) to Roth 401(k) easily. As you can see, both share a common term $xr^y$ , the initial amount with growth. The only difference is in the tax cost term. For Roth 401(k), the tax cost is $t_0xr_t^y$ , whereas for traditional 401(k), the tax cost is $t_1xr^y$ . These have a common term $x$ that can also be disregarded. We have simplified the comparison between traditional 401(k) and Roth 401(k) to just:

Traditional 401(k) vs Roth 401(k)
$t_1r^y$ vs $t_0r_t^y$

This looks familiar doesn’t it. This is just like comparing the traditional 401(k) model to the no tax advantage model, except here the Roth 401(k) is like the traditional 401(k) and the traditional 401(k) is like the no tax advantage. Comparing the models, the Roth 401(k) contribution is worth it if:

Your future tax rate $t_1$ is higher than your current tax rate, or
Your future tax rate $t_1$ is lower than your current tax rate, but only up to however much is lost from having a lower growth rate ( $r^y/r_t^y>t_0/t_1$ )

Therefore, for young people with many years to compound growth, the Roth 401(k) can be a better option than the traditional 401(k). Just using our $r^y=1.05$ and $r_t^y=1.04$ from above over a 40 year period, the ratio becomes about $r^y/r_t^y=1.47$ . It would generally not be the case that $t_0/t_1>1.47$ . Using some more aggressive growth rates like $r^y=1.1$ and $r_t^y=1.08$ , we get an even larger ratio like 2.08. It would be quite unlikely to have the current marginal tax rate be more than double the future effective tax rate. However, as one gets older, there will be less time for a contribution to grow, and one would favor the traditional 401(k) more and more over the Roth 401(k). The recommendation is to favor contributions to the Roth 401(k) more when you are young (like before 30) and then exponentially decay that contribution to favor more towards traditional 401(k) as you get older (like after 30).

My favorite protocol that makes the Internet useful

2018-06-18T20:12:11-04:00

I remember when I was first dabbling in making online multiplayer video games. I ran into many issues of people “hacking” my game and stealing players’ passwords. Of course, this was a major issue, but I at the time had no clue how these “hackers” could get ahold of the passwords. It was later that I realized that all data sent over the Internet is public. Anyone could see the data that my game’s server sent to the players and vice versa. Whenever a player logged in, their game would send their password to my server for verification. This data was sent over the Internet, and anyone watching would be able to see the password.

So, I had a question.

If anything sent on the Internet could be seen by anyone, how are we able to do anything secure?

How are we able to log in to websites without others seeing our secret information? How are we able to read our emails without others being able to read them as well? How are we able to process financial transactions over the Internet?

At the time I had no clue. I vaguely remember reading some articles about SSL and something about public and private keys, but I didn’t really understand how it all worked. It was not until I took a cryptography course in college that I discovered the protocol behind it all - the Diffie-Hellman key exchange.

To explain this protocol, I’ll present it as a puzzle.

Let’s pretend there’s 3 people - Alice, Bob, and Eve - sitting at a table. Alice and Bob want to agree on something without Eve knowing. Since they are all at the same table, anything Alice communicates to Bob and vice versa, Eve gets to know as well. Each person can also turn their backs to do things in secret, which no-one else can see. How can Alice and Bob achieve this?

Of course, neither Alice nor Bob would have any prior agreement Eve does not. This means that this is a symmetric system, where all participants are equal in terms of information. However, the goal is to have a protocol that can achieve informational asymmetry.

At first, this might seem impossible. How can Alice and Bob agree on something only they know when Eve can listen in on their whole conversation? Wouldn’t that mean that Eve would always have the same information as Alice and Bob? And thus there’s no way of breaking the information symmetry?

But there is a way. The key is that Alice and Bob can do things in private. The private actions break the overall information symmetry of the system.

For example, let’s say Alice, Bob, and Eve all start with no information. Then, Alice says “TRIANGLE!” Now, Alice, Bob, and Eve all know triangle. If Bob says “SQUARE!”, then Alice, Bob, and Eve now all know square as well.

This is still information symmetric. But, if Alice secretly thinks to herself “circle…”, then this breaks the information symmetry of the system. Alice now knows something that neither Bob nor Eve knows.

Okay, great. Now we’ve broken the information symmetry. But, this is not enough for what we need though. What we need is that both Alice and Bob share some piece of information that Eve doesn’t have. Therefore, Alice and Bob need to somehow leverage the secret information that they each have to create some secret information that Eve does not have.

The way we do this is Alice and Bob both choose their own secret number. Alice chooses 3 and Bob chooses 4. Alice then announces some number, say 5, to the table, meaning that everyone knows this number. Then, Bob multiplies his secret number 4 with the 5 that Alice announced to get 20. He then announces 20 back to the table. Alice multiplies her 3 with the public 5 to announce 15. Alice then multiplies her secret 3 with the 20 Bob announced, getting 60. Bob’s secret number was 4, so he multiplies that with the 15 Alice announced and gets 60. And there we have it, both Alice and Bob have the number 60, and Eve does not.

Eve does not know 60 because the numbers that were announced were 5, 15, and 20. No matter how you multiply these numbers, you can’t get 60.

Oh wait.

Eve can actually guess 60 very easily. All she needs to do is divide 15 by 5 to get 3 and 20 by 5 to get 4, and now she can multiply 5 times 3 times 4 to get 60. We’ll need to do something further. Hmm.. how do we make it harder for Eve to guess… How about every time we multiply two numbers, we only take the remainder after it divides by 9.

This time, Alice still announces 5 to start, but instead of sending 20, Bob sends the remainder of 20 divided by 9, which is 2. Alice sends the remainder of 15 divided by 9, which is 6. Bob then multiplies his secret 4 with 6 and divides by 9 to get a remainder of 6. Alice then multiplies her secret 3 with the 2 from Bob to get 6, which, when divided by 9, gets a remainder of 6 as well. Now, Eve has a much harder time finding this common secret (I explain more in Remarks below). Of course, this gets even harder when Alice and Bob use much larger numbers. Try finding what two numbers produce 893498529201 as a remainder when multiplied and divided by 2182841020912383.

Now that I’ve thoroughly confused you with all these numbers, let’s take a step back and see what really happened here. What really happened is that Alice and Bob combined their asymmetries with each other. This process of combining asymmetries produced a symmetry between just Alice and Bob and maintained asymmetry with Eve.

Diffie-Hellman is essentially the same protocol, except that it uses exponentials rather than multiplication and large primes with primitive roots rather than just some small integers. In my example, I used a multiply-divide-get-remainder function, but this protocol works with any function that has these properties:

The function takes two inputs and produces an output
The function is commutative and associative, meaning that applying it many times in any order produces the same output
The inputs are hard to find if given an output

We can see that these properties help us because 1) and 2) help us combine the asymmetries of Alice and Bob, and 3) prevents Eve from doing the same.

To illustrate this protocol in a generic form with an example, let’s say the function is $f$ . Alice chooses $x$ as her secret and Bob chooses $y$ as his secret. The publicly announced value is $z$ . Alice announces $f(x, z)$ and Bob announces $f(y, z)$ . Then, Alice gets the secret from $f(x, f(y, z))$ and Bob gets the secret from $f(y, f(x, z))$ . These are the same value S (because $f$ is commutative and associative), which Eve is not able to get. All Eve has is $z$ , $f(x, z)$ , and $f(y, z)$ .

So, now that we have such a protocol, how does this enable secure communications over the Internet? Well, the shared information that Alice and Bob create can be a key for encryption. After the protocol, Alice can use the key to encrypt messages she sends to Bob, only send the encrypted message over the Internet, and when Bob receives the encrypted messages, he can use the key to decrypt them and read the original messages. Anyone without this key would not be able to see the contents of the messages. When you are browsing your favorite website, you are essentially going through a similar protocol to generate a key that only you and the website’s servers share. All the communications between you and the website are encrypted with this key. Just make sure the website uses HTTPS and has a valid TLS certificate, but that’s probably for another post.

And there you have it - the protocol that lets you do the useful things on the Internet, like posting to your Facebook wall or ordering that late night delivery from Grubhub.

Remarks

In the example with Alice, Bob, and Eve, why did taking the remainder make it harder for Eve to know what Alice and Bob’s secret numbers were?

Well, first, it’s trivial for Eve to find the secret numbers when the protocol involves just multiplication. Let’s say $z$ is the publicly announced number, $x$ is Alice’s secret, and $y$ is Bob’s secret. Alice sends $x\times z$ and Bob sends $y\times z$ . Eve can calculate x by just doing $\frac{x\cdot z}{z}$ , and likewise for $y$ . This is because there can only be one $x$ that could give $x\times z$ and only one $y$ that could give $y\times z$ .

Now, taking a remainder changes this. Let’s represent taking the remainder after dividing by a number with the symbol $\%$ . If you were to take any integer $x \% 3$ , for example, the result can only be one of 3 values - 0, 1, or 2. This means that many $x$ ’s could result in a remainder of 0, and likewise for 1 and 2. This is essentially what in cryptography you’d call a hash function, where the function has a large domain but a small range. Of course, just taking the remainder is not a cryptographically secure hash function, but it increases the difficulty by which Eve can find the inputs - the secrets of Alice and Bob. This is because, given a divisor $d$ , Alice would send $x\times z \% d$ and Bob would send $y\times z \% d$ . With just $x\times z \% d$ and $y\times z \% d$ , Eve would have a harder time finding which $x\times z$ and $y\times z$ produced those remainders.

I made a mistake in a concurrent program

2018-05-25T10:42:30-04:00

I was working on the concurrency part of my main project at work today. The program executed a series of steps in order to accomplish its task. Many of these steps could run in parallel, but some steps could only run after certain other steps.

I organized these steps into a dependency graph. Each step defined previous steps it needed to wait on before running. Steps with no dependencies could run immediately. With this structure, many steps could run at the same time, but no step would run before they were supposed to.

Example of an execution dependency graph with steps A…F

Each step would submit itself to an execution manager to either run immediately (no dependencies) or run after some other steps (after its dependencies). Upon submission, this step would receive a future. Other steps would then use this future to retrieve the result of that step after it finishes. I decided to be a bit fancy and lazily initialize these futures, where the step would only submit itself to the execution manager if another step required it to run.

All seemed well. I added a check to make sure that a future is always finished before another step retrieves its result. This makes sure that I defined the dependencies for the steps correctly. This was important because if I did not define the the dependencies correctly, steps might unintentionally block threads from executing useful work. Unintentional blocking would decrease the efficiency of the parallel execution.

After running the tests a few times, the check failed. I must have set up the graph incorrectly somewhere. I must have forgotten to define some dependency… So, I went down a rabbit hole for a few hours, auditing each step to make sure I had all the necessary dependencies set. But, every so often, the check still failed.

I thought for a long time. And then it suddenly hit me. I figured it out. This whole time I had set up the dependency graph correctly. The problem lied not in my dependency graph, but rather in my lazy initialization.

The lazy initialization worked like this. Each step has a “future” variable and a “get future” method. The “future” variable is initially undefined. When another step calls the “get future” method the first time, the method submits the step to the execution manager and sets the “future” variable. Further calls to “get future” returns that submitted future.

“get future” method

Each step calls its dependencies’ “get future” method to get the futures to wait upon. Therefore, this initialization design builds the dependency graph from the last step backwards.

Illustration of how the dependency graph is built backwards

From this, I noticed the flaw in this initialization method - it had a blaring race condition. For example, let’s take a look at a scenario where two steps depend on a single step:

Let’s say B and C happened to call A’s “get future” method at the same time. B’s call has A submit future 1 to the execution manager. C’s call has A submit future 2 to the execution manager. This causes A to set its “future” variable to future 1 and then change it to future 2. Later, B calls A’s “get future” method again. B receives future 2, and not future 1 as it had expected.

So, how do we solve this problem? By not doing this backwards initialization. Each step should just initialize itself. Each step would submit itself to the execution manager upon creation. This means that each step would only ever create one future. Each step’s “get future” method would always return the same future.

I made this fix, and the problem disappeared.

In essence, the real issue is that I had mutable state. All concurrent objects should be immutable. I ran into this problem here because I had concurrent objects (the steps) be able to mutate state (set a “future” variable). By setting the “future” variable upon construction of each step, I made each step immutable and therefore immune to any parallel execution problems.

So, yeah, make sure all your concurrent objects are immutable. And don’t try to do fancy stuff before thinking it through completely.

Remarks

This problem wouldn’t exist with the original initialization method had the dependency graph been a tree. This is because in a tree, each node has only one parent, and therefore each “get future” method could only be called at most one time. But, our dependency graph is a directed acyclic graph and, in practice, many nodes have several parents.

The 25 Horses Problem

2018-05-20T00:42:30-04:00

I recently came across a video titled HARD Google Interview Question - The 25 Horses Puzzle. I don’t think Google asks any brain teaser problems so I decided to check it out. Although the problem was formulated like a brain teaser, I found it to actually be a very well-designed technical problem. I could actually apply various technical problem-solving techniques to solve this problem - although it involved no coding. The problem describes a scenario with a “real-world” setting, but it could be modeled nicely with graph theory and topological ordering. The video presented a nice intuitive explanation of the solution, but I want to show how certain techniques can be used to approach this problem step-by-step - techniques that could be applied to problems of all sorts.

So, the problem goes as such:

You want to find the fastest 3 horses in a group of 25 horses. You can only race 5 horses at a time. You don’t have a stopwatch, so you can only know the ranking of each horse within each race. How many races do you need?

I enjoyed this problem since it wasn’t one of those brain teasers where there’s some trick to solving it that wasn’t presented in the original problem. The solution didn’t involve adding elements of manipulation to the scenario (like you give steroids to horses to make them faster). There wasn’t some wisecrack answer like “just one race if we are lucky!”. The solution formulates itself completely within the scope of the problem. In fact, the problem could be reduced into a raw representation such that it could be modeled in a mathematical format.

To re-formulate the problem in a technical sense:

There are 25 elements that have some ordering from fastest to slowest among them (a strict ordering).
You can perform some computation called race that can give the relative ordering of any 5 of those elements. How many times do you need to run race in order to find the first 3 elements in the ordering?

The intent of the problem isn’t for us to give a number of races needed, but rather to provide a proof as to the number of races. This proof is like one of those classic two-sided proofs where the goal is to prove an equality. In these proofs, to prove that the solution is exactly some value, you prove that the solution is at least that value and also at most that value - and therefore the solution must be exactly that value. However, in this case, the two proofs are 1) that a solution exists for some value and 2) that the solution must be at least that value - therefore, that value is the minimum possible value.

So, for this problem, I needed to prove that:

I can find a small number of races that works, and that
The minimum number of races needed is that number

I started with a naive answer to prove the first lemma (there exists some number of races that works). Since there are 25 horses, if I race every pair of horses, I can rank each of the horses in their exact order. Therefore if I race the 25 horses 25 times each, I have the exact ordering - this is 25² races. Then, I realize that there are a lot of duplicate races. Eliminating these duplicate races I find the actual number of pairings:

The first horse has 24 other horses to pair.
The second horse has 23 other horses to pair. The pairing for the first horse is already covered.
The third horse has 22 other horses to pair…

…

24. The 24th horse has 1 other horse to pair.

25. The 25th horse has no other horses to pair.

This linear summation can be easily expressed as $\frac{25(24+0)}{2}$ , or 300. That’s quite a few races still.

So, then I tried to look for what other races I could eliminate (don’t take this out of context) - which ones did not need to happen - what inefficiencies my algorithm had. The two main inefficiencies were that:

I was not utilizing the full potential of the race function. I am discarding useful results by just looking at the ordering for the first two of each race result, and that
I am finding the full ordering of all 25 horses, whereas I only need the ordering for the first 3.

One technique I like to use is to represent the problem in a model I have worked with extensively before - one that I am familiar with and have solved other problems in. Doing so would helps me to apply techniques I’ve learned from working with that model. In this problem, we are trying to find the (ordering) relationship between pairs of horses. What better model to represent this than a graph.

For those unfamiliar with graphs in discrete math, a graph is just a bunch of nodes and edges that connect them. In other words, it is the colloquial equivalent of a network (like a social network, or a network of highways).

An example of a graph

In our case, the nodes are the horses, and each edge represents a pair of horses. We can have our graph be a “knowledge” graph. We start with 25 nodes with no edges. Then, every edge we add between two horses represents knowledge of the ordering between the two horses - knowledge of which horse is faster. That means that each race could add new edges to our graph.

A 25-node graph with no edges

With this model, the original approach of running 300 races is equivalent to adding an edge between every node. In fact, there are exactly 300 edges if we were to connect all the nodes with edges (see complete graph). Many of these edges are not necessary. In fact, we only need two edges to know the fastest 3 horses - the edge between the fastest and second fastest, and the edge between the second fastest and third fastest.

We win if we find these edges

However, this does not mean that we only need one race. We could get lucky and have our top 3 horses be in the one race we hold, but the problem is asking for a solution that could work in all cases.

Now, it made sense to start building the solution from the bottom up and tackle the second lemma (the number of races needed is at least some number). The initial graph is 25 nodes with no edges. So, the first thing I realize is that no matter what, when we connect the nodes up no node can be left out (the graph needs to be connected). This is because any horse that is not connected to the others means that we don’t have any information about its ordering. Therefore, first, we must run at least 5 races to get some edge for each of the 25 horses. However, this results in 5 disjoint graphs.

We still need to connect these 5 disjoint graphs by picking some horse in each graph for another race (the 6th race). One way is to pick the fastest from each group. In fact, this will give us one of the key information that we need - the fastest horse overall.

However, we still need to know the second and third fastest. If we look at the current knowledge graph, the fastest horse is connected to two horses. Either of these horses could be the second fastest. Therefore, we need an edge between these two horses in order to know who is faster. We need a 7th race.

What if we didn’t pick the fastest horse to race from each group? For instance, let’s assume we raced the fastest horses from only 4 groups. In that case, we would not know if the horse we left out is the fastest overall unless we had some edge between that horse and the fastest of the other 4 we raced. To have that information, we need at least another race.

In both cases, we proved that there needs to be some number of races 7 or higher (lemma 2). So then, I tried to find a way to solve this with just 7 races.

After we raced the horses 6 times, we got this knowledge graph:

We can see that the only horses that could be the second fastest are the ones that are directly connected to the fastest horse. Likewise, we know that the only horses that could be the third fastest are the ones that are directly connected to the contenders for second place.

We are left with 6 horses that are still in the running for first, second, and third. Since we already know which horse is fastest, we just need to race the 5 other horses to find the second and third fastest.

Another race gives us the four edges we need

And there we have it, we found a way to solve the problem with 7 races and proved lemma 1 as well.

Therefore, the minimum number of races needed is 7.

The framework I used in solving this problem can be used to solve many other types of problems:

Reformulate the problem to remove the cruft
Define what we need to prove
Model the problem to break it down and see what we actually need to solve
Start easy and gradually engineer the solution by:
1. Removing the unnecessary parts, and
2. Building from the ground up

Remarks

This problem is also recognizable as a topological ordering problem, but I decided not to present it in that way since I wanted to explain it in a more intuitive “knowledge graph” manner.

So, I’ll present the topological ordering model here. Let’s take the original “knowledge graph” and make it into a directed graph. For those who are unfamiliar, this just means that each edge points from one node to the other. In this case, we can point from the faster horse to the slower horse for each edge. Going back to the diagrams we had, after the 6th race, the edges would look like:

Here, we can see that a path from horse A to horse B means that we know horse A is faster than horse B. A path is just a way to get from one node to another by only following edges in the direction they point.

The final 6 contending horses would look like:

After the 6th race, we have a path from the fastest horse to any of the contenders for 2nd and 3rd place. The 7th race gives us the path among these contenders (making our directed graph strongly connected).