Some problems can be solved elegantly with the application of recursion. We're going to look at one shortly, the Merged String Puzzle.
Whenever a problem can be solved with recursion, engineers smile inside. Recursive solutions are often simple and elegant. The core of their algorithms are typically short in length, and the underlying processes for how they function are frequently easy to understand.
Unfortunately, a recursive solution is often an inefficient mechanism for solving a problem. Typically there are more proficient ways to crack to a recursive problem (using dynamic programming, or some other method to avoid repetition), but when absolute performance is not an issue (often the case thanks to Moore’s Law providing us with faster and faster grinding wheels), recursion is the way to go.
Recursive solutions are simple to code. Because of their elegance, logic errors that might otherwise occur are bypassed when a recursive solution is employed. Similarly, other coding errors are often minimized because of the compactness of the code.
If code has to be modified in the future, for instance to add additional functionality, often a recursive solution can be adapted with minimum coding effort and good confidence in testing of the modification.
When writing a recursive function you often think extra hard about possible states; this is because of the need to code your base case. Like a set of Russian stacking Matryoshka dolls, recursive functions call themselves with ever decreasingly complex parameters until eventually some simple base case is arrived at for which an answer can be returned directly. Then, back up the stack you run.
“To understand recursion you must first understand recursion”
As long as you have a base case and good conditions telling you when, and when not, to continue the recursion, then, providing you don’t run out of stack space (by trying to go too deep down the rabbit hole), you’ll eventually get to your solution.
(The curves in the pictures above are Hilbert Curves. They were also generated by recursion. You can read about them here.)
I heard about this problem during a recent conversation about coding interview questions asked by technology companies. I wrote about another interview questions recently on my blog, the two egg problem.
In this puzzle, you are given two input streams and one output stream. The output stream has supposedly been generated by random asymmetrical merging of the two input streams. What does this mean? It means that the two input streams are mixed together, but ordering of each input string is preserved in the output string. A couple of random examples are shown below.
Your job is to write a function into which you pass the three strings and returns a simple Boolean value indicating if the output string is a valid merge of the two input strings.
If the two input streams each contained their own set of distinct characters (such all the characters in the first string being all numbers, and those in the second string being all letters), then it would be a fairly simple programming problem.
Similarly, if we knew that each character in the both input strings only occurred once it would be a pretty trivial problem. However, we have to plan for the worst and assume that either string can contain any number of characters and multiple times.
It’s an interesting problem as the various combinations of characters can be pulled from either string in any quantity up to the length of each input string as we merge them.
We can solve this problem by breaking it down into simpler versions of itself; testing shortened versions of the strings that, themselves, test shortened versions of the strings until all characters are accounted for and we can prove, or not, if the output is valid.
Let's imagine we have a function IsMerge that returns a boolean, and takes as input the three strings: s1, s2 and s3 (Where s1 and s2 are the two inputs, and s3 is the merged output.)
Function IsMerge(s1,s2,s3) as boolean
Before we start with the recursion, however, we can apply a very simple optimization. If the length of the output string is not equal to the sum of the lengths of the two input strings there is no way that it could be a valid solution and we can exit the function without performing any more tests. (Pseudo code below)
If Len(s1) + Len(s2) <> Len(s3) Then IsMerge = False Exit Function End If
(Another possible optimization we could employ is to count all the characters and their frequencies in the input strings, then start to do the same for the output string. The combined input strings' distribution of characters should be the same as that of the output. If, as we are counting the characters in the output string, we come across more of one particular character in the output counts than the inputs we can immediately exit at that point with a fail. Just because the character counts match does not mean the output solution is valid, it just means it might be. Based on your knowledge of the input streams and their character distribution, as an engineer, you could determine if this optimization is worth it – with all algorithms and optimizations, knowing the common expected input patterns and common failure modes to look for will greatly help you in determining the optimal solution.)
The next test to check is to see if the length of either of the two input strings is zero length (it is empty). If this is the case, then the output string must be identical to the other input string for the solution to be valid. Another similar test is if the output string length is empty, then for the solution to be valid, both the input strings need to be empty at this time too. All these tests can encapsulated into one logical test:
If Len(s1) = 0 Or Len(s2) = 0 Or Len(s3) = 0 Then If s1 + s2 = s3 Then IsMerge = True Else IsMerge = False Exit Function End If
If any of the strings is empty, then the output string needs to be equal to the sum of the two input strings (think about it for a second to make sure you understand).
The next obvious test is to check to see if either of the leading characters of the two input strings is the leading character of the output string, if not there is no way the solution can be valid.
If Left(s1, 1) <> Left(s3, 1) And Left(s2, 1) <> Left(s3, 1) Then IsMerge = False Exit Function End If
Finally, if this test passes, we can move onto the recursion magic. If the leading character from the first input string matches the leading character of the output string, we remove both of these characters and recursively call the function again using these two short strings and the other input string.
Similarly, if the leading character of the second input string matches the leading character of the output we do the same exercise
If Left(s1, 1) = Left(s3, 1) And IsMerge(Right(s1, Len(s1) - 1), s2, Right(s3, Len(s3) - 1)) Then IsMerge = True If Left(s2, 1) = Left(s3, 1) And IsMerge(s1, Right(s2, Len(s2) - 1), Right(s3, Len(s3) - 1)) Then IsMerge = True
The function recursively calls itself until neither of the leading characters match (definite fail), or one of the strings becomes empty and then we test what is left (passing or failing depending on what is left).
Simple and elegant.