Giant stack of React service instances in the shape of a shoddy pyramid, with the top one falling off, possibly to crush a tiny developer below. 8/10 pretty good metaphor.

Scaling React Server-Side Rendering

I had the opportunity to work on scaling a React rendering service, adapting a fixed hardware provision to deal with increasing load. Over the course of many months, incremental improvements were made to the system to enable it to cope with demand. I thought it might be useful to share the more interesting insights that I gained during this process.

Some of the insights here are React-specific, but many are simply generic scalability challenges, or simple mistakes that were made. React server-side performance optimization has been covered elsewhere, so I’m not going to provide an overview of React performance, generally. I’m going to focus on the “big wins” that we enjoyed, along with the subtle, fascinating footguns. My hope is that I can give you something interesting to think about, beyond the standard advice of setting NODE_ENV=production. Something based on the real, honest-to-goodness challenges we had to overcome.

What I found so interesting about this project was where the investigative trail led. I assumed that improving React server-side performance would boil down to correctly implementing a number of React-specific best practices. Only later did I realize that I was looking for performance in the wrong places. With any luck, these stories will enable you to diagnose or avoid your own performance pitfalls!

Stick figure swinging on a rope over a bottomless pit, towards a shimmering React logo. Remember Pitfall?

Things We Will Talk About

Introduction
The Situation
Load Balancing
Client-Side Rendering Fallback
Load Shedding
Component Caching
Dependencies
- Don’t Get Hacked
- Do You Like Free Things?
Isomorphic Rendering
- The Browser As Your Server
- Pairs Of Pages
The Aggregation Of Marginal Gains
All Your Servers Are Belong To Redux

The Situation

Our team was looking to revitalize the front-end architecture for our product. As tends to be the case with a many years-old monolith, the technical debt had piled up, and front-end modifications were becoming difficult. Increasingly, we were telling product managers that their requested changes were infeasible. It was time to get serious about sustainability.

Within the front-end team, a consensus was quickly reached that a component-oriented architecture built on React and Redux was the best bet for a sustainable future. Our collective experience and intuition favored separating concerns at the component level, extracting reusable components wherever possible, and embracing functional programming.

We were beginning with the fairly modest, spaghetti front-end that most monolithic applications seem to evolve into. Browser requests would hit a load balancer, which would forward requests to one of several instances of a Java/Spring monolith. JSP-generated HTML templates were returned, styled with CSS (LESS), and dynamic client functionality was bolted on with a gratuitous amount of jQuery.

Diagram of monocled, top-hatted user connecting to a Load Balancer, which forwards the request to a Monolith. Monolith responds by returning a rendered JSP document to the load balancer, which sends it to the user. Pretty boring stuff.

The question was how to integrate our desire for a React front-end with a Java monolith. SEO was a very important consideration — we had full-time SEO consultants on staff — and we wanted to provide the best possible page load speed, so server-side rendering quickly became a requirement. We knew that React was capable of isomorphic (client- and server-side) rendering. The back-end team was already on their journey towards breaking up the monolith into a microservice architecture. It therefore seemed only natural to extract our React server-side rendering into its own Node.js service.

The idea was that the monolith would continue to render JSP templates, but would delegate some parts of the page to the React service. The monolith would send rendering requests to the React service, including the names of components to render, and any data that the component would require. The React service would render the requested components, returning embeddable HTML, React mounting instructions, and the serialized Redux store to the monolith. Finally, the monolith would insert these assets into the final, rendered template. In the browser, React would handle any dynamic re-rendering. The result was a single codebase which renders on both the client and server — a huge improvement upon the status quo.

As we gained confidence with this new approach, we would build more and more of our features using React, eventually culminating with the entire page render being delegated to the React service. This approach allowed us to migrate safely and incrementally, avoiding a big-bang rewrite.

The same web page, composed three different ways, demonstrating an incremental migration path. Starting form is entirely JSP-rendered. Transition form contains a mixture of JSP and React-rendered elements. Goal form is one huge React-rendered component.

Our service would be deployed as a Docker container within a Mesos/Marathon infrastructure. Due to extremely complex and boring internal dynamics, we did not have much horizontal scaling capacity. We weren’t in a position to be able to provision additional machines for the cluster. We were limited to approximately 100 instances of our React service. It wouldn’t always be this way, but during the period of transition to isomorphic rendering, we would have to find a way to work within these constraints.

Load Balancing

I Got 99 Percentiles

The initial stages of this transition weren’t without their hiccups, but our React service rendering performance was reasonable.

Graph of Response Latency (ms) over time. p50 response time is plotted as being fairly consistent at around 5ms. p99 response time is more erratic but generally around 50ms.

As we ported more and more portions of the site to React, we noticed that our render times were increasing — which was expected — but our 99th percentile was particularly egregious.

Graph of Response Latency (ms) over time. p50 response time is plotted as being fairly consistent around 30ms. p99 response time is fairly erratic but generally around 250ms. Yeah, I know, graphs are boring. Drawing them was surprisingly fun, though. At least one of us is having fun.

To make matters worse, when our traffic peaked in the evening, we would see large spikes in 99th percentile response time.

We knew from our benchmarks that it simply does not take 400ms to render even a fairly complex page in React. We profiled and made lots of improvements to the service’s rendering efficiency, including streaming responses, refactoring React component elements to DOM node elements, various Webpack shenanigans, and introducing cached renders for some components. These measures mitigated the problem, and for a while we were hovering right on the edge of acceptable performance.

Seasonality

One day I was looking at our response latency graph, and I noticed that the problem had returned. Unusually high traffic during the previous evening had pushed our 99th percentile response times past the acceptable threshold. I shrugged it off as an outlier — we were incredibly busy, and I didn’t have time to investigate.

This trend continued for a few days. Every evening when traffic peaked, we would set a new record. Zooming out to show the last few days, there was a clear trend of increasing response time.

There was a clear correlation in the graphs between traffic volume and response time. We could attempt to duct tape the problem, but if traffic were to increase, we would be in bad shape. We needed to scale horizontally, but we couldn’t. So how close were we to a calamity? I pulled up an annual traffic graph, and promptly spit out my tea.

Without a doubt our response times would dramatically increase with traffic. It was currently spring — roughly the annual midpoint for traffic — and by summer we would be drowning in requests. This was Very Bad.

But how could we have missed this? We thought we had solved this problem already. What gives?

I’m pretty sure we were caught off guard due to the seasonality of our traffic. Starting the previous summer — when traffic was at its peak — we began moving more and more functionality to React. If traffic had remained constant, the increased component rendering load would have caused our response times to increase. Instead, as the year progressed, traffic was decreasing. Requests were going down, but the per-request workload was going up! The result was a roughly flat response time during the fall and winter seasons. As traffic picked up again in the spring, our response times rapidly increased, and this time the effect was magnified by the increased per-request workload.

Randomness

Out of ideas for squeezing easy performance wins out of the system, I started asking some of my colleagues for suggestions. During one one of these conversations, somebody mentioned the fact that our service discovery mechanism, Consul, returns three random service instances for every service discovery request.

I remembered reading a fantastic Genius article several years ago, which told the story of the performance regressions that they experienced when Heroku silently switched to a randomized load balancing strategy, causing a 50x decrease in scaling efficiency. If we were using a similar load balancing strategy, then we were likely to be suffering the same fate. I did a bit of spelunking and confirmed that this was indeed the case.

Basically, when the monolith needs to make a request to the React service, it needs to know the IP address and port where it can locate an instance of that service. To get this information, a DNS request is sent to Consul, which keeps track of every active service instance. In our configuration, for each service discovery request, Consul returns three random instances from the pool. This was the only load balancing mechanism within the system. Yikes!

Before I continue, I should explain why random load balancing is inefficient.

Let’s say you have a load balancer and three service instances. If the load balancer routes requests randomly to those instances, the distribution of requests will always be severely uneven.

Diagram showing Load Balancer routing requests randomly to three service instances. Instances receive 7, 2, and 4 requests, respectively. Random load balancing always produces a distribution like this. Yes, it does. YES IT DOES!!!1

I have explained this problem to many people, and it confuses a huge number of them. It reminds me of the Monty Hall problem — even though it’s true, people find it hard to believe.

But yes, it’s true: random load balancing does not balance load at all! This can be easier to understand if you flip a coin, counting the number of heads and tails. The balance is almost always uneven.

A common response is that the load may not be balanced at the beginning, but over time the load will “average out” so that each instance will handle the same number of requests. This is correct, but unfortunately it misses the point: at almost every moment, the load will be unevenly distributed across instances. Virtually all of the time, some servers will be concurrently handling more requests than the others. The problem arises when a server decides what to do with those extra requests.

When a server is under too much load, it has a couple of options. One option is to drop the excess requests, such that some clients will not receive a response, a strategy known as load shedding. Another option is to queue the requests, such that every client will receive a response, but that response might take a long time, since it must wait its turn in the queue. To be honest, both options are unacceptable.

Diagram of how load shedding works. Load Balancer sends 3 requests ('too many requests!') to an instance of a service. The instance sends a response for request #1, and and discards requests #2 and #3 into a bottomless pit, which is actually how servers work and not at all a metaphor.

Diagram of how queuing works. Load Balancer sends 3 requests ('too many requests!') to an instance of a service. The instance sends a response for request #1, and enqueues the remaining two requests for later processing. I drew tiny front and back doors on the service, which the queued requests can use to enter and exit. One cannot live on boxes and arrows alone.

Our Node servers were queueing excess requests. If we have at least one service instance per concurrent request, the queue length for each instance will always be zero, and response times will be normal, provided that we are balancing the load evenly. But when we are using a random load balancing strategy, some instances will always receive an unfair share of requests, forcing them to queue the excess ones. The requests at the back of a queue must wait for the entire queue to be processed, dramatically increasing their response time.

To make matters worse, it doesn’t matter how many service instances we have. The random allocation of requests guarantees that some instances will always be sitting idle, while other instances are being crushed by too much traffic. Adding more instances will reduce the probability that multiple requests will be routed to the same instance, but it doesn’t eliminate it. To really fix this problem, you need load balancing.

I installed metrics to graph request queue length per service instance, and it was clear that some services were queueing more requests than others. The distribution would change over time, as the random load balancing just happened to select different instances.

Graph of Request Queue Length (Per Instance). X-axis represents time, and y-axis spans from 0 to 5 requests enqueued. Three hypothetical service instances are plotted, with queue lengths that vary from 0 to 3 requests. Crucially: the queue lengths for each service are different, and constantly fluctuating, a symptom of uneven load balancing.

Load Balancing Strategies

So we need to ensure that the load is evenly distributed across instances. Not wishing to repeat past mistakes, I began researching load balancing strategies. This is a really fascinating topic, and if you’re interested in learning more, I highly recommend Tyler McMullen’s presentation, Load Balancing is Impossible.

Unfortunately, there are so many permutations of load balancing strategies that it would be impossible to test them all in a production environment. The iteration cost for each strategy would be too great. So I followed Genius’ lead and wrote a simple in-memory load balancing simulator which enabled me to experiment with dozens of strategies over the course of a few hours. This gave me much greater confidence in the shortlist of solutions that would be tested in production.

Load Shedding With Random Retries

One clever solution involves configuring our React service to shed load, returning a 503 Service Unavailable instead of queueing excess requests. The monolith would receive the 503 more or less immediately, and would then retry its request on a different, randomly selected node. Each retry has an exponentially decreasing probability of reaching another overloaded instance.

Unfortunately, when I simulated this approach I discovered that it was not the most efficient. It was certainly better than a single, random attempt, but it does not perform as well as a round-robin algorithm, for example.

There are a few reasons for this. First, each retry adds additional network latency to the ultimate response time. All other things being equal, an algorithm which does not issue redundant requests will not suffer this overhead.

Second, as the cluster of service instances becomes saturated with traffic, the probability that a retry will reach a healthy instance decreases! Think about a 5 instance cluster, with 4 instances at capacity, unable to handle additional requests — the odds that a retry will reach the 1 available instance are only 20%! This means that some requests will suffer many retries in order to receive a response.

Diagram showing Monolith choosing randomly among 5 service instances for its next request. Instances have 3, 2, 1, 0, and 4 requests in queue, respectively. Random selection has a 20% chance of picking the instance with no requests in its queue, i.e. the 'best' instance. Of course, I love all my instances equally. None of them are truly the 'best'. Some are just more enqueued than others.

This problem is less pronounced when you can scale horizontally, but hopefully the inefficiency of this solution is clear. I wanted to do better, if possible.

Round-Robin

A much better approach is to route each request, in turn, to the next instance in the cluster, commonly known as a round-robin algorithm.

Round-robin guarantees that each service instance will receive exactly its fair share of requests. This is the simplest load balancing algorithm that we can honestly say is balancing load in a meaningful way. Accordingly, it vastly outperforms random, and load shedding with random retries.

Deceptively, round-robin is not the absolute most efficient approach, because requests can vary in the amount of work that they require the server to perform. One request might require 5ms to render a single React component, while another may require 50ms to render a page filled with hundreds of components. This natural variance in per-request workload means that round-robin can send requests to instances which are still processing a previous request, while other instances remain idle. This is because round-robin does not take an instance’s workload into account. It strictly allocates requests as a blackjack dealer would deal cards: everybody gets the same number of cards, but some cards are better than others!

Comic of a blackjack dealer talking to a single player. 'Dealer has 20. You have... 96'. Caption underneath reads, 'Distributed systems blackjack.' I really messed up the bow tie on the dealer, but I'm too lazy to draw it again.

Join-Shortest-Queue

Obviously we can’t speak of the “best” load balancing algorithm, because the “best” choice depends on your particular circumstances. But I would be remiss not to describe what is probably the most widely useful approach, which is a join-shortest-queue strategy.

I’m going to lump a few variations of this strategy together. Sometimes we might use a least-connections, or a join-idle-queue approach, but the unifying principle is the same: try to send requests to the instance which is least overloaded. We can use different heuristics to approximate “load”, including the number of requests in the instance’s queue, or the number of outstanding connections, or having each instance self-report when they are ready to handle another request.

The join-shortest-queue approach outperforms round-robin because it attempts to take the per-request workload into account. It does this by keeping track of the number of responses it is waiting for from each instance. If one instance is struggling to process a gigantic request, its queue length will be 1. Meanwhile, another instance might complete all of its requests, reducing its queue length to 0, at which point the load balancer will prefer to send requests to it.

Fabio

So how did we resolve our load balancing woes? We ended up implementing a round-robin load balancer, Fabio, as a compromise solution, trading off performance for convenience.

While Fabio does not support a join-shortest-queue load balancing strategy, it integrates seamlessly with Consul, giving us server-side service discovery. This means that our monolith can simply send requests to Fabio, and Fabio figures out both how to get them to the React service, and also how to balance the load in a reasonable way.

Of course, in this configuration our load balancer becomes a single point of failure — if it dies, we can’t render any web pages!

To provide an availability strategy, we implemented our Fabio load balancer as just another containerized service — load balancing as a service. The monolith would use Consul to discover a random Fabio instance, and send requests to that instance. If a Fabio instance dies, Consul would automatically detect this and stop offering that instance as one of the random options. We tested failover in production by sending a small amount of traffic through Fabio, and then manually killing a Fabio instance. Consul would reliably recover from this failure within a couple of seconds. Not bad!

We might be tempted to assume that randomly selecting a load balancer would preserve the performance issue we are trying to solve, but in practice this is not a problem. Each instance of Fabio can easily accommodate all of the traffic destined for our React service cluster. If our load balancers are sufficiently fast, it doesn’t matter if the load is evenly balanced across the load balancers themselves. We have multiple load balancers purely to provide failover capability.

Great Success

When the new round-robin load balancing strategy was productionized and ramped up to 100% of traffic, our React service instance queue lengths were a sight to behold. All of the queues converged around the same length. The system works!

Even better, our original problem was solved: peak traffic response latency spikes smoothed out, and our 99th percentile latency dropped. Everything “just worked”, as we had originally hoped.

Graph of Response Latency (ms) during the activation of load balancing. p50 response time is about 50ms, until load balancing is activated, at which point it drops to about 40ms. p99 response time is always erratic, and hovers around 350ms until load balancing is activated, after which it drops to around 200ms. A terrific win!

Client-Side Rendering Fallback

Elastic Inelasticity

The addition of load balancing to our system effectively solved our high latency issues, and the efficiency gains provided a modest amount of additional capacity. But we were still concerned about extraordinary scenarios. Bots would scrape our website, triggering a huge surge in requests. Seasonality, including holidays, could also trigger unexpected increases in traffic. We had enough server capacity to keep up with normal traffic growth, but we could only sleep easily with the knowledge that our system would be resilient under significantly higher load.

Ideally we would build an auto-scaling system which could detect surges in traffic, and scale horizontally to accommodate them. Of course, this was not an option for us. We also couldn’t simply provision 10x more capacity than required. Was there any way we could add some kind of margin of safety? As it turns out, there was.

We couldn’t shed load by dropping requests, but I started thinking about load shedding more generally, and I began to wonder if some kind of load throttling would be possible. Late one evening, a solution popped into my head. We were using Redux, and one of the nice things about Redux is that it makes serialization of state very easy, enabling isomorphic rendering. We were rendering requests on the server, and then handling re-renders on the client, yet isomorphic rendering allows us to render on either the server or client. We don’t always have to do both.

So the way to throttle load was profound in its simplicity: when the server is under high load, skip the server-side render, and force the browser to perform the initial render. In times of great need, our rendering capacity would automatically expand to include every single user’s computer. We would trade a bit of page load speed for the ability to elastically scale on a fixed amount of hardware. Redux is the gift that just keeps on giving!

Diagram of how React server-side rendering works. Browser sends a request to the Monolith, which requests some React component renders from the React service. The React service responds with the rendered components, serialized Redux store, and mounting instructions. These pieces are merged into a JSP template by the Monolith, and sent back to the browser. Pretty straightforward.

How It Works

Building a client-side rendering fallback system is remarkably straightforward.

The Node server simply maintains a request queue length counter. For every request received, increment the counter, and for every error or response sent, decrement the counter. When the queue length is less than or equal to n, perform regular data fetching, Redux store hydration, and a server-side React render. When the queue length is greater than n, skip the server-side React rendering part — the browser will handle that, using the data from the Redux store.

The exact value of n will need to be tuned to match the characteristics of your application. Generally speaking, n should be slightly larger than the typical queue length during peak expected load.

Of course, if SEO is a requirement, this approach contains a slight problem: if a search engine crawls the site during a traffic surge, it may not receive a server-side rendered response, and therefore it may not index your pages! Fortunately this is an easy problem to solve: provide an exception for known search engine user agent strings.

Comic of people lined up to get into Club React. Google made it past the bouncer, who is speaking with the next person in line, who is arguing, 'I'm a friend of Google.'

There is a possibility that the search engine will punish our rankings for treating it differently than other clients. However, it is important to remember that the client-side rendering fallback exists to prevent us from dropping requests during traffic surges, or server failures. It is a safety net for rare, exceptional circumstances. The alternative is to risk sending nothing to the crawler, which could also result in punishment. In addition, we aren’t serving different content to the search engine, we are merely providing it with priority rendering. Plenty of users will receive server-side rendered responses, but search engines will always receive one. And of course, it is easy to remove this priority if it is considered counter-productive.

The Results

The day after we deployed client-side rendering fallback to production, a traffic spike occurred and the results were outstanding. The system performed exactly as we had hoped. Our React service instances automatically began delegating rendering to the browser. Client-side renders increased, while server-side request latency held roughly constant.

We benchmarked the efficiency gained through this approach, and found that it provides a roughly 8x increase in capacity. This system went on to save us multiple times over the next several months, including during a deployment error which significantly reduced the number of React service instances. I’m extremely pleased with the results, and I do recommend that you experiment with this approach in your own isomorphic rendering setup.

Load Shedding

Why You Need Load Shedding

Previously I mentioned that load shedding could be used in conjunction with random retries to provide an improvement over purely random load balancing. But even if a different load balancing strategy is used, it is still important to ensure that the React service can shed load by dropping excess requests.

We discovered this the hard way during a freak operations accident. A Puppet misconfiguration accidentally restarted Docker on every machine in the cluster, simultaneously. When Marathon attempted to restart the React service instances, the first ones to register with Consul would have 100% of the normal request load routed to them. A single instance could be swamped with 100x its normal request load. This is very bad, because the instance may then exceed the Docker container’s memory limit, triggering the container’s death. With one less active instance, the other instances are now forced to shoulder the additional load. If we aren’t lucky, a cascade failure can occur, and the entire cluster can fail to start!

Checking our graphs during this incident, I saw request queue lengths spike into the thousands for some service instances. We were lucky the service recovered, and we immediately installed a load shedding mechanism to cap the request queue length at a reasonable number.

Not So Fast

Unfortunately the Node event loop makes load shedding tricky. When we shed a request, we want to return a 503 Service Unavailable response so that the client can implement its fallback plan. But we can’t return a response until all earlier requests in the queue have been processed. This means that the 503 response will not be sent immediately, and could be waiting a long time in the queue. This in turn will keep the client waiting for a response, which could ruin its fallback plan, especially if that plan was to retry the request on a different instance.

Diagram of an instance of React service which implements load shedding in conjunction with request queues. A request at the back of the queue, which is due to be shed, must wait for all of the other requests before actually being shed. Meanwhile, the developer also sheds... tears.

If we want load shedding to be useful, we need to send the 503 response almost immediately after the doomed request is received.

Interleaved Shedding

After a bit of brainstorming, I realized that we could provide fast shedding by interleaving request rendering and shedding.

I built a proof of concept by pushing all requests to be rendered into a rendering queue, implemented with a simple array. When a new request arrived, if the queue was smaller than m — where m is the maximum number of concurrent requests to accept — I would push the request object into the array. If the queue has grown too large, a 503 response is immediately sent.

When the server starts, I call a function which pulls a single request from the head of the rendering queue, and renders it. When the request has finished rendering, the response is sent, and the function is recursively called with setImmediate(). This schedules the next single request render after the Node event loop processes accumulated I/O events, giving us a chance to shed the excess requests.

The effect is that a single request is rendered, then all excess requests are shed, then another single request is rendered, and so on. This approach limits the shed response latency to approximately the length of the request that was rendered before it.

Of course, it is possible to provide even faster shedding.

I/O And Worker Processes

To achieve almost instantaneous load shedding, we refactored our application to spawn a cluster of Node processes.

The idea was simple: dedicate one process exclusively to load shedding. When the service starts, the cluster master process forks a number of worker processes. The master process handles I/O, receiving incoming requests and immediately returning a 503 if the worker processes are too busy. If a worker is idle, the master process sends requests to it. The worker performs all of the heavy lifting, including React component rendering, and returns a response to the master. The master process finally sends the HTTP response to the client.

This is the approach we shipped to production. Although it is a bit more complicated, it gives us the flexibility to experiment with various numbers of worker processes. It is also important, when evolving towards a microservice architecture, to take the easy latency wins where we can have them.

Component Caching

The Idea Of Caching

Whenever we’re attempting to improve performance, the topic of caching is going to come up. Out of the box, React server-side rendering performance is not nearly as fast as, say, a JSP template, and so there has been considerable interest in implementing caching strategies for React.

Walmart Labs has produced a very fancy caching library, electrode-react-ssr-caching, which provides caching of HTML output on a per-component basis. For dynamic rendering, prop values can either be cached or interpolated. It’s a very impressive system.

And whoa, it’s fast! Liberal use of caching can reduce render times to sub-millisecond levels. This is clearly the approach which offers the greatest performance gains.

Two Hard Things In Computer Science

Unfortunately, this approach is not without its cost. To implement caching, electrode-react-ssr-caching relies on React private APIs, and mutates some of them. This effectively ties the library to React 15, since a complete rewrite of React’s core algorithm shipped with React 16.

Even more pernicious, there is that old saw looming in the background:

There are only two hard things in Computer Science: cache invalidation and naming things. — Phil Karlton

At it turns out, implementing caching on a per-component basis produces a lot of subtle problems.

Caching And Interpolation

In order to cache a rendered React component, electrode-react-ssr-caching needs to know what to do with the component’s props. Two strategies are available, “simple” and “template”, but I will use the more descriptive terms, “memoization” and “interpolation”.

Imagine a <Greeting> component, which renders a greeting for the user. To keep things simple, let’s assume we only support English and French greetings. The component accepts a language prop, which could be either en or fr. Eventually, two versions of the component would be cached in memory.

When using the memoization strategy, the component is rendered normally, and one or more of its props are used to generate a cache key. Every time a relevant prop value changes, a different, rendered copy of the component is stored in the cache.

Table illustrating that the 'Greeting_en' cache key corresponds with the 'Hello!' rendered component HTML, and the 'Greeting_fr' cache key corresponds with the 'Bonjour!' rendered component HTML.

By contrast, the interpolation strategy treats the component as a template generation function. It renders the component once, stores the output in cache, and for subsequent renders it merges the props into the cached output.

'Greeting' cache key corresponds with the '@1@' rendered component HTML template. When rendering a Greeting component with 'language' prop 'fr', the resulting HTML is 'fr', which is obviously not what we want. When rendering a Greeting component with 'language' prop 'Bonjour!', the resulting HTML is 'Bonjour!', which is the original intention.

It is important to note that we can’t simply pass a language code to the <Greeting> component when we are using interpolation. The exact prop values are merged into the cached component template. In order to render English and French messages, we have to pass those exact messages into the component as props — conditional logic is not usable inside interpolated component render() methods.

Murphy’s Law

How do we choose between prop memoization and interpolation strategies for our cached components? A global configuration object stores the choice of strategy for each component. Developers must manually register components and their strategies with the caching config. This means that if, as a component evolves, its prop strategy needs to change, the developer must remember to update the strategy in the caching config. Murphy’s Law tells us that sometimes we will forget to do so. The consequences of this dependence on human infallibility can be startling.

Let’s say our <Greeting> component is using a memoization strategy for its props, and the language prop value is still being used to generate the cache key. We decide that we would like to display a more personalized greeting, so we add a second prop to the component, name.

Rendering a memoized Greeting component which receives a 'language' prop of 'en', and a 'name' prop of 'Brutus', will result in 'Hello, Brutus!'.

In order to accomplish this, we must update the component’s entry in the caching config so that it uses the interpolation strategy instead.

But if we forget to update the strategy, both prop values will be memoized. The first two user names to be rendered within the <Greeting> component will be cached, one per language, and will accidentally appear for all users!

Oh FOUC!

It gets worse. Since component caching is only used for server-side renders, and since all of our state is stored in Redux, when React mounts in the browser its virtual DOM will not match the server-side rendered DOM! React will correct the situation by reconciling in favor of the virtual DOM. The user will experience something like a flash of unstyled content (FOUC). The wrong name will appear for a split-second, and then the correct one will suddenly render!

Now imagine that this content is being served to a search engine crawler. When a human looks at the page, they are unlikely to notice the error, because the client-side re-render fixes the issue in the blink of an eye. But search engines will index the incorrect content. We are in danger of shipping serious SEO defects, potentially for long periods of time, with no obvious symptoms.

Exploding Cache

It gets even worse. Let’s assume our application has one million users, and that we generate cache keys for the <Greeting> component using both language and name prop values. Accidentally forgetting to switch from memoization to interpolation means that the new name prop, which will be rendered with one million unique values, will generate one million cache entries. The cache has exploded in size!

Illustration huge explosion occurring on Earth due to passing millions of user names to an accidentally memoized component, which inflates memory usage until probably something explodes. Probably.

If this accident exhausts available memory, the service will terminate. This failure will probably sneak up on us, as cache misses don’t all occur at once.

Even if we set a maximum cache size and employ a cache replacement policy — such as least recently used (LRU) — the cache explosion runs a serious risk of exhausting cache storage. Components that would have been cached are now competing for cache space with all of the other debris. Cache misses will increase, and rendering performance could severely degrade.

Comic of Star Trek bridge, with someone saying, 'Captain, our cache size has reached critical limits.' The Captain responds, 'Eject the warp core!' The caption reads, 'Don't let this happen to you.'

Making The Opposite Mistake

Now let’s imagine that we do remember to update the caching config, changing the prop strategy to from memoization to interpolation for our <Greeting> component. If we do this, but forget to update the component’s prop usage, we will ship a broken component to production.

Recall that interpolated prop values are merged as-is into the rendered component template. Conditional logic inside a component’s render() method — such as the selection of a greeting based on the value of the language prop — will only ever execute once. If the first render happens to produce an English greeting, the template will be cached with the English greeting baked-in. For all subsequent renders, the user’s name will be successfully interpolated, but the rest of the greeting will only ever render in English.

Cache Rules Everything Around Me

No matter which way we look at it, modifying the props of a cached component becomes fraught with danger. The developer must take special care to ensure that caching is correctly implemented for each component. React components experience a lot of churn as new features are added, so there are constant opportunities to make an innocuous change which destroys SEO performance, or destroys rendering performance, or renders incorrect data, or renders private user data for every user, or brings the UI down entirely.

Due to these problems, I’m not comfortable recommending per-component caching as a primary scaling strategy. The speed gains are incredible, and you should consider implementing this style of caching when you have run out of other options. But in my view, the biggest advantage of isomorphic rendering is that it unifies your codebase. Developers no longer need to cope with both client- and server-side logic, and the duplication that arrangement entails. The potential for subtle, pernicious bugs creates the need to think very carefully about both client- and server-side rendering, which is precisely the wasteful paradigm we were trying to get away from.

Dependencies

Don’t Get Hacked

I would be remiss not to mention the disgustingly cheap performance wins we were able to achieve by keeping our dependencies up to date. Dependencies such as Node.js and React.

It is important to keep your dependencies up to date so that you don’t get hacked. If you’re on the fence about this, just ask Equifax how well that worked out for them.

A newscaster sitting at their newscaster's desk reads the following cast of news. 'Equifax revealed that a cyberattack potentially compromised confidential information of 143 million Americans. The breach was open from mid-May to July 29. That was when Equifax first detected it. This security weakness was publicly identified in March and a patch to fix it had been available since then.

Do You Like Free Things?

But that’s not all! If you act now, your dependency upgrades will come with a free performance boost!

Because we were seeking to improve performance, we became interested in benchmarking upgrades to major dependencies. While your mileage may vary, upgrading from Node 4 to Node 6 decreased our response times by about 20%. Upgrading from Node 6 to Node 8 brought a 30% improvement. Finally, upgrading from React 15 to 16 yielded a 25% improvement. The cumulative effect of these upgrades is to more than double our performance, and therefore our service capacity.

Profiling your code can be important, as well. But the open source community is a vast ocean of talent. Very smart people are working incredibly hard, often for free, to speed up your application for you. They’re standing on the corner of a busy intersection, handing out free performance chocolate bars. Take one, and thank them!

Comic of person handing out free performance chocolate bars to a turtle, saying, 'Free performance?' An arrow points out that the turtle is an 'Insecure turtle.' Slow, and without the latest dependencies, also insecure. Get it? GET IT? OH COME ON!!!1

Isomorphic Rendering

The Browser As Your Server

Isomorphic rendering is a huge simplicity booster for developers, who for too long have been forced to maintain split templates and logic for both client- and server-side rendering contexts. It also enables a dramatic reduction in server resource consumption, by offloading re-renders onto the web browser. The first page of a user’s browsing session can be rendered server-side, providing a first-render performance boost along with basic SEO. All subsequent page views may then fetch their data from JSON endpoints, rendering exclusively within the browser, and managing browser history via the history API.

If a typical user session consists of 5 page views, rendering only the first page server-side will reduce your server resource consumption by 80%. Another way to think of this is that it would achieve a 5x increase in server-side rendering capacity. This is a huge win!

Pairs Of Pages

Evolving toward this capability in a legacy application requires patience. A big-bang rewrite of the front-end, in addition to being incredibly risky, is usually off the table because it is a very expensive prospect. A long-term, incremental strategy is therefore required.

I think it makes sense to conceive of this problem in terms of pairs of pages. Imagine a simple, e-commerce website, with home, search results, and individual product pages.

Diagram of a common web page architecture, with many users starting on the Home page, proceeding to the Search page, which displays search results, and then finally ending up on a Product page.

If you upgrade both the home and search results pages to take advantage of isomorphic rendering, most users will hit the homepage first and can therefore render the search results page entirely within the browser. The same is true for the search results and product page combination.

Diagram of the Home, Search, Product page user flow, with the Home-Search and Search-Product page pairings highlighted. Ideally pages will transition to an isomorphic rendering strategy in adjacent pairs, such as those identified here.

But it’s easy to miss out on these strategic pairings. Let’s say your search results page is where all of the money is made, and so the product team is hesitant to modify it. If we invest our time into improving the home and product pages, making them isomorphic in the process, we won’t see much uptake in client-side rendering. This is because in order to get from the homepage to a product page, most users will navigate through a search results page. Because the search results page is not isomorphic, a server-side render will be required. If we’re not careful, it’s easy to perform a kind of inverse Pareto optimization, investing 80% of the resources to achieve only 20% of the gains.

The Aggregation Of Marginal Gains

It is astonishing how a large number of small improvements, when compounded, can add up to produce one enormous performance boost. I recently learned that the term aggregation of marginal gains describes this phenomenon. It is famously associated with Dave Brailsford, head of British Cycling, who used this philosophy to turn the British Cycling team into a dominant force.

It is important to emphasize the compounding effect of these gains. If we implement two improvements which, in isolation, double performance, combining them will quadruple performance. Various fixed costs and overhead will affect the final result, but in general this principle applies.

Human psychology seems at odds with this approach. We tend to prefer quick wins, and short-term improvements. We tend not to consider a long-term roadmap of improvements in aggregate, and certainly not their compounding effects. These tendencies discourage us from exploring viable strategies. Comparing React server-side rendering to traditional server-rendered templating, React at first seems like it “doesn’t scale”. But as we layer performance improvement techniques, we can see that we have enormous performance headroom.

How much performance can we gain? And in which order should we pursue these techniques? Ultimately, the exact techniques and their order of implementation will depend on your specific situation. Your mileage may vary. But as a generic starting point from which to plan your journey, I recommend the following approach.

First, upgrade your Node and React dependencies. This is likely the easiest performance win you will achieve. In my experience, upgrading from Node 4 and React 15, to Node 8 and React 16, increased performance by approximately 2.3x.
Double-check your load balancing strategy, and fix it if necessary. This is probably the next-easiest win. While it doesn’t improve average render times, we must always provision for the worst-case scenario, and so reducing 99th percentile response latency counts as a capacity increase in my book. I would conservatively estimate that switching from random to round-robin load balancing bought us a 1.4x improvement in headroom.
Implement a client-side rendering fallback strategy. This is fairly easy if you are already server-side rendering a serialized Redux store. In my experience, this provides a roughly 8x improvement in emergency, elastic capacity. This capability can give you a lot of flexibility to defer other performance upgrades. And even if your performance is fine, it’s always nice to have a safety net.
Implement isomorphic rendering for entire pages, in conjunction with client-side routing. The goal here is to server-side render only the first page in a user’s browsing session. Upgrading a legacy application to use this approach will probably take a while, but it can be done incrementally, and it can be Pareto-optimized by upgrading strategic pairs of pages. All applications are different, but if we assume an average of 5 pages visited per user session, we can increase capacity by 5x with this strategy.
Install per-component caching in low-risk areas. I have already outlined the pitfalls of this caching strategy, but certain rarely modified components, such as the page header, navigation, and footer, provide a better risk-to-reward ratio. I saw a roughly 1.4x increase in capacity when a handful of rarely modified components were cached.
Finally, for situations requiring both maximum risk and maximum reward, cache as many components as possible. A 10x or greater improvement in capacity is easily achievable with this approach. It does, however, require very careful attention to detail.

Given reasonable estimates, when we compound these improvements, we can achieve an astounding 1288x improvement in total capacity! Your mileage will of course vary, but a three orders of magnitude improvement can easily change your technology strategy.

All Your Servers Are Belong To Redux

I feel a lot better about the viability of React server-side rendering, now that I have waded through the fires and come out with only minor burns. As with virtually everything in the world of technology, exploring an approach for the first time carries the bulk of the cost. But even if you leave it to somebody else to blaze the trails, there will still be a first time for you. You can’t escape that. Waiting for other people to perfect the backstroke is a very slow way to learn how to swim.

Comic of a beach, with a body of water in the background. Lots of people are swimming and playing in the water, while a single person stands on the beach and addresses the reader: 'I'm not sure that swimming is production-ready yet.' The sun shines in the sky, wearing sunglasses, as suns do.

I know so much more about this topic than I did when I first started. This isn’t to say that my next attempt will be devoid of problems, but knowing exactly where many trap doors and power-ups lie could easily make the next project an order of magnitude cheaper. I’m looking forward to a world where, rather than something to aspire towards, component-oriented, isomorphic architecture is the standard approach. We’re getting there!