JDK8 + Facebook React: Rendering single page apps on the server (August Lilleaas' blog)

I've taken a short break from my OS project in the light of the recent release of JDK8. The addition of Nashorn, a brand spanking new JavaScript engine, makes executing JS from a JVM app really easy and convenient. Much more so than the old Rhino.

The goal is to make a single page web app where all the URLs will also be renderable by the server, without any extra work on your part, both in terms of deployment and coding.

Serving HTML from the server is great for search engines. If all your content is generated by a single page web app that downloads data and executes JS, your site won't get crawled at all - no popular search engines exceutes JS. Typically you'd have to write light versions/duplicates for server-side rendering of your pages, or do something crazy like loading your pages into phantomjs and extract the generated HTML from there.

It's also great for mobile. Since the initial request already contains all of the HTML, you don't have to wait for JS to be downloaded and then executed. The HTML is always what gets downloaded first, so if you load the HTML, and you loose your 3G connection before the JS can load, you still get to see the content.

Heck, you can even browse <noscript>, and still be able to use the single page web app.

Full code: github.com/augustl/react-nashorn-example

Read on for some juicy deails.

Executing JavaScript on a JVM

The API for invoking Nashorn is very smimilar to the API to invoke the old Rhino engine. Ask the ScriptEngineManager for a ScriptEngine by name. I use Clojure, and I think all programmers should be able to read and skim Clojure code, so I'll just jump right in.

(import '[javax.script ScriptEngineManager])

(def nashorn (.getEngineByName (ScriptEngineManager.) "nashorn"))

(.eval engine "5 + 5") ;; 10 (not 10.0 like Rhino did)
(= true (.eval engine "' \\r\\n' == 0")) ;; true

The bridge is very smooth. When we return JS bools, we get proper JVM bools back. The same goes for numbers, strigs and other types. So you don't have to manually serialize values. Nashorn is pretty fast, rivaling the Sunspider engine.

Here's how you pass Java values into the JS env.

(.eval nashorn "5 + x", (doto (.createBindings nashorn) (.put "x" (* 2 Math/PI))))
;; 11.283185307179586

See nashorn-utils in the example app for more details, such as how to create a binding that has existing global vars available but still lets you pass custom one-off values.

Renderring React JS component as HTML on the server

Facebook React is engineered with server-side HTML generation in mind. React UIs are built up with a tree structure of React components. A React component is a mix of DOM elements and other components. But the DOM elements aren't actual browser based DOM elements, it's a "virtual" DOM. This DOM is just a bunch of plain JS objects that represents someting that looks very much like a DOM - tags with tag names, attributes, and children. But since they're not actual DOM elements, no real browser is required to render React components.

(.eval nashorn "var global = this") ;; React expects 'window' or 'global' to be set
(.eval nashorn (clojure.java.io/reader (java.io.File. "path/to/react.js")))
(.eval nashorn (clojure.java.io/reader (java.io.File. "path/to/react-dom.js")))
(.eval nashorn "var MyComponent = React.createFactory(React.createClass({
  render: function () {
    return React.DOM.h1(null, 'Hi, ' + this.props.msg)
  }
}))")
(.eval nashorn "ReactDOMServer.renderToString(MyComponent({msg: 'World!'}))")
; <h1 data-reactroot="" data-reactid="1" data-react-checksum="-359329492">Hi, World!</h1>

And just like that, we executed code for a single page web app in our backend, without any hassle at all.

What happens in the browser?

We'll render components twice. First the server uses the URL to figure out which component to load. The browser does the same. Won't this double rendering hurt performance?

Let's forget server-side rendering for little a while.

React is all about diffing. When state changes, a brand new virtual DOM will be generated for your components. React will then diff the new virtual DOM with the old one, using clever specialized diffing algorithms. This diff will then be applied to the DOM.

This allows React to do highly performant DOM updates. It batches as much as it can, so you get as few DOM operations as possible, meaning as few repaints as possible. It will also automatically use document fragments and what not. No extra work is required on your part to achieve this. Your components are just dumb, and only has code to render itself completely. No explicit state management, no explicit partial updates.

So how does this benefit us when rendering on both server and browser?

When the page is loaded, we already have the HTML and DOM for the React component - we rendered it on the server. Then, the browser invokes the very same JS that ran on the server, and renders the React component again. At this point, all we have is the virtual DOM. We invoke some browser-only code that attaches the component to the DOM. There isn't a previous virtual DOM to diff against, so React will probe the actual DOM. It will figure out that the DOM is actually the same as our virtual DOM, and leave the DOM completely untouched!

In other words, exactly zero DOM operations are needed to get our browser side JS in sync with what we rendered on the server. Awesome

The actual app

Our app consists of a bunch of React components that are linked up to a URL router. This code is completely browser agnostic, and it has to be, since it should run just fine in our browser and in Nashorn.

I made a very small regex based routing module. The app itself basically looks like this:

var app = {};

var HomePageComponent = React.createFactory(React.createClass({...}));
var PersonShowComponent = React.createFactory(React.createClass({...}));

app.router = sillyRouter.create([
  {path: /^\/$/,
   get: function (props) { return HomePageComponent(props) },
   urls: function (match) { return {people: "/api/people"}; }},
  {path: /^\/people\/([^\/]+)$/,
   get: function (props) { return PersonShowComponent(props); },
   urls: function (match) { return {person: "/api/people/" + match[1]}; }}
])

Gluing it all together

We still need some glue to make this actually work. The idea is that every part of the single page web app that has a URL, should also be renderrable from the server. That means we need a routing module that can run in both the server and browser environments. This routing system needs to have a pluggable way of fetching data. When we visit /people/1, both the server and the browser needs the data for the person with the ID 1.

The example app uses a home brewed very small and silly that allows this. For each URL, the router specifies a list of URLs to fetch. It's then up to nashorn or browser specific code to actually fetch that data. Then, the router is invoked again with the fetched data to get the React component, all wired up and ready for renderring.

In the browser, we basically do this :

// Browser side

function renderUrl(url) {
    var match = app.router.match(url);
    if (match) {
        fetchDataFromUrls(match.urls, function (props) {
            ReactDOM.render(match.get(props),
                            document.getElementById("app"));
        });
    } else {
        renderNotFound();
    }
}

// Invoke by React components that wants go to to a different page.
app.onLocationChangeRequested = function (url) {
    renderUrl(url); // See implementation above
    history.pushState(null, null, url);
};

// Listening to browser back/forward.
window.addEventListener("popstate", function () { renderUrl(location.pathname) });

// On page load, render the current path.
renderUrl(location.pathname);

The implementation of fetchDataFromUrls() can be found here, along with the rest o the browser glue. It uses promises, which are awesome and composable, and something every JS developer should be familiar with.

So as you can see, we only need to do the fetching and browser history part. Everything else is generic and reusable. Let's look at the nashorn glue next.

// Nashorn side

// This is how you access Java from Nashorn
var apiFetcher = Java.type('react_nashorn_example.js_api_fetcher')

function renderUrl(url) {
    var match = app.router.match(url);
    if (match) {
        var props = apiFetcher.resolveUrls(match.urls);
        return ReactDOMServer.renderToString(match.get(props));
    }
}

We invoke it in the back-end like so (here's the impl. of bindings-append):

(.eval nashorn
       "renderUrl(url)"
       (nashorn-utils/bindings-append nashorn {url: "/people/1"}))

We invoke the very same router. To fetch data, we just invoke our API directly without even going through HTTP, since the JS runs in-process with the backend. This is really easy with Clojure, since HTTP is implemented using functions that takes HTTP requests as a map, and returns the response as anothe map. You can read the full implementation of react_nashorn_example.js_api_fetcher here.

Clicking links

What the heck is app.onLocationChangeRequested? It's what our React components invoke when they want to change pages. We also make sure that our React components has normal anchor tags with normal hrefs on them, so they work fine without JS.

var PersonLinkComponent = React.createFactory(React.createClass({
    gotoPerson: function (e) {
        e.preventDefault();
        app.onLocationChangeRequested(this.getPath());
    },
    getPath: function () {
        return "/people/" + this.props.person.id
    },
    render: function () {
        return React.DOM.a(
            {onClick: this.gotoPerson, href: this.getPath()},
            "Go to person " + this.props.person.name);
    }
}));

var HomePageComponent = React.createFactory(React.createClass({
    render: function () {
        return React.DOM.div(
            null,
            React.DOM.h1(null, "The home page!"),
            this.props.people.map(function (p) {
                return React.DOM.p({key: p.id}, PersonLinkComponent({person: p}))
            }));
    }
}));

So in the browser, HTML5 pushState changes the URL and causes the JS routing to be invoked, as we saw earlier. But on the server, we don't generate a DOM, we just get HTML. So none of the onClick stuff is passed on. We just get a plain anchor tag with a href. And what's interesting here, is that if we disable JS alltogether, our app will still work! We share the routing between the server and browser, so when no JS intercepts the click, the URL loads good ol' browser style, and our backend is invoked and loads the React component for that URL. Splendid!

The big tradeoff: Using JS that requires a browser

There's a glaring problem here. If you create React components that invokes 3rd party code, such as jQuery plugins, Bootstrap's JS components, and so on, you're in for trouble. That means you'll get exceptions when attempting to render this on the server, since these 3rd party JS components expects a full browser environment, something Nashorn is not.

I'm not an experienced React developer, so I don't know how this is typically solved. I would imagine it's possible to greate "progressive enhancement" stype React components, so that you'll yield virtual DOM for a plain input box, and somehow enhance it with a auto-complete JS. It really doesn't make any sense to render interactive components on the server anyway.

Perhaps React has something built-in that makes this easy, so that when you execute on nashorn, the DOM requiring code doesn't execute. If your components and app are progressively enhanced, you should be all good.

Wrapping up

What have we seen and learned?

We can easily render our single page app as HTML on the server if we use React
Nashorn makes it easy to invoke JS from a JVM
React handles "ehnahcing" HTML generated on the server very well
Our JS single page web app will work without JS even enabled in the browser!
All URLs will be indexable by a search engine, since our backend serves the HTML automatically in all cases

Not bad!

Again, the full POC can be found at github.com/augustl/react-nashorn-example. I haven't created any sort of reusable library for this yet, since my experience with this sort of architecture is limited to the POC.