Recently, we had created an application that would allow users to log in and have a list of their Facebook friends scanned against certain criteria that would gauge their threat assessment. This was a powerful tool for parents to keep track of with whom their kids were associating.
How it’s Built
The app had a ton of asynchronous processing. It had a Ruby on Rails front-end, with two Node.js servers, one downloading profile information, and the other acting as the rules engine. We were using RabbitMQ for message queuing and Postgres for the database server. The workflow looked like this:
- A user would login to the application with their Facebook credentials
- We would request permissions to view their friends’ profile information
- Once we had permission, we would pass the auth token and user’s Facebook ID onto RabbitMQ
- The message would be picked up by the first Node server where it would download the friends’ profiles and dump them on another queue
- The second Node server, the rules engine, would pick up the profiles off of the queue and run them through a set of criteria to give the friend a threat assessment
- The results would then be saved to Postgres
There were also a couple of cron jobs that would run various clean-up and other miscellaneous tasks.
What Went Wrong?
Overall, the application worked very well, but there were some difficulties that arose during development. One of the things we experienced was some odd socket hangup errors which would cause our Node servers to hang, so we would have to periodically restart them to get them to process messages again. We also had a lot of duplication of effort in maintaining our models between Rails and Node. At one point, we settled on creating an internal REST API, keeping all of our model validations in Rails and creating an API connector for Node to use. This initially cut down on the amount of code being copied between Rails and Node. However, as the application grew, we had to keep growing the API connector for Node, and then we had to start making API calls in more places than we had originally planned. It became a bit cumbersome. Another issue was deployment. Instead of deploying a neat, single package, we had to deploy three distinct parts. Because we split out our API connector into its own module, we also had to ensure that it was installed through NPM upon deployment. Also, Node lacked any dependency management tools like Bundler for Rails, so ensuring that all of the development team was using the same version of modules as the production environment tripped us up a few times.
I Want a Do-Over!
The main reason we used the Node servers was to download the Facebook profiles in the background and then process them through the rules engine. In retrospect, perhaps Resque would have been better for this task. If there is one thing I miss about working in a product-based environment, it’s being able to go back and try a better way. Even though we, as a service-based company, do not always get the opportunity to explore better options for a particular project, due to budget and time constraints, it is important that we always reflect on what we might done differently. Hopefully, with the insight gleaned from reflection, we can create an even more awesome product the next time around. Here are a couple of things I have explored.
Replacing the REST API with Yet Another Queue
Rails was already communicating with the first Node process through RabbitMQ, and the first Node process passed data along to the rules engine through another queue, so it would have made sense for those to exchange information with the Rails environment through another queue. This would have allowed the Node processes to stay ignorant of the database schema, giving us the flexibility of making schema changes without much ceremony.
Nothing but Node
We still would have had to worry about dependency management, but at the rate that the Node community is maturing, that probably will not be an issue for too much longer. There is now a project called Hem, which takes care of the dependency management problem. That community is moving fast!
Keeping it Real on Rails
There are a number of asynchronous job processing tools for Rails. I have already mentioned one, Resque. It is a great tool for managing backgrounded, asynchronous tasks that would have allowed us to use Redis for storing our queues instead of RabbitMQ. It also comes with a web-based front-end to view and manage workers and queues. One of the problems with Resque, however, is that it can be very memory intensive depending on the number of workers that are running. This is because Resque will load up an entire environment for each worker, which can add up very quickly. There is a project called Sidekiq, which aims to match the API of Resque, allowing it to be a drop-in replacement. It uses Celluloid beneath the covers to provide multi-threaded, concurrent operation to eliminate the need for several worker processes. This can dramatically reduce the amount of memory required for the background tasks. A quick note about Celluloid: Celluloid provides an actors API similar to what one would see in Erlang and Scala. Actors try to simplify multi-threaded programming by operating concurrently but without sharing state. This allows a programmer to avoid things like mutexes or locking. Either of these solutions would have provided access to the full Rails environment, eliminating the need to maintain a separate API or duplicating our schema and validations across the various parts. It would have also made deployment so much easier, only having to deploy one application instead of three.
It Goes to Eleven
There is a project, created by the folks over at RedHat, called Torquebox. It aims to be a full-stack enterprise solution that has built-in messaging, services, and scheduled tasks. It sits on top of JBoss AS 7, but provides Ruby APIs for many of the advanced features that it provides, like JMS, authentication, distributed transactions, and more. Backgrounding tasks is as simple as adding one line of code to your Rails models, and creating services that can work on those background tasks is a breeze. Another great feature is the ability to setup scheduled tasks that can be checked into version control and do not require messing around with cron jobs on the production server. We ended up using the whenever gem to accomplish the same thing. All in all, the features offered by Torquebox may have been overkill for this application, but I am really impressed by the project.
We’re Doing it Live
The decisions we made were not necessarily the wrong decisions; hindsight is always 20/20. Incorporating new technology into a project will bring about unique challenges, and pairing new technology with tight time and budget constraints creates its own set of difficulties. I am glad we took that steps that we did because, in the end, we gained some invaluable insight into what works and what we can do better the next go around, and the application still rocked. Given the opportunity to work on a similar project in the future, I will probably take one of the approaches offered here, and at the end of that project, I will have a whole new set of would-have-dones. Learning is a lifelong adventure. If you’re interested in seeing some code examples on how to implement some of these solutions, let me know, and I will do a follow up in another blog post.