#ruby Array of Hashes Quiz

Found this interesting ruby quiz from AlphaSights. Given an array of hashes, collapse into an array of hashes containing one entry per day. And you can only reference the :time key and not the rest.

log = [
  {time: 201201, x: 2},
  {time: 201201, y: 7},
  {time: 201201, z: 2},
  {time: 201202, a: 3},
  {time: 201202, b: 4},
  {time: 201202, c: 0}

# result should be
  {time: 201201, x: 2, y: 7, z: 2},
  {time: 201202, a: 3, b: 4, c: 0},

The first thing came to mind is to use Enumerable#group_by

grouped = log.group_by { |i| i[:time] }
collapsed = grouped.collect do |t, a|
  no_time_h = a.inject({}) do |others, h|
    others.merge h.reject { |k, v| k.to_sym == :time }

  {time: t}.merge(no_time_h)

puts collapsed.inspect

However, after reading this a couple of times, I still find the solution hard to follow. For starter, group_by returns a hash where the values are an array of hashes which brings me back to the original problem even though it is already grouped by time. That I feel made the rest of the code more complicated.

# result of group_by
{201201=>[{:time=>201201, :x=>2}, {:time=>201201, :y=>7}, {:time=>201201, :z=>2}], 201202=>[{:time=>201202, :a=>3}, {:time=>201202, :b=>4}, {:time=>201202, :c=>0}]}

For my second version, I simply loop into the array and compose the hash using :time as the key. Afterwards, use the key-value pair to compose the resulting array. The code may be longer but it is more readable. Remember, Correct, Beautiful, Fast (in That Order).

hash_by_time = {}
log.each do |h|
  time = h[:time]
  others = h.reject { |k,v| k.to_sym == :time }

  if hash_by_time[time]
    hash_by_time[time].merge! others
    hash_by_time[time] = others

collapsed = hash_by_time.collect do |k, v|
  {time: k}.merge(v)

To the Crazy Ones

Though I’ve seen this video a gazillion times, I still find it fresh and inspiring.

To The Crazy Ones

Rebuild Rails Part 1

I have no plans to build another Rails-clone. Let’s leave that work to other smarter people with more time. But wouldn’t it be fun if we can learn how Rails work under the hood and find out what makes it “magical”? In this post I will only cover what happens when you typed that url until you get an HTML page. We’ll simplify further by not using any database access. If you would like to go deeper and wider, there is a book devoted entirely to it that I highly recommend.

Let’s call our application CrazyApp and let’s build our first web app.

# config.ru
app =  proc do |env|
  [200, {'Content-Type' => 'text/html'}, ["hello from crazy app"] ]

run app

$ rackup config.ru -p 3000                                                                       [2.0.0-p247]
Thin web server (v1.6.2 codename Doc Brown)
Maximum connections set to 1024
Listening on, CTRL+C to stop - - [16/Nov/2014 20:42:02] "GET / HTTP/1.1" 200 - 0.0008

It’s all about the Rack

Rack is a gem that sits between your framework (e.g. Rails) and Ruby-based application servers like Thin, Puma, Unicorn, and WEBrick. When you type a url, it goes through several layers of software until it hits our application which in this case just returns a hellow from crazy app. Rack simplifies the interface for web servers that we only have to worry about a few things to handle an HTTP request.

  • HTTP status, e.g. 200
  • Response headers. There are lot of things you can set here but for now let’s stick to content-type.
  • Actual content. In our case, an HTML page.

Let’s look at a boilerplate config.ru that you get from Rails.

# This file is used by Rack-based servers to start the application.

require ::File.expand_path('../config/environment',  __FILE__)
run Rails.application

One step in Rails' bootup process is to define Rails.application as class MyApp::Application < Rails::Application. Both Rails::Application and proc provides a call method that is why both config.ru works.

Now, let’s move our initial config.ru code to a different class that we can later extract into a gem for our framework that we shall call Tracks. From here on, we shall follow Rails conventions and build our gem from there.

# config.ru
require './config/application'
run CrazyApp::Application

# config/application.rb
require './lib/tracks'

module CrazyApp
  class Application < Tracks::Application

# lib/tracks.rb
module Tracks
  class Application
    def self.call(env)
      [200, {'Content-Type' => 'text/html'}, ["hello from tracks"] ]

Exit from your rackup process and re-run it because we are not supporting auto-reloading. This time you will see a message from our Tracks - the super awesome Rails-like framework.

Render a default page

We now introduce a very simple root controller and use it to render a default index page. We also modify our route handling by inspecting the value in env object that Rack passed to our framework. The env packs a lot of information about a request and for our routing, we are interested in PATH_INFO which is the url after the domain minus the query parameters.

# lib/tracks/controlle.rb
module Tracks
  class Controller
    def self.render_default_root
      filename = File.join('public', 'index.html')
      File.read filename

# lib/tracks.rb
require File.expand_path('../tracks/controller', __FILE__)

module Tracks
  class Application
    def self.call(env)
      path_info = env['PATH_INFO']
      if  path_info == '/'
        text = Tracks::Controller.render_default_root
        text = "hello from tracks #{path_info}"

      [200, {'Content-Type' => 'text/html'}, [text] ]

That’s it for now. Next time, we will create our own controllers, action, and dynamic pages using ERB.

#ruby Counting Vowels

Just saw a simple exercise in my Facebook feed and I thought I give it a shot. The problem is simple:

Write a function that returns the number of vowels in the string.

Here’s my ruby solution:

  require 'minitest/autorun'

  def vowel_count(s)
    vowels = %w[a e i o u]
    s.to_s.scan(/\w/).select { |i| i if vowels.include?(i.downcase) }.count

  describe "#vowel_count" do
    it "should count upcase lowercase" do
      test = "I wanted to be an astronaut"
      vowel_count(test).must_equal 10

    it "should be zero for empty string" do
      vowel_count("").must_equal 0

    it "should be zero for nil" do
      vowel_count(nil).must_equal 0

Sounds simple, right? But there are subtle things you should watch out for.

  • Upper and lower cases may seem trivial but programmers are often bitten by these when comparing strings.
  • An initial solution would be to access each character via [index] and increment a counter for vowels. Here is where familiarity with your language’s libraries becomes useful. While I didn’t get the right method initially, I know Ruby’s String library offers a way to extract regex matches. From then on, it’s just a matter of using Enumerable#select which is a common Ruby idiom for filtering elements.
  • Having tests even for a simple code is a good discipline to have. My initial test only covers the functional requirement. When I added the case of nil it quickly showed the flaw in my code, which brings me to my next point.
  • Produce sensible results as much as possible. While you can argue the requirement states a string and not a nil, it is good habit to defend your code in case the caller passed an invalid value. Hence, I converted the parameter to a string to ensure the rest of the code is working with a string object and it gives a sensible result even if the passed parameter is not a string.

Minimalist testing

If you are working with Rails' for a while, you probably been pampered with Rails seamless integration with testing frameworks you’ll be forgiven if you think these support are only available within Rails.

Ruby comes with minitest/autorun that supports a minimalist testing framework. Just require in your code and you are good to go with rspec-style testing right off the bat.

$ ruby vowelcount.rb
Run options: --seed 47907

# Running:


Finished in 0.001155s, 2597.4026 runs/s, 2597.4026 assertions/s.

3 runs, 3 assertions, 0 failures, 0 errors, 0 skips

#hacking Ansible

Before you continue, let me congratulate myself. It’s been years since I’ve written a post. Hooray! Also, thank you to Rad for guiding me on my Ansible adventure. The ansible code I reference here are available in github.

There are tons of posts about Ansible already so this is more about gotchas I learned along the way.


Just like good coding, you should isolate things that will change. Store variables in vars/defaults.yml for things like initial database password, deployment directory, ruby version, and whatnot. I started out with using snake format but I later learned you can have a hierarchy for your variables as well, for example:

  name: google.app

Inside your tasks, you can reference it with . Of course, nothing prevents you from just simply naming your variable as app_name and use it as

Hosts and groups

You can use ansible to work on 1 or more servers at a time. These servers are specified in the hosts file (see hosts in the example). You can also group servers by using [groupname]. In the example below, you can have ansible target qa only (using the —limit option) and it will update the 2 servers you listed under the [qa] group.

localhost ansible_python_interpreter=/usr/local/bin/python




Use roles

Roles allow you to separate common tasks into sort of like modules giving you flexibility. Some use roles to group common tasks like ‘db’, ‘web’, etc. For starters like me and if you are playing with combining different software, I used roles to define specific software. I have a role named ‘mysql’, a role ‘nginx-puma’, or a role ‘nginx-passenger’. Over time, you may split the roles into functional distinctions like web, db, etc.

Creating an EC2 instance

In my example, just update the variables below (included in vars/defaults.yml) to suit your requirements.

ec2_keypair: yourapp-www
ec2_instance_type: t1.micro
ec2_security_group: web-security-group
ec2_region: us-west-1
ec2_zone: us-west-1c
ec2_image: ami-f1fdfeb4 # Ubuntu Server 14.04 LTS (PV) 64-bit
ec2_instance_count: 1

You need your AWS_ACCESS_KEY, AWS_SECRET_KEY, and PEM file to create an instance. Creating the AWS access keys requires access to the IAM (Identity and Access Management).

When you’re ready, run the command:

AWS_ACCESS_KEY="access" AWS_SECRET_KEY="secret" ansible-playbook -i hosts ec2.yml

I have set ec2.yml to run against ‘local’ group (specified in the ’hosts’ file) because you don’t have a host yet. The Ansible EC2 module is smart enough to know that you are still creating an instance at this point.

After running the playbook, notice that it creates a new entry in the [launched] group in your hosts file. The new entry points to the EC2 instance you just created. At this point, you now have a server to run the rest of the playbooks.

Creating the deploy user

When the instance is created, it would create a new user. In my project where I use Ubuntu, it creates an ‘ubuntu’ user. In other cases, it might only create ‘root’ user. Regardless, your next step is to create a ‘deploy’ user because it is not a good idea to keep using root. You can change the name of the deploy user (see vars/defaults.yml) but I prefer using ‘deploy’ because it is also the default user in capistrano and I’m a Rails guy.

ansible-playbook -i hosts create-user.yml --user root --limit launched --private-key ~/.ssh/yourapp.pem

This playbook does several things:

  • Creates the ‘deploy’ user.
  • Adds your ssh key to /home/deploy/.ssh/authorized_keys
  • Gives deploy sudo rights.
  • Disables ssh and password access for root.

Note that I specified in the initial user ‘root’. Depending on the instance you just created it might be ‘ubuntu’ or some other initial user

Creating an instance in other providers

In Digital Ocean (and other similar providers), you can create an instance using their admin interface. The initial user is ‘root’ and you can specify a password and/or public key. If you forgot the public key, you must add it before you continue.

scp ~/.ssh/id_rsa.pub root@yourapp.com:~/uploaded_key.pub
ssh root@staging.app.com
mkdir -m og-rwx .ssh
cat ~/uploaded_key.pub >> ~/.ssh/authorized_keys


Now that you have your deploy user, just run the command below, take a coffee, and come back after 30 mins. That’s how awesome ansible is.

ansible-playbook -i hosts bootstrap.yml --limit launched

You may need to tweak the roles or update a specific server. For example, you’re testing an upgrade on a staging server. In that case, make sure you specify the corect host group in the —limit parameter

A few more things I learned with this exercise:

  • I can set the file mode with mkdir.
  • Use bash —login to load login profile. When I ssh using ‘deploy@domain‘, my ruby gets loaded correctly. However, when using ansible (which uses ssh), my ruby is not loaded and thus, you will get errors like ‘bundle not found’.
  • I’m torn between using system ruby and chruby. On one hand, I feel like there is no need for chruby because this is a server and no need to switch rubies from time-to-time. On the other hand, why not try it and see how it works.

#ruby Method_missing Gotchas

Forgetting ‘super’ with ‘method_missing’

‘method_missing’ is one of Ruby’s power that makes frameworks like Rails seem magical. When you call a method in an object (or “send a message to the object”), the object executes the first method it finds. If the object can’t find the method, it complains. This is pretty much what every modern programming language does. Except in Ruby you can guard against a non-existent method call by having the method ‘method_missing’ in your object. If you are using Rails, this technique enables dynamic record finders like User.find_by_first_name.

require "rspec"

class RadioActive
  def to_format(format)

  def method_missing(name, *args)
    if name.to_s =~ /^to_(\w+)$/

describe RadioActive do
  it "should respond to to_format" do
    format = stub
    subject.to_format(format).should == format

  it "should respond to to_other_format" do
    subject.to_other_format.should == "other_format"

  it "should raise a method missing" do
    expect do
    end.to raise_error

However, improper use of ‘method_missing’ can introduce bugs in your code that would be hard to track. To illustrate, our example code above intercepts methods whose name are in the ‘to_name’ format. It works fine as our tests tell us except when we try to call an undefined method that does not follow the “to_name” format. The default behavior for undefined method is for the object to raise a NoMethodError exception.

$ rspec method_missing_gotcha-01.rb


  1) RadioActive should raise a method missing
     Failure/Error: expect do
       expected Exception but nothing was raised
     # ./method_missing_gotcha-01.rb:30:in `block (2 levels) in <top (required)>'

Finished in 0.00448 seconds
3 examples, 1 failure

Failed examples:

rspec ./method_missing_gotcha-01.rb:29 # RadioActive should raise a method missing

You can easily catch this bug if you have a test. It would be a different story if you just use your class straight away.

irb(main):001:0> require './method_missing_gotcha-01.rb'
=> true
irb(main):002:0> r = RadioActive.new
=> #<RadioActive:0x007fd232a4d8a8>
irb(main):003:0> r.to_format('json')
=> "json"
irb(main):004:0> r.to_json
=> "json"
irb(main):005:0> r.undefined
=> nil

The undefined method just returns nil instead of raising an exception. When we defined our method_missing, we removed the default behavior accidentally. Oops!

Fortunately, the fix is easy. There is no need to raise the ‘NoMethodError’ in your code. Instead, simply call ‘super’ if you are not handling the method. Whether you have your own class or inheriting from another, do not forget to call ‘super’ with your ‘method_missing’. And that would make our tests happy :)

--- 1/method_missing_gotcha-01.rb
+++ 2/method_missing_gotcha-02.rb
@@ -9,6 +9,8 @@ class RadioActive
   def method_missing(name, *args)
     if name.to_s =~ /^to_(\w+)$/
+    else
+      super

$ rspec method_missing_gotcha-02.rb

Finished in 0.00414 seconds
3 examples, 0 failures

Calling ‘super’ is not just for ‘missing_method’. You also need to do the same for the other hook methods like ‘const_missing’, ‘append_features’, or ‘method_added’.

Forgetting respond_to?

When we modified ‘method_missing’, we are essentially introducing ghost methods. They exist but you cannot see them. You can call them spirit methods if that suits your beliefs. In our example, we were able to use a method named ‘to_json’ but if we look at the list of methods defined for RadioActive, we will not see a ‘to_json’.

irb(main):002:0> RadioActive.instance_methods(false)
=> [:to_format, :method_missing]
irb(main):003:0> r = RadioActive.new
=> #<RadioActive:0x007f88b2a151c0>
irb(main):004:0> r.respond_to?(:to_format)
=> true
irb(main):005:0> r.respond_to?(:to_json)
=> false

Before we introduce a fix, let us first write a test that shows this bug. It’s TDD time baby!

@@ -32,4 +34,8 @@ describe RadioActive do
     end.to raise_error

+  it "should respond_to? to_other format" do
+    subject.respond_to?(:to_other_format).should == true
+  end



  1) RadioActive should respond_to? to_other format
     Failure/Error: subject.respond_to?(:to_other_format).should == true
       expected: true
            got: false (using ==)
     # ./method_missing_gotcha-02.rb:38:in `block (2 levels) in <top (required)>'

Finished in 0.00444 seconds
4 examples, 1 failure

Failed examples:

rspec ./method_missing_gotcha-02.rb:37 # RadioActive should respond_to? to_other format

The fix is every time you modify ‘method_missing’, you also need to update ‘respond_to?’. And don’t forget to include ‘super’.

+  def respond_to?(name)
+    !!(name.to_s =~ /^to_/ || super)
+  end

And with that, we are all green.


Finished in 0.00443 seconds
4 examples, 0 failures

Mining Twitter Data With Ruby - Visualizing User Mentions

In my previous post on mining twitter data with ruby, we laid our foundation for collecting and analyzing Twitter updates. We stored these updates in MongoDB and used map-reduce to implement a simple counting of tweets. In this post, we’ll show relationships between users based on mentions inside the tweet. Fortunately for us, there is no need to parse each tweet just to get a list of users mentioned in the tweet because Twitter provides the “entities.mentions” field that contains what we need. After we collected the “who mentions who”, we then construct a directed graph to represent these relationships and convert them to an image so we can actually see it.

First, we start with the aggregation of mentions per user. We will use the same code base as last time. So if this is your first time, I recommend reading my previous related post or you can follow the changes in Github. Note to self: Convert this to an actual gem in the next post.

# user_mention.rb
module UserMention

  def mentions_by_user
    map_command = %q{
      function() {
        var mentions = this.entities.user_mentions,
            users = [];
        if (mentions.length &gt; 0) {
          for(i in mentions) {

          emit(this.user.id_str, { mentions: users });

   reduce_command = %q{
      function(key, values) {
        var users = [];

        for(i in values) {
          users = users.concat(values[i].mentions);

        return { mentions: users };

    options = {:out => {:inline => 1}, :raw => true, :limit => 50 }
    statuses.map_reduce(map_command, reduce_command, options)


We then again use map-reduce in MongoDB to implement our aggregation. Of course, this sort of thing can be done in Ruby directly but it would be way more efficient if we do it in MongoDB especially if you have a big collection to process. Note that we limit the number of documents to process because we don’t want our graph to look unrecognizable when we display it.

Now that we have our aggregation working, we construct a directed graph of user mentions using the rgl library.

require "bundler"
require File.expand_path("../tweetminer", __FILE__)
settings = YAML.load_file File.expand_path("../mongo.yml", __FILE__)
miner = TweetMiner.new(settings)

require "rgl/adjacency"
require "rgl/dot"

graph = RGL::DirectedAdjacencyGraph.new
miner.mentions_by_user.fetch("results").each do |user|
  user.fetch("value").fetch("mentions").each do |mention|
    graph.add_edge(user.fetch("_id"), mention)

# creates graph.dot, graph.png

Once you have the user-mentions relationships in a graph, you can do interesting things like who is connected to somebody and the degrees of separation. But for now, we are just interested in showing who mentioned whom. Our sample program saves the graph to the file graph.dot (using the DOT language) and PNG output. But the default PNG output is not laid out nicely. Instead, we will use the “neato” program to convert our graph.dot into a nice looking PNG file.

$ neato -Tpng graph.dot -o mentions.png

When you view “mentions.png”, you should see something similar as the one below. The labels are user IDs and the arrows show the mentioned users.

It would be cool to modify our program to use the users' avatars and also make it interactive. Or, use Twitter’s streaming API and create an auto-update graph. I haven’t done any research yet but I’m sure there is some Javascript library out there that can help us display graph relationships.

Mining Twitter Data With Ruby, MongoDB and Map-Reduce

When is the best time to tweet? If you care about reaching a lot of users, the best time probably is when your followers are also tweeting. In this exercise,we will try to figure out the day and time users are the most active. Since there is no way for us to do this for all users in the twitterverse, we will only use the users we follow as our sample.

What do we need

  • mongodb
  • tweetstream gem
  • awesome_print gem for awesome printing of Ruby objects
  • oauth credentials

Visit http://dev.twitter.com to get your oauth credentials. You just need to login, create an app, and the oauth credentials you need will be there. Copy the oauth settings to the twitter.yml file because that is where our sample code will be looking.

Collect status updates

We use the Tweetstream gem to access the Twitter Streaming APIs which allows our program to receive updates as they occur without the need to regularly poll Twitter.

# Collects user tweets and saves them to a mongodb
require 'bundler'
require File.dirname(__FILE__) + '/tweetminer'


# We use the TweetStream gem to access Twitter's Streaming API
# https://github.com/intridea/tweetstream

TweetStream.configure do |config|
  settings = YAML.load_file File.dirname(__FILE__) + '/twitter.yml'

  config.consumer_key       = settings['consumer_key']
  config.consumer_secret    = settings['consumer_secret']
  config.oauth_token        = settings['oauth_token']
  config.oauth_token_secret = settings['oauth_token_secret']

settings = YAML.load_file File.dirname(__FILE__) + '/mongo.yml'
miner = TweetMiner.new(settings)stream = TweetStream::Client.new

stream.on_error do |msg|
  puts msg

stream.on_timeline_status do |status|
  miner.insert_status status
  print '.'

# Do not forget this to trigger the collection of tweets

The code above handles the collection of status updates. The actual saving to mongodb is handled by the TweetMiner module.

# tweetminer.rb

require 'mongo'

class TweetMiner
  attr_writer :db_connector
  attr_reader :options

  def initialize(options)
    @options = options

  def db
    @db ||= connect_to_db

  def insert_status(status)
    statuses.insert status

  def statuses
    @statuses ||= db['statuses']


  def connect_to_db
    db_connector.call(options['host'], options['port']).db(options['database'])

  def db_connector
    @db_connector ||= Mongo::Connection.public_method :new


We will be modifying our code along the way and if you want follow each step, you can view this commit at github.

Depending on how active the people you follow, it may take a while before you get a good sample of tweets. Actually, it would be interesting if you could run the collection for several days.

Assuming we have several days' worth of data, let us proceed with the “data mining” part. Data mining would not be fun without a mention of map reduce - a strategy for data mining popularized by Google. The key innovation with map reduce is its ability to take a query over a data set, divide it, and run it in parallel over many nodes. “Counting”, for example, is a task that fits nicely with the map reduce framework. Imagine you and your friends are counting the number of people in a football stadium. First, you divide yourselves into 2 groups - group A counts the people in the lower deck while group B does the upper deck. Group A in turn divides the task into north, south, and endzones. When group A is done counting, they tally all their results. After group B is done, they combine the results with group A for which the total gives us the number of people in the stadium. Dividing your friends is the “map” part while the tallying of results is the “reduce” part.

Updates per user

First, let us do a simple task. Let us count the number of updates per user. We introduce a new module ‘StatusCounter’ which we include in our TweetMiner module. We also add a new program to execute the map reduce task.

# counter.rb

require 'bundler'
require File.dirname(__FILE__) + '/tweetminer'
settings = YAML.load_file File.dirname(__FILE__) + '/mongo.yml'

miner = TweetMiner.new(settings)

results = miner.status_count_by_user
ap results

Map reduce commands in mongodb are written in Javascript. When writing Javascript, just be conscious about string interpolation because Ruby sees it as a bunch of characters and nothing else. For the example below, we use the here document which interprets backslashes. In our later examples, we switch to single quotes when we use regular expressions within our Javascript.

module StatusCounter
  class UserCounter
    def map_command
        function() {
          emit(this.user.id_str, 1);

    def reduce_command
        function(key, values) {
          var count = 0;
          for(i in values) {
            count += values[i]

          return count;

  def status_count_by_user
    counter = UserCounter.new
    statuses.map_reduce(counter.map_command, counter.reduce_command, default_mr_options)

  def default_mr_options
    {:out => {:inline => 1}, :raw => true }

Follow this commit to view the changes from our previous examples.

When you run ‘ruby counter.rb’, you should see a similar screenshot as the one below:

Tweets per Hour

Now, let’s do something a little bit harder than the previous example. This time, we want to know how many tweets are posted per hour. Every tweet has a created_at field of type String. We then use a regular expression to extract the hour component.

created_at:  'Tue Sep 04 22:04:40 +0000 2012'
regex:  (\d{2,2}):\d{2,2}:\d{2,2}
match: 22

The only significant change is the addition of a new map command. Note the reduce command did not change from the previous example. See the commit.

class HourOfDayCounter
  def map_command
    'function() {
      var re = /(\d{2,2}):\d{2,2}:\d{2,2}/;
      var hour = re.exec(this.created_at)[1];

      emit(hour, 1);

  def reduce_command
      function(key, values) {
        var count = 0;

        for(i in values) {
          count += values[i]

        return count;


def status_count_by_hday
  counter = HourOfDayCounter.new
  statuses.map_reduce(counter.map_command, counter.reduce_command, default_mr_options)

Now run ‘ruby counter.rb’ in the console with the new method and the result should be something like the one below.

Filtering records

Our examples so far include every status since the beginning of time, which is pretty much useless. What we want is to apply the counting tasks to statuses posted the past 7 days, for example. MongoDB allows you to pass a query to your map-reduce so you can filter the data where the map-reduce is applied. One problem though: created_at field is a string. To get around this, we introduce a new field created_at_dt which is of type Date. You can hook it up in the insert_status method but since we already have our data, we instead run a query (using MongoDB console) to update our records. Please note the collection we are using is statuses and the new field is created_at_dt.

var cursor = db.statuses.find({ created_at_dt: { $exists: false } });
while (cursor.hasNext()) {
  var doc = cursor.next();
  db.statuses.update({ _id : doc._id }, { $set : { created_at_dt : new Date(doc.created_at) } } )

Now, that we have a Date field, let’s modify our method to include a days_ago parameter and a query in our map reduce.

def status_count_by_hday(days_ago = 7)
  date     = Date.today - days_ago
  days_ago = Time.utc(date.year, date.month, date.day)
  query = { 'created_at_dt' => { '$gte' => days_ago } }

  options = default_mr_options.merge(:query => query)

  counter = HourOfDayCounter.new
  statuses.map_reduce(counter.map_command, counter.reduce_command, options)

Since we’re now getting the hang of it, why don’t we add another complexity. This time, let us count by day of the week and include a breakdown per hour. Luckily for us, the day of the week is also included in the created_at field and it is just a matter of extracting it. Of course, if Twitter decides to change the format, this will break. Let’s visit rubular.com and try our regular expression.

Now that we have our regex working, let’s include this in our new map command.

def map_command
  'function() {
    var re = /(^\w{3,3}).+(\d{2,2}):\d{2,2}:\d{2,2}/;
    var matches = re.exec(this.created_at);

    var wday = matches[1],
        hday = matches[2];

    emit(wday, { count: 1, hdayBreakdown: [{ hday: hday, count: 1 }] });

Note the difference in the emit function from our previous examples. Before, we only emit a single numeric value that is why our reduce command is simple array loop. This time, our reduce command requires more work.

def reduce_command
  'function(key, values) {
     var total = 0,
         hdays = {},

     for(i in values) {
       total += values[i].count

       hdayBreakdown = values[i].hdayBreakdown;

       for(j in hdayBreakdown) {
         hday  = hdayBreakdown[j].hday;
         count = hdayBreakdown[j].count;

         if( hdays[hday] == undefined ) {
           hdays[hday] = count;
         } else {
           hdays[hday] += count;

     hdayBreakdown = [];
     for(k in hdays) {
       hdayBreakdown.push({ hday: k, count: hdays[k] })

     return { count: total, hdayBreakdown: hdayBreakdown }

In our previous examples, the values parameter is a simple array of numeric values. Now, it becomes an an array of properties. On top of that, one of the properties (i.e. hdayBreakdown) is also an array. If everything works according to plan, you should see something like the image below when you run collect.rb.

Did you have fun? I hope so :)

Adding Keyboard Shortcuts in Web Pages

Adding keyboard shortcuts to interact with your web pages seems like a useless feature when the rest of the world is using a mouse. But for a programmer who wants everything to be a few keystrokes away, keyboard shortcuts are very handy.

In this tutorial, we will add a simple scrolling shortcuts to our webpage. This is just to illustrate what is possible. So please, do not copy-and-paste this to your production code.

What do we need?

Actually, the only critical piece we need is jQuery and knowledge of Javascript. However, since I am more of a Ruby guy, we will use Sinatra to build the page and CoffeeScript to write the Javascript.

Build the pages

The screenshot below (left side) shows how our directory structure would look like. It is pretty much a standard Sinatra structure.

Our HTML page displays 10 entries where each is grouped under a “div” element with an “.entry” class and an ID. We also add in some styling in our page to distinguish each entry.

  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <link rel="stylesheet" href="css/style.css"/> <script type="text/javascript" charset="utf-8" src="http://code.jquery.com/jquery-1.7.1.min.js"></script>
  <script type="text/javascript" charset="utf-8" src="js/app.js">
  <% 1.upto(10) do |i| %>
    <div id="<%= "entry_#{i}" %>"class="entry">
      <%= "Title #{i}" %>
      <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
  <% end %>

If everything is setup correctly, you should be able to run the app and see 10 entries.

$ ruby app.rb
[2012-08-30 13:48:44] INFO WEBrick 1.3.1
[2012-08-30 13:48:44] INFO ruby 1.9.2 (2012-04-20) [x86_64-darwin12.1.0]
== Sinatra/1.3.3 has taken the stage on 4567 for development with backup from WEBrick
[2012-08-30 13:48:44] INFO WEBrick::HTTPServer#start: pid=12415 port=4567

Now for the juicy part. When the user presses ‘j’, we will scroll to the next entry while ‘k’ scrolls to the previous. If you are a Vim user, you know why.

current_entry = -1

$(document).keydown (e) ->
  when 74 then scroll_to_next() # j
  when 75 then scroll_to_previous() # k

scroll_to_next = ->
  #alert "scroll to next"

scroll_to_previous = ->
  if current_entry > 0

scroll_to_entry = (entry) ->
  # Get the element we need to scroll to
  id = $(".entry")[entry].id
  $("html, body").animate { scrollTop: $("##{id}").offset().top }, "slow"

That’s it! As I’ve mentioned before, this is not production ready. For example, the shortcut should not interfere with other actions in your page like when the user is interacting with an input field. This also assumes the current visible entry is the first one.

This post is based from the book Web Development Recipes. If you are looking for quick reference on how to improve your project, I suggest reading the book.

How to Create a Wrapper Gem for Service APIs - Part 1

APIs are getting more and more popular as apps and services move to the cloud. Whenever you need to integrate a popular web service API into your Ruby app, 99.99% of the time there already exists a gem ready for use. That is a testament to how active Ruby developers are in supporting the community.

Even if integrating with the popular APIs is not in your radar, you may still have a need to create an API wrapper for internal use. For example, if your Rails application has grown tremendously (congratulations!), you may eventually need to adopt a services architecture to support upcoming features and make things manageable.

I created the gem for the Open Amplify API as part of my exploration to data mining. When I first created it, my primary goal was simply to wrap the API. Though I still didn’t write spaghetti code, it wasn’t a good example of structured code either. Two years later (yep, that’s how long I let the code rot), I decided to re-write the gem and adopt the architecture from the Twitter gem. It was a good exercise because not only I updated the gem for the newest API version, I also learned a great deal on how to write a gem.

Setup the project

We will create a wrapper for the fictitious Awesome API and thus call our gem ‘awesome’. To get things started, let’s use bundler to set up our initial code.

$> bundle gem awesome
create awesome/Gemfile
create awesome/Rakefile
create awesome/LICENSE
create awesome/README.md
create awesome/.gitignore
create awesome/awesome.gemspec
create awesome/lib/awesome.rb
create awesome/lib/awesome/version.rb
Initializating git repo in /Users/greg/dev/code/awesome

This is the standard directory structure and naming convention of Ruby gems. The files Gemfile, Rakefile, and .gitignore are not necessary but they would be very useful while developing your gem.

Gem dependencies

All gem dependencies should go into awesome.gemspec and not in Gemfile. Inside your Gemfile, the line ‘gemspec’ takes care of identifying the gems you needed in your local.

$> more Gemfile
source 'https://rubygems.org'

# Specify your gem's dependencies in awesome.gemspec


You specify the version of your gem inside lib/awesome/version.rb

$> more lib/awesome/version.rb
module Awesome
  VERSION = "0.0.1"

You may be wondering how is this used by the gem. Take a peek at awesome.gemspec and you’ll see that Awesome::VERSION is used by the .gemspec file.

$> more awesome.gemspec
require File.expand_path('../lib/awesome/version', __FILE__)

Gem::Specification.new do |gem|
gem.authors = ["Greg Moreno"]
gem.email = ["greg.moreno@gmail.com"]
gem.description = %q{TODO: Write a gem description}
gem.summary = %q{TODO: Write a gem summary}
gem.homepage = ""

gem.files = `git ls-files`.split($\)
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
gem.name = "awesome"
gem.require_paths = ["lib"]
gem.version = Awesome::VERSION

Additional modules and classes

You can write all your code in the file lib/awesome.rb and it would still work. However, in the spirit of making code maintainable, it is highly recommended that you put your classes and modules under the directory lib/awesome just like what we did with lib/awesome/version.rb

Testing the gem

We will use minitest but you can always use any test framework you prefer. For our test setup, we need to do the ff:

  • Setup our test directory manually since bundler didn’t do this for us.
  • Create a rake task to run our tests.
  • Specify gem dependencies in our common test helper file.

And the steps and code below.

$> mkdir test
$> touch test/helper.rb
$> mkdir test/awesome
$> touch test/awesome/awesome_test.rb

# Rakefile
require 'bundler/gem_tasks'

require 'rake/testtask'
Rake::TestTask.new do |test|
  test.libs << 'lib' << 'test'
  test.ruby_opts << "-rubygems"
  test.pattern = 'test/**/*_test.rb'
  test.verbose = true

# test/helper.rb
require 'awesome'
require 'minitest/spec'
require 'minitest/autorun'

Now that we have our testing in place, let’s write a simple test and see if everything works.

# test/awesome/awesome_test.rb
require 'helper'

describe Awesome do
  it 'should have a version' do

# Then, let's run the test
$> rake test
(in /Users/greg/dev/code/awesome)
/Users/greg/.rbenv/versions/1.9.2-p290/bin/ruby -I"lib:lib:test" -rubygems "/Users/greg/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rake/rake_test_loader.rb" "test/awesome/awesome_test.rb"
Loaded suite /Users/greg/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rake/rake_test_loader
Finished in 0.000695 seconds.

1 tests, 1 assertions, 0 failures, 0 errors, 0 skips

Test run options: --seed 55984

Perfect! Now, let’s start working on the juicy parts of our gem.


Every webservice API would definitely require a per-user configuration and your gem should be able to support that. For example in the Twitter gem, some methods require authentication and you setup the default configuration with this:

Twitter.configure do |config|
  config.consumer_key = YOUR_CONSUMER_KEY
  config.consumer_secret = YOUR_CONSUMER_SECRET
  config.oauth_token = YOUR_OAUTH_TOKEN
  config.oauth_token_secret = YOUR_OAUTH_TOKEN_SECRET

Every API has different options but if you are wrapping a webservice, the options often fall into two categories - connections and functional options. For example, connection-related options include the endpoint, user agent, and authentication keys while functional options include request format (e.g. json), number of pages to return and other parameters required by specific API functions. In some APIs, the api key is passed as a parameter to GET calls so while it may be connection-related, it is better to group it with parameter options so you can easily encode all parameters in a single call.Our Awesome API is simple and will not deal with OAuth like the Twitter gem does. For the configuration, we should be able to do this:

Awesome.api_key = 'YOUR_API_KEY'
Awesome.format = :json
# Other options are: user_agent, method

Now, let’s write some tests. Of course, these should fail at first :)

# test/awesome/configuration_test.rbrequire 'helper'

describe 'configuration' do
  describe '.api_key' do
    it 'should return default key' do
      Awesome.api_key.must_equal Awesome::Configuration::DEFAULT_API_KEY

  describe '.format' do
    it 'should return default format' do
      Awesome.format.must_equal Awesome::Configuration::DEFAULT_FORMAT

  describe '.user_agent' do
    it 'should return default user agent' do
      Awesome.user_agent.must_equal Awesome::Configuration::DEFAULT_USER_AGENT

  describe '.method' do
    it 'should return default http method' do
      Awesome.method.must_equal Awesome::Configuration::DEFAULT_METHOD

As I mentioned before, the best way to write your gem (or any program for that matter) is to cleary separate the functionalities into modules and classes. In our case, we will put all configuration defaults inside a module (i.e. lib/awesome/configuration.rb). We also want to provide class methods for the module Awesome which we can easily do using Ruby’s ‘extend’.

# lib/awesome/configuration.rb

module Awesome
  module Configuration
    VALID_CONNECTION_KEYS = [:endpoint, :user_agent, :method].freeze
    VALID_OPTIONS_KEYS = [:api_key, :format].freeze

    DEFAULT_ENDPOINT = 'http://awesome.dev/api'
    DEFAULT_USER_AGENT = "Awesome API Ruby Gem #{Awesome::VERSION}".freeze

    DEFAULT_FORMAT = :json

    # Build accessor methods for every config options so we can do this, for example:
    # Awesome.format = :xml
    attr_accessor *VALID_CONFIG_KEYS

    # Make sure we have the default values set when we get 'extended'
    def self.extended(base)

    def reset
      self.endpoint = DEFAULT_ENDPOINT
      self.method = DEFAULT_METHOD
      self.user_agent = DEFAULT_USER_AGENT

      self.api_key = DEFAULT_API_KEY
      self.format = DEFAULT_FORMAT

  end # Configuration

# lib/awesome.rb
require 'awesome/version'
require 'awesome/configuration'

module Awesome
  extend Configuration

$> rake test
(in /Users/greg/dev/code/awesome)
/Users/greg/.rbenv/versions/1.9.2-p290/bin/ruby -I"lib:lib:test" -rubygems "/Users/greg/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rake/rake_test_loader.rb" "test/awesome/awesome_test.rb" "test/awesome/configuration_test.rb"
Loaded suite /Users/greg/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rake/rake_test_loader
Finished in 0.001600 seconds.

5 tests, 5 assertions, 0 failures, 0 errors, 0 skips

Our gem will not be awesome if we don’t support a ‘configure’ block like what the Twitter gem does. We want to setup the configuration like this:

Awesome.configure do |config|
  config.api_key = 'YOUR_API_KEY'
  config.method = :post
  config.format = :json

Fortunately, it’s an easy fix. We just need to add a ‘configure’ method to the Configuration module. We also update our tests to make sure this new method works.

# lib/awesome/configuration.rb
def configure
  yield self

# test/awesome/configuration_test.rb
after do

describe '.configure' do
  Awesome::Configuration::VALID_CONFIG_KEYS.each do |key|
    it "should set the #{key}" do
      Awesome.configure do |config|
        config.send("#{key}=", key)
        Awesome.send(key).must_equal key

Before we move on, let’s take a second look at our configuration tests. We have tests for checking default values and setting-up new ones. What if we added a new configuration key for our gem? The ‘configure’ tests will be able to handle the new key but we still have to add another test for checking the default value. And we don’t want to right another test code, right? More importantly, we don’t want our tests to yield false positives. If we fail to add the ‘default value’ check, our tests will still pass even though we forgot to set a default value.Let us remove all our default value tests and replace it with code that relies on VALID_CONFIG_KEYS instead.

# test/awesome/configuration_test.rb
Awesome::Configuration::VALID_CONFIG_KEYS.each do |key|
  describe ".#{key}" do
    it 'should return the default value' do
      Awesome.send(key).must_equal Awesome::Configuration.const_get("DEFAULT_#{key.upcase}")

$> rake test
(in /Users/greg/dev/code/awesome)
/Users/greg/.rbenv/versions/1.9.2-p290/bin/ruby -I"lib:lib:test" -rubygems "/Users/greg/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rake/rake_test_loader.rb" "test/awesome/awesome_test.rb" "test/awesome/configuration_test.rb"
Loaded suite /Users/greg/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rake/rake_test_loader
Finished in 0.002935 seconds.

11 tests, 11 assertions, 0 failures, 0 errors, 0 skips

Test run options: --seed 21540

Configuring clients

Our end goal is to wrap API calls that fits nicely into our application and the common approach to do that is to wrap the API calls under a ‘Client’ class. Depending on the size of the API you want to support, the Client class maybe delegating the method calls to other classes and modules but from the point of view your program, the action happens inside the Client class. There are two ways to configure the Client class:

  • It inherits the configuration values defined in the Awesome module;
  • It overrides the configuration values per client

    # Use the values defined in the Awesome module client = Awesome::Client.new client.make_me_awesome(‘gregmoreno’)

    client_xml = Awesome::Client.new :format => :xml client_json = Awesome::Client.new :format => :json

We are not going to show our tests in here but if you are interested, you can view the test code from the github repository. Instead, we show the code that handles the two scenarios for client configuration.

# lib/awesome/client.rb

module Awesome
  class Client

    # Define the same set of accessors as the Awesome module
    attr_accessor *Configuration::VALID_CONFIG_KEYS

    def initialize(options={})
      # Merge the config values from the module and those passed
      # to the client.
      merged_options = Awesome.options.merge(options)

      # Copy the merged values to this client and ignore those
      # not part of our configuration
      Configuration::VALID_CONFIG_KEYS.each do |key|
        send("#{key}=", merged_options[key])

  end # Client

We also need to update our Awesome module. First, we need to require the new file awesome/client.rb so it will be loaded when we require the gem. Second, we need to implement a method that returns all the configuration values inside the Awesome module. Since this is still about configuration, our new method should go inside the Configuration module.

# lib/awesome/configuration.rb
def options
  Hash[ * VALID_CONFIG_KEYS.map { |key| [key, send(key)] }.flatten ]

We’re finally done with the configuration part of our gem. I know it’s a lot of work for a simple task but we managed to put a good structure in our code. Plus, we learned how to make our tests less brittle, and use Ruby’s awesome power to make our code better. In our next installment, we’ll discuss requests and error handling.