Posts Tagged ‘algorithms’

Word Cheat, a simple anagram engine

Those of you who follow this blog since its inception will be aware of my obsession with word games, and specifically algorithms that solve word games.

Yesterday I was playing Word Challenge on Facebook, a simple game where you have to find all the words that can be composed with six random letters. After a couple of trials I decided to try and solve it with a ruby program, but since I didn’t want to use my previous code I decided to try a new approach to solve this problem.

First of all I took an extensive list of Italian words and filtered it to suit my needs, leaving only the words ranging from three to six letters.

Once I had my wordlist I had to build a data structure that could be used for fast anagrams retrieval, thus the WordHash class was born.

class WordHash
  attr_reader :words
 
  def initialize(wordlist)
    @words = {}
    wordlist.each do |word|
      if @words.has_key?(w = word.strip.signature)
        @words[w] << word.strip
      else
        @words[w] = [word.strip]
      end
    end
  end
 
  def get_anagrams(word)
    result = []
    wordlist = @words.each do |k,w|
      result << w if word.contains?(k)
    end
    result
  end
end

This class makes use of two helper methods I added to String, that I think should be really part of the Ruby standard library:

module StringExtensions
  def signature
    self.split('').sort.join
  end
 
  def contains?(search)
   !Regexp.new(search.signature.split("").join(".*")).match(self.signature).nil?
  end
end
 
String.class_eval do
  include StringExtensions
end
The way it all works is quite simple, I build a Hash whose keys are the different signatures of words in the wordlist and whose values are arrays made up by the words with the same signature, so fetching all the anagrams for a word means just accessing word_hash[word.signature].

The get_anagrams method is used to also get anagrams with a length less than that of the original word.

The whole project, complete with tests and a helper script is available on GitHub, feel free to contribute.

Also, big thanks to Stefano Cobianchi, who contributed with the contains? method, that I really couldn’t code :)