Handling APIs with Ruby XML Parsing
Have you ever wanted needed to write a Ruby wrapper for an XML based API?
If you have a burning desire to seamlessly exchange data over the web or you just want to use the latest Interweb 2.0 service – you’re most likely contemplating writing your API client in Ruby… yes that’s why you’re here isn’t it?
Why Use Ruby to Parse XML?
It’s a fact – Ruby kicks ass at parsing XML. You can find tons of examples of XML API clients written in Ruby.
ActiveResource parses XML and handles RESTful HTTP
The new ActiveResource Rails gem found in Rails2 makes pretty light work of handling XML APIs via REST. Unfortunately not everyone is rushing to support REST just yet. If you support a big Rails 1.2 app you can’t just run out an add ActiveResource to your project (which is my case). This post is not about REST or ActiveResource so if you’re looking for that, click the link you just clicked skipped over.
Enough with the useful useless Ruby XML facts…
Show me how to write a Ruby XML API wrapper
I’ve been working with an API recently for a third party registration system on a project we’re rolling out soon. The third party provides their API using domain scoped query URLs and HTTP GET params and returns XML documents.
Huh?
http://example.com/aaflinn/lookup?username=billlumberg&password=swingline
XML Messages
When you get a matching user/pass combination you get something like this.
<auth>
<user>
<username><![CDATA[billlumberg]]></username>
<fullname><![CDATA[Bill Lumberg]]></fullname>
<zipcode><![CDATA92131]></zipcode>
<email><![CDATA[bill.lumberg@initech.com]]></email>
</user>
</auth>
When you put the wrong password you get an error like so.
<auth>
<error><![CDATA[Invalid username/password combination]]></error>
</auth>
If you don’t enter a username at all you’ll get an error thusly.
<auth>
<error><![CDATA[No username given.]]></error>
</auth>
API Wrapper Concepts
Here are some concepts that I felt were important when I started writing my wrapper.
- Simple – it should be easy to code (I hate writing stupid code)
- DRY – if it’s worth writing the wrapper make sure it’s reusable
- Self Documenting – it should be as self documenting as possible (rdoc)
- “Exceptional Code” – it should raise errors on exceptions and handle errors from the libraries it makes use of
- Don’t Spam – Don’t abuse APIs (grrr!)
Requirements
The wrapper should use a GET query string to perform a query to the service provider passing an md5 hashed password and username combination. If the user exists parse the XML result document and return an instantiated user object based on the XML. If no user exists raise some type of rescuable error (RecordNotFound).
Additionally the wrapper should be able to create a new user. This particular service provider uses an HTTP GET query string but in some cases you might find a plain old POST like you’d see in a form or you’ll need to build XML and pass that back to the service provider (which I am not doing here). The wrapper should be able to be able to interpret error messages and determine the status of our create request in a graceful way.
Ruby API Wrapper by Example
The names have been changed to protect the innocent. I’ve edited the wrapper a bit to reduce some complexity and renamed it to hide the actual API provider. Read on in the comments of the code, I’ve done everything on that bullet list and you can read the code as you read the comments setting up just about every line of code written.
#
# Usage
#
# === setup ===
# require ‘ApiExample’
# ApiExample::account = ‘test’
# ApiExample::logger = Logger.new(‘example.log’)
#
# ===Find A User===
# user = ApiExample::User.find(‘bill.lumberg’, :password => ‘swingline’)
#
# ===Create A User===
# user = ApiExample::User.create(:username => ‘bill.lumberg’, :password => ‘swingline’, :email => ‘bill.lumberg@initech.com’)
#
require ‘base64’
require ‘digest/md5’
require ‘net/http’
require ‘rexml/document’
require ‘cgi’
require ‘logger’
module ApiExample
# Example Database Host
HOST = ‘www.example.com’
# These params are required to register a new user
REGISTRATION_PARAMS = [ :username, :password, :email ]
# ApiExample account (brand account not user account)
@@account = nil
@@success_url = nil
@@fail_url = nil
@@logger = Logger.new(STDERR)
mattr_accessor :account, :logger, :success_url, :fail_url
# Exception handling
class ApiExampleError < StandardError; end
class UnexpectedError < ApiExampleError; end
class RegistrationError < ApiExampleError; end
class RecordNotFound < ApiExampleError; end
def md5password(password)
Base64.encode64(Digest::MD5.digest(password)).strip
end
# Using Struct here allows us to make our object act similar to an
# ActiveRecord object. In this example it’s not so obvious of a pain
# in the ass it is but the real class has about 20 or so attributes
# Using means I don’t have to create attribute read and writes
# attribution – I got this idea from the Ben Vinegar’s
# <a href="http://rubyforge.org/projects/freshbooks/">freshbooks gem</a>
User = Struct.new(:username, :email, :password, :fullname, :zipcode)
# Extend the Struct attributes by adding the class and instance methods we want
class User
attr_accessor :attributes, :errors, :new_record
# class method for create a new user
# Usage:
# user = ApiExample::User.create(:username => ‘bill.lumberg’, :password => ‘swingline’, :email => ‘bill.lumberg@initech.com’)
#
def self.create(attributes = {})
object = new(attributes)
object.create
object
end
# class method for finding an existing user
# Usage:
# ApiExample::User.find(‘bill.lumberg’, ‘swingline’) # plain text will be sent md5 hashed rather than clear text
#
def self.find(username, password)
query_params = { :username => username, :password => ApiExample::md5password(password_option[:password]) }
query = query_params.collect{ |k, v| [k, v].map{ |kv| CGI::escape(kv.to_s) }.join(’=’) }.join(’&’)
# @@account doesn’t need to be encoded because we are setting internally, the other stuff is vulnerable
uri = URI::HTTP.build(:host => ApiExample::HOST, :path => "/#{ApiExample::account}/lookup", :query => query)
# I’ve got a bunch of these loggers throughout which are used to debug, remove as you see fit.
ApiExample::logger.debug("URI: #{uri}")
# Make the actual HTTP Request
result = Net::HTTP.get_response(uri)
# Parse the XML Result of the HTTP Request
response = REXML::Document.new(result.body)
ApiExample::logger.debug("RESPONSE: #{response}")
# Check to see if there is a user node, if there’s not raise an error similar to ActiveRecord
unless response.elements[’//user’].nil?
attributes = {}
# Members is the Struct method for the attributes we setup…they correspond to what
# we expect the XML to return, so lets only handle what we know
members.each do |field_name|
node = response.elements[’//user’].elements[field_name.to_s]
next if node.nil?
attributes[field_name.to_sym] = node.text # you can do casting here if you need, I did
end
object = allocate
object.attributes = attributes
object
else
# Raise an error, setting the message to what the API sets as the error in XML
# if there is no ‘error’ node in the XML set an ‘unknown error’ message
raise RecordNotFound, (response.elements[’//error’].text rescue "Couldn’t find ApiExample::User with Username: #{username} because of an unknown error!")
end
end
# instance method to setup new object ala ActiveRecord
# Usage:
# user = ApiExample::User.new(:username => ‘bill.lumberg’)
#
def initialize(attributes = {})
@new_record = true
self.attributes = attributes
end
# instance method ala ActiveRecord
#
def errors
@errors = [] if @errors.blank?
@errors
end
# instance method to get an attributes hash ala ActiveRecord
#
def attributes
butes = {}
members.each{ |member| butes[member.to_sym] = self.send(member) } # you can cast here, I did
butes
end
# instance method to set attributes via a hash ala ActiveRecord
#
def attributes=(new_attributes = {})
members.each do |member|
unless new_attributes.has_value?(member.to_sym)
# #{member}= because you might possibly write an attribute writer to handle the input to =
self.send(member+’=’, new_attributes[member.to_sym]) # you can cast here, I did
ApiExample::logger.debug("#{member}: #{self.send(member).inspect}") # for snooping
end
end
end
# instance method to create the new instantited object ala ActiveRecord
# note the difference in create and self.create are the same as in ActiveRecord
# Usage:
# user = ApiExample::User.new(:username => ‘bill.lumberg’)
# user.password = ‘swingline’
# user.email = ‘bill.lumberg@initech.com’
# user.create
#
def create
# we pass our success and fail URLs even though we aren’t using them for their intended purpose
query_params = { :success => ApiExample::success_url, :fail => ApiExample::fail_url }.merge(attributes)
ApiExample::logger.debug(query_params.inspect) # for snooping
# check to make sure all required params are included
if ApiExample::REGISTRATION_PARAMS.all?{ |param| query_params.include?(param) }
# URL Encode everything
query = query_params.collect{ |k, v| [k, v].map{ |kv| CGI::escape(kv.to_s) }.join(’=’) unless v.blank? }.compact.join(’&’)
# @@account doesn’t need to be encoded because we are setting internally, the other stuff is vulnerable
uri = URI::HTTP.build(:host => ApiExample::HOST, :path => "/#{ApiExample::account}/register", :query => query)
ApiExample::logger.debug("URI: #{uri}") # for snooping
# Make the actual HTTP Request
response = Net::HTTP.get_response(uri)
ApiExample::logger.debug("RESPONSE: #{response}") # for snooping
# Check for failure and errors
if response[‘location’].include?(ApiExample::fail_url)
# Raise an error for each message
CGI::parse(URI::parse(response[‘location’]).query)[‘error’].each do |error_message|
raise RegistrationError, error_message
end
elsif response[‘location’].include?(ApiExample::success_url)
# Change the status to reflect the fact that we’ve saved the object…this could get much more
# in-depth in terms of ensuring our data was correctly saved… you wouldn’t expect to do
# that kind of thing in an RDBMS so why here?
self.new_record = false
else
# it wasn’t the fail url, it wasn’t the success url so what the hell was it?
raise UnexpectedError, "Unknown response URL!"
end
else
# For each missing required param add an error message
ApiExample::REGISTRATION_PARAMS.each do |missing_param|
self.errors << "#{missing_param} can’t be blank." unless query_params.include?(missing_param)
end
end
end
end
end
PHP and ActiveRecord (continued)
Today I saw a big traffic increase from my PHP and ActiveRecord post. It looks like PHPDeveloper posted a link to the article and response, so I’ve written a response to the Arnold’s wor(l)ds response post. Arnold’s post is insightful, references runkit and has a good implementation of a Sortable tasklist example in PHP.
In Ruby everything is an object. A Ruby object that is.
When I was working on Biscuit I longed for a working implementation of runkit. I experimented with it and it did offer exactly what I wanted but it lacked widespread deployment because of and stability. After two years I don’t think much progress has been made (not to take anything away from the author). I think runkit is a hack, a dangerous one too. Here is another example of why Ruby is better.
In PHP if you want the features runkit offers (add super globals, modify class methods at runtime, etc.) you have to hack the engine at the internal C/C++ level, meaning you have to completely control the development, testing and production environments, track source changes, hope nothing breaks… You might as well be forking at that point.
If you want to hack something in the Ruby kernel methods, you just monkey patch – equally dangerous on kernel methods BUT limited in scope…in Ruby you’re not destroying Ruby with monkey patches. You might kill your own application but other applications that don’t share the same instance are not effected.
Lineage
PHP and Ruby have a lot in common, they both derive a certain expressiveness found in Perl, and they are scripting languages with low overhead and a quick learning curve. They’re also different in a number of ways, and I think that can be chalked up to lineage. One could argue that PHP’s design is derived from C++ and Perl and maybe even Java lately.
One would also be able to argue that Ruby derives its design from Smalltalk/Strongtalk and Perl. So the reason a lot of PHP users probably like Ruby is it’s light weight scriptyness and Perlish feel (I know I’ll take crap from Perl, PHP and Ruby developers for that one). But because of the lineage we see straight line inheritance and interfaces in PHP and in Ruby we see mix-ins. Back to Arnold’s article.
How to not write a shitload of code using Ruby Mix-Ins
Arnold next addresses the issue with mix-ins showing his implementation of a sortable class in PHP. His implementation is good but it’s a lot of work compared to just expressing acts_as_list in Ruby. Most people might see this as splitting hairs but it’s not for anyone actually wondering. In Ruby you need exactly 1 (user) class to handle this sort of functionality – the difference being that the Class is the list in Ruby.
In the PHP example you need a TaskList and a Task and since Sortable is only an interface you’ve actually got to write the methods to implement them in your own (user) class. Again, you’re saying “big woop”...well the woop is the fact that you can’t do mix-in style inheritance like this because your class needs to inherit the methods to make them work anything like acts_as_list, ie. by not writing a shitload of code for every sortable class.
The second example demos a sortable runkit implementation. This is really similar to saying PHP6 will have the keyword static to solve the issue with static property inheritance.
The problem with static inheritance in PHP
I’ve seen people describe static property inheritance as a problem, and I’ve probably been guilty of it myself. Well it’s not really a problem. Static properties shouldn’t inherit. The PHP developers have said time and time again they are not changing this, it works how it is supposed to work.
As far as runkit goes, It seems pretty crazy to have an engine level hack + runtime interferance with a class to overcome this issue.
Over and over again when I see this coming up as an “issue with PHP” I remember grappling with it with Biscuit. The theme that I keep seeing when people try to describe this problem has to do either with identifying the calling class using an inherited static method. In other words, people are usually trying to do this:
Class Base {
static public name() {
return get_class(self);
}
}
Class My Extends Base { }
echo My::name(); #=> Base but everyone wants My
They’re usually trying to implement some form of ActiveRecord a factory method or something similar. From what I’ve seen on how PHP6 will be re-using the static keyword and implementing namespaces this problem won’t be fixed like I had hoped.
What PHP6 Actually Needs
There’s an article entitled What PHP6 Actually Needs and you could almost retitle the article Things Ruby has and PHP doesn’t. So if some many good programmers are asking for these features why aren’t they getting them from Zend and PHP6?
A question that Ruby enthusiasts might ask is why are so many PHP developers asking for these features and still not using Ruby??? I think for Ruby it has a lot to do with deployment issues – fast_cgi sucks, ligthttpd you have to jigger with, mongrel and just about every other deployment solution is akin to setting up an application server. For smaller apps that a lot of PHP developers work on having to go through this crap just for a small program is prohibitive.
So I guess this continuation of the original article seems like an attack on PHP. It’s not…it is a continuation about why ActiveRecord works really well in Ruby and not so good in PHP. But there is hope. After looking at the Row Data Gateway pattern I’m thinking a really good implementation can be done in PHP that would give similar flexible muscles for PHP database driven development.
PHP and ActiveRecord
PHP on Rails
Updated: 2007-08-10 : Check out my response to Arnold Daniels’ article.
I’m starting a new job soon and I’ll be working primarily with PHP. Since I’ve been a rubyist for the last 2 years I’m looking at PHP from a Rails development perspective. Before working exclusively with Ruby I hung onto to PHP (because of the project I was working on) by porting Rails bits to PHP. I eventually gave up on porting Rails to PHP after my project’s funding was cut.
What went wrong
So since I’m moving back to a PHP environment my mind is again on Rails bits in PHP. When I first ported ActiveRecord to PHP from Ruby I wasn’t nearly as familiar with Ruby as I am now. I’m looking at PHP again and I understand why the ActiveRecord pattern doesn’t work in PHP nearly as well as it does in Ruby. ActiveRecord in PHP probably works just as well in Java.
Why ActiveRecord Doesn’t work in PHP
(or why everything is better in Ruby)
In Ruby everything is an object. Everything.
In ruby the class is an object and you can tell the class to do things using class methods and class variables with a separate scope than an instance. You simply don’t have this type of flexibility in PHP. Take the following for example…
class Person < ActiveRecord::Base
acts_as_list
has_many :hobbies
end
What exactly is happening here? The two lines within the class are actually calling class methods. The class methods operate on the class which in turn determines how the instance is created.
Here we are telling the person class to act as a list. That will include some instance methods in the Person class using Ruby’s mix-in feature which is far different from simple inheritance.
We’re also telling the class that a person instance has many hobby instances associated with it. Rails performs some magic here and again adds in a number of instance methods that enable this to happen.
In PHP 4 ; 5 this is impossible but we may get something new in PHP6 with the namespace/module feature ideas that have been floating around for the last two years. And why not PHP/Zend? The Zend Framework is clearly an attempt to copy jump on the MVC boat that Ruby on Rails made (more) popular…so let’s just go all the way and copy a feature that will actually enable PHP to evolve. Sorry…side tracked.
If you try to implement the above in PHP you get the following:
Class Person Extends ActiveRecord_Base {
public function __constructor() {
$this->acts_as_list();
$this->has_many('hobbies');
}
}
You might be saying to yourself “big woop, what’s the difference.” The woop is the fact that the class isn’t able to access acts_as_list or has_many unless you define them some place in the inheritance chain. If that’s the case you’ve got to add a lot of conditional garbage to your instances in order to come up with functionality close to this.
A better example becomes really clear when you toss in plugins.
class Hobby < ActiveRecord::Base
limit_by_scope :person
end
Person.current = Person.find_by_name('Flinn')
Hobby.count #=> 0
Hobby.limit #=> 3
Hobby.available #=> 3
%w[kayaking hiking frobnicating].each do |hobby|
Hobby.find_or_create_by_name(hobby)
end
Hobby.count #=> 3
Hobby.limit #=> 3
Hobby.available #=> 0
In the above code we’re setting a class attribute (current) to set the scope for all other classes and objects. Then we’re asking the Hobby class to tell us how many hobby’s the current person has, how many it can have and how many we are free to add.
Where do limit and available come from?
They come from the class method limit_by_scope added by the plugin of the same name. The method adds a few class methods and class attributes to enable this functionality. It IS IMPOSSIBLE to do this in PHP.
Reflection, Reflection, Reflection.
In PHP you have no ability to reflect on a class the way you would in Ruby. This becomes very clear for anyone who has ever tried implement something like this in PHP:Person::find(1)
For the record the above IS IMPOSSIBLE to do the right way™ but let’s explore it for fun.
Class Base {
function me() {
return get_class();
}
}
Class Foo Extends Base { }
echo Foo::me(); # returns "Base"
What does that mean?
It means you can never ever ever reflect on the data using static methods.
Ok, so what.
Why not just instantiate the object like so?
Class Base {
function me() {
return get_class($this);
}
}
Class Foo Extends Base { }
$foo = new Foo();
echo $foo->me(); # returns Foo
Well that works in this tiny little demo – but for ActiveRecord it sucks. I can hear you now, “but it works for Cake”. Ok so it does – but it’s wrong. It’s wrong because you are relying on the instantiated object to find a like object. It just doesn’t make any sense. The finder object has all the bits to make it a proper record but it’s also got all the bits to make it a finder object. So let’s say for example:
$person = new Person;
$people = $person->find_all();
$person->name = "Flinn";
The purpose of the person class and object is not so clear. Is it a record or knower of the record?
Last but not least according to Martin Fowler’s definition of ActiveRecord in Patterns of Enterprise Application Architecture , finder code should be static ie. Person::find(1).
So in PHP you CAN do ActiveRecord if you manually define finders but that’s not nearly as elegant as in Ruby and it certainly requires a lot more arm twisting programming.
Symbols & Flexible Arguments
This one is simple, in Ruby you want to specify a symbol :person rather than ‘person’ because it’s fast and painless.
ActiveRecord makes some pretty heavy usage of dynamic argument passing using hashes, eg.
flinn = Person.create(:name => 'Flinn', :nerdy => true)
gina = Person.create(:name => 'Gina', :hotty => true)
nerds = Person.find(:all, :conditions => { :nerdy => true })
This isn’t exactly part of the ActiveRecord pattern, just a niceity that Rails adds. This is a Ruby hack but it’s an awesome one.
So what’s the solution?
I don’t know!
The Row Data Gateway seems to be doable in PHP. The RDG pattern would seem to enable us to add a static finder object to the class that would enable us to handle the meta class programming issues a little easier using a singleton factory implementation.
Here’s a crude example (
Class RDGMeta {
static public $instances = array();
public $sender;
//an example of the kind of meta info we'll store in the class
public $table_name;
public $attributes;
public function __construct($sender) {
$this->sender = $sender;
}
public function find() {
...
Class RDG {
static public $instances = array();
static public function meta($sender) {
if (!array_key_exists($sender, self::$instances)) {
self::$instances[$sender] =& new RDGMeta($sender);
}
return self::$instances[$sender];
}
public function save() {
...
Class Person extends RDG {
static public function meta() {
parent::meta(get_class())->has_many('hobbies');
return parent::meta(get_class());
}
}
Class Hobby extends RDG {
static public function meta() { return parent::meta(get_class()); }
}
$people = Person::meta()->find_all();
$flinn = Person::meta()->find_by_name('flinn');
$flinn->update_attributes(array('is_sexy' => true));
$nerds = Person::meta()->update(array('is_sexy' => false), array('is_nerdy' => true)); # that would set is_sexy to false for every person who is a nerd.
You can see here the clear distinction between the meta object and the record object. Fowler explains the similarities between the ActiveRecord and Row Data Gateway patterns by saying there really isn’t a huge difference just that if you have domain logic in the class, use ActiveRecord.
My interpretation is that in this context the finder object acts like a Table Data Gateway in many ways giving us access to finder methods and table manipulation methods but returning row data gateway objects.
But that doesn’t relieve us of the lack of mix-ins which I think is essential for a proper plugin architecture.
More to come on a later date for the whole subject. It’s more complicated than I want to go into right now.
