kirkbrown.com

My Web Blarg

Howdy

without comments

I’m in the process of making this less like a silly self-marketing site. That’s soooo 2003. As soon as I have the time, I’ll be posting some things that you will probably find to be of little to no interest!

Written by kirk

May 4th, 2009 at 12:48 pm

Posted in Uncategorized

Beginnings of a MySpace Music Scraper

without comments

A while back I had occasion to work with some folks that had a handful of PHP scripts for scraping some basic information off of MySpace Music. Scraping data off other sites is a bit of a grey area, I suppose, but in this case it was being used to create links to MySpace Band players, or pull copies of a bands promo photograph. Hardly a controversial usage, yet MySpace for whatever reason didn’t have a useful API.

These were simple little one-off scripts meant to gather data on bands. I was never really a fan of them, but always too busy to replace them with anything pretty.

Then hpricot came into my life and I finally had a reason to do it. Hpricot is sweet HTML parser that uses a syntax similar to jQuery - really handy for treating a page like a resource. Here is a simple class I created to parse and pull some specific data from a MySpace band page:

require 'rubygems'
require 'hpricot'
require 'open-uri'

class MySpace
  attr_accessor :friend_id, :hmodel

  def initialize(friend_id)
    self.friend_id = friend_id
    self.load
  end

  def load
    @hmodel = Hpricot(open("http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=#{@friend_id}"))
  end

  def flashvars
    for flashvars in (@hmodel/"//param[@name = 'flashvars']")
      if flashvars.attributes["value"] && flashvars.attributes["value"].match("^uid")
        return flashvars.attributes["value"]
      end
    end
    return nil
  end

  def plid
    if flashvars = self.flashvars
      matches = flashvars.match('plid=(\d+)&')
      if matches && matches[1]
        return matches[1]
      end
    end
    return nil
  end

  def artid
    if flashvars = self.flashvars
      matches = flashvars.match('artid=(\d+)&')
      if matches && matches[1]
        return matches[1]
      end
    end
    return nil
  end

  def image
    if image_link = @hmodel.search("a#ctl00_cpMain_ctl01_UserBasicInformation1_hlDefaultImage")
      if image_field = image_link.at("img")
        if image = image_field.attributes["src"]
          return image unless (image == "http://x.myspacecdn.com/images/no_pic.gif")
        end
      end
    end
    return nil
  end

end

The two main things I was trying to get at were the plid and the artid which you can use to create a nice MySpace Player popup. I added another function to pull the image, but that’s as far as I’ve taken it. At some point, it would be simple to add hometown, genre, band name and probably more.

Usage looks something like this (from a Rails app, in the Band model):

def update_myspace_data
  # pass the scraper the band's myspace ID:
  scrape = MySpace.new(self.myspace_id) if self.myspace_id
  if scrape
    self.myspace_plid  = myspace.plid if myspace.plid
    self.myspace_artid = myspace.artid if myspace.artid
  end
end

This makes it a lot easer to update a band with a quick scrape of their page, and if (or probably when) MySpace changes their HTML, it should be fairly simple to locate and modify the corresponding functions in this class instead of hunting down really long regular expressions.

Written by kirk

May 4th, 2009 at 4:26 pm

Posted in Ruby

Tagged with , ,

Simple PHP Template System

without comments

Simple PHP Template System

I found an article from a few years back written by a guy named Brian Lozier that changed my mind on using a templating system in PHP. I’ve gone through quite a few template systems and had settled on Smarty. In fact, I’d been using Smarty for all my projects – it’s a great program.

Then I read this article on Template Engines and realized that Brian was absolutely right. I could understand (sort of) why you might need a complex meta-language like Smarty if you’re trying to literally block your template designers from using PHP. Even then, is it really worth the extra layer? When you stop and think about it on a practical level, it’s adding an extra step that could easily be removed. Cut out the middle man.

I was so enamoured by this idea, that I took the core of his template class and began converting my base code from Smarty to this new ‘Simple PHP Template System’. This is basically the same code Brian was using, with a few tweaks and modifications for my own twisted purposes. I like the caching system, although I cut it out of this first version just to make things as incredibly simple as possible.

Here’s how to get it working. First grab the template class and save it in a directory as lib.templater.php:

<?php
/**
 * Simple PHP Template System
 * This was built on and inspired by an article from Brian Lozier:
 * http://www.massassi.com/php/articles/template_engines/
 *
 * @author Kirk Brown <kirk@kirkbrown.com>
 */

class templater {

  /**
   *  Template directory
   *  @var tplDir
   */
  var $tplDir;

  /**
   *  Template variables
   *  @var vars
   */
  var $vars;  

  /**
   * Constructor function for the templater
   *
   * @param string $tplDir template directory
   */
  function templater($tplDir='') {
    $this->tplDir = $tplDir;
  }

  /**
   * Sets a variable for later template parsing
   *
   * @param string $name name of the variable
   * @param string $value value to replace it with
   */
  function set($name, $value='') {
    if (is_array($name)) {
      foreach ($name as $key => $value) {
        $this->vars[$key] = $value;
      }
    } else {
      $this->vars[$name] = $value;
    }
  }

  /**
   * Parses a php file with all current $vars
   *
   * @param string $file file to parse
   */
  function parse($file) {
    if ($this->vars) {
      extract($this->vars);
    }
    ob_start();
    if (is_file($this->tplDir . $file)) {
      include($this->tplDir . $file);
    } else {
      include(PATH_TPLS . $file);
    }
    $parsed = ob_get_contents();
    ob_end_clean();
    return $parsed;
  }
}
?>

Next pick up the sample template, be amazed how simple it is, and save it in the same directory as sampleTemplate.php:

<h1><?= $page ?></h1>
<table class="dTbl">
  <tr>
  <? foreach ($headers as $header) { ?>
    <th><?= $header ?></th>
  <? } ?>
  </tr>
  <? foreach ($rows as $row) { ?>
  <tr>
    <? foreach ($row as $cell) { ?>
    <td><?= $cell ?></td>
    <? } ?>
  </tr>
  <? } ?>
</table>

Finally, get the page that glues it all together so you can see it in action and save that in the same directory as viewTemplate.php:

<?php
include('lib.templater.php');
$tpl = new templater();
$tpl->set('page', 'This is my Page');
$tpl->set('headers', array('One', 'Two', 'Three', 'Four'));
$tpl->set('rows', array(
  array('abc', 'def', 'ghi', 'jkl'),
  array('mno', 'pqr', 'stu', 'vwx')
  ));
$html = $tpl->parse('sampleTemplate.php');
print $html;
?>

That’s it. If they’re all in a directory together, you should be able to see it in action by running viewTemplate.php. Let me know what you think, I’d love to improve on this without ever making it too bloated.

If you don’t want to cut and paste, you can download the files in a zip.

Written by kirk

August 12th, 2006 at 8:58 pm

Posted in PHP

Tagged with ,