kirkbrown.com

My Web Blarg

Beginnings of a MySpace Music Scraper

without comments

A while back I had occasion to work with some folks that had a handful of PHP scripts for scraping some basic information off of MySpace Music. Scraping data off other sites is a bit of a grey area, I suppose, but in this case it was being used to create links to MySpace Band players, or pull copies of a bands promo photograph. Hardly a controversial usage, yet MySpace for whatever reason didn’t have a useful API.

These were simple little one-off scripts meant to gather data on bands. I was never really a fan of them, but always too busy to replace them with anything pretty.

Then hpricot came into my life and I finally had a reason to do it. Hpricot is sweet HTML parser that uses a syntax similar to jQuery – really handy for treating a page like a resource. Here is a simple class I created to parse and pull some specific data from a MySpace band page:

require 'rubygems'
require 'hpricot'
require 'open-uri'

class MySpace
  attr_accessor :friend_id, :hmodel

  def initialize(friend_id)
    self.friend_id = friend_id
    self.load
  end

  def load
    @hmodel = Hpricot(open("http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=#{@friend_id}"))
  end

  def flashvars
    for flashvars in (@hmodel/"//param[@name = 'flashvars']")
      if flashvars.attributes["value"] && flashvars.attributes["value"].match("^uid")
        return flashvars.attributes["value"]
      end
    end
    return nil
  end

  def plid
    if flashvars = self.flashvars
      matches = flashvars.match('plid=(\d+)&')
      if matches && matches[1]
        return matches[1]
      end
    end
    return nil
  end

  def artid
    if flashvars = self.flashvars
      matches = flashvars.match('artid=(\d+)&')
      if matches && matches[1]
        return matches[1]
      end
    end
    return nil
  end

  def image
    if image_link = @hmodel.search("a#ctl00_cpMain_ctl01_UserBasicInformation1_hlDefaultImage")
      if image_field = image_link.at("img")
        if image = image_field.attributes["src"]
          return image unless (image == "http://x.myspacecdn.com/images/no_pic.gif")
        end
      end
    end
    return nil
  end

end

The two main things I was trying to get at were the plid and the artid which you can use to create a nice MySpace Player popup. I added another function to pull the image, but that’s as far as I’ve taken it. At some point, it would be simple to add hometown, genre, band name and probably more.

Usage looks something like this (from a Rails app, in the Band model):

def update_myspace_data
  # pass the scraper the band's myspace ID:
  scrape = MySpace.new(self.myspace_id) if self.myspace_id
  if scrape
    self.myspace_plid  = myspace.plid if myspace.plid
    self.myspace_artid = myspace.artid if myspace.artid
  end
end

This makes it a lot easer to update a band with a quick scrape of their page, and if (or probably when) MySpace changes their HTML, it should be fairly simple to locate and modify the corresponding functions in this class instead of hunting down really long regular expressions.

Written by kirk

May 4th, 2009 at 4:26 pm

Posted in Ruby

Tagged with , ,

Simple PHP Template System

without comments

Simple PHP Template System

I found an article from a few years back written by a guy named Brian Lozier that changed my mind on using a templating system in PHP. I’ve gone through quite a few template systems and had settled on Smarty. In fact, I’d been using Smarty for all my projects – it’s a great program.

Then I read this article on Template Engines and realized that Brian was absolutely right. I could understand (sort of) why you might need a complex meta-language like Smarty if you’re trying to literally block your template designers from using PHP. Even then, is it really worth the extra layer? When you stop and think about it on a practical level, it’s adding an extra step that could easily be removed. Cut out the middle man.

I was so enamoured by this idea, that I took the core of his template class and began converting my base code from Smarty to this new ‘Simple PHP Template System’. This is basically the same code Brian was using, with a few tweaks and modifications for my own twisted purposes. I like the caching system, although I cut it out of this first version just to make things as incredibly simple as possible.

Here’s how to get it working. First grab the template class and save it in a directory as lib.templater.php:

<?php
/**
 * Simple PHP Template System
 * This was built on and inspired by an article from Brian Lozier:
 * http://www.massassi.com/php/articles/template_engines/
 *
 * @author Kirk Brown <kirk@kirkbrown.com>
 */

class templater {

  /**
   *  Template directory
   *  @var tplDir
   */
  var $tplDir;

  /**
   *  Template variables
   *  @var vars
   */
  var $vars;  

  /**
   * Constructor function for the templater
   *
   * @param string $tplDir template directory
   */
  function templater($tplDir='') {
    $this->tplDir = $tplDir;
  }

  /**
   * Sets a variable for later template parsing
   *
   * @param string $name name of the variable
   * @param string $value value to replace it with
   */
  function set($name, $value='') {
    if (is_array($name)) {
      foreach ($name as $key => $value) {
        $this->vars[$key] = $value;
      }
    } else {
      $this->vars[$name] = $value;
    }
  }

  /**
   * Parses a php file with all current $vars
   *
   * @param string $file file to parse
   */
  function parse($file) {
    if ($this->vars) {
      extract($this->vars);
    }
    ob_start();
    if (is_file($this->tplDir . $file)) {
      include($this->tplDir . $file);
    } else {
      include(PATH_TPLS . $file);
    }
    $parsed = ob_get_contents();
    ob_end_clean();
    return $parsed;
  }
}
?>

Next pick up the sample template, be amazed how simple it is, and save it in the same directory as sampleTemplate.php:

<h1><?= $page ?></h1>
<table class="dTbl">
  <tr>
  <? foreach ($headers as $header) { ?>
    <th><?= $header ?></th>
  <? } ?>
  </tr>
  <? foreach ($rows as $row) { ?>
  <tr>
    <? foreach ($row as $cell) { ?>
    <td><?= $cell ?></td>
    <? } ?>
  </tr>
  <? } ?>
</table>

Finally, get the page that glues it all together so you can see it in action and save that in the same directory as viewTemplate.php:

<?php
include('lib.templater.php');
$tpl = new templater();
$tpl->set('page', 'This is my Page');
$tpl->set('headers', array('One', 'Two', 'Three', 'Four'));
$tpl->set('rows', array(
  array('abc', 'def', 'ghi', 'jkl'),
  array('mno', 'pqr', 'stu', 'vwx')
  ));
$html = $tpl->parse('sampleTemplate.php');
print $html;
?>

That’s it. If they’re all in a directory together, you should be able to see it in action by running viewTemplate.php. Let me know what you think, I’d love to improve on this without ever making it too bloated.

If you don’t want to cut and paste, you can download the files in a zip.

Written by kirk

August 12th, 2006 at 8:58 pm

Posted in PHP

Tagged with ,

Random Image Rotation

without comments

Here’s a quick routine I used a while back to pull a set of images from a directory, and then randomly rotate through them on page load. I thought it was an interesting way to have a changing front page marquee without the annoyance of flash or javascript special effects:

<?php

$imageDirectory = '/path/to/images/';
$imageURL = '/url/to/images/';
$randomImage = $imageURL . randomImage($imageDirectory);

function randomImage($dir) {
  $files = array();
  if (is_dir($dir)) {
    if ($dh = opendir($dir)) {
      while (false !== ($file = readdir($dh))) {
        if ($file != "." && $file != "..") {
          $files[] = $file;
        }
      }
      closedir($dh);
    }
  }
  $totalImages = count($files);
  $maxImages = $totalImages - 1;
  $randomImage = rand(0, $maxImages);
  return $files[$randomImage];
}
?>

<img src="<?= $randomImage ?>" />

I start by creating a directory for storing the images. Since we’re going to be reading the directory contents, there’s no need to worry about any particular naming order or convention other than the ones required by your operating system. If you want to add more images to the rotation later, just place them in this base directory and they will be included immediately. Eliminating hard-coded arrays or lists of images makes it much easier to control this script in the future.

The first thing I do in the script is set up the server path and the URL to my image directory. The path is the physical location on the disk, while the URL is the address you’d type into your browser to view them. These will vary depending on your particular server setup.

$imageDirectory = '/path/to/images/';
$imageURL = '/url/to/images/';

Next I call the random image function with the server path, and store the URL for a random image in the $randomImage variable.

$randomImage = $imageURL . randomImage($imageDirectory);

You can see at the very bottom of the program, I use this variable to output the URL in the src tag of an image. Inside the function, I create an empty array for the results, test to see if the $dir variable is an actual directory, and open it if it is.

$files = array();
if (is_dir($dir)) {
  if ($dh = opendir($dir))

$dh is short for ‘directory handle’, which is the pointer or ‘handle’ to the actual disk resource. The next step is to start up a while loop to read in the contents of the directory.

while (false !== ($file = readdir($dh)))

This will kill the loop when there are no more files to be read and the readdir function gives a false return. There are shorter ways to write this particular statement, but the PHP site recommends this specific usage as the correct one. In this case, the loop will not exit unless the return value of the readdir function is identical to false (meaning equal to and of the same type). For more information, take a look at the not identical operator on the PHP web site.

Next comes one last bit of housecleaning as I strike out any entries that are equal to ’.’ or ’..’. If it’s passed all these tests, it must be one of our images, so I push the filename into the $files array.

if ($file != "." && $file != "..") {
  $files[] = $file;

Finally, there’s a tiny bit of basic math to choose the random image. $totalImages gets the total number of images (or files) in the specified directory by using the count() function. Since the array starts at 0, rather than 1, I need to set an upper limit that is one less than the actual total in the $maxImages variable.

We let the rand() function give us a random number between 0 and $maxImages, and return the random array element that corresponds.

$totalImages = count($files);
$maxImages = $totalImages - 1;
$randomImage = rand(0, $maxImages);
return $files[$randomImage];

And that’s that.

Written by kirk

June 14th, 2006 at 9:22 pm

Posted in PHP

Tagged with