Howdy
I’m in the process of making this less like a silly self-marketing site. That’s soooo 2003. As soon as I have the time, I’ll be posting some things that you will probably find to be of little to no interest!
Beginnings of a MySpace Music Scraper
A while back I had occasion to work with some folks that had a handful of PHP scripts for scraping some basic information off of MySpace Music. Scraping data off other sites is a bit of a grey area, I suppose, but in this case it was being used to create links to MySpace Band players, or pull copies of a bands promo photograph. Hardly a controversial usage, yet MySpace for whatever reason didn’t have a useful API.
These were simple little one-off scripts meant to gather data on bands. I was never really a fan of them, but always too busy to replace them with anything pretty.
Then hpricot came into my life and I finally had a reason to do it. Hpricot is sweet HTML parser that uses a syntax similar to jQuery - really handy for treating a page like a resource. Here is a simple class I created to parse and pull some specific data from a MySpace band page:
require 'rubygems'
require 'hpricot'
require 'open-uri'
class MySpace
attr_accessor :friend_id, :hmodel
def initialize(friend_id)
self.friend_id = friend_id
self.load
end
def load
@hmodel = Hpricot(open("http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=#{@friend_id}"))
end
def flashvars
for flashvars in (@hmodel/"//param[@name = 'flashvars']")
if flashvars.attributes["value"] && flashvars.attributes["value"].match("^uid")
return flashvars.attributes["value"]
end
end
return nil
end
def plid
if flashvars = self.flashvars
matches = flashvars.match('plid=(\d+)&')
if matches && matches[1]
return matches[1]
end
end
return nil
end
def artid
if flashvars = self.flashvars
matches = flashvars.match('artid=(\d+)&')
if matches && matches[1]
return matches[1]
end
end
return nil
end
def image
if image_link = @hmodel.search("a#ctl00_cpMain_ctl01_UserBasicInformation1_hlDefaultImage")
if image_field = image_link.at("img")
if image = image_field.attributes["src"]
return image unless (image == "http://x.myspacecdn.com/images/no_pic.gif")
end
end
end
return nil
end
end
The two main things I was trying to get at were the plid and the artid which you can use to create a nice MySpace Player popup. I added another function to pull the image, but that’s as far as I’ve taken it. At some point, it would be simple to add hometown, genre, band name and probably more.
Usage looks something like this (from a Rails app, in the Band model):
def update_myspace_data
# pass the scraper the band's myspace ID:
scrape = MySpace.new(self.myspace_id) if self.myspace_id
if scrape
self.myspace_plid = myspace.plid if myspace.plid
self.myspace_artid = myspace.artid if myspace.artid
end
end
This makes it a lot easer to update a band with a quick scrape of their page, and if (or probably when) MySpace changes their HTML, it should be fairly simple to locate and modify the corresponding functions in this class instead of hunting down really long regular expressions.
Simple PHP Template System
Simple PHP Template System
I found an article from a few years back written by a guy named Brian Lozier that changed my mind on using a templating system in PHP. I’ve gone through quite a few template systems and had settled on Smarty. In fact, I’d been using Smarty for all my projects – it’s a great program.
Then I read this article on Template Engines and realized that Brian was absolutely right. I could understand (sort of) why you might need a complex meta-language like Smarty if you’re trying to literally block your template designers from using PHP. Even then, is it really worth the extra layer? When you stop and think about it on a practical level, it’s adding an extra step that could easily be removed. Cut out the middle man.
I was so enamoured by this idea, that I took the core of his template class and began converting my base code from Smarty to this new ‘Simple PHP Template System’. This is basically the same code Brian was using, with a few tweaks and modifications for my own twisted purposes. I like the caching system, although I cut it out of this first version just to make things as incredibly simple as possible.
Here’s how to get it working. First grab the template class and save it in a directory as lib.templater.php:
<?php
/**
* Simple PHP Template System
* This was built on and inspired by an article from Brian Lozier:
* http://www.massassi.com/php/articles/template_engines/
*
* @author Kirk Brown <kirk@kirkbrown.com>
*/
class templater {
/**
* Template directory
* @var tplDir
*/
var $tplDir;
/**
* Template variables
* @var vars
*/
var $vars;
/**
* Constructor function for the templater
*
* @param string $tplDir template directory
*/
function templater($tplDir='') {
$this->tplDir = $tplDir;
}
/**
* Sets a variable for later template parsing
*
* @param string $name name of the variable
* @param string $value value to replace it with
*/
function set($name, $value='') {
if (is_array($name)) {
foreach ($name as $key => $value) {
$this->vars[$key] = $value;
}
} else {
$this->vars[$name] = $value;
}
}
/**
* Parses a php file with all current $vars
*
* @param string $file file to parse
*/
function parse($file) {
if ($this->vars) {
extract($this->vars);
}
ob_start();
if (is_file($this->tplDir . $file)) {
include($this->tplDir . $file);
} else {
include(PATH_TPLS . $file);
}
$parsed = ob_get_contents();
ob_end_clean();
return $parsed;
}
}
?>
Next pick up the sample template, be amazed how simple it is, and save it in the same directory as sampleTemplate.php:
<h1><?= $page ?></h1>
<table class="dTbl">
<tr>
<? foreach ($headers as $header) { ?>
<th><?= $header ?></th>
<? } ?>
</tr>
<? foreach ($rows as $row) { ?>
<tr>
<? foreach ($row as $cell) { ?>
<td><?= $cell ?></td>
<? } ?>
</tr>
<? } ?>
</table>
Finally, get the page that glues it all together so you can see it in action and save that in the same directory as viewTemplate.php:
<?php
include('lib.templater.php');
$tpl = new templater();
$tpl->set('page', 'This is my Page');
$tpl->set('headers', array('One', 'Two', 'Three', 'Four'));
$tpl->set('rows', array(
array('abc', 'def', 'ghi', 'jkl'),
array('mno', 'pqr', 'stu', 'vwx')
));
$html = $tpl->parse('sampleTemplate.php');
print $html;
?>
That’s it. If they’re all in a directory together, you should be able to see it in action by running viewTemplate.php. Let me know what you think, I’d love to improve on this without ever making it too bloated.
If you don’t want to cut and paste, you can download the files in a zip.