Archive for July 2009

Programming with XML in Firefox Add-ons

The eHow Earnings Tracker stores its data in XML format, so one of the things that I needed to learn how to do was program with XML in Firefox Add-ons.  Mozilla has a whole section of code snippets dedicated to XML but I will walk through exactly what I did in the development of the tracker and point out some gotchas that I figured out along the way.

The first step in this process was to figure out how to create an XML DOM tree in memory.  This turned out to be pretty straightforward to do thanks to Mozilla’s How to create a DOM tree documentation.  I wound up with the following code to create my basic XML DOM hierarchy:

var createEarningsDOM = function()
{
    m_trackerDOM = document.implementation.createDocument(null, null, null);

    var trackerNode = m_trackerDOM.createElement(m_xmlNodeTracker);
    trackerNode.setAttribute(m_xmlAttrVersion, m_xmlVersion);

    m_articleListElem = m_trackerDOM.createElement(m_xmlNodeArticleList);
    trackerNode.appendChild(m_articleListElem);

    m_statsElem = m_trackerDOM.createElement(m_xmlNodeStats);
    trackerNode.appendChild(m_statsElem);

    m_trackerDOM.appendChild(trackerNode);
};

You should notice a couple of things about the source code above.  First, I create variables to hold my node and attribute names.  This ensures consistency when I refer to particular nodes and attributes throughout my code.  It also means that if I decide to change the string for a node or attribute name, I only need to do it in one spot.  The second thing is that I create variables to hold references to DOM nodes so that I can easily manipulate them later in the code.

Now that I was able to create an XML DOM tree in memory, it was time to figure out how to write it out to a file.  Mozilla has some decent documentation on parsing and serializing XML.  I combined the information there with what I learned about file I/O and wrote the following code to serialize my XML DOM to a file:

var s = new XMLSerializer();
XML.prettyIndent = 4;
var xmlString = XML(s.serializeToString(m_trackerDOM)).toXMLString();

if(!FileIO.write(m_earningsFile, xmlString))
{
    throw Error("Failed to save file " + m_earningsFile.path);
}

The “pretty” serialization for XML strings is really cool because it formats the XML so that it’s very readable.  It makes life a lot easier when you need to look at the contents the XML file.

Now that I was able to write out XML to a file, I needed a way to read it back in again later.  I used the DOMParser.  The one thing to watch out for with the DOMParser is that when the parsing fails, it doesn’t throw an exception.  Instead, it returns an XML document that contains the parsing error.

This is the code I came up with for loading the XML DOM from a file:

var fileContents = FileIO.read(currentFile);

if(fileContents)
{
    logMessage("\tParsing file");
    var parser = new DOMParser();
    m_trackerDOM = parser.parseFromString(fileContents, "text/xml");

    logMessage("\tLooking for " + m_xmlNodeArticleList);
    m_articleListElem = m_trackerDOM.getElementsByTagName(m_xmlNodeArticleList)[0];

    if(!m_articleListElem)
    {
        // XML parsing failed, grab error
        var s = new XMLSerializer();
        XML.prettyIndent = 4;
        logMessage(XML(s.serializeToString(m_trackerDOM)).toXMLString());                    
        throw Error("Failed to parse XML data file");
    }
}

In this piece of code, I read in the file contents and parse it, attempting to find the DOM element that I’m looking for.  If I don’t find the element then something has gone wrong – either the parsing failed, or the XML is not what I’m expecting it to be.  In either case, I log the results of the parseFromString() function to my log file so that I know what went wrong.

How to Host WPF Content in MFC Applications

This is something that I figured out a while back but wanted write about it here since I spent a few hours piecing together the information.

There is an MSDN Walkthrough that gets you most of the way there, but there are a couple of key pieces that I found elsewhere. For example, the walkthrough tells you to place the line [System::STAThreadAttribute] before the _tWinMain() definition but if you’re implementing a standard MFC application then you don’t have _tWinMain() in your source code.

Step 1: Configure the MFC application to compile with CLR support

The best way to achieve interoperability between native C++ and managed .NET code is to compile the application as managed C++ rather than native C++. This is done by going to the Configuration Properties of the project. Under General there is an option “Common Language Runtime support”. Set this to “Common Language Runtime Support /clr”.

Step 2: Add the WPF assemblies to the project

Right-click on the project in the Solution Explorer and choose “References”. Click “Add New Reference”. Under the .NET tab, add WindowsBase, PresentationCore, PresentationFramework, and System. Make sure you Rebuild All after adding any references in order for them to get picked up.

Step 3: Set STAThreadAttribute on the MFC application

WPF requires that STAThreadAttribute be set on the main UI thread. Set this by going to Configuration Properties of the project. Under Linker->Advanced there is an option called “CLR Thread Attribute”. Set this to “STA threading attribute”.

Step 4: Create an instance of HwndSource to wrap the WPF component

System::Windows::Interop::HwndSource is a .NET class that handles the interaction between MFC and .NET components. Create one using the following syntax:

System::Windows::Interop::HwndSourceParameters^ sourceParams = gcnew     System::Windows::Interop::HwndSourceParameters("MyWindowName");
sourceParams->PositionX = x;
sourceParams->PositionY = y;
sourceParams->ParentWindow = System::IntPtr(hWndParent);
sourceParams->WindowStyle = WS_VISIBLE | WS_CHILD;

System::Windows::Interop::HwndSource^ source = gcnew System::Windows::Interop::HwndSource(*sourceParams);
source->SizeToContent = System::Windows::SizeToContent::WidthAndHeight;

Add an HWND member variable to the dialog class and then assign it like this: m_hWnd = (HWND) source->Handle.ToPointer();  The source object and the associated WPF content will remain in existence until you call ::DestroyWindow(m_hWnd).

Step 5: Add the WPF control to the HwndSource wrapper


</strong>System::Windows::Controls::WebBrowser^ browser = gcnew System::Windows::Controls::WebBrowser();

browser->Height = height;
browser->Width = width;
source->RootVisual = browser;<strong>

Step 6: Keep a reference to the WPF object

Since the browser variable will go out of scope after we exit the function doing the creation, we need to somehow hold a reference to it. Managed objects cannot be members of unmanaged objects but you can use a wrapper template called gcroot to get the job done.

Add a member variable to the dialog class:


<span>#include <vcclr.h></span>
<pre><code><span>gcroot</span><span><</span><span>System</span><span>::</span><span>Windows</span><span>::</span><span>Controls</span><span>::</span><span>WebBrowser</span><span>^></span><span> m_webBrowser</span><span>;</span><span> </span></code>

Then add the following line to the code in Step 5:

<code><span>m_webBrowser </span><span>=</span><span> browser</span><span>;</span><span> </span></code>

Now you can access properties and methods on the WPF component through m_webBrowser.

File I/O with Firefox Add-ons

One of the major pieces of the eHow Earnings Tracker involves writing data to an XML file and reading it back later.  Figuring out how to do file I/O in a Firefox add-on was not straightforward due to Mozilla’s wide array of Chrome APIs and spotty documentation.

Unfortunately, file I/O is not one of Mozilla’s new FUEL APIs.  FUEL is a Javascript Library available to Firefox add-ons that is far easier to use than the older XPCOM API.  It was introduced in Firefox 3.0 and Mozilla has been slowly adding functionality to it.

Starting with Mozilla’s file I/O code snippets was a quite confusing – the documentation meanders all over the place showing bits and pieces of code without explaining anything in much depth.  On top of that, they suggest that you use the io.js wrappers at the beginning of the page but then none of the examples shown use it.

Basically what it comes down to is that file I/O is done by using the XPCOM interfaces nsIFile and nsILocalFile.  The io.js wrappers are utility functions used to encapsulate the tedious syntax needed to use them.  I don’t fully understand the purpose of everything in io.js but I will show you what I did figure out and ultimately use in the tracker implementation.

The first thing that I needed to do was get a hold of the current Firefox profile directory since that’s where I wanted to store my data.  I chose the current profile directory because I wanted to be able to support multiple Firefox profiles using the tracker without overwriting each other’s data.

I wound up with the following function to do it:

var getProfileDir = function()
{
    var dir = DirIO.get('ProfD');
    if(!dir || !dir.exists())
    {
        throw Error("Failed to open profile directory");
    }

    return dir;
};

What this code does is create an nsIFile object that represents the profile directory.  ‘ProfD’ is a special string that refers to the profile directory.  You can see a list of the supported strings in the file I/O code snippets.  If you want to open a specific directory path, use DirIO.open() instead of get().

Once you have an nsIFile object, you can do a bunch of things with it.  The following code snippet will try to open a file in the profile directory and create it if it doesn’t already exist:

var openEarningsBackupFile = function(username)
{
    var currentFile = getProfileDir();
    currentFile.append(username + ".xml");

    if(!currentFile.exists())
    {
        if(!FileIO.create(currentFile))
        {
            throw Error("Failed to create earnings backup file");
        }
    }
    return currentFile;
}

The append function of the nsIFile object lets you navigate a file hierarchy one level at a time.  You could go down multiple levels like so:

currentFile.append("path1");
currentFile.append("path2");
currentFile.append("file.xml");

This would navigate to [currentFile.path]\path1\path2\file.xml.  The create() function will create any path segments that don’t already exist.

Keep in mind that the append function modifies the calling object so if you want to open up two different files in a particular directory then the easiest way to do it is to create an nsIFile object that points to the directory and then clone it:

var file1 = getProfileDir();
var file2 = file1.clone();
file1.append("file1.xml");
file2.append("file2.xml");

Once you have an nsIFile object pointed to an existing file you can read from and write to it.  The following copy function shows how to do reads and writes with nsIFile objects using the io.js wrappers:

// srcFile and destFile should be nsIFile objects
var copyFile = function(srcFile, destFile)
{
    var srcText = FileIO.read(srcFile);

    if(!srcText)
    {
        throw Error("Failed to read " + srcFile.path);
    }

    if(!FileIO.write(destFile, srcText))
    {
       throw Error("Failed to copy " + srcFile.path + " to " + destFile.path);
    }
 };

Another not-so-obvious thing to figure out was how to get the path where the add-on is installed.  After quite a bit of searching, I came up with this bit of code:

var getExtensionDir = function()
{
    var dir = Components.classes["@mozilla.org/extensions/manager;1"].getService(Components.interfaces.nsIExtensionManager).getInstallLocation(m_extensionId).getItemLocation(m_extensionId);

    if(!dir)
    {
        throw Error("Failed to get extension installation directory");
    }

    return dir;
};

File I/O in Firefox add-ons is pretty easy to do once you see how all of the pieces fit together.


				

eHow Earnings Tracker Version 0.4.0 Available

Yet another version of the eHow Earnings Tracker has been uploaded to to the Firefox Addons site. Head over to the downloads page to get it.

Changes since 0.3.0:

*==> Version 0.4.0 (July 13, 2009)

* Added “Totals” and “Change from previous day” rows for earnings and views

*==> Version 0.3.1 (July 13, 2009)

* Fixed problem with URL parsing when users had dashes(-) in their name.

eHow Earnings Tracker Version 0.3.0 Available

Another version of the eHow Earnings Tracker has been uploaded to to the Firefox Addons site. Head over to the downloads page to get it.

Changes since Version 0.2.0:

*==> Version 0.3.0 (July 13, 2009)

* Added “Copy Earnings Data to Desktop” menu option
* Added Views to HTML report

*==> Version 0.2.3 (July 12, 2009)

* Additional logging
* Fixed issue with invalid XML characters getting stored as part of titles
* Added “Delete Earnings Data” menu option

*==> Version 0.2.2 (July 12, 2009)

* Fixed bug related to addition/removal of articles from the article library

*==> Version 0.2.1 (July 12, 2009)

* Added more error logging
* A backup earnings.xml file is now created on a successful update

eHow Earnings Tracker Version 0.2.0 Available

I just uploaded the latest version of the eHow Earnings Tracker to the Firefox Addons site. Head over to the downloads page to get it.

Changes since 0.1.1:

*==> Version 0.2.0 (July 12, 2009)

* eHow earnings report now opens in its own tab
* Added styling to $0.00 value fields so that they are italics with gray font
* Added eHow graphics & color scheme
* Added “View Last Update” which views the last HTML stats page rather than regenerating it from eHow article earnings pages
* Added error logging and menu item to copy log file to desktop

*==> Version 0.1.2 (July 9, 2009)

* Added CSS styling to HTML report

Once development work settles down I’ll write some more posts about the tracker and how to write Firefox Addons.

eHow Earnings Tracker : Initial Release (Version 0.1.1)

So, it’s done – the initial release of the eHow Earnings Tracker is available from the Mozilla Addons site.  Head on over to my downloads page for more details on obtaining the addon.  The tracker is currently classified as “experimental” which is just Mozilla’s way of saying that they haven’t reviewed it and placed their stamp of approval on it.  Apparently there are over 400 addons in the review queue so that won’t be happening any time soon :) .

I must say that overall, it was a very smooth process to get my addon hosted by Mozilla.  I just had to create an account, fill out a few fields, and upload the addon.  The hardest part was deciding which software license I wanted to release my code  under.  The legalese was overwhelming and I just picked the simplified BSD license.  I have no idea if it was the right choice or not.

The tracker is currently pretty barebones, but it is functional from the limited testing I did with it.  A few fellow eHow writers (Carol McKenzie, Dianne Cass, and knyc) were kind enough to volunteer their services in testing it out for a few days before I start broadcasting its availability to the eHow community.

What it Does

As I originally mentioned, eHow gives writers very limited earnings data.  Currently there is no way to writers to figure out which articles get income on a day to day basis without doing a whole lot of manual work.  The tracker collects data each time that it is run and keeps a history so that writers can see earnings increase over time on a per article basis.  Since earnings updates only occur at most once per day, the tracker keeps data on a per day basis.  This means that if the tracker is run multiple times during a single day, it only keeps the last set of data for that day.

Using the tracker is very easy.  Upon installation, it adds submenus to both the main Tools menu and the context menu.  The user simply needs to navigate to one of their article earnings pages and select the menu option “Update Earnings”.  The tracker will then read all of the earnings pages and generate an HTML report.  The HTML report is currently very barebones and ugly but the plan is to fix that up.

tracker context menueHow Earnings Tracker context menu

How it Works

The tracker works by first examining the URL of the web page that’s currently in the browser.  eHow has very specific URL formats for the earnings pages so the tracker can figure out whether or not it can be run.

Once it is run, the tracker examines the HTML of the page and extracts data based on HTML elements and CSS class names.  It can also figure out how to navigate to the other earnings pages by using the “Next” anchor link at the bottom of the page and gathers information off each page as it loads.

ehow earnings html structureHTML structure of the eHow article earnings pages

The tracker stores its data in an XML file in the Firefox profiles directory.  If you’re curious, you can track it down by looking in the same directory as the URL that appears in the HTML report after the tracker is run.  The file is called eHowEarningsTracker_17of26_earnings_[ehow user name].xml.

Caveats

The tracker requires Firefox 3.5 because I needed to use some functionality that only became available in 3.5.

The tracker can only collect data when it is explicitly run.  Since it can only read what’s in the rendered HTML pages, it can’t know anything that the user doesn’t see.  This means that if the tracker will never have data for days that it wasn’t run.

This tool is completely tied to eHow’s HTML code.  If eHow does anything to change the HTML layout of their earnings pages then the tracker will not be able to successfully read any more data until I update it to handle the new HTML layout.  However, all of the past collected data would still be there.

I have no idea how this thing is going to handle large amounts of data.  Some people have hundreds of articles and tracking earnings data over weeks and months is going to get big.

Upcoming Changes

There’s really only two things I need to work on at this point.  The first one is to do some serious work on the HTML report.  It needs some nice styling and I really want to add some functionality to it (such as being able to sort by column).

The other thing that I want to do is to provide a way to export/backup the data in CSV or XML format.

eHow Earnings Tracker : A New Direction

One of the major challenges with doing software development is that you need to be constantly evaluating the direction that you’re heading in with both the code and the end product.  You can’t just decide to do something and then put your head down and do it.  The landscape changes as you develop a product – you learn better ways to do things, roadblocks come up, and your vision of the end product (and code) becomes more refined and  clear.  If you don’t pay attention to these things as you go along then inevitably you’ll wind up miles away from where you should be in the end.

On the other hand, you can’t be changing direction with the wind either.  This is a trap that many programmers fall into.  If you don’t have a solid starting vision for the product or you constantly change the requirements for the product or how it is implemented then you’ll wind up with a big mess or, even worse, the product will never get finished.

The balancing act of changing things that need to be changed and not changing things too often is one of the things that makes software development as much like an art or craft as an engineering or science.  Complicating things further is the fact that what’s best to do for the product often conflicts with what’s easiest to do in the code.

I was working on the eHow Earnings Tracker last night when I had one of those “aha!” moments where I knew that I needed to change the direction I was heading in.

My original vision for the tracker was that it would automatically detect eHow earnings pages as the user browsed the web.  When an earnings page was detected, the tracker would read the current page of earnings, save them to a file, and compare them to the previous update.  The user would be able to see which articles had increases in earnings from update to update and they would have a history of earnings that could be exported.  This  seemed like it would be fairly straightforward to implement and would be a low friction solution for the user.

As I worked on the code, a number of implementation difficulties arose and it became apparent that there would be some usability issues with the end product.

Implementation difficulties (code):

  1. Firefox can have multiple tabs open at once which means that, in theory, multiple earnings pages could be loading at the same time.  Writing code to handle this sort of scenario is very difficult in a Javascript based environment.
  2. Due to the fact that the code is only parsing one page of articles at a time and the article order is not guaranteed, the code can’t just blindly read & write the entire contents of the current earnings file stored on disk.  Each time an earnings page is viewed, the file needs to be read in and then its data needs to be compared to the current page of earnings to see whether article entries need to be added or modified.
  3. The code would be executing every time an earnings page was viewed, even if it didn’t have to.

Usability issues:

  1. The user would have to manually click through each page of articles (which can get tedious since articles are only displayed 10 at a time).
  2. Since the code was only seeing one page of articles at a time, it would have no way to inform the users of any articles that were added or removed since the last update.

My “aha!” moment was in realizing that the better way to do this would be give the user an “Update Earnings” button that could be activated whenever they were on their earnings page.  That button will trigger the tool to automatically navigate through all of the users’ earnings pages collecting information.  At the end of the process, an HTML page can be displayed showing a summary of all the earnings data.  This fixes all of the implementation and usability issues in one shot.  It’s a rare case where I can simplify the code and improve the product at the same time.

How to Track eHow


Since I now have a Firefox plugin that will properly update the browser’s status bar with the URL of the web page being viewed, it’s time to write code that will specifically detect an eHow earnings page.  For those of you following along at home, take the code from the previous post and add the following function just after the updateStatusBar function :

var update = function()
{
    var host = Application.activeWindow.activeTab.uri.host;
    var path = Application.activeWindow.activeTab.uri.path;

    m_eHowMsg = ""

    if(host.search(/www.ehow.com/i) == 0)     
    {         
        // eHow earnings URLs are in the form "http://www.ehow.com/members/user-p#-articles.html" or  "http://www.ehow.com/members/user-articles.html"         
        var userAndPageNumPattern = new RegExp("\/members\/\(.+)-(.+)-articles.html");         
        var userAndPageNumResult = userAndPageNumPattern.exec(path);          
        var userPattern = new RegExp("\/members\/\(.+)-articles.html");         
        var userResult = userPattern.exec(path);          

        if(userAndPageNumResult && (userAndPageNumResult.length < 1))         
        {             
            m_eHowMsg = userAndPageNumResult[1];         
        }
        else if(userResult && (userResult.length &lt; 1))         
        {             
            m_eHowMsg = userResult[1];         
        }     
    }      
    updateStatusBar(m_eHowMsg);
}

This function looks for URLs that match eHow earnings pages and then updates the status bar with the eHow user name that it finds.  To hook this function up, just replace the calls to updateStatusBar in onTabSelect() and onTabLoad() with calls to update().

Now when we navigate to a URL that contains eHow article earnings data we see the username in the status bar (the status bar will be blank for other URLs):

ehow username in statusbar

A Wrench in the Works – Multiple Tabs in Firefox

The code from my previous post worked great with a single tab open in Firefox.  However, using multiple tabs exposed a problem – all tabs in a single Firefox window share the same status bar.  Another issue is that the event “DOMContentLoaded” gets fired any time a page finishes loading, regardless of what tab you’re on.  This means that the information in the status bar may or may not actually match the tab that you’re on.

I also found some better techniques for placing my code in a namespace/module that are described here and here.  It’s also worth noting that Mozilla’s Developer Center has documentation for their Javascript APIs and objects.  Unfortunately, the documentation usually isn’t very good and I wind up spending an awful lot of time looking for simple pieces of information.

After a bit of work, I came up with the code below to do what I wanted.  This code only works for Firefox 3.5 because that’s when event data was added for the TabOpen, TabClose, and TabSelect events.  In the code below, event.data is a BrowserTab object.

// Create a global variable for the addon
var g17of26 = {};

g17of26.eHowEarningsTracker = function ()
{
    // Private functions
    var updateStatusBar = function(msg)
    {
        var statusBar = window.document.getElementById("ehow_earnings_tracker_statusbar");

        if(statusBar)
        {
            statusBar.setAttribute("label", msg);
        }
    }

    return {  // This brace must stay on the same line as the return - it's a Javascript quirk

        // This function is called when the browser is first launched
        onBrowserLoad : function()
        {
            Application.activeWindow.events.addListener("TabOpen", g17of26.eHowEarningsTracker.onTabOpen);    
            Application.activeWindow.events.addListener("TabClose", g17of26.eHowEarningsTracker.onTabClose);
            Application.activeWindow.events.addListener("TabSelect", g17of26.eHowEarningsTracker.onTabSelect);

            Application.activeWindow.activeTab.events.addListener("load", g17of26.eHowEarningsTracker.onTabLoad);
        },

        // This function is called when the browser quits
        onBrowserUnload : function()
        {
            window.removeEventListener("load", g17of26.eHowEarningsTracker.browserLoad, false);
            window.removeEventListener("unload", g17of26.eHowEarningsTracker.browserUnload, false);        

            Application.activeWindow.events.removeListener("TabOpen", g17of26.eHowEarningsTracker.onTabOpen);
            Application.activeWindow.events.removeListener("TabClose", g17of26.eHowEarningsTracker.onTabClose);
            Application.activeWindow.events.removeListener("TabSelect", g17of26.eHowEarningsTracker.onTabSelect);
        },

        // This function is called when a tab is opened
        onTabOpen : function(event)
        {
            event.data.events.addListener("load", g17of26.eHowEarningsTracker.onTabLoad);
        },

        // This function is called when a tab is closed
        onTabClose : function(event)
        {
            event.data.events.removeListener("load", g17of26.eHowEarningsTracker.onTabLoad);
        },

        // This function is called when a tab is selected
        onTabSelect : function(event)
        {
            updateStatusBar(Application.activeWindow.activeTab.uri.spec);
        },

        // This function is called when a page finishes loading in a tab
        onTabLoad : function(event)
        {
            if(event.data.uri.spec == Application.activeWindow.activeTab.uri.spec)
            {
                updateStatusBar(event.data.uri.spec);
            }
        }
    };
}(); // the parens here cause the anonymous function to execute and return

// Hook up events
window.addEventListener("load", g17of26.eHowEarningsTracker.onBrowserLoad, false);
window.addEventListener("unload", g17of26.eHowEarningsTracker.onBrowserUnload, false);