Allan on Technology: November 2007

Friday, November 23, 2007

XSLT in EPiServer CMS 5

Personally I'm not a big fan of neither XSLT nor XML. In fact, my feelings around XML is expressed in this quote I heard recently: "XML is like children. They start out cute and small, then they grow..." (I don't remember who said it - if it was YOU, mail me and tell me to credit you for those words of wisdom). In my opinion XSLT's are mainly just good for job-security for XSLT developers - they are about as friendly to read as regular expressions - and terrible to maintain. Nevertheless a lot of people like them due to the way they help separate design from data - and I've already been asked the question "how can I work with XSLT in EPiServer" many times. So, now I thought I better do something about it, so one dark and cold evening I made a web control that hopefully will satisfy all the XSLT magicians out there!

The control will create a XML document representing the current page that has this structure:
    <page>
        <properties>
            <property name="PageLink" type="PageReference" isdynamic="False" isnull="False">3</property>
            ...
        </properties>
        <children>
            <page>
                ...
            </page>
        </children>
    </page>
and then transform that XML using the XSLT you provide.

In order to use the control, you'll need to place the dll in the "bin" folder, and register it on the page you wish to use it on. Then you can put it on the page like this:

<research:XSLT runat="server" id="xslt2" MaxChildDepth=1 IncludeDynamic="true"> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="html"/> <xsl:template match="page"> <h1><xsl:value-of select="properties/property[@name='PageName']"/></h1> <ul> <xsl:for-each select="properties/property"> <li><xsl:value-of select="@name"/>: <xsl:value-of select="."/></li> </xsl:for-each> <li> Children: <xsl:apply-templates select="children/page"/> </li> </ul> <br /> </xsl:template> </xsl:stylesheet></research:xslt>

In the above case the XSLT is specified within the controls tag, but you can also reference an external xslt file, by setting the property "TransformationFile" to the url of the XSLT file. Or - if you're feeling mean - you can bind the XSLT contents to a LongString property in EPiServer and let the editors figure out the extended stylesheets - that'll freak them out for sure :-)

By default the control will begin with rendering the XSLT on the current page data - if you want to base it on another page, it can be set up in well-known "PageLink" and "PageLinkProperty" properties. The "IncludeDynamic" property specifies whether to include dynamic properties or not, and the "AutoHtmlDecode" property specifies if html-tags within properties should be rendered as HTML tags or as text on the page.

Find the control on labs.episerver.com

P.S. If you insist on playing around with XSLT and use this control, I wrote a XPathChecker some time ago that might come in handy.

Friday, November 16, 2007

Parallel Computing Made Easy

At the Developer Conference, Øredev, earlier this week I had a chance to see a session about Microsoft's upcoming ParallelFX - the .NET toolkit for working utilizing multiprocessor machines - and it really looks promising!

The basic idea is that it should be a lot easier to utilize multi-processors from code than it is today. Sure, we have System.Threading and it's QueueUserWorkItem, Threads and some Async execution - but they are difficult to use in the most efficient manner. It's not easy to distribute a series of parallel tasks to threads in the way so that they will be distributed among the processors in the most sensible way!

This is where ParallelFX comes in. A part of it are some simple classes - both for Data Distribution (when you distribute the data you need processed so it's processed in parallel) and also to some degree Task Distribution - where you split up your tasks to run parallel wherever it makes sense. The methods that make it all possible are for example the Parallel.For( ... ), Parallel.ForEach(...) and Parallel.Do(...) that all take delegates - or Lambda expressions (essentially the same). By introducing these methods it'll now be a lot easier to take typical tasks like For loops and speed them up a lot!

I can imagine a lot of For loops out there that perform non-interrelated operations (e.g. they are not dependent on each others execution) that will benefit with a small ParallelFX performance refactoring.

Thursday, November 15, 2007

My new favourite Global Search Engine

Whenever people hear about "search engines" they immediatly think Google. A few of the early bloomers might even remember Altavista or Lycos - but they still prefer Google. Sure, they might have seen a challenger or two in their time (Live Search being one) but usually they've tried it a couple of times and then went back to their safe heaven - Google.

I use Google too, for my global searching needs and I love it! Or well....it's at least the best I know - or rather - it was the best I had known. Actually, when I think about it, it's not all that good. Sure, the index is big, and the ranking often makes sense - and it's certainly a nice feature that I can type in 2+2 in the search field and get a result back...But to be honest, once you get your results in Google, you are pretty much on your own - and need to start paging to find what you are looking for in the 13498919384981339849184391 results it found for you (in 0.0o00001 seconds I'm sure). And even the result you're looking for is right in front of you on the first page you might not even know that it's the right result - cause all you got to identify it with is a title and a couple of words showing the immediate context of your search query.

I always excused Google to myself, thinking: Well, being a global search engine I suppose thats how they got to do it - since they can't rely on good and structured data on the web they'll have to go by the lowest common denominator. A good local site search might do better - but only because it can to some degree require the data it indexes to be more structured.

But recently I saw a demo, that made it obvious it doesn't have to be that way. The challenger who's got a brand new approach to global web search is Exalead - a french search engine company. It was at a Panel debate about Enterprise Search in CMForum2007, where Exalead really stood out - both as an enterprise search and as a global search.
Their trick: They automatically retrieve structured data from an unstructured mess.

By having this structured information they are able to help you narrow your search results to the results that are relevant to you. And their search UI is filled with usability enhancing features - take for instance the thumbnails. Showing thumbnailed search results isn't a new idea (in fact I made a demo of it 4 years ago - and many others were doing it before me) - but it's extremly helpful on a global search engine - instantly you get an idea of what kind of page you'll end up at if you click the link.
On top of that they implemented a simple automated clustering (looking a lot like Suffix Tree Clustering to me) - that although it could be better (look at vivisimo guys) still is pretty helpful in suggesting additional search terms for narrowing the results even further.

My only complaint is that the Exalead index doesn't seem to be as complete or up-to-date as google's - I suppose they still have some investments in giant data-centers ahead of them to reach that point - but it's already pretty usefull!

On a side note: Exalead also makes a very nice little desktop search that can be downloaded for free from their site - check it out!

Friday, November 9, 2007

A Simple Page Import Web Service

In EPiServer CMS 5 there's a couple of very useful Web Services that gives you pretty thorough access to do just about anything you please - at least with regards to adding / searching / modifying pages.

However the Web Services can be pretty complex and at times a green newbie like myself needs a quick & dirty way of importing files from another system into EPiServer without doing too much thinking. In fact, I found myself in exactly that situation recently, when I wanted to import a huge amount of test data I had scavenged on the net (19040 pages to be exact). So I ended up writing a new little web service that takes the following parameters:

ParentID - The ID of the page that should be the parent of the page you're adding
PageType - The name of the page type to create
PageXML - A string of XML, defining the page.The tag-names should match the page properties you want to set, and they should all be wrapped in a <page> </page> tag. So something like "<page><pagename>My Page</pagename><bodyField>This is the body</bodyField></page>"
Publish - A boolean parameter that specifies if the page should be published instantly.

The Web method then returns the page-id of the page created, thereby enabling you to build your own page-hierarchy.

In order to make it, I simply created a new standard Web Service in Visual Studio, set the following Using clauses:

using EPiServer;
using EPiServer.DataAccess;
using EPiServer.Core;
using EPiServer.DataAbstraction;
using System.Xml;
using EPiServer.Security;

and added this method to the Web Service class:

[WebMethod]
public int ImportPage(int ParentID, string PageType, string PageXML, bool Publish)
{
    XmlDocument xd = new XmlDocument();
    xd.LoadXml(PageXML);
    PageData pd = DataFactory.Instance.GetDefaultPageData(new PageReference(ParentID), PageType,AccessLevel.NoAccess);
    //Fill in properties
    foreach (XmlNode xn in xd.DocumentElement.ChildNodes)
    {
        if (xn is XmlElement)
        {
            string name = (xn as XmlElement).Name;
            try
            {
                pd[name] = (xn as XmlElement).InnerText;
            }
            catch { }
        }
    }

PageReference pr=DataFactory.Instance.Save(pd, (Publish)? SaveAction.Publish:SaveAction.Save,AccessLevel.NoAccess);
return pr.ID;
}

So, basically it creates a page with default data, iterates through the 1st level nodes and checks if there is a matching page property. This is a really simple example without any proper error handling, and which only supports string properties. The only thing that's worth noting is how I use the AccessLevel.NoAccess in the Save method and the GetDefaultPageData method, to avoid uncomfortable access checks (since the web service typically runs as an anonymous user). However, make sure always to put the service behind access-control (which can be set up in web.config).

And yes, the import of my 19k pages went surprisingly well - and pretty quick too!