Allan on Technology: 2007

Friday, November 23, 2007

XSLT in EPiServer CMS 5

Personally I'm not a big fan of neither XSLT nor XML. In fact, my feelings around XML is expressed in this quote I heard recently: "XML is like children. They start out cute and small, then they grow..." (I don't remember who said it - if it was YOU, mail me and tell me to credit you for those words of wisdom). In my opinion XSLT's are mainly just good for job-security for XSLT developers - they are about as friendly to read as regular expressions - and terrible to maintain. Nevertheless a lot of people like them due to the way they help separate design from data - and I've already been asked the question "how can I work with XSLT in EPiServer" many times. So, now I thought I better do something about it, so one dark and cold evening I made a web control that hopefully will satisfy all the XSLT magicians out there!

The control will create a XML document representing the current page that has this structure:
    <page>
        <properties>
            <property name="PageLink" type="PageReference" isdynamic="False" isnull="False">3</property>
            ...
        </properties>
        <children>
            <page>
                ...
            </page>
        </children>
    </page>
and then transform that XML using the XSLT you provide.

In order to use the control, you'll need to place the dll in the "bin" folder, and register it on the page you wish to use it on. Then you can put it on the page like this:

<research:XSLT runat="server" id="xslt2" MaxChildDepth=1 IncludeDynamic="true"> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="html"/> <xsl:template match="page"> <h1><xsl:value-of select="properties/property[@name='PageName']"/></h1> <ul> <xsl:for-each select="properties/property"> <li><xsl:value-of select="@name"/>: <xsl:value-of select="."/></li> </xsl:for-each> <li> Children: <xsl:apply-templates select="children/page"/> </li> </ul> <br /> </xsl:template> </xsl:stylesheet></research:xslt>

In the above case the XSLT is specified within the controls tag, but you can also reference an external xslt file, by setting the property "TransformationFile" to the url of the XSLT file. Or - if you're feeling mean - you can bind the XSLT contents to a LongString property in EPiServer and let the editors figure out the extended stylesheets - that'll freak them out for sure :-)

By default the control will begin with rendering the XSLT on the current page data - if you want to base it on another page, it can be set up in well-known "PageLink" and "PageLinkProperty" properties. The "IncludeDynamic" property specifies whether to include dynamic properties or not, and the "AutoHtmlDecode" property specifies if html-tags within properties should be rendered as HTML tags or as text on the page.

Find the control on labs.episerver.com

P.S. If you insist on playing around with XSLT and use this control, I wrote a XPathChecker some time ago that might come in handy.

Friday, November 16, 2007

Parallel Computing Made Easy

At the Developer Conference, Øredev, earlier this week I had a chance to see a session about Microsoft's upcoming ParallelFX - the .NET toolkit for working utilizing multiprocessor machines - and it really looks promising!

The basic idea is that it should be a lot easier to utilize multi-processors from code than it is today. Sure, we have System.Threading and it's QueueUserWorkItem, Threads and some Async execution - but they are difficult to use in the most efficient manner. It's not easy to distribute a series of parallel tasks to threads in the way so that they will be distributed among the processors in the most sensible way!

This is where ParallelFX comes in. A part of it are some simple classes - both for Data Distribution (when you distribute the data you need processed so it's processed in parallel) and also to some degree Task Distribution - where you split up your tasks to run parallel wherever it makes sense. The methods that make it all possible are for example the Parallel.For( ... ), Parallel.ForEach(...) and Parallel.Do(...) that all take delegates - or Lambda expressions (essentially the same). By introducing these methods it'll now be a lot easier to take typical tasks like For loops and speed them up a lot!

I can imagine a lot of For loops out there that perform non-interrelated operations (e.g. they are not dependent on each others execution) that will benefit with a small ParallelFX performance refactoring.

Thursday, November 15, 2007

My new favourite Global Search Engine

Whenever people hear about "search engines" they immediatly think Google. A few of the early bloomers might even remember Altavista or Lycos - but they still prefer Google. Sure, they might have seen a challenger or two in their time (Live Search being one) but usually they've tried it a couple of times and then went back to their safe heaven - Google.

I use Google too, for my global searching needs and I love it! Or well....it's at least the best I know - or rather - it was the best I had known. Actually, when I think about it, it's not all that good. Sure, the index is big, and the ranking often makes sense - and it's certainly a nice feature that I can type in 2+2 in the search field and get a result back...But to be honest, once you get your results in Google, you are pretty much on your own - and need to start paging to find what you are looking for in the 13498919384981339849184391 results it found for you (in 0.0o00001 seconds I'm sure). And even the result you're looking for is right in front of you on the first page you might not even know that it's the right result - cause all you got to identify it with is a title and a couple of words showing the immediate context of your search query.

I always excused Google to myself, thinking: Well, being a global search engine I suppose thats how they got to do it - since they can't rely on good and structured data on the web they'll have to go by the lowest common denominator. A good local site search might do better - but only because it can to some degree require the data it indexes to be more structured.

But recently I saw a demo, that made it obvious it doesn't have to be that way. The challenger who's got a brand new approach to global web search is Exalead - a french search engine company. It was at a Panel debate about Enterprise Search in CMForum2007, where Exalead really stood out - both as an enterprise search and as a global search.
Their trick: They automatically retrieve structured data from an unstructured mess.

By having this structured information they are able to help you narrow your search results to the results that are relevant to you. And their search UI is filled with usability enhancing features - take for instance the thumbnails. Showing thumbnailed search results isn't a new idea (in fact I made a demo of it 4 years ago - and many others were doing it before me) - but it's extremly helpful on a global search engine - instantly you get an idea of what kind of page you'll end up at if you click the link.
On top of that they implemented a simple automated clustering (looking a lot like Suffix Tree Clustering to me) - that although it could be better (look at vivisimo guys) still is pretty helpful in suggesting additional search terms for narrowing the results even further.

My only complaint is that the Exalead index doesn't seem to be as complete or up-to-date as google's - I suppose they still have some investments in giant data-centers ahead of them to reach that point - but it's already pretty usefull!

On a side note: Exalead also makes a very nice little desktop search that can be downloaded for free from their site - check it out!

Friday, November 9, 2007

A Simple Page Import Web Service

In EPiServer CMS 5 there's a couple of very useful Web Services that gives you pretty thorough access to do just about anything you please - at least with regards to adding / searching / modifying pages.

However the Web Services can be pretty complex and at times a green newbie like myself needs a quick & dirty way of importing files from another system into EPiServer without doing too much thinking. In fact, I found myself in exactly that situation recently, when I wanted to import a huge amount of test data I had scavenged on the net (19040 pages to be exact). So I ended up writing a new little web service that takes the following parameters:

ParentID - The ID of the page that should be the parent of the page you're adding
PageType - The name of the page type to create
PageXML - A string of XML, defining the page.The tag-names should match the page properties you want to set, and they should all be wrapped in a <page> </page> tag. So something like "<page><pagename>My Page</pagename><bodyField>This is the body</bodyField></page>"
Publish - A boolean parameter that specifies if the page should be published instantly.

The Web method then returns the page-id of the page created, thereby enabling you to build your own page-hierarchy.

In order to make it, I simply created a new standard Web Service in Visual Studio, set the following Using clauses:

using EPiServer;
using EPiServer.DataAccess;
using EPiServer.Core;
using EPiServer.DataAbstraction;
using System.Xml;
using EPiServer.Security;

and added this method to the Web Service class:

[WebMethod]
public int ImportPage(int ParentID, string PageType, string PageXML, bool Publish)
{
    XmlDocument xd = new XmlDocument();
    xd.LoadXml(PageXML);
    PageData pd = DataFactory.Instance.GetDefaultPageData(new PageReference(ParentID), PageType,AccessLevel.NoAccess);
    //Fill in properties
    foreach (XmlNode xn in xd.DocumentElement.ChildNodes)
    {
        if (xn is XmlElement)
        {
            string name = (xn as XmlElement).Name;
            try
            {
                pd[name] = (xn as XmlElement).InnerText;
            }
            catch { }
        }
    }

PageReference pr=DataFactory.Instance.Save(pd, (Publish)? SaveAction.Publish:SaveAction.Save,AccessLevel.NoAccess);
return pr.ID;
}

So, basically it creates a page with default data, iterates through the 1st level nodes and checks if there is a matching page property. This is a really simple example without any proper error handling, and which only supports string properties. The only thing that's worth noting is how I use the AccessLevel.NoAccess in the Save method and the GetDefaultPageData method, to avoid uncomfortable access checks (since the web service typically runs as an anonymous user). However, make sure always to put the service behind access-control (which can be set up in web.config).

And yes, the import of my 19k pages went surprisingly well - and pretty quick too!

Monday, October 22, 2007

How to extend the general Page functionality in EPiServer CMS 5

Still being a newbie in the EPiServer company I learn a lot of new things every day - both about the product and the company.

I've decided to share some of my discoveries here - perhaps they help other EPiServer newbies out there :-)

Today I came across quite a useful, but not-so-well-documented feature in EPiServer CMS 5: The ability to extend the general functionality of your Pages. So, if for instance you want to add a piece of code that should be executed whenever a page is shown, and which can affect that page, but don't feel like inheriting the TemplatePage type and letting all your pages inherit from your custom type, there is actually quite a neat way of doing it.

The trick is the PagePlugIn attribute. You can create a class, put the PagePlugIn attribute (from the EPiServer.PlugIn namespace) and add static method with the signature void Initialize(int) and from that method setup an event handler to handle the EPiServer.PageBase.PageSetup event.

Here's an example on how you can use this functionality to add an extra menu item to the context menu:


   [PagePlugIn]
   public class MyPagePlugin
   {

       public static void Initialize(int bitflags)
       {
           EPiServer.PageBase.PageSetup += new EPiServer.PageSetupEventHandler(PageBase_PageSetup);           
       }

       static void PageBase_PageSetup(EPiServer.PageBase sender, EPiServer.PageSetupEventArgs e)
       {
           sender.ClientScript.RegisterClientScriptInclude("OnScript", "MyScript.js");
           sender.PreRender += new EventHandler(sender_PreRender);
       }

       static void sender_PreRender(object sender, EventArgs e)
       {
           (sender as EPiServer.PageBase).ContextMenu.Menu.Add("MyItem", EPiServer.Security.AccessLevel.Edit, new EPiServer.RightClickMenuItem("My Script", "MyScript()", "MyScriptSubMenu"));
       }

   }

Monday, October 8, 2007

Off to Stockholm

After a terrific first week at EPiServer Denmark, I'm off for a 3-day introduction tour to the swedish HQ.
In the first week I accomplished most of what I set out to do: I got my laptop up and running, with a decent dev. environment, almost learned my way around Vista (still don't know if I like it) and got started on a couple of really interesting projects.

I've also started an episerver-based blog on the Research site, r.ep.se - but don't worry, I plan to crosspost a lot so you shouldn't be missing out on anything here.
So far EPiServer seems to be both a great product and a really nice company - I certainly can't complain about the way I've been received so far.

So, now all thats left to do is to look forward to some of the famous swedish cuisine in the next 3 days to come...

Monday, October 1, 2007

First Day on the new Job

Today was my first day at the new job at EPiServer, and I must say that the future is looking bright!
A lot of new cool hardware had arrived for me (in fact, I'm posting this from my new Lenovo T61p laptop) and - yes, I've taked the step to Vista (finally).
So, naturally I spend a lot of day unpacking hardware, installing and learning my way around Vista (only made it crash twice).
I also had a chance to discover what some of the first tasks for me will be, after I have my dev-environment set up and memorized all of the EPiServer API by heart. It looks like I'll get to play around a little with building a wiki for EPiServer (there seems to be a demand) and I also get to finish a little project that I already started some days ago, making a Mail2EPiPages job. I'll try to post a little more - but don't be surprised if it's all EPiServer specific. Although I'll try to cross-post a lot of my posting will probably be on a blog on the EPiServer Research site.

I owe a big thanks to the great welcome I've received so far at EPiServer - and EPiServer Denmark (THANKS Michael!). I really have a good feeling about my future in this company...

Monday, September 24, 2007

EPiServer Code: Send a warning email when a page is about to expire

As you might imagine I find it difficult to keep my hands of the brand new v.5. So naturally I've been searching out excuses to try out coding some small samples against the API.
Here's a feature that I've heard requested from several intranet customers already - An automatic email that informs the owner of a page that it's about to expire.
"What a great idea" I thought first time I heard it - One of the major problems with most intranet is outdated information so I can easily imagine companies having a policy that all pages on their intranet should have an expiration date - and I can just as easily imagine the need for owners to change that expiration date, if the information is still relevant hence producing the need for a warning email.
First off I could see several approaches to making this in EPiServer:

Hook into the right DataFactory event from the global.asax and send a mail whenever a page is expiring
Use the v5 support for workflow foundation and make a workflow that performs a SendEmail activity when a page is moved to the archive
Set up a scheduled task to check for pages about to expire

Although I found option 2 very encharming due to the use of workflows, I decided to make an implementation of option 3 - since this was the only approach that would send out a warning email before the page actually expired (and any damage was done).

I made a new C# Code library project, added references to the relevant EPiServer dlls (and log4net to enable logging) and wrote this code:


using System;
using System.Collections.Generic;
using System.Text;
using EPiServer.PlugIn;
using EPiServer;
using EPiServer.Core;
using EPiServer.Filters;
using EPiServer.Security;
using EPiServer.Personalization;
using EPiServer.Configuration;
using System.Net.Mail;

namespace Allan.EPiModules
{
  [ScheduledPlugIn(DisplayName = "Page Expiry Warning")]
  public class ExpiryWarningJob
  {
      private static log4net.ILog _log;

      static ExpiryWarningJob()
      {
          _log = log4net.LogManager.GetLogger(typeof(ExpiryWarningJob));
      }

      public static string Execute(){
          int num = 0;
          //Find pages that will expire in a day
          PropertyCriteria criteria = new PropertyCriteria();
          criteria.Name = "PageStopPublish";
          criteria.Value = DateTime.Now.AddDays(1).ToString();
          criteria.Type = PropertyDataType.Date;
          criteria.Required = true;
          criteria.Condition = CompareCondition.LessThan;
          //...but hasn't already expired
          PropertyCriteria criteria2 = new PropertyCriteria();
          criteria2.Name = "PageStopPublish";
          criteria2.Value = DateTime.Now.ToString();
          criteria2.Type = PropertyDataType.Date;
          criteria2.Required = true;
          criteria2.Condition = CompareCondition.GreaterThan;
          PropertyCriteriaCollection criterias = new PropertyCriteriaCollection();
          criterias.Add(criteria);
          criterias.Add(criteria2);
          foreach (PageData data in DataFactory.Instance.FindPagesWithCriteria(
              PageReference.RootPage, criterias,
              null, LanguageSelector.MasterLanguage(), AccessLevel.NoAccess)
              )
          {
              SendMail(data);
              num++;

          }
          return string.Format("{0} expiry emails sent", num.ToString());

      }


      private static void SendMail(PageData p)
      {
          //Identify user profile. 
          //Consider using the ChangedBy instead of CreatedBy.
          EPiServerProfile esp = EPiServerProfile.Get(p.CreatedBy);
          if (esp.Email != null)
          {
              try
              {
                  //Build a new mail message
                  MailMessage message = new MailMessage("expire@" + Settings.Instance.SiteUrl.Host, esp.Email);
                  message.Subject = "Page \"" + p.PageName + "\" is about to expire";
                  message.Headers.Add("X-Mailer", "EPiServer CMS");
                  message.Headers.Add("Content-Base", Settings.Instance.SiteUrl.GetLeftPart(UriPartial.Authority));
                  message.Body = "The page <A href=\""
                      + Settings.Instance.SiteUrl.GetLeftPart(UriPartial.Authority)
                      + p.StaticLinkURL + "\">"
                      + p.PageName
                      + "</A> will expire on "
                      + p.StopPublish.ToShortDateString();
                  message.IsBodyHtml = true;
                  message.BodyEncoding = Encoding.UTF8;
                  SmtpClient smtp = new SmtpClient();
                  //Make sure the web.config sets up the SMTP Client.
                  smtp.Send(message);
                  _log.Info("Expiry warning sent to: " + esp.Email);
              }
              catch (Exception e)
              {
                  _log.Error("Failed to send expiry warning", e);
              }
          }
          else _log.Warn("Unable to send expiry warning to " +
 esp.DisplayName + 
" - no known email address");
      }
  }
}

The dll should be placed in the EPiServers "bin" folder and then it will automatically be loaded. The ScheduledPlugin attribute will make it appear as a scheduled task in Admin mode. Here you should probably set it to run once a day - perhaps in the early hours of the morning will be best.
You also need to make sure the web.config is setup to the SMTP server.

The code is a pretty simple sample that will find the pages that are about to expire (the following day) and then send a mail to the creator of each page that it's about to expire.

The sample was made in less than a day and of course it still could use a lot of work to be really nice. Ideas for improvements:

Group expiration mails so that each user won't be bombarded with several mails every day
Consider how long time before people should be warned that the pages are about to expire - is 1 day time enough?
Consider if it's really the creator that should get the mail - perhaps the last person to have updated the page would be the right one?
Build functionality together with tasks - so instead of emails a task to check the page should be created.

Enjoy!

New Job!

Just a quick personal update: I've decided to join the great team at EPiServer, starting october 1st as a Technical Architect. I'll still be based in Copenhagen, Denmark - but my primary work-area will still be within development (which for the most part is in Stockholm) so I guess I'll once again get to try working virtually with people all over the world.
I'm looking very much forward to joining the EPiServer family and I hope that I'll be able to contribute to make the EPiServer CMS an even better product (although it's already quite good).

...And just to answer an obvious question: Yes, this means that there might be more CMS and EPiServer related posts on my blog in the future - but I'll still try to find time between changing my sons diapers and developing CMS features to post some more unrelated crazy code :-)

Thursday, September 13, 2007

An oldie but a goldie

Just came across this old story that pretty much explains why so many software companies have trouble surviving after moving from pioneers to "real software company".
It's quite a laugh.

Thursday, September 6, 2007

Google Earth Rules!

A couple of days ago I read that Google Earth had introduced a new awesome feature and I hurried to download the latest version.
No, I'm not talking about the wannabe astronomical function where you can browse the sky as well as the earth - thats cute, but no were near as cool as their new "secret" Flight Simulator.

What a great idea to build in a flight-sim in Google earth! It's awesome to cruise around in 10.000 ft in a makebelieve airplane seeing the ground as it really looks.
All we need now to make it even more realistic is some clouds, radio-chatter, more 3d buildings and a bunch of hot stewardesses!

"Ladies & Gentlemen, Captain Allan has turned on the Fasten Seat Belt sign, so please buggle up and turn off any electronic devices"

Friday, August 31, 2007

PRSS = Post Release Stress Syndrome

It's been a while since I last blogged, and a lot have happened in the time.
Mostly I've been busy with the post-release of Maximilian 1.0 (= my son). It's amazing how tricky it is to code and blog, with such a cute kid to distract you and a lot of diapers to change...

Now, however, we're slowing finding a routine in our daily lifes and I'll try to start blogging again - but bear with me if I don't post daily like I used to.

I have a ton of unfinished coding and blogging projects in the pipeline that I'll try to finish off while at the same time I'll try to keep posting my comments when I see new cool developments in my spheres...Like today when Mats, you pointed me to the fact that EPiServer 5 RC4 just has been released. I already have a download in progress and I can't wait to see what changes it has brought since RC2.

Wednesday, July 25, 2007

My own MiniMax implementation

As you might have noticed these last couple of days has been kind of slow tech-blog wise.
This is of course due to the birth of our wonderful boy, Maximilian, who was born monday morning.
I expect it'll be a couple of weeks before I again have time to write tech-blog entries :-)

Check my personal blog for personal updates.

Tuesday, July 17, 2007

NXT: My first 'bot

As I mentioned earlier I recently got a cool Lego MindStorms NXT to play with.
Now, with the help of Jesper, we managed to build the basic humanoid robot - and it's awesome!

I've coded it to do a few things:

Whenever the light is on, Mr. Bot will say "Goodmorning" and start walking.
When the light is turned off he'll say "Goodnight" and stop walking
When he's 12 cm from an obstacle he'll say "Please" (...move) and stop.
When he hears a noise (like somebody clapping their hands) he'll look around to see whats going on
If he's hand-button is pushed he'll greet you with the words "Have a nice day".

Of course I also made a simple C# Win Forms application that can remote control him.
My initial idea was to build in some simple learning mechanism (perhaps reinforcement learning) but I havn't gotten around to that yet. So far I'm just blinded by all the possibilities :-)

As handsome as Mr. Bot is now, you better take a good look, cause now he's being taken apart. Jesper suggest a really funny use for the NXT - something that every household needs: So now I'm going to build a machine that can automatically color-sort M&M's - and perhaps send out maydays if Olga comes too close to the candy!

WCF: Sharing Types between Server and Client

A discussion I've run into time and time again through the last few months when I've been working with WCF is whether to use the generated proxy classes client-side or think of something else (like inherting the proxy-classes, creating your own proxies, or somehow try to make the proxy classes identical to the source classes).
I guess the discussions arise as a result of people not being sure if they should consider WCF like Web Services which has a loose coupling (generated client proxies) or like Remoting which often has a tighter coupling between the interacting participants (shared dll).
Until now I've mainly been a fan of the loose coupling because of three things:

It's the easiest and fastest just click the right buttons in your VS and you're set! (yes, I can be quite lazy at times but remember that lazy developers often are the best)
There's no dependency of a specific version dll between the client and the server. If the Server gets an update that breaks the service-convention the client just has to regenerate it's proxy and you're set.
I havn't seen a clean and pretty alternate solution before. Mostly it's been messy.

However today I just came across this excellent codeproject posting that gives a fine example of how to share a type between a WCF Service and a WCF consumer. It turns out that it's built into the SvcUtil (the client proxy generator) as a command-line switch. Now doesn't that make me feel stupid :-)

Friday, July 13, 2007

Automatic Language Detection

A classical task when dealing with textual information is to automatically identify which language a text is written in (no, geeks - it's not a question of VB or C# - I mean human languages!).
Here's my attempt at a very simple, yet useful approach: character-bigram statistics.
I've basically made some extensive statistics on several languages on the frequency of all bigrams, and using that it's now possible to determine which language a given text resembles the most.
Try out my Language Detector here!

The text-corpus I used was another classic, the proceedings of the European Parlament through several years (can be found here).

My first step was to construct a class to contain bigram statistics for some text (LangStat).
In the class I also included code to determine the euclidean distance between two sets of bigram statistics (useful when trying to determine which language a text is most similar to). I implemented it as an operator overload for "-", so you can always determine the distance between two bigram-statistics by simply subtracting them from each other.


       //Calculates euclidean distance between two LangStat's
       public static double operator -(LangStat a,LangStat c)
       {
           //Operator overload
           double tot = 0;
           foreach (Bigram b in a.Bigrams.Keys)
           {
               if (c.Bigrams.ContainsKey(b))
               {
                   //Bigram exist in remote
                   double me = (double)a.Bigrams[b] / a.Count;
                   double them = (double)c.Bigrams[b] / c.Count;
                   tot += Math.Pow(Math.Abs(me - them), 2);
               }
           }
           return Math.Sqrt(tot);
       }

Then I build a Console trainer application, that is able to load the corpus text files for a given language, clean up any unwanted tags in them and then adds the text to a bigram statistic.

When it's done, it use the System.CodeDom to generate source-code for a class that inherits the LangStat, but which is specific to the current language. That way I'll have my languages precompiled and ready to be compared to custom textual content.
This might not be the most efficient approach, but it sure was funny to play around with CodeDom (an interesting namespace that I get to use far to seldom).


       static void Main(string[] args)
       {
           string lang = "sv";
           string langname = "Swedish";
           string[] files = Directory.GetFiles((...language folder...));
          
           //Build language statistics from file-corpus
           LangStat l=new LangStat();
           foreach(string f in files){
               Console.WriteLine("Examining file: "+f);
               StreamReader sr=new StreamReader(f);
               string s=sr.ReadToEnd();
               sr.Close();
               //File loaded
               s=Regex.Replace(s,"<[^>]*>"," ",RegexOptions.Multiline);
               //Tags removed
               l.AddText(s);
           }


           //Generate Code
           System.CodeDom.CodeNamespace ns =
               new System.CodeDom.CodeNamespace("Allan.Language.Detection");
           CodeTypeDeclaration tp = new CodeTypeDeclaration(langname);
           tp.BaseTypes.Add(typeof(LangStat));
           tp.IsClass = true;
           ns.Types.Add(tp);
           CodeConstructor cc = new CodeConstructor();
           cc.Attributes = MemberAttributes.Public;
           tp.Members.Add(cc);
           cc.BaseConstructorArgs.Add(
               new CodePrimitiveExpression(l.Bigrams.Count));
           foreach (Bigram b in l.Bigrams.Keys)
           {
               //Could be done much nicer, but I'm in a hurry
               cc.Statements.Add(
                   new CodeSnippetExpression(
                   "_bigrams.Add(new Bigram('"+b.A+"','"+b.B+"'),"+
                   l.Bigrams[b].ToString()+")"));
           }
           cc.Statements.Add(
               new CodeAssignStatement(
               new CodeVariableReferenceExpression("_count"),
               new CodePrimitiveExpression(l.Count)));
           System.CodeDom.Compiler.ICodeGenerator gen =
               new CSharpCodeProvider().CreateGenerator();
           StreamWriter sw=File.CreateText(langname+".cs");
           gen.GenerateCodeFromNamespace(ns, sw,
               new System.CodeDom.Compiler.CodeGeneratorOptions());
           sw.Close();

       }

Finally I just had to build a simple windows testing app, that will compare the text written to the languages. Download the solution here.

Wednesday, July 11, 2007

Majestic

A major problem for most global search engines is the simple fact that the net grows so rapidly that no matter how many serverfarms they build, pages are being created or updated faster than the search engines can detect and index them.
I recently came across Majestic that has a really interesting approach to this problem: Distributed crawlers. They've made a simple crawler-client that can help distribute the indexing among all the volunteers who provide spare bandwidth and computertime to this noble task in much the same way as some people donate time and bandwidth to the SETI@HOME project or my personal favourite, the search for the next Mersenne prime.
However, the idea with distributing the search seems really useful. Now, if only they had done something novel to the search-end instead of just copying Google I would have been thrilled. But I like the idea anyway. Check it out at http://www.majestic12.co.uk

Oh yeah, while you're there, check out the C# source for their HTML Parser. It's awesome. Fast and furious!

Tuesday, July 10, 2007

Code Challenge: Michael the Math Maniac

Time for another summer code-challenge. Hopefully this one is a bit easier than the last one :-)

Mr. Michael was a lucky man, cause today, 20070710 (ISO standard) was his birthday!
But Michael wasn't your average lucky birthday boy. He was a Math Maniac. And on this special day, he was wondering: How many of the numbers between 0 and 1.000.000.000 contains the ciphers "20070710" (in that order) somewhere within the number?

Design a method int CountNumbers(int min, int max, int SequenceToFind); that returns the count of numbers which contains the SequenceToFind.
1st prize goes to first valid entry, 2nd prize to best performing entry.
The prizes are still "honour & mocking rights".
May the best developer win.

Friday, July 6, 2007

Code Challenge Results: No luck for the Hash-Party

So far there hasn't been a lot of entries to the latest Code Challenge so I suppose I might have overestimated the abilities of you, my honorable readers.
In fact, the only entry I received was from Peter Thygesen and he admits to actually just having adopted an algorithm by Paul Hsieh.

However just for the fun I compared it to the build-in string hashing algorithm (.GetHashCode()).
The comparison I did was fairly simple: I took 1.000.000 fairly random unique strings (well - actually Guids as strings) and timed how long time it cumulative took to run the algorithms. I also checked how many duplicate hash-codes each algorithm resulted in.
It turns out they were pretty equal.
The build-in algorithm had 114 duplicate hash-codes and took 15275 ms. while Mr. Thygesens entry had 115 duplicates and took 15318 ms.

Thanks for playing, Peter - but I think we have to declare this a no-win :-)

EPiServer 5 CMS - First impressions

A couple of weeks ago I wanted to check out how the new EPiServer 5 looked, so I downloaded a free trial version of the RC2.

It comes in two flavors. There's the traditional installer that installs the Manager which allows you to setup new EPiServer websites with a default look & feel, but on top of that there's also a new Visual Studio integration available that I instantly knew I just had to try out.

The install itself was very (!) easy and without

any problems or hickups I had a lot of new features in my visual studio.
For instance I now had the possibility of creating a new EPiServer Project which I instantly did.
This template created blank episerver website, db, etc. for me ready to use.
It's really clear to see that with this new release the clever guys at EPiServer has been focussing a lot on improving the quality of life for all the developers out there who use it as an every day tool to make websites.
At the same EPiServer is now even tighter coupled with the newest Microsoft technologies, basing their CMS on standard ASP.NET 2.0 things like Master pages and ASP.NET User/Role configuration. They've also done a tremendous job of integrationg Workflow Foundation into the core functionality - and to this date this seems like one of the best usages of WWF I've seen so far.

Seen from a developer perspective the new SDK makes me think of EPiServer as a huge toolbox that gives me a lot of tools to efficiently create cool websites and webfunctionality in a standard ASP.NET way, while taking care of a lot of the tedious details. But from an Editor / Administrator perspective you still get the well-known intuitive webbased interface for administrering and editing the website. Cool.

The editor and administrator interface hasn't changed all that much since last version and the entrypoint is still the "famous" right-click menu for logged-in editors. It seems to me like the Editor interface hasn't gotten all that much work done except for a paint-job and perhaps some improved versioning/comparison features (however I could be mistaking, having never been a real-life editor :-) ). Thats okay, though. Rome wasn't build in a day and I certainly prefer the improved SDK and architectural changes.
Yes, I am the kind of guy who cares more about whats under the hood of my car, than the color, shape and sexiness of it's exterior. However it still wouldn't hurt to give a bit of attention to improve the (already good) usability for editors and administrators in a future version. Perhaps AJAX is a good approach here.

While I'm at it, here's another few things for my wishlist for future versions: WCF support for easier data / functionality access and a couple of nice fully-featured demo-sites / templates for the SDK. It could be nice to a couple of ready-to-go samples as VS Templates.

All-in-all I'm very impressed with the RC2 version of EPiServer 5 and I can't wait to play around with it some more. Don't be surprised if a couple of modules start appearing on this blog for free download in the near future. EPiServer continues to be a powerful workhorse in the CMS world, not as flashy and shiny as some competitors but intuitive, strong and flexible.

Zattoo is awesome!

Yesterday I came across Zattoo which is a really cool p2p live-tv service. A bit the same concept as Joost, but with zattoo it's not on demand. Instead you get high-quality streaming of live channels..And quite a lot already! It's easy to get started and it work surprisingly well.
I'll definetly remember that I have it installed next time my wife wants to watch "America's next top model" when I wanna watch the news!
Speaking of news and online TV, I've already become a regular viewer of DR Update (sorry, danish only news). Good quality and nice to see news-videos produced specificly to the web. Way to go DR!

Monday, July 2, 2007

Code Challenge: Fun with Hash

No, this is not what you expected, crackhead. This post doesn't include getting high on anything stronger than your coding skills. It's time for another code challenge!

The challenge
Sometimes it can be very handy to make a small fingerprint of a piece of textual information so you can easily compare it to other pieces of text and check if they are identical without doing a full textual comparison.
The friendly folks at Microsoft have even been kind enough to include a "ToHashCode()" method in the .NET framework, but in this challenge I kindly ask you to ignore that.
The challenge is to code your own method that returns an integer hashcode for any string, so that two identical strings will have the same hashcode and that probability of two different strings sharing the same fingerprint is as small as possible.
So, write a method with the signature: static int MakeHash(string s); in C# and post it as a comment here.
Any posts that use ToHashCode(), MD5, or any other build-in hashing mechanism is disqualified along with posts that is almost identical to prior entries.
Post before friday and I'll make comparisons between the submissions on two different parameters:

Performance
Duplicate Hashcodes for non-identical strings

Good luck, Gentlemen!

Sunday, July 1, 2007

Happy birthday to me!

Today is my birthday. I love birthdays..lots of cake and many presents. The last couple of years (okay...ever since I lost my childhood innocense) the presents have gotten more and more "boring" (= practical and nice but not really play-toys).
Being the eternal kid that I am, I was naturally extremly pleased this year when my wonderful wife (!!!) gave me Mindstorms NXT.
I can't wait to start playing around with it and code C# applications to it.
A quick googling showed that this could be a good place to start!

Thursday, June 28, 2007

Code Challenge: "John the courier"

Yet another brillian idea popped into my mind today: Why not celebrate the rainy summer with a nice indoor competition - a Code Challenge!

Through the next couple of weeks I intend to publish a couple of challenges like the one below.

Think fast, solve the problem and post it as a comment!

The various challenges will have different winning criterias. These could be: "First valid solution posted", "Valid solution with fewest code-lines", "Funniest approach", "Best Performance", etc.

The Winner will win ... well... the honour along with mocking rights over all other coders in the world (at least those who read this blog and who didn't win).

We'll start off with an easy one...

Challenge #1 "John the courier"

In John's little world there is n cities, named by numbers starting at 0. In every city there is a parcel that's supposed to go to another city in John's world.

John live in City no. 0 and starts by picking up a package there. Then, whenever he delivers a package in a city he takes the package from that city and takes it to where it should go.

The distance between the cities is oddly enough the same as the difference in their names (e.g. the distance between city 10 and city 7 is 10-7=3).

John starts off his day with receiving a list of where the parcels in each city should be delivered. Now, John wonders: How far will I have to travel before I get back to my home (City 0).

Suggest a method that takes an int-array where the city is the index and the value is the destination of the package in the city (like this: int CalculateDistanceToHome(int[] CityPackages);) that returns the distance John must travel before he gets home. You can assume that the packages are distributed in such a way that John will always eventually get home.
First valid solution posted is the winner. Bonus points for recursive solutions.
Let the games begin!

Thursday, June 21, 2007

Making cross-thread calls / events

Many developers first introduction to multithreaded programming is the classic challenge of using one or more "background-worker" threads to do some work thats expected to take longer than the average user wants to wait for his windows application to become responsive again.
Making a worker-method and starting up a thread to run it - or asking the ThreadPool to assign the method to a thread from the pool (QueueUserWorkItem) is quite simple - but soon after the coding starts to get interesting (and fun!).

Now you have to worry about using Mutex and Monitor, etc. to ensure that there's no sharing violation between ressource that the threads share. This in itself is a worthy topics of several books and many blogposts (a lot better than my humble abilities allow me to write).
In a simple scenario as described above you might be able to avoid many of these problems if you contain all the necessary data within each worker thread - but judging on the number of times I've been asked about this, you still encounter yet another issue: cross-thread communication.
Imagine that you've started up your worker-thread and it works happily, enjoying as many cpu-cycles as your operating system allows it, while still letting the main application thread provide a responsive UI. Then you'll at some point start to wonder "Okay....so now my fingamaboob is doing some work...thats nice...I wonder how far it's gotten".
- "No problem", I hear you say. "I'll just have my worker thread output it's status to the window running in the main thread."
This approach will often lead to one of two scenarios:

InvalidOperationException, Cross-thread operation not valid

Some weird construction with a shared status variable, that the UI is polling every X seconds

The correct solution to this problem is to use a proper cross-thread call. For instance you can use Invoke (or BeginInvoke if you're the asynchroneous type). All Windows Forms controls has an Invoke method that you can call and provide with a delegate and a set of parameters. That way you are instructing the thread that "owns" the control to run the call the delegate with the specified parameters.

As you can see below it can be done quite elegantly using anonymous methods and a custom delegate.


    public partial class Form1 : Form

    {

        public Form1()

        {

            InitializeComponent();

        }



        private delegate void ReportStatusHandler(string status);



        private void DoBoringWork(object param)

        {

            for (int i = 0; i < 10000; i++)

            {

                //Simulate boring work

                Thread.Sleep(10);

                if (0 == (i % 100))

                {

                    //Output status for every 100



                    //WRONG:

                    //listBox1.Items.Add(i.ToString()+" items processed"); 

                   

                    //Right:

                    ReportStatusHandler rsh = new ReportStatusHandler(

                        delegate(string s) {

                            listBox1.Items.Add(s);

                        });

                    listBox1.Invoke(rsh,

                        i.ToString() + " items processed");

                }

            }

        }



        private void button1_Click(object sender, EventArgs e)

        {

            //Put work in Queue to be done by ThreadPool

            ThreadPool.QueueUserWorkItem(

                new WaitCallback(DoBoringWork), null);

        }

    }

EPiGirl leads the way

I just came across Rachel Goldthorpes newly started Blog on EPiServer development.
She definetly seems to be off to a good start with her post on getting started with EPiServer 5 CMS - it looks very helpful and I'll definetly keep it in mind when I get the chance to try out the new RC.
Welcome to the blogging sphere EPiGirl and keep up the good work!

Wednesday, June 20, 2007

Warning: TortoiseSVN Rename

Now, here's a little lesson I just learned the hard (!) way... When you are using the "Rename" functionality on a folder in TortoiseSVN it doesn't do what you might expect, that is - rename the folder. No, in fact it deletes the f0lder and creates a new one with the contents thats commited to the Subversion repository. And it doesn't just do a "move-to-recycle-bin" kind of deletion, no it makes a "oh-no-daamn-sh*t-aaaargh" deletion of the entire folder along with any uncommitted files you might have in there (yes, even those added to subversion that hasn't been commited yet).
But on the bright side...I suppose this is the kind of mistake you only make once.

Now I'll go back to recreating my lost files from scratch :-(

Friday, June 15, 2007

Generic Type Conversion in C#

I have now several times found myself in the peculiar situation of having two classes, that mostly have the same properties and fields, but doesn't implement the same interface or inherit from the same ancestor, which makes it rather tedious to convert between them.
The problem is typically seen when dynamically loading 3rd party libraries and trying to get them to interact with other 3rd party libraries, or when doing some advanced kinds of communication (like WCF) with complex types.
True, in some cases the problems can be avoided totally by considering your data, assembly and communication structure - but still there's those hopeless cases where you find yourself writing yet another "Create an object of this type based on that type"-code piece. Not a difficult task, but boring.
So, as always I've tried to come up with a stupid solution to a stupid problem: A Generic Type Converter. It uses Generics to "convert" type A to type B. By Converting it simply matches up the public properties and fields and copies the ones that match and that it's allowed to.
Sure, it might be slow (and some might even find it ugly) - but for tasks where the development time is more critical than a few miliseconds of execution time (like for POC's) it might be a nice thing to have in your toolbox.

Keep in mind that this is just a draft version 0.0.0.01.

P.S. Sorry about the formatting, but I'm having a small war with my blogspot and some CSS :-)

using System.Reflection;

public static class GenericTypeConverter
{

public static DestType ConvertType<SrcType, DestType>(SrcType Source) where DestType:class,new()
{
return ConvertType<SrcType,DestType>(Source, null, null);
}

public static DestType ConvertType<SrcType, DestType>(SrcType Source, Dictionary<string, string> SrcDestMapping) where DestType : class,new()
{
return ConvertType<SrcType,DestType>(Source, SrcDestMapping,null);
}

private static string GetMappedName(Dictionary<string, string> Map, string OrigName)
{
if((Map!=null)&&(Map.ContainsKey(OrigName))) return Map[OrigName];
return OrigName;
}

/// <summary>
/// Uses reflection to convert an object to a destination type, e.g. transfers all the properties and members they have in common
/// </summary>
/// <typeparam name="SrcType">Source Type</typeparam>
/// <typeparam name="DestType">Destination Type</typeparam>
/// <param name="Source">Object to convert</param>
/// <param name="SrcDestMap">Mapping between source and destination property names. Null if no mapping exist.</param>
/// <param name="Dest">Destination object or null if it should be created</param>
/// <returns>An object where as many properties and fields as possible have been transferred from Source.</returns>
private static DestType ConvertType<SrcType, DestType>(SrcType Source, Dictionary<string, string> SrcDestMap, DestType Dest) where DestType : class
{
//Create object if it doesn't exist.
DestType dstVar = Dest;
if (dstVar == null) dstVar = Activator.CreateInstance<DestType>();

//Loop through Source' public properties
Type srcTp = typeof(SrcType);
PropertyInfo[] props=srcTp.GetProperties(System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.Static
| System.Reflection.BindingFlags.GetProperty);
foreach (PropertyInfo p in props)
{
//Check if destination type has a settable property of the same type
PropertyInfo pDest = typeof(DestType).GetProperty(GetMappedName(SrcDestMap,p.Name), p.PropertyType);
if ((pDest != null) && (pDest.CanWrite)) pDest.SetValue(dstVar, p.GetValue(Source, null), null);
}

//Loop through Source' public fields
FieldInfo[] mems=srcTp.GetFields();
foreach (FieldInfo fi in mems)
{
FieldInfo mDest = typeof(DestType).GetField(GetMappedName(SrcDestMap,fi.Name));
if ((mDest != null) && (fi.FieldType == mDest.FieldType))
{
mDest.SetValue(dstVar, fi.GetValue(Source));
}
}

return dstVar;
}
}

Blog War: EPiServer vs Sitecore vs Reddot

Some months ago I came across Blogpulse.com (by Nielsen Buzzmetrics) which is a cool service that monitors how popular certain topics arein the Blog-sphere. Just for fun I ran this comparison between 3 of the major european CMS vendors: Sitecore, EPiServer and Reddot.
It would seem that all of the vendors have huge variations in their blog-popularity.

I also couldn't resist trying out battle-of-the-giants BlogPulse and this is what I came(Google vs. Microsoft vs. Apple) on the up with. It would seem that Google is in the lead Blog-wise, but both Microsoft and Apple are gaining (naturally you'd have to take into consideration that all blogs about the fruit that goes by the same name as a major soft/hardware company is also part of the graph).

Wednesday, June 13, 2007

Awesome Public/Free SVN Repository

Several months ago my friend and colleague Jesper (aka Graffen) got the idea to make a free public SVN hosting service.
I know, I know - several of those exist already, but this is awesome because it's so simplistic, easy to use - and without any rules about having to do OpenSource, etc. bothering you.
You just go to the website, create users and repositories and you are ready to go.
I've already started using it on a lot of my pet projects. It's so much nicer to have even your crazy-idea-personal-projects under source control. And Subversion (SVN) source control have been one of my favorites for a long time.
All you need to be running with SVN and this awesome service is just a SVN client and access.

Great job, Jesper!

Spring cleaning

Although spring has passed and summer is here, I thought it might be a good idea to begin doing some spring cleaning in my draft posts.
Typically I get a gazillion ideas for posts I want to make and I instantly begin writing them, but pretty quickly I get another new idea that I'd rather post, resulting in a lot of half-finished draft-posts.
Through the next couple of days I'm going to do some cleanup - probably post some drafts and delete some others - so have patience if you thing it's old-news I'm posting.

Surfing Linkedin

I just had a brief surf through my linkedin contacts and I saw that my good friend Mats Hellström from EPiServer has joined the blogging sphere way back in march.
Welcome, Mats - I hope to see more postings soon!

Check out Mats' blog here.

Friday, June 8, 2007

ShotCode - Let's reinvent barcodes!

About a year and a half ago I was amazed at some of the new java applications for cameraphones I saw at the NeXT conference, among others were a reinvention of barcodes where the idea was that if a special 2d barcode was put on a poster, everybody with a cameraphone could scan it and get someinformation contained in the barcode, typically an ID number which an online service could translate to a URL or something else meaningful.
Of course I hurried home, downloaded the software and tried, but it almost never worked as it should so I gave up.

Today I found another similar service that actually seems to work quite well. And on top of storing your urls in a smart looking circular barcode they even offer to convert the webpage to a format more suitable for the phone that requests it (admitted, that specific part of their service doesn't seem to work too well, but anyway).

I saw the service in use at getjar.com - my favourity source for java apps for my nokia 6380 and there it worked quite impressive...Just start the shotcode application on the phone, take a picture of the barcode on the screen and begin downloading the java-application. Then I checked out Shotcode.com and after a quick registration I found it quite easy to create my own shotcodes....Now I suppose I'll just sit back and wait until this becomes an industry standard so everybody has it on their phones!

Sunday, June 3, 2007

The Meaning of Life

If I were a religious man I might believe that there's one truth in this world, known only to God (and perhaps Knuth also figured it out, but I digress).
If there really was such a one single truth, it should be possible (although perhaps not economical) to let a computer figure out what it is.
Someone once said that if you fill a room with monkey's and typewriters and give it enough time (and bananas!), eventually they will have written the collected works of shakespeare - I suppose this is the idea behind any brute force algorithm (although some might argue that given enough time the monkeys will have evolved into Shakespeare, but again I digress).
But it must be possible to speed up such a process....What if I custom-built the typewriters so they could only write real words that shakespeare used? And what if I had the almighty inspire the monkeys to which words should come in sequence? Perhaps I could shorten down the production time of my shakespeare - or the meaning of life...
So, eager to learn the true meaning of life (okay, who said "42" ?? shut up!) I devised the following plan:

Gather a collection of quotes from some of the most clever men I can think of (e.g. Einstein, Decartes, Voltaire, Vonnegut & Twain).
Calculate a Markov Chain over the bigrams in their sentences
Fill in Divine interaction (in the shape of a random generator)
Start to generate the meaning of life as random quotes

Now, the only problem I face is to determine when my generator actually comes across the meaning of life.... Good proposals can be posted here!

My "Meaning of Life" Random Quote Generator.

If you want to have a "quote of the day" on your website, feel free to use this javascript:

<SCRIPT TYPE="text/javascript"
SRC="http://www.mizar.dk/QuoteGenerator/Javascript.aspx">
</SCRIPT>

Tuesday, May 29, 2007

Is Google high on LSI?

Ah, the headline caught your attention :-) Well, don't worry. LSI is not a new fancy designer-drug and although the G company has a history of flying high, I doubt they are on anything stronger than Coke Zero.
But yesterday I just came across this excellent post by fellow blogspot blogger, John Colascione.
In the post he brings some interesting examples on how Google has implemented LSI (Latent Semantic Indexing). Back in 2005 I had the great pleasure of working with Moses Martiny and Kenneth Vester at Mondosoft while they were writing their Master Thesis on one of my favourite topics of all time, Document Clustering. I remember how I through them learned about LSI which is quite an interesting approach to automatic keyword extraction.
With this technique you can get some amazing results of keywords extracted from documents that doesn't even contain the actual words - although it should have!
If I recall correctly the basic approach is something like making a matrix of documents and words containing the entire document collection, and then use an algorithm like SVD to determine the most distinctive words for each document - even without the document containing the words. Funny stuff!

Naturally I couldn't read John's post without trying Google solution myself, and although it's not every term that has good LSI matches, there was some interesting ones. For instance it would seem that the word "~rap" is connected to both "Eminem" and "Lyrics" as well as "Rdf Api for Php" (the last was obviously the most interesting hit in my humble opinion).

Anyway, it's cool that Google is playing around with this technology - just as all the other search giants (and challengers). Now, if only it was incorporated in the search in a better way than the tilde ("~") query line operator.

WCF: Duplex is awesome!

One of my first assignments in the new job has been to lookup into a couple of the fun new features in .NET 3.0, like WCF and WWF.

So far, I've found Communication Foundation really interesting, albeit a bit annoying to work with.
It's main force is it's flexibility. Where you would usually have to decide on either building a Web Service, your own custom coded TCP Server or use remoting, you can now just build a standard service and then just put in the configuration file which protocol it should use (like HTTP, TCP, Named Pipes, MSMQ, etc) - or well, at least thats the theory. The downside to this is of course that since there's a lot of stuff thats configurable, you really need to understand all the configuration concepts (like Bindings, MetadataExchange, Endpoints, Security, Contracts, etc) properly and configure it well in order to use it.
Another potential problem could be the performance of this communication since all the communication is handled using SOAP (which means that there's a lot of XML serialization and deserialization going on).

It would also seem that a couple of the problems known from WebServices has been addressed. For instance you now no longer need to put the webservices in IIS in order to use web-services - they'll just open a HTTP port for you and act as their own server. It also looks like error-handling has been improved and it looks like there's now some cross-service exception-handling (although not perfect. An ApplicationException thrown from the server will appear to be a "FaultException" on the client - but perhaps I'm missing something here).

The most awesome feature I've stumbled across until now in WCF is the possibility to make Duplex services, e.g. services that are able to initiate communication with the client.
Sure, you could yourself make each client a service as well as a client and then let them exchange connection information, but now this functionality is build into the communications framework.
Naturally this requires some coding/configuration inconviniences, but once they are done it's easy to implement a state-of-the-art Observer pattern across various machines.
Setup a service, allow multiple clients to call the service to register themself as subscribers to various events, and then let the Service notify them when the event occurs.
Jeff Barnes has put a great article on Codeproject with an example of this.

Sunday, May 20, 2007

New Job!

After about a month filled with interesting job interviews and exiting offers, I finally decided where I'm going to work!
As of tomorrow I'm working as a Software Architect at Infopaq International.

I find Infopaq to be a really interesting company, mainly because of their ideas and infrastructure. They monitor a lot (!) of media, write news resumees and distribute the relevant news to their customers, along with doing some really fancy media analyzing. In other words, they seem to have tons of data and information to play around with - something thats always fascinated me. I'm looking very much forward into taking the step from Information Retrieval into working for a genuine Information Provider.

On top of that they seem like a fun company with a lot of clever guys and with some great ideas and visions. I can't wait to get started and learn all about this new company in details.

Wednesday, May 16, 2007

Playing around with Embedded Objects in IE

Yesterday I felt like playing around a bit with embedded objects in IE - you know, showing Windows Forms in a browser. An old .NET trick that I've used a couple of times.

For some reason it has never quite become the market standard it was supposed to (probably because it's ugly, inefficient and very browser specific) - but it would have been nice with a good alternative to java applets!

I made a quick adaptation of the SudokuSolver from my previus post to see how it would work as an embedded object. This is what I did:

Made a Windows Forms Library project
Made a Windows Forms User Control
Moved the UI from the Sudoku application to the new User Control (as well as the Code Behind)
Compiled and put on a web-server
Made a HTML Page that includes it as an embedded object, and put the page on the same webserver (this is important)

This is the HTML I used, notice how you provide it with the URL to the Windows Forms DLL, and then the full path (including namespace) to the control to display:


    <object id="SudokuControl" height="240" width="206"
        classid="http://www.thraen.dk/Download/SudokuWinLib.dll#SudokuWinLib.Sudoku">
    </object>

If you are watching this blog in IE, and you have .NET 2.0 installed, and your security settings is just right, there is a chance that you might actually see the Sudoku Solver here:

Solving Sudokus as CSP with Forward Checking

When you think about it, a Sudoku is a perfect example of a Constraint Satisfaction Problem, and so it made good sense that the final programming assignment in the IAIP course I'm taking was to solve a Sudoku as a CSP, using forward checking.

The basic idea is to define the problem as a set of variables (in this case 9*9=81 variables), where each has a finite domain of potential values (by default values 1-9 in this Sudoku example) and then there's a number of constraints connecting the variables. For the Sudoku these constraints ensure that the numbers 1-9 are distinct in every horizontal and vertical line, as well as in the same 3x3 square.

Once the problem is defined as a CSP there's several approaches to solving it. The assignment this time was to implement Forward Checking. The basic idea with forward checking is to make a backtracking depth-first search in the solution space, but every time a variable is assigned, the domains of all the affected variables (due to constraints) will be modified. If a domain becomes totally empty, we backtrack. It all sounds terribly complicated but it's really a straightforward approach, assigning the variables one at a time and whenever something is assigned revise the possible values for the rest of the affected variables.

As usual we were supplied with a nice codebase to extend to (in fact all we had to do was more or less to implement the Forward Checking part), but also as usual the entire codebase was in Java. This time it made the most sense to simply port it to C# and it didn't take much more than an hour to port the main solver class and even refactor a little.

The report can be seen here.
The project files can be found here.

Saturday, May 12, 2007

Solving NQueens using Reduced Ordered Binary Decisision Diagrams

Here's one of the things I've been too busy to post:
At my AI course we've completed our last 2 assignments. The one that I'll post here is the N-Queens problem solver, that helps solving the n-queens problem using BDD's.
BDD's is a really clever technique that can help solve some satisfiability problems quite fast.

The N-Queens problem is a classic problem, described lots of places. The derives from the 8-queens problem, thats how to fit 8-queens onto an 8x8 chess board without any of them threatening the others (according to the rules of chess).
It can be solved in many ways, but BDD's is a quite efficient way of doing it.
The basic idea is to build a binary decision diagram, where every node corresponds to a variable in the problem, construct the diagram according to the rules (constraints) that apply to the specific problem and along the way reduce the diagram according to a couple of rules (and to avoid duplicate node-childnode patterns).
How we choose to assign the nodes can be found in our report (as always written with Thomas Gravgaard and Peter Thygesen). Also, a final web-version of our solution can be tried out here.

The joy of being unemployed!

Ahh..posting again... I know, it's been a while, but now I hope to be back again. Now I've been through the first month of unemployed life, and I must admit that it has been nothing like I had imagined. It's been crazy! No time for sitting in front of the local supermarket drinking 6-packs of beers all day (as I had been told unemployed people are supposed to do), no time for playing around with technologies and ideas (as I had hoped) and no time for blogging either...Well, perhaps it's just a bad excuse, but I've been really, really busy.
During the first couple of weeks I got contacted by 40+ companies that wanted to discuss job oppertunities! Early on I was really eager and polite and noted everybody down and did my best to keep track of them, but I must admit that after a couple of weeks I started to sort them roughly at the initial contact. For instance I turned down all consultancy companies, cause I don't feel like body-shopping. At least not now.
But even with thoroughly sorting it mounted to around 15 initial interviews I had to go to. This part is really terrific and quite an ego boost (currently my bloated ego has reached a size where it can hardly fit into living room), that can definetly be recommended. Each interview was around 1,5h - most of the time talking about one of my favourite subject, me. As you can imagine it took quite a lot of time to go through all these talks, and following up on them afterwards. Meanwhile I constantly kept track of all advantages and disadvantages to each job, and after a couple of weeks of initial interviews I had a list of 10 companies offering me 15 different jobs, ranging from software architect, product manager, senior developer to researcher.
Then it started getting tricky!
Cause all these jobs on my shortlist sounded really good and I could easily see myself enjoying each and every one of them. But only one can be picked. So I decided to measure companies on more criterias and narrowed down the list to 5 companies that I decided to have a second talk with.
Now I'm down to considering 3 companies, and of those I must admit I do have a favourite - but I'm not telling which - at least not until the contract is signed and everybody's happy!

Thursday, April 26, 2007

Awesome IKVM; Interact with JAVA from .NET

Yet again I've java in my studies at ITU. The current assignment Peter, Thomas and I are working on is to implement an assistant that helps a user solve the classic N-Queens problem using Reduced ordered Binary Decision Diagrams (RoBDDs or simply BDDs).
There's many(!) ways to solve that problem, but using BDDs does seem like a very intriguing approach. The only problem: it requires a BDD engine. We could of course write our own (and actually I'm currently working on that), but in the assignment we were given, there was actually a fully functioning BDD library ready for us to use. Only, it was in java... (NOTE: I don't have any problem with java and I'm not religious in any ways, but usually .NET is my weapon of choice).
"No problem, we'll just use J# to handle it like last time" was the initial reaction.
But, alas, the library was already a compiled jar, no source included. Naturally we could get all the source from sourceforge and port it to J#, but the time seemed right to try a new clever approach!
Luckily Peter found the right solution: Enter IKVM! IKVM is a great set of tools to interact between java and .NET and it works like a charm.
The two main tools is a command-line program that allows you to run compiled java files in .NET instead of java's virtual machine. The other tool that proved to be really useful to us, will allow you to take a JAR and transform it into a .NET DLL.
All I had to do was to call it command-line with the name of the JAR file and the name of the .NET output file, and run it - and in no time I had a working .NET dll that I could reference directly in my .NET projects.
In order for the referencing programs to work though, it's important to have two of the IKVM dlls' included in the "bin" folder or in the GAC (namely the "IKVM.GNU.ClassPath.dll" and "IKVM.Runtime.dll").
Great work, IKVM guys. Keep doing your magic!

And the programming assignement? Well, here is how far we've gotten so far. Keep in mind that it's work in progress. After the hand-in deadline I'll make a new post about how we did it.
I'm also considering trying out other of the known approaches to solve the same problem and comparing them. Drop a comment if you'd be interested in knowning what works best :-)

Monday, April 16, 2007

DPLL in C# - Satisfying problems in CNF

Time for another AI post! These last couple of weeks I've been working with two fellow C# guru's, Peter Thygesen and Thomas Gravgaard on an assignment in our AI class, on implementing a couple of specific parts of the DPLL algorithm, such as the methods for choosing split symbols, finding unit clauses and identifying pure symbols.
"What's DPLL good for?" I hear you cry...Well, it's simple really - or actually it isn't all that simple but I'll try to explain it anyway.Suppose you have a boolean statement in CNF (conjunctive normal form) and you want to test if it's satisfiable, that is - if a certain configuration exist, that will make it true - then you can run the DPLL algorithm to find out. The DPLL basically searches the solution space, but during the search uses the unit-clauses and pure symbols to prune the search space.In other words (and hopefully more understandable words) if you have a problem that you can formulate as a boolean problem (A and B or C implies D), then you can change that formulation into conjunctive normal form ((A or B or C) and (A or D or E) ... ) and when thats done you can determine if it's actually possible to assign values to the variables that will make this true.The way the algorithm works is basically to pick a symbol (=variable) and assign it true or false, and then for each options recursively call itself until all variables are assigned. In order to minimize the search space it uses a couple of simple rules to shortcut through this search, like finding out which clauses only contained one unassigned symbol. It's also very important in what order it assigns variables.
Another challenge in this assignment was that the code provided for the assignment that we should use as a basis for our work and for testing was all in java (typical university assignment). We're all C# people but too lazy to rewrite everything in C#, so luckily we got the java-code working in J# and were able to base our code on it anyway (and I wouldn't be surprised if our execution performance is higher that if we had used java).

Read our project here.