IIS

How to stop IIS7 from handling 404 errors so you can handle them in ASP.NET

IIS7 has lots of places you could look to make this change: you might start off looking to see if it’s an advanced option on your application pool, no, so then you try looking at the web site itself and the option .NET Error Pages.  That has to be it, surely!  So you try every option there Mode=On, Mode=Off, Mode=Remote Only.  Nothing works so you consult the help for those items only learn that “Mode” is to “Select a mode for the error pages: On, Off, or Remote Only.”  You can see now why help writers at Microsoft are so well paid – who would have guessed that Mode = Remote Only sets the Mode to Remote Only!

Now you are really frustrated but luckily you landed on my blog post here where you learned that the true path to 404 happiness is a simple change to your web.config:

<system.webServer>
  <httpErrors errorMode="Detailed" />

A simple web crawler in C# using HtmlAgilityPack

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using System.Net;

namespace LinkChecker.WebSpider
{
    /// <summary>
    /// A result encapsulating the Url and the HtmlDocument
    /// </summary>
    public abstract class WebPage
    {
        public Uri Url { get; set; }

        /// <summary>
        /// Get every WebPage.Internal on a web site (or part of a web site) visiting all internal links just once
        /// plus every external page (or other Url) linked to the web site as a WebPage.External
        /// </summary>
        /// <remarks>
        /// Use .OfType WebPage.Internal to get just the internal ones if that's what you want
        /// </remarks>
        public static IEnumerable<WebPage> GetAllPagesUnder(Uri urlRoot)
        {
            var queue = new Queue<Uri>();
            var allSiteUrls = new HashSet<Uri>();

            queue.Enqueue(urlRoot);
            allSiteUrls.Add(urlRoot);

            while (queue.Count > 0)
            {
                Uri url = queue.Dequeue();

                HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
                oReq.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";

                HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();

                WebPage result;

                if (resp.ContentType.StartsWith("text/html", StringComparison.InvariantCultureIgnoreCase))
                {
                    HtmlDocument doc = new HtmlDocument();
                    try
                    {
                        var resultStream = resp.GetResponseStream();
                        doc.Load(resultStream); // The HtmlAgilityPack
                        result = new Internal() { Url = url, HtmlDocument = doc };
                    }
                    catch (System.Net.WebException ex)
                    {
                        result = new WebPage.Error() { Url = url, Exception = ex };
                    }
                    catch (Exception ex)
                    {
                        ex.Data.Add("Url", url);    // Annotate the exception with the Url
                        throw;
                    }

                    // Success, hand off the page
                    yield return new WebPage.Internal() { Url = url, HtmlDocument = doc };

                    // And and now queue up all the links on this page
                    foreach (HtmlNode link in doc.DocumentNode.SelectNodes(@"//a[@href]"))
                    {
                        HtmlAttribute att = link.Attributes["href"];
                        if (att == null) continue;
                        string href = att.Value;
                        if (href.StartsWith("javascript", StringComparison.InvariantCultureIgnoreCase)) continue;      // ignore javascript on buttons using a tags

                        Uri urlNext = new Uri(href, UriKind.RelativeOrAbsolute);

                        // Make it absolute if it's relative
                        if (!urlNext.IsAbsoluteUri)
                        {
                            urlNext = new Uri(urlRoot, urlNext);
                        }

                        if (!allSiteUrls.Contains(urlNext))
                        {
                            allSiteUrls.Add(urlNext);               // keep track of every page we've handed off

                            if (urlRoot.IsBaseOf(urlNext))
                            {
                                queue.Enqueue(urlNext);
                            }
                            else
                            {
                                yield return new WebPage.External() { Url = urlNext };
                            }
                        }
                    }
                }
            }
        }

        ///// <summary>
        ///// In the future might provide all the images too??
        ///// </summary>
        //public class Image : WebPage
        //{
        //}

        /// <summary>
        /// Error loading page
        /// </summary>
        public class Error : WebPage
        {
            public int HttpResult { get; set; }
            public Exception Exception { get; set; }
        }

        /// <summary>
        /// External page - not followed
        /// </summary>
        /// <remarks>
        /// No body - go load it yourself
        /// </remarks>
        public class External : WebPage
        {
        }

        /// <summary>
        /// Internal page
        /// </summary>
        public class Internal : WebPage
        {
            /// <summary>
            /// For internal pages we load the document for you
            /// </summary>
            public virtual HtmlDocument HtmlDocument { get; internal set; }
        }
    }
}

Shaving seconds off page load times

IIS6 comes with some antiquated settings for Gzip compression.  Even after enabling it in the IIS management console nothing happens until you edit the metabase file to enable the files you really want to compress.
With the correct settings the main page for my site now loads in 5.3s instead of 6.0s and subsequent loads are now down to about 1second with just three requests to the server.
Browser caching and Gzip compression is improving our customer experience significantly now.
Here’s what my metabase settings look like after removing ‘asp’ and adding ‘aspx’, ‘cs’ and ‘js’.
HcFileExtensions=”htm
html”
HcScriptFileExtensions=”aspx
js
css”

Don’t forget to change both sections in the metabase, one for deflate and one for gzip.