The Blog of Ian Mercer.

Web site crawler and link checker (free)

In a previous post I provided a utility called LinkChecker that is a web site crawler and link checker. The idea behind LinkChecker is that you can include it in your continuous integration scripts and thus check your web site either regularly or after every deployment and unlike a simple ping check this one will fail if you've broken any links within your site or have seo issues. It will also break just once for every site change and then be fixed the next time you run it. This feature means that in a continuous integration system like TeamCity you can get an email or other alert each time your site (or perhaps your competitor's site) changes.

As promised in that post, a new version is now available. There's many improvements under the covers but one obvious new feature is the ability to dump all the text content of a site into a text file. Simply append -dump filename.txt to the command line and you'll get a complete text dump of any site. The dump includes page titles and all visible text on the page (it excludes embedded script and css automatically). It also excludes any element with an ID or CLASS that includes one of the words "footer", "header", "sidebar", "feedback" so you don't get lots of duplicate header and footer information in the dump. I plan to make this more extensible in future to allow other words to be added to the ignore list.

One technique you can use with this new 'dump' option is to dump a copy of your site after each deployment and then check it into source control. Now if there's every any need to go back to see when a particular word or paragraph was changed on your site you have a complete record. You could for example use this to maintain a text copy of your WordPress blog, or perhaps to keep an eye on someone else's blog or Facebook page to see when they added or removed a particular story.

Download the new version here:- LinkCheck <-- Requires Windows XP or later with .NET4 installed, unzip and run

Please consult the original article for more information.

LinkCheck is free, it doesn't make any call backs, doesn't use any personal data, use at your own risk. If you like it please make a link to this blog from your own blog or post a link to Twitter, thanks!

Related Stories

Why smarthomes are hard

Why automated learning is hard for a smart home. The perils of over-fitting, under-fitting and how the general unpredictable nature of life makes it hard to build a system that learns your behavior.

Ian Mercer
Ian Mercer

ATAN curve for probabilities

In a home automation system we often want to convert a measurement into a probability. The ATAN curve is one of my favorite curves for this as it's easy to map overything onto a 0.0-1.0 range.

Ian Mercer
Ian Mercer

Home Construction Advice

Several years ago we did a major remodel. I did all of the finish electrical myself and supervised all of the rough-in electrical. I also put in all of the electrical system and water in our barn. I have opinions ...

Ian Mercer
Ian Mercer

T-Mobile home internet

I'm testing a T-Mobile Home Internet device as a backup to XFinity and a way to offload half our monthly traffic to avoid the XFinity 1.2TB cap

Ian Mercer
Ian Mercer

Home Automation Systems as a Graph

Using nodes and links to represent a home and all the devices in it

Ian Mercer
Ian Mercer

Showing home status with just a single RGB LED

Multicolored LEDs can convey a lot of information in a small space

Ian Mercer
Ian Mercer

A wireless sensor network using Moteino boards

The diminutive Arduino boards include a powerful transmitter/receiver

Ian Mercer
Ian Mercer

JSON Patch - a C# implementation

Ian Mercer
Ian Mercer

A Quantified House - My Talk to the Seattle Quantified Self Meetup

My talk to the Seattle Quantified Self meetup

Ian Mercer
Ian Mercer

Integrating an Android phone into my home automation system

Some new features for my home automation using an Android phone

Ian Mercer
Ian Mercer

Before there was the web there was BeebTel

Just thought I should mention that I built a web-like system before the web existed

Ian Mercer
Ian Mercer

My first programme [sic]

At the risk of looking seriously old, here's something found on a paper tape

Ian Mercer
Ian Mercer

The Internet of Dogs

Connecting our dog into the home automation

Ian Mercer
Ian Mercer

Closing down seokeywordsearch.com

Ian Mercer
Ian Mercer

The specified password is not correct

Ian Mercer
Ian Mercer

Smart home energy savings - update for 2010

Ian Mercer
Ian Mercer

SEO Myths

Ian Mercer
Ian Mercer

Why don't you trust your build system?

Ian Mercer
Ian Mercer

ASP.NET MVC SEO - Solution Part 1

Ian Mercer
Ian Mercer

Elliott 803 - An Early Computer

Ian Mercer
Ian Mercer

Building sitemap.xml for SEO ASP.NET MVC

Ian Mercer
Ian Mercer

Seo for beginners

Ian Mercer
Ian Mercer

SEO Keyword Tool in Action

Ian Mercer
Ian Mercer

Looking forward to the new year and our new datacenter

Historical note about moving my servers into a datacenter

Ian Mercer
Ian Mercer

Measuring website browser performance

Found this great resource on website performance

Ian Mercer
Ian Mercer

Second Drobo Update

At this point things were looking up for my Drobo

Ian Mercer
Ian Mercer

It's all about disk speed

Why disk speed is the most critical aspect for most modern PCs and servers

Ian Mercer
Ian Mercer

Comcast woes and a new monitoring utility

Monitoring a cable modem using its HTML management interface

Ian Mercer
Ian Mercer

Core duo desktop machine runs cool

Ian Mercer
Ian Mercer

New Home Automation Server

Ian Mercer
Ian Mercer