Serializable base class added to XSDToClasses project

At the prompting (or complaint) of a reader, I’ve shared a base class that I use with my code generator.

The generated code is “pure” in that it doesn’t contain any methods that don’t have to do with the data — it doesn’t implement saving and loading methods. I could create a codemodifier for this, but it could just cause trouble. However, this leaves users writing the saving and loading code over and over, and we can’t have that.

An XSD can express inheritance. Generating save and load methods for every class would cause lots of warnings and duplicated code. So I’ve created a generic base class that contains a bunch of utility methods including the save and load methods. If you want to have these in a class, just inherit the partial class from this base class and you’re ready to go.

There is an example project that shows you how to use it. There’s also a nifty serializable color class. It’s not going to work in every circumstance (for example if  your class is already inheriting), but it should reduce the amount of code you need to write.

You can find the download at google code  here.

Comparing performance of GDAL and Proj.NET

There are two major free/open source projection libraries available for use under .NET: GDAL and Proj.NET.

GDAL has a long history, and is written in C++, so it’s compiled to machine code. The C# bindings are generated by a tool. Calling C++ from C# uses a process called “pinvoke” (Platform Invoke), and a certain amount of overhead is involved (between 10 and 30 instructions per call).

Proj.NET is written entirely in managed code, so there is almost no overhead per call. It’s a much younger project, but has an active development community.

We are interested in comparing the performance of these two projection libraries since my application needs to do spatial transforms, so I did the following experiment.

Preliminaries

  • I wrote a wrapper library in C# with appropriate interfaces that could call both Proj.NET, GDAL, and some of my own hand-rolled projection classes
  • I downloaded free topographic data for Canada from GeoGratis and GeoBase.
    • The data consists of 20 meter contours, roads, lakes, rivers, cities and other basic geographic features for all of Canada.
    • Each file has a variable number of points, lines and polygons of various sizes
  • The data is in the Geographic projection, or EPSG:4326. For display I projected it into EPSG:3857
  • I wrote a C# program to load and convert a subset of 166 files containing 6.8 million items and 211 million points
    • each file contained between a few hundred to 4.2 million items
      • mean number of points: 1.2 million per file
      • mean number of items: 41,000 per file
    • we did the timing on a per-file basis
      • time-per-item derived as an average on a per-file basis
    • we started the timer just before projection started, and stopped after it ended
      • file load times were not included
    • This gives us a time/item value for each file
      • mean time/file was 6.7 seconds
  • We use the System.Diagnostics.Stopwatch class to do the timing.
    • the resolution of the timer is in the ticks or milliseconds range,
    • projections took from 1000 to 6000 milliseconds

The data represents a real-world problem, that of projecting a file in one SRS (spatial reference system) to another. The number items in each file, the type of item, their spatial variability, and the number of points per item all are from a real-world map.

For the GDAL library, we loaded all the points for a single item into an array, and marshalled for pinvoke. When executing a platform invoke, it’s obviously better to have fewer calls with more marshalled data, but I exerted no special effort to optimize this. We called the projection function on a per-item basis, so if an item has one or 1 million points, it would be a single function call. The same is true of the Proj.NET library.

Results

This graph isn’t too surprising. GDAL has overall better performance, as a numerically intensive, but machine-coded library should. At first glance  it may seem that the difference in performance is greater with a larger number of points, but the above is an absolute difference.

This chart compares the ratio of the difference in time between Proj.NET and GDAL. When expressed this way you can see that GDAL’s performance is variable, but always better (positive). In fact, the mean performance increase is 20.9%

I suspect that the variability in performance for smaller number of points is attributable to the number of points per item. As I mentioned in the intro, the points are batched for projection per item. When charted there appears to be more variability for items with from 15 to 25 points. Above this number, the % difference in speed between GDAL and Proj.NET appear so be fairly stable. I am not too interested in the cause of the variability since it is all positive (faster) with some projections happening almost twice as fast as with Proj.NET

Conclusion

The performance of GDAL and Proj.NET was tested against a real world data set. We projected 166 files with 6 million items and a total of 211 million points. We compared the time to project points on a per-item basis. The comparison shows that GDAL is consistently about 20% faster than Proj.NET, with some projections being much faster for items that have between 15 to 25 points per item.

More info:

The entire data set and all charts are available here.

Installing an MVC Application in a subdirectory of WordPress

For several reasons I have WordPress installed in the root directory of my hosted web space. I’ve been experimenting with the new ASP.NET MVC framework, Entity Framework 4, SQL Server Compact Edition 4 and other new technologies, and I wanted to do a test deployment. I was quite surprised when, after deploying to a subdirectory of my web site I could not bring up the page; WordPress was attempting to handle the request even though the path was clearly to the subdirectory.

The culprit: WordPress’s  pretty permalinks feature. I’ve ranted in the past about how there is a fetish for URLs, and how so much effort, for better or for worse, is given toward making them more human readable. In the case of a WordPress install on an IIS7 Windows server, the permalink feature is achieved using the Url Rewrite Module, version 2.

The module is configured using the web.config file, and the configuration block for the Url Rewrite engine looks like this:

<system.webServer>
  <rewrite>
    <rules>
      <rule name="wordpress" patternSyntax="Wildcard">
        <match url="*"/>
        <conditions>
          <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true"/>
          <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true"/>
        </conditions>
        <action type="Rewrite" url="index.php"/>
      </rule>
    </rules>
  </rewrite>
</system.webServer>

The rewrite rule tells IIS7 to intercept all requests of a certain form and route them to index.php. This is not what I want, I would like requests that ask for a directory that actually exists to go to that directory. In fact, I don’t know how to do this, nor do I have the time to learn the complex syntax of the Url Rewrite engine. However, using the examples given in several places on the web I figured out how to tell the Rewrite engine to direct certain requests to the proper directories.

The solution: add the following rule before the WordPress rule (shown above)

<system.webServer>
  <rewrite>
    <rules>
      <rule name="new_directory" stopProcessing="true">
        <match url="new_directory" />
      </rule>
      <rule name="wordpress" patternSyntax="Wildcard">
        <match url="*"/>
        <conditions>
          <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true"/>
          <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true"/>
        </conditions>
        <action type="Rewrite" url="index.php"/>
      </rule>
    </rules>
  </rewrite>
</system.webServer>

This rule tells the rewrite engine to take requests for “new_directory” and ignore them completely – it stops processing the rules once this rule is matched, and allows the request to be handled by the code in the subdirectory. There were a few other tweaks that I had to do that were specific to my web host; it you’re interested, comment below and I’ll provide the details.

XsdToClasses Open Sourced

The XsdToClasses tools, which I’ve been using for most of my software development for many years, has been published as open source at Google Code. The license is the GPL V3; note that since the XsdToClasses tool is a code generator, the tool itself is GPL, but the code it generates is most definitely not; the output from the program can be under any license you like.

I use this tool so much, and it has saved me many hours of development time over many years. May it serve you well also.

QuadTree

Introduction

QuadTree is a spatial partitioning strategy used to make queries on relationships between 2D spatial data such as coordinates in a Geographic Information System (GIS), or the location of objects in a video game. For instance, you may need to know all the objects within a region on a map, test whether objects are visible by a camera, or optimize a collision detection algorithm.

The QuadTree is so named because it recursively partitions regions into four parts, with leaf nodes containing references to the spatial objects. Querying the QuadTree is a function of traversing the tree nodes that intersect the query area.

The OctTree is the analogous structure used for 3 dimensional problems.

For a masterful collection of demos and variations on the QuadTree and other spatial indexing methods see Frantisek Brabec and Hanan Samet’s site, or use the references at the end of this article.

Background

There are many spatial partitioning methods, each with the goal of providing an efficient way of determining the position of an item in a spatial domain. For example, a database query can be considered as a graphical problem. Consider a query on a database containing date of birth and income: a query against all people between 35 and 50 years of age and incomes between 30,000 and 60,000 per year . When plotted on a 2-dimensional chart, the points representing age and income are spatially distributed on the surface of the chart. In the same way, a map of all the restaurants in the city of Vancouver are spatially distributed. In the case of explicitly spatial data, the X and Y coordinates are Latitude and Longitude, but in all other ways the queries on the 2 dimensional data are same: they are 2 dimensional spatial queries.

Several spatial indexing methods are more efficient in time and space, and are easily generalizable to higher dimensions. However, the QuadTree is specialized to the 2D domain, and it is easy to implement.

Algorithm

The general strategy of the QuadTree is to build a tree structure that partitions a region recursively into four parts, or Quads. Each Quad can further partition itself as necessary. A pre-requisite is that you must know the bounds of the area to be encompassed; the basic algorithm does not lend itself to the addition or removal of areas under consideration without rebuilding the index.

Insertion

When an item is inserted into the tree, it is inserted into a Quad that encompasses the item’s position (or spatial index). Each Quad has a maximum capacity. When that capacity is exceeded, the Quad splits into four sub-quads that become child nodes of the parent Quad, and the items are redistributed into the new leaves of the QuadTree. Some variations set the maximum capacity to one, and subdivide until each leaf contains at most a single item (Adaptive QuadTree).

Querying

To query a QuadTree for items that are inside a particular rectangle, the tree is traversed. Each Quad is tested for intersection with the query area. In the diagrams below, the quads are blue and the query area is yellow.

Quads that do not intersect are not traversed, allowing large regions of the spatial index to be rejected rapidly.
Quads that are wholly contained by the query area have their sub-trees added to the result set without further spatial tests: this allows large regions to be covered without further expensive operations.
Quads that intersect are traversed, with each sub-quad tested for intersection recursively.

When a Quad is found with no sub-Quads, its contents are individually tested for intersection with the query rectangle

Other Operations

Other operations on the QuadTree could include

  1. Deletion: an object is removed from the QuadTree, empty quads are removed
  2. Merge: two QuadTrees are merged, indexes are rebuilt
  3. Nearest Neighbour: common to more advanced spatial indexes, a Query could ask for the nearest neighbours to a given object. A simple implementation would be to take the object’s bounding rectangle and inflate it by an amount based on the the neighbour proximity. Objects in the result set would be sorted by increasing distance.

These operations are not demonstrated in this code.

Variation

This implementation of the QuadTree has the following variations:

The QuadTree has been changed to index items with rectangular bounds rather than points. This allows it to be used with lines, and polygons.

  • On insertion, new quads are created until there are no Quads able contain an item’s rectangle. IE: the item is inserted into the smallest quad that will contain it.
  • There is no maximum number of items in a Quad, there is a minimum Quad size (necessary to avoid massive tree growth if an item happens to have a very small area).
  • Because the Quad an item is stored in is related to the size of the item, both leaf nodes and parent nodes store items.
  • The QuadTree’s performance will be severely impacted if there are many large items.
  • The Quadtree’s performance will be best when the size of most items are close to the minimum quad size.

After writing this code I find that this particular variation bears a striking resemblance to the “MX-CIF QuadTree“.

Note: there are other operations on QuadTrees such as deleting a node, or find the nearest neighbour. These are not supported in this implementation

The following two diagrams show the spatial relationship of the QuadTree with the tree structure. The coloured regions represent objects in the spatial domain. Those that are entirely within a quad are shown in the tree structure in their smallest enclosing quad. You can see that the green shape, since it intersects two of the highest level Quads and does not fit into either is placed in the root quad. The red and purple shapes are placed in child nodes at level one since they are the largest enclosing Quads. The blue shape is at level three along with the orange shape. The Yellow shape is at level four. This tree is adaptive in that it does not create quads until insertion is requested.

figure 1: spatial objects partitioned using quads
figure 2: tree structure partitioning from figure 1

Using the Code

The QuadTree class is a generic class. The generic parameter has a restriction that it must inherit from the IHasRect interface which defines a property Rectangle. Creating a QuadTree requires an area, the demo application uses the main form’s ClientRectangle.

QuadTree<Item> m_quadTree = new QuadTree<Item>(this.ClientRectangle);

Inserting items into the QuadTree is done on a left mouse click, Querying items in a QuadTree is done with a right mouse drag

private void MainForm_MouseUp(object sender, MouseEventArgs e)
{
    if (m_dragging && e.Button== MouseButtons.Right)
    {
        m_selectedItems = m_quadTree.Query(m_selectionRect);
        m_dragging = false;
    }
    else
    {
        Random rand = new Random(DateTime.Now.Millisecond);
        m_quadTree.Add(new Item(e.Location, rand.Next(25) + 4));
    }
    Invalidate();
}

Run the demo application, and left click anywhere in the client rectangle: an object is inserted at the click point with a random size. Right-click and drag: a selection rectangle is created. Release the mouse button: the QuadTree is queried with the selection rectangle. The QuadTree renderer draws the QuadTree nodes and the objects in the QuadTree in random colours. It also draws the selection region and highlights the selected nodes.

Performance:

There are two components of QuadTree performance: insertion and query. Insertion can be very expensive because it involves several intersection tests per item to be inserted. The number of tests depends on the size of the region (the root of the QuadTree) and on the minimum Quad size configured. These two numbers have to be tuned per application. Loading many items into the QuadTree (bulk load, or indexing) tends to be very CPU intensive. This overhead may not be acceptable; consider storing the QuadTree structure on disk (not covered in this article).

The QuadTree is designed to be faster at querying the spatial domain than iteration, but the performance of the index depends on the distribution of objects in the domain. If items are clustered together, the tree tends to have many items in one branch which defeats the strategy of being able to cull large regions, and reduce the number of intersection tests. The worst case performance happens when all objects are in one small cluster that is the same size as the smallest Quad; in this case the performance of the QuadTree will be slightly worse than just iterating through all objects.

If items are uniformly distributed across the spatial domain, performance is approximately O(n*log n)

Points of Interest

  • Generic implementation; allows you to used it with any class that implements IHasRect interface.
  • Colour used to draw the node is stored in a Hash Table; allows the colour of the Quad on screen to be constant over the life of the QuadTree
  • In the QuadTreeRenderer class, note the anonymous delegate used to draw the QuadTreeNodes; allows the QuadTree to be tested, and visualized, without adding specific code to the class to do so.

History

  • Initial version with regions and simple Insert and Query operations, demo application

References

  • H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, Reading, MA, 1990. ISBN 0-201-50255-0
  • H. Samet, Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS, Addison-Wesley, Reading, MA, 1990. ISBN 0-201-50300-0.
  • Mark de Berg, Marc van Kreveld, Mark Overmars, Otfried Schwarzkopf, Computational Geometry: Algorithms and Applications, 2nd Edition, Springer-Verlag 2000 ISBN: 3-540-65620-0
Follow

Get every new post delivered to your Inbox.