Loading census data into SQL Server’s Geography Spatial Type
should have been very straightforward. Simply
download the shapefiles from the census ftp site in batch mode, unzip, then
import to a SDE feature class with the geometry storage type set as “GEOGRAPHY”.
Unfortunately, I was not that lucky. During the import process with ArcCatalog,
the 8+ million blocks failed after loading around 2.5 million records with a variation
of the following error:
Msg 6522, Level 16,
State 1, Line 1
A .NET Framework
error occurred during execution of user-defined routine or aggregate
"geography":
System.ArgumentException:
24200: The specified input does not represent a valid geography instance.
System.ArgumentException:
at Microsoft.SqlServer.Types.SqlGeography.ConstructGeographyFromUserInput(GeoData
g, Int32 srid)
at
Microsoft.SqlServer.Types.SqlGeography.GeographyFromText(OpenGisType type,
SqlChars taggedText, Int32 srid)
.
Translation: There was some feature in the data that had a
valid shape according to esri, but not Microsoft.
There are some differences between how esri and Microsoft
want the polygon shapes to be drawn, specifically with the order in which the
vertices are drawn (clockwise vs counterclockwise) for the exterior (outer
boundary) ring(s) and interior (donut hole) ring(s). The first attempt was to create a button
in ArcMap which read the geometry from the SDEBINARY geodatabase, created the Well
Known Text and then geography for each feature (accounting for the correct draw
orientation), and then pushed the results into a SQL Server table using ADO.net.
The geometry creation code looks something like this:
Dim geo As New SqlGeometry()
geo =
Microsoft.SqlServer.Types.SqlGeometry.STGeomFromText(New
SqlChars(New SqlString(myshapecalc)),
4326).MakeValid
mynewrow("Shape") = geo.MakeValid
I was fairly happy
with my first results, until I remembered I wasn’t dealing with multipart
geometry, empty geometry, multiple interior rings, and other geo quirks. After some quick modifications, the code was
done and the results were populated, however I still hadnt solved the problem
of getting my data into GEOGRAPHY. When
I modified the code to pump GEOGRAPHY not GEOMETRY, I had the same
failure. Did a great job of solving the
wrong problem – at least now I have a esri to SQL Geometry table loader.
I was definitely
surprised to realize that you can in fact push unprojected data into the
geometry data type, using the SRID from a geographic coordinate system. I am not really sure why this is not an
acceptable approach, other than geography is typically for Lat / Long
coordinates, and geometry is for feet / meters units. There are definitely stronger limitations on
what can be pushed into each – geography being much more stringent. One we realized this, I pushed all the
polygons (WGS 84 projection) into a geometry dataset using ArcCatalog, then
tried to convert the data to geography from there.
One recommendation we
found was to use the geometry STUnion and MakeValid functions to try to fix the
shape’s validity from a geography perspective, which can be seen here:
geography::STGeomFromText(shape.MakeValid().STUnion(shape.STStartPoint()).STAsText(),4326)
This is taking the source
polygon, running the MakeValid() operation, then unioning the polygon with it’s
first vertex, and finally converting the result to geography. Doing so re-orders the vertices so that they
are drawn in the correct order.
Using this query, I
was able to start pumping Geography records, unfortunately we still found that
the conversion failed.
At this point, I
needed to isolate the record or records that were causing the process to fail,
so I created this query:
declare @objectid int
Declare rowcursor CURSOR FAST_FORWARD FOR
select objectid
from dbo.CENSUS_2010_TABBLOCK_GEOMETRY_IMPORT2
order by objectid
OPEN rowcursor
FETCH NEXT FROM rowcursor
INTO @objectid
WHILE @@FETCH_STATUS = 0
BEGIN
declare @sqlinsert
varchar(max)
set @sqlinsert
= '
insert into
dbo.CENSUS_2010_TABBLOCK_GEOGRAPHY
select objectid,
blkidfp00,
geography::STGeomFromText(shape.MakeValid().STUnion(shape.STStartPoint()).STAsText(),4326)
from
dbo.CENSUS_2010_TABBLOCK_GEOMETRY_IMPORT2
where objectid = ' + convert(varchar(255),@objectid)
begin try
exec(@sqlinsert)
end try
begin catch
insert into
geography_failures
select @objectid
end catch
FETCH NEXT FROM rowcursor
INTO
@objectid
END
CLOSE rowcursor
DEALLOCATE rowcursor
Here, I am taking all
input records, iteration thru each, attempting to push the input row into a
geography table, and where that doesn’t work, dumping the result into an error
table. This was very slow to run (1 day
21 hours), but it ended up showing me the feature that was causing the problem:
POLYGON ((-73.539727170912329
41.134362625800748, -73.534610028565623 41.142645381690613, -73.538288255665464
41.136691869856634, -73.539808109894977 41.1357116088447, -73.540023947182021
41.134416585122494, -73.53999696752112 41.134416585122494, -73.539727170912329
41.134362625800748))
On a map, a pretty standard polygon with a pretty
ugly and unneeded set of vertices shooting off in one direction.
Unfortunately, there is not really a clean way
of handling this shape, as far as I can determine, other than to manipulate the
shape coordinates, and manually load this single feature into the table.
In the end, we have learned there are some
issues with the census 2010 geometry, including gaps, overlaps, and funky
topological discrepancies. We also
learned that it can be a major hassle with significant roadblocks to try to
ditch esri formats when performing spatial operations inside the SQL database. Finally, we re-learned during QA that esri
handles cluster tolerance and spatial resolution, whereas SQL does not. In this instance, esri was giving bad results
along some polygon borders, whereas SQL was giving correct polygon association
every time.
But, this effort is definitely worth it. With proper spatial indexing and other logic,
I was able to assign census blocks to 128 million input records in about 6
hours, which is way faster than any other esri options. And we have some very handy reusable code for
the future.