Well Known Vs. Rational Vector Delivery - Part 2

August 24, 2007 by Bill Thorp

Today I coded up a little toy application.  It limits the number of vertices displayed according to zoom level.  Very simple.  Its the pre-game to tiled vector datasets. 

andysnap_006.pngandysnap_005.pngandysnap_004.pngandysnap_003.pngandysnap_001.pngandysnap_002.png

Winners:  NetTopologySuite and SharpMap.  They let me write this in under 80 lines of code.

Losers: Me!  I skipped out on rock-climbing, a movie, and beer to write this.

Well Known Vs. Rational Vector Delivery

July 17, 2007 by Bill Thorp

I am in love with MapShaper. Go play with it right now. The “Simplification Level” slider on the bottom lets us get a great feel for what fast vector simplification algorithms can do. Watching Brazil drop from 33,000 vertices to a few hundred in n*log(n) time is simply sexy.

For those of us concerned with thin-client vector rendering speeds, its a short mental jump to realize the utility of simplified geometries. Simply put, non-optimized geometries make for slow or dead thin clients. Traditional web mapping has addressed this by not addressing it — simplifying vectors to the ultimate level of screen pixels. However, with rasterization we lose the ability to dynamically interact with vectors on the client — often a choice we’d prefer not to make.

Given a bird’s eye zoom-level, its perfectly rational to cull vertices when rendering vectors. However, rendering performance is only one part of the equation. MapShaper pushes ~10 bytes per vertex. With a 500×500 pixel map, 5 vector layers, and 2% on-screen data-density per layer, I can imagine a reasonably performant ~250kB vector map. The problem, then, are those 1mB+ vector datasets, where MapShaper must first push the entire dataset to the client before simplification. Practical use of bandwidth is an issue here.

One casual solution, then, is to have a server-side process simplify the geometry. Yes! Go ahead and load that 1mB+ dataset into your webserver’s RAM and crunch the hell out of it! Enh… no. Follow the pipeline up, and you’ll find your WKB/WKT database sitting there, trying to look innocent. This is a data storage question; and raster tile-caches can offer some guidance.

This is also a data storage question with well defined solutions. Vector quad-trees make huge sense, especially if coordinated with the levels-of-detail in an accompanying raster tile-cache. An end-to-end quadtree approach is intriguing not only for display, but because it intrinically optimizes various spatial operations. The downside to quad-trees, however, is a lack of robust existing solutions, and my lack of precision education on the subject. Those who can, do; those who can’t blog (troll?) about it.

Lacking the formal knowledge, I have devised a primitive test for my database theories. Instead of using a quad-tree approach, I’ll assign scale-levels via Douglas-Peucker line simplification. The following brain-dead data structure will be used to house a polygon dataset: { feature_id, point_order, x, y, scale_lvl }. Using plain SQL, I should be able to query into my dataset ala “SELECT … WHERE x between(…) and y between(…) and scale_lvl … ORDER BY point_order”. I’ll plan on using MonoGIS and NetTopologySuite to read in a shapefile and perform the line simplification. We can expect this match-off to last at least a couple of rounds.

Winners: MapShaper, if only for its good looks.

Losers: Your everyday spatial database. To my knowledge, they’re just not built to auto-magically return simplified geometries.

Python vs. IronPython : Round 2

June 19, 2007 by Bill Thorp

Yesterday I let IronPython battle it out with TileCache, which was written for CPython and CGI.  IronPython was stumbling — not seeming like a drop-in replacement for CPython yet — but I didn’t quite call the fight.  I wanted to port TileCache’s CGI library calls to ASP.NET calls and see what I came up with.

After trivially porting TileCache’s CGI code, I discovered a new issue.  IronPython’s lacks support for ”os.access.”.  TileCache requires “os.access”.  I’m not sure where this method normally lives.  Its not defined in the “os” module of the Standard Library, but it works from CPython.  AFAIK, its magic. 

 With a bit more research, and I stumbled upon FePy.  “The FePy project aims to provide enhancements and add-ons for IronPython, an implementation of Python programming language.”  They’ve already addressed my “hashlib” problem, providing a custom .NET module for it and for “socket.”  They’ve also identified and patched 3 issues in IronPython, and 6 issues in the Python Standard Library when used with IronPython. 

FePy’s patches should fix my issue.  They’ve added “os.access” directly into the IronPython DLL.  However, FePy lists their patched IronPython DLL version as 1.1, while ASP.NET Futures seems to use a Version 2.0.  The thought ”with a little more effort, I can make this work” was beginning to feel familiar.  I’ll wait till FePy moves to 2.0.

Winners:  the FePy developers, notably Seo Sanghyeon, for relentlessly discovering and fixing IronPython bugs before MS can

Losers:  IronPython, for stumbling twice on a mere 44k of code (+400k of relevant, standard libraries)

Python vs. IronPython : TileCache

June 18, 2007 by Bill Thorp

There is a maker’s serial number 9906947-XB71. Interesting. Not fish. Snake scale.”  Blade Runner has shown how tell a real snake from a replicant snake:  (Sys.Version).  Beyond the superficial, what happens when we take the animoid IronPython , and attempt to replace CPython in a real-world battlefield?  I chose MetaCarta’s TileCache.

For those who don’t know, IronPython is MS’s implementation of Python on top of .NET.  Its pre-release software in the context of running it from ASP.NET, and therefore against the pre-release license to benchmark.  I won’t be doing that; nor do I need to.  The purpose is to see if IronPython is a drop-in replacement for CPython.  To be fair, I know next to nothing about Python in either implementation.

It took me less than 30 minutes to set up TileCache using CPython and CGI on MS’s IIS.  The directions weren’t 100%, but anybody with a clue can fill in the blanks.  Hint:  IIS has no default CGI handler. 

IronPython right now is much trickier.  I put about two hours into it, and didn’t get TileCache running in that time.  I did finally get everything to compile, but it took some work.  The basic problem is that IronPython won’t import compiled C/C++ modules.  The IronPython developers are aware of this, and will supposedly replace these compiled modules with .NET equivalents as time permits. 

For now, IronPython doesn’t ship with standard library equivalents, and must reference CPython’s modules, many of which in turn reference compiled C/C++ modules.  TileCache references script modules that reference the “hashlib” compiled module, even though I’m pretty sure that TileCache doesn’t use these. 

After a long, uneducated process to track down where “hashlib” was being imported, it was easy to comment out.  After this, TileCache would run, but it didn’t do anything.  I was just getting back zero-byte responses, because TileCache’s CGI-specific code needs porting to the ASP.Net environment.  To a .NET developer, the nice part is that VisualStudio’s debugging tools made it simple to track down this problem. 

Winners:  CPython, because it already works just fine

Losers:  Impatient developers?  People who waste time evaluating beta products?

MapWrecker 2.0: Change vs. Progress

June 7, 2007 by Bill Thorp

In the spirit of new releases, I hereby rechristen this site MapWrecker 2.0

There will even be a new theme:  battles!  Every title will have a “vs” in it.  Every post will be a fight-tastic feature comparison of GIS thingamabobs.  MapWrecker 2.0 will offer streamlined workflows (I’ll declare winners and losers at the end) and increased cohesion for greater niche market capatialization (I don’t know what this means).

Winner: Change! 

Losers: Progress?

GeoCommons: James vs. Steve

June 6, 2007 by Bill Thorp

Steve’s Little World and Spatially Adjusted are having a tiff over the usefullness of the GeoCommons services.

Need I remind you that “destruction” is one of the sub-topics of this blog?  Like a moth to a flame!!  I must weigh-in!  I’ve recently decided I’m old-school GIS, not one of those NeoGeography leotard-wearing types.  But tasteless sarcasm takes no prisoners — let the quips begin!

 

First of all, let me remind everyone of Executive Order 12906.  It has this little line “each agency shall document all new geospatial data it collects or produces, either directly or indirectly, using the standard under development by the FGDC.”  I’m a true American; I love my country, and hate my government.  If they say use FGDC metadata, I say metadata is for sissies pansies.

Bush needs to make that vital decision to privatize the FGDC, and throw the no-bid-contract at Google or FortisOne (who made GeoCommons).  We need more entrepenuers like FortisOne “applying its deep expertise in geographic analysis and visualization to the needs of clients such as the Department of Homeland Security.”  After all, Homeland Security isn’t a bunch of sissies… Name, Tags, and Descriptiton (GeoCommons’s fields) are metadata enough for those tough guys.

And besides, doing things right and following the rules is more than dull, its hard!  ESRI can barely do it, and we all love ESRI more than God.  Hell, OGC wrote the interoperability standard for cataloging and searching metadata (and by implication, data).  [There ain't no money in standards.]  Yeah, and OGC wrote it.  That means it’s a painfully long, stupid document–not worth reading.  So not only is the metadata schema completely ignored, but the cataloging protocol is too!

These specs are ignored because nobody really cares about details like spatial projections when we all know that Google only lets us use one.  And Google is clearly right, because they’re really popular and have a ton of money.  They don’t play by the rules, they are the rulesI love them.  They’re getting the job done. 

And OGC?  Can’t we just privatize them too?

JSON vs. Flash for Distributed Mapping

June 5, 2007 by Bill Thorp

A few years ago I wrote a “Flash connector” for the ArcIMS HTML Viewer.  This “connector” took advantage of Flash’s security sandbox model, which allows cross-domain data connections under special, protected circumstances.  The point was to allow ArcIMS to get around JavaScript’s “same-domain policy” limitation — to allow distributed web-map calls without a proxy. 

Really the point was to put an ArcIMS HTML viewer on my HTML-only web-resume host, but I digress.

 In the browser, ArcIMS’s JavaScript and Flash communicated via Flash’s “External Interface” methods.  ArcIMS could post ArcXML to and from other domains.  Flash’s security model requires a “crossdomain.xml” be published on the remote-domain.  Jan Bliki’s excellent Flash + ArcIMS work at the EEA offered both the ArcIMS backend and “crossdomain.xml” to test all of this against.

The whole point, again, was to get around JavaScript’s ”same-domain policy” limitation.  Fast forward to 2007 where JSON is commonplace.  One accepted JSON paradigm is On-Demand Javascript – a special case where JSON-encoded data may be accessed across domain boundaries.  Essentially, an HTML <script> tag may be dynamically generated in the DOM, thus loading the JSON data.  There is some debate whether this is a feature, or a bug

For a moment, I considered rewriting my “Flash connector” as a “JSON connector”.  However, the lack of domain restrictions on DOM-based JSON requests does not apply to XmlHttpRequest JSON requests.  This means that JSON data may only be loaded via an HTTP “GET”; you can “POST” information to remote domains.  Hence, Javascript still cannot POST data to remote domains.

Overall?  If you’re not interested in proxying, 2-3 year old Flash technology is still better than JavaScript.  Why?  The security model.  Microsoft has been hinting at giving Silverlight a permissive cross-domain policy.  I suggest they do so.

Let me know in the comments if there is any interest in me posting a “Flash-connector” walk-through.

Best GIS Infrastructure?

June 1, 2007 by Bill Thorp

If you could take any existing GIS products, and not have to worry about personnel / training / cost / platform considerations, what you would be your ideal GIS infrastructure?

 I think was thinking I’d like Oracle Spatial + ArcInfo + GeoNetworks + Mapserver +  MetaCarta’s TileCache + Openlayers.

Its funny – I rarely/never use these components.  Perhaps I chose them because I don’t know their faults.

The Web Map Unit of Work

May 30, 2007 by Bill Thorp

Last week I got a little depressed. I realized my client/stakeholders:

A) Primarily cared about a single dataset

B) Wanted a web page with a map on it

This is easy — it’s mundane. It’s Google Maps’s wet dream — 90% of the users have been brainwashed into thinking the earth is 2, maybe 3 datasets in Platte Carre. It’s a J O B. Whatever features I add to it, putting a single use, single projection, single datatype online still just makes me bored.

GIS has compuational complexities that make it interesting! Take those away… use a single projection, use a single map service, assume no repeat workflows… what do you have? A web map? A web map and some query logic? Perhaps the unit of work for a dull job?

I got into GIS to get away from the monotony of database/forms work. All that work is the same: UI -> validation -> database. Now I’m having visions of a GIS world that had remarkable similarities. Tile Cache + UI -> Map. The reduction in computation complexity necessary to achieve Google Maps style performance has resulted in a similar reduction in complexity of user expectations. Simple and fast is winning the heart and minds of our clients.

And it just might make our jobs suck.

R(D)COM Server NSFW

May 15, 2007 by Bill Thorp

Having the R stats package available to your web/GIS applications via DCOM sounds pretty sweet.  It does statistics, graphs, and has quite a few GIS related add-ons.

A co-worker was working with a solution which calls R by invoking RTerm.  I stumbled across R(D)COM and thought a DCOM implementation might provide a more reusable solution.  I tried to get him to evaluate/benchmark R(D)COM, but eventually took up the torch myself.

Unfortunately, it seems to be crap. 

 It takes 1 full second to get an R instance.  The least-chatty cross-process IO is barely faster than disk IO.  Cross-process IO hits a scaling wall well before the OS runs out of RAM.  Sending large datasets via the DCOM proxy eats more CPU time than the actual processing within R.  The crowning fault is that R(D)COM fails to destroy one of it’s processes about 5% of the time.  Props to Greg Hill for having way better intuition than me.