GeoSpatial Software Development, Web Mapping, Geographic
Database Design, Spatial Data Compilation, Processing, &
Advanced Analysis - Specializing in Open Source Technology

Diverse GIS Market Space

+ Practically hourly arrival of new solutions

+ Exponential growth of internet-based services over traditional desktop applications

+ GPS-enabled mobile devices

+ Data In The Cloud

+ Convenient Javascript and other web APIs for mapping, geocoding, and navigation from major vendors (Google Maps, Bing, Esri, Here, Mapquest, TomTom)

+ Geospatial RDBMS

+ The advent of integrated open source GIS products under the umbrella of Open Source Geospatial Foundation (OSGeo)

+ Geospatial standards from Open Geospatial Consortium (OGC)

+ Vast repositories of public geodata

+ Crowd Sourcing

+ Hybrid open-source / proprietary solutions

+ Python, PHP, Perl, Ruby-on-Rails, C/C++, C#,...

Kartanica & Software Integration

Kartanica GeoSystems puts this diversity of geospatial products, formats, services, and capabilities to work for you, rather than having it become a bewildering array of obstacles to overcome. Among the many advantages of working within the geospatial open source ecosystem is the premium placed on the principle of interoperability. Reliance on open file and data formats. Adherence to OGC web service standards. Use of common software libraries. Kartanica exploits already existing technology as a first resort. Custom coding only when necessary.

Kartanica & Software Development

But there are more than two decades of GIS design and development behind the Kartanica logo. This includes innovative (and patented) work in the mapping of outside networked infrastructure for telecommunications and utilities. Understanding the raw geometric processing that is a critical component of all GIS - before it became safely embedded in today's plug-and-play architectures.

-place excessive detail here-

The Kartanica Blog .... Volume : dxdydz .... Issues : 0..n-1


Simplified Water Polygons

(See the (incomplete) article below about Line Simplification.) Trying to reduce the workload for the mapping engine at outer zooms. Main lesson below: "Don't show details that occur inside of one pixel." Extend that to water polygons ( in the Census TIGER dataset, these are 'areawater' ) by not only simplifying the polygons according to the no-details-inside-pixel guideline, but also, don't even attempt to draw anything that is actually smaller than one pixel. In fact, if creating alternate geometry columns at different levels of detail according to an intended map scale factor, don't even create the simplified alternate in the first place where it will never be shown at the scale associated to that column.

Nice trick in Mapserver that isn't well documented: At the LAYER level, SCALETOKEN..END where you give the token a NAME, by convention something like "%scale%" then assign a table of VALUES..END where each row is a SCALEDENOM (first must be "0" and they are ordered in increasingly large numbers (smaller scale)) followed by some string that can be substituted into the layer's DATA statement. So, where the CONNECTIONTYPE is "postgis" your "%substitution%" might be something like "25" or "250" and so on, and the DATA statement contains "....WHERE ..... AND size > %substitution% AND..."

The use of SCALETOKEN can condense the mapfile down from having to have created multiple LAYERs, each with its own DATA statement plus largely the same CLASSes in each layer. It also gives you the opportunity to ensure that as few rows as possible are returned from the query. This is a big performance win over simply gathering a large number of shapeobj then letting the CLASSes differentiate them into what is displayed or not.


Intuitive Approach to Line Simplication

I'm trying to come up with a common-sense notion of how to know that a digital representation of linear features - roads mostly - contains "Too Much Information": that the drawing engine is spending too much time drawing fine details that no one will really ever see in the displayed map.

One area of thought: before starting to consider the details of how to call ST_Simplify() in PostGIS, whether it can be called on the fly or not (probably not), focus on: "At what scale factor is the engine calculating details that are effectively inside of 1 pixel?"

Suppose your display resolution if about 100 pixels per inch. You have an image showing at 1:5,000,000, which is 1" = ~80 miles. Now lets make our math simpler by zooming out a bit to make 1" = 100 miles, or, each pixel is about a mile wide. First, it occurs to me that if a road will appear to be 1 inch long -- so 100 pixels long -- we're probably wasting time if there are more than 100 points in the line.


Shared Polygon Boundaries

In cases where I'm displaying polygons from a particular feature type, say, U S States, U S Counties, etc., if I'm drawing a border around the polygons -- and especially if there is some kind of 'dash-dot-dot' line style:

It would be nice to not draw the same boundary section more than once where it is shared between two features ('common border'),not only to save processing time, but also to make sure the line patterns do not draw out of phase with respect to two features sharing the same border.

The out-of-phase problem might be solved internally by the mapping service, or within the data model. Or both? For the mapping service to enforce phasing, it might simply (a) always draw in clockwise or counter-clockwise direction, (b) always restart the line pattern at 'nodes' rather than continuing them across node boundaries (but that might have a disruptive appearance). For the data model, make sure the shared portions of polygon boundaries align on node boundaries.

But can you eliminate the pattern discontinuities with a data model change? Suppose the model extends to 'shared boundary segments are stored once' -- if they are stored once they are more easily drawn once. Dissolve node boundaries within each shared segment.

To most completely dispose of discontinuities (maybe never totally?), rather than restart the line pattern at each surviving node boundary, we might need to record a line pattern state at each node, then continue that into adjoining segments? Might be a lot of trouble to go to for small payoff? Maybe just tolerate the line pattern discontinuities -- the simpler the line pattern, the less disruptive effects at node boundaries.

By the way, some interesting labeling opportunities become available when modeling shared boundaries as a draw-once feature?

Labeling a Lake

Further on 2014/08/20 reference to labeling a lake polygon to match the curvature of the underlying stream... Mapserver and others already fit labels to curved linear features. If the Lake polygon records a curved line for the underlying stream channel, then that line can be labeled with the lake's name.

Problem: you might want to label the widest part of the lake rather than simply labeling the middle of the curved stream channel.

Problem: the lake may inundate enough surrounding land as to erase all or most hints of the underlying stream channel. So the label attached to the channel will look strange.

Problem: might not always be easy to distinguish the 'primary' stream channel from among the multiple branches.

We really want to capture the effective curve of the lake itself and label that, at a location where the lake is widest (or prefer label location at widest point while willing to migrate up and down the larger curve as needed to avoid collision with other labels). Needed: some calculation based on the shape of the lake polygon, considering the size and spacing of the characters in the label. Reminds me of a raster resampling exercise, where the pixel size matches the label's character size?


Label Angle with Polygons

I created a simple function in PostGIS a few years ago to capture the 'predominant' angle of a polygon for labeling purposes. Most handy with labeling parcels, which are often rectangular but not aligned on 0 or 90 degrees. My angle calculation was simly the angle of the longest line in the polygon. Where parcels are involved, it would be nice to bisect the angles of the two longest lines in the polygon, assuming the two longest are opposing each other. Or if there is no particular longest segment, then label angle might default to zero degrees.

This angle calculation is handy in two ways: by aligning label with polygon, the label is more likely to fit within the polygon. And the label angle relationship reenforces the association to the polygon.

There are other polygon label schemes that would be helpful: align a name of a lake with its polygon, curving if necessary, and staying over water. Similar for irregular political boundaries, such as Florida: put 'Florida' down the angle of the peninsula, instead of at zero degrees over the centroid, which would drown most of the label in the eastern Gulf of Mexico.

Map Labels 'Seeking a Vacuum'?

Problem: in a map image containing labels for more-or-less point features, such as cities, Two large cities close to each other and surrounded by smaller cities or rural areas. In a labeling scheme such as Mapserver's, the two large cities carry identical labeling priority, but one of them will place first by the luck of the draw over its centroid. When the second one attempts to place over its centroid, the first label is already in the way, so the second one does not label.

In a manual labeling exercise, the cartographer is likely to detect the impending collision and push the two labels away from each other while retaining the visual tie between the label and its city (symbol or polygon). A couple of strategies come to mind for this: form an axis on which the two large cities rest, then push each city's label farther toward the edges of the axis. Another consideration is what other labels are expected in the area. Even if the smaller cities surrounding the two large ones may be required to yield their label position to their larger neighbor, it's nice to be able to label the smaller entities also. So the larger cities' labels seek 'open ground' where fewer competing labels are likely to appear (while still anchoring to their respective symbol or polygon)

There's the usual difficulty in trying to automate this decision process.

Mapserver takes a stab at some degree of conflict resolution with 'LABEL.POSITION auto' which, depending on the geometry type of the feature being labeled, tries a fixed selection of alternate placements for a feature when a previously placed label is in the way of the preferred position.

A key to implementing the more flexible strategy is to have the labels hold on claiming a permanent position until several best-fit solutions have been attempted.


More on Address-less or Unnamed TIGER Roads

I've been using the U S Census TIGER dataset as a foundation for geocoding for a few years. As a solitary source, TIGER is a mediocre resource in terms of quality and completeness. Among its major weaknesses:
+ the use of theoretical address ranges in place of true address ranges.
+ deliberate obfuscation of the high and low address numbers in those ranges to abide by federal privacy statutes.
+ They're not interesting in delivering mail; address info collection was solely to assist in Census data collection and tabulation. As a result, many zip codes are missing, or, from lack of continual coordination with USPS, the TIGER zip code information becomes outdated.
+ The road geometry is sometimes not very reflective of reality

+ And, the primary topic of this edition: many roads completely lack address range information, or roads are not even given names (and these also lack addressing).

It's a little hard to characterize the amplitude of these deficiencies because of TIGER's sloppy road classification. The scheme itself ( MTFCC, plus whatever you can conclude from road type prefix and suffix ) is relatively strong: MTFCC values S1100 for primary highway, S1200 for secondary highway, S1400 as the grab-bag of residential and rural, etc. S1400 begins as too broad a category, where minor residential roads are grouped with major thoroughfares and county highways and forest service roads and so forth -- but much of this can be resolved by examination of those type prefix and suffix ('Rd' 'Cir' 'Ave' 'FM' 'CR' etc.).

But then, if you highlight the roads with no names and/or addressing, and only do so for S1400, you'll see large numbers of roads that appear to be driveway, alleys, service roads, and so forth. These are given their own MTFCC values in the scheme, but often are dumped into S1400.

Why is this a problem? The exercise is "I want to see roads that probably have residential or other mailing addresses on them but are entered into TIGER without the addressing, or may even have been entered without names" We don't expect driveways and alleys to have addressing, but if they are classified in S1400, then it is difficult to filter them out from the highlights we are making.

One might have some luck filtering (especially the rural) unnamed driveways out of S1400 as:
+ single segment
+ no longer than some threshhold, say, 300 m
+ attached on FROM side to a significant road, such as a State Hwy, County Rd, etc.
+ attached to nothing on the TO end.

Then, unnamed S1400 that don't meet the above criteria are candidates for concern, namely, they may contain mailing addresses that TIGER neglected to collect.

Is it feasible to fill in missing information, such as names and addresses? For road segments with names but no addresses, if their FROM and TO ends are connected to same-named road segments with full addressing, then we might assume any gap between TO address on one end and FROM address on other end to be filled in with the address range getween those values. An unnamed segment might take on the name attached to FROM and TO segments of a single name.


Don't forget GeoNames

All the talk yesterday about SimpleGeo; I've also got the US data from GeoNames. They have world-wide point datasets for free, plus premium services. The GeoNames US data are derived out of the Federal GNIS dataset. Weighted more toward the sorts of things you might see referenced on USGS topo maps - jurisdictional things like states, counties, cities, and lots of land and water features. And schools and shopping centers and churches and...

Pretty easy to convert the tab-delimited downloaded file to PostGIS COPY. Bi-level classification: Feature Class, Feature Code. About 2 million US records, of which about half are Feature Class 'S' for buildings and other human-made structures.

No addresses or zip codes, but each rec marked with state postal abbreviation and county FIPS code.


SimpleGeo US Data

A couple notes on SimpleGeo, a points-of-interest dataset. It was actively maintained until about two years ago, but is a free, no-strings-attached set of more than 12 million points in the US, extracts as a GeoJSON, with a tri-level classification scheme ( feature type, category, subcategory ) and records may have one or more unstructured tags. Weighted toward commercial, but has public places also.

The records contain street address, city, state postal abbreviation, and zip code (mostly 5-digit, a tiny number containing the +4 extension). The presence of the addressing, including zip, makes for some interesting comparisons to other data sets, especially in the area of geocoding:

- there are about 21,000 post offices in SimpleGeo. This comes in handy as a possible augmentation of geocoding from the U S Census TIGER data, as TIGER tends to lack any mention of zip codes that are not found on physical delivery addresses. If any of these fully addressed SimpleGeo post offices are addressed at zip codes not found in TIGER, then geocoding to the actual post office becomes feasible. The US Postal Service Address Information System ('AIS') is constantly updated and contains EVERY zip code, but it does not independently geocode its address range records.

- The remaining millions of US SimpleGeo records are also addressed with zip codes, and many times they are government or other large complexes that have their own zip codes which may not be found in TIGER.

As time passes,some of SimpleGeo will become increasingly out of date, in a few different ways. First, zip codes are fairly volatile, in that USPS alters the delivery data for thousands of addresses each month. This might be such things as the carrier route or the plus-4, and SimpleGeo is not very concerned with either. But sometimes an address will migrate from one 5-digit zip to another. In fact, whole zip codes will be created and destroyed by USPS. So zip codes stored on SimpleGeo records may become stale for that reason.

While many of the SimpleGeo, especially the public-sector ones, will remain stable, many of the commercial records will become outdated from ordinary moves to new locations, closed businesses, purchased businesses, new development and so forth.


Nice Discovery Over at GDAL/OGR

Reading up on Esri FileGeoDatabase format: downloaded 'Stratmap' Transportation from TNRIS (Texas Natural Resources Information System) as fgdb. Current release OGR optionally builds a Read-Write fgdb with the inclusion of a lib from Esri. Turns out, that v 1.11 OGR has a Read-Only "OpenFGDB" driver, requiring no external library.

So if all you want to do is download .fgdb and immediately convert to PostGIS (well, that's what I did), then the READ-ONLY is fine.

Future Topic (one of these days...): Topological Faces & Variations on Generalizations...


Stay Off This Point

In Mapserver Road Labeling, I'd like to keep the 'shield' style labels (Interstates, US Hwys, et. al.) away from the points where differently named roads join each other ( fancy way to say 'intersections').

How to find these points: Assuming that road features are multi-point (in PostGIS they may be MULTILINE, so have to account for that too), and that a road (say, 'W Elm Ave') is often partitioned into multiple features for data bookkeeping purposes but the separate features will appear as one on a map (Census TIGER 'edges' being primary example). Paraphrase the query as "get the first point of any road feature where another road feature whose name is different uses that point." Might want to make 'whose name is different' a little 'fuzzy' to eliminates cases such as 'W Elm Ave' connects to 'E Elm Ave'. In cases of Census TIGER 'edges', can use columns tnidf, tnidt ('From Node' and 'To Node') for 'using same point' but it's not uncommon in TIGER for two tnidf or two tnidt to connect, although you would hope that edge A's To Node is always edge B's From Node.

How to make Labels stear clear of those intersection points: Create a single-character label there, PRIORITY 10, set a BUFFER of some number of pixels around the point to protect from other labels. Set LAYER OPACITY to zero (CHECK that Mapserver does not ignore such labels, might have to set OPACITY 1 in such a case).

Will the other labels you are trying to push away from these points be able to find locations elsewhere?Use POSITION AUTO, but concern would be that the placement attempts are limited, especially for LAYER TYPE 'line', which is what all the roads that are being chased away from intersections are..

Performance will drag from making too much use of this.

Interesting: I find that the TIGER 'roads' have erased MOST of the artificial partitioning from the 'edges' on which they are based. But they have retained partitions at county boundaries, and they do NOT (unfortunately) partition at intersections as I have described them above.

I'm refining my ideal rule to: "Keep road shield-based labels away from any place where any such road connects to at least one other such road of a different name." In other words, if two sections of US Hwy 50 connect (over a county boundary or whatever), feel free to place the '50' shield there anyway. If US Hwy 50 connects to a minor road (which is not shield-labeled), feel free to place the '50' shield there (although probably better to avoid that if not too expensive to protect it). Where US Hwy 50 connects to State Hwy 17, don't label either one at that point.



I stated in an earlier post that the Census TIGER roads, prisecroads, priroads don't have easy access to their multiple names, if any, and to any of their names' type and direction fields. In fact, these tables have column linearid which links directly to the featnames table, where all this info is kept.

You may remember I liked the fact that these *roads tables dissolve the artificial partitioning of a road, as found in the edges table, due to jurisdiction boundaries and address ranging. But I was disappointed that the structure of featnames was not easily available for simplifying the classification of the road for map style and scale range visibility.

Kind of related topic

This access to featnames structure and name aliasing: on the one hand, I may not always want to be stuck labeling some roads with their numeric names (examples 'Co Rd 33', 'US Hwy 183') when there is also a more familiar non-numeric ('James Rd', 'Research Blvd'). On the other hand, the mtfcc='S1400' class of roads: I've been complaining they represent too broad a range of road types for smaller-scale mapping. So having direct access from table roads and its single, unstructured column fullname to featnames and it structured, MULTIPLE names for the same feature gives us the opportunity to display the familiar name while taking the numeric alias name as clue the road deserves emphasis in linestyle and so forth.

Census actually has done some classifying in the *roads tables with a single-character column rttyp, 'U' for US, 'S' for State, 'C' for county, etc. Especially 'U' and 'S' and possibly 'C': if these classes are found among 'S1400' in dense urban areas, they might usually warrant bold styling. The county roads are not as clear-cut for this, as many minor roads carry county ownership. The rttyp value 'O' requires examination of pretyp or suftyp from featnames for clues as to major or minor status. The 'M' rttyp require some of the length-based and other techniques discussed previously to determine major or minor status. (When rttyp is null, there is no name. I'm willing to assign them blanket status as minor.)


Another Road Classification Clue?

Building on a previous post on how one might determine the major urban thoroughfares from among the weakly classified Census TIGER edge records, mtfcc='S1400', in order to give emphasis to the major roads ( heavier, darker lines; priority in labeling ).

S1400 is a grab-bag of numerous road types (as expressed through type prefix or suffix). We can use the length of the road (after dissolving any artificial partitioning over address range and change of jurisdiction), and this works well in some cities. But it seems in older cities, especially those built on fairly flat ground where the water features are minor and widely spaced, too many 'ordinary' roads (i.e., residential) are about as long as the big commercial roads. Then too many roads get bold styling and the 'false positives' bury the roads we really want standing out.

In some cases, the type prefix and suffix for 'S1400' are the same ones that dominame the Primary/Secondary mtfcc values 'S1100' and 'S1200': 'State Hwy', 'US Hwy', plus an assortment of other 'Hwy' or 'Rte' types that would tend to earn such roads more attention (I've also noticed that most of the common type prefixes, i.e., just having any type prefix, is a likely indicator that an 'S1400' is more significant than surrounding streets).

Meanwhile, the most common type suffixes are 'Rd', 'St', 'Ave', 'Dr', 'Ln', 'Ct', 'Blvd', 'Way', 'Cir', 'Pl', 'Trl', 'Pkwy'. I would expect that most 'Ct', 'Cir', 'Pl' are minor, while 'Pkwy' may belong to more significant roads. But what to make of 'Rd', 'St', 'Ave', 'Dr', 'Ln' and 'Blvd'?

Looking at some map images of troublesome cities of the U S Northern Plains, where the every-lengthy-street-deserves-bold-styling strategy fails ("lengthy" being greater than 3km +/-), it is not always obvious how to refine the major roads scheme. Some possibilities, if not too messy to calculate, give emphasis to:
+ the 'longEST' roads in the current view,
+ 'curvy' roads,
+ divided roads,
+ roads that cross: expressways, waterways, jurisdictional lines, ...
+ roads at which 'many' other roads dead-end or have T-intersections,
+ roads that border on landmark polygons, or appear to be the anchor for point landmarks.

Perhaps a scoring mechanism, with weights given to each of these criteria? Especially for 'fuzzy' qualities ('most', 'fewest', 'more', 'less', 'many', 'sufficient number of',...)

(...or go find a dataset where someone has already awarded 'major thoroughfare' status to some roads that are lost in the TIGER 'S1400' soup...)


Cooperative vs. Competitive Labeling

In the never-ending quest to make automated mapping mimmick the decisions in manual cartography...

The labeling in Mapserver ( from the Label Cache / in the absence of "FORCE true" ) seems to be largely a competitive model: LAYER A allowed to label its features before LAYER B gets its turn...PRIORITY 10 labels before PRIORITY 9 labels...sort your DATA most to least important and MAXFEATURES limits how many features to process from the query results...POSITION AUTO: well, I'm given several tries at picking a label point, but I try them all, and there's someone already there. Too bad... In other words, the first chance you get to label yourself, grab a location and keep it.

Picture the labeling of urban streets, scales at or below 1:40000 +/- about where it becomes feasible to label individual streets with text you can actually read. In the competitive scheme summarized above, it works pretty well to one way or the other give priority to the 'important' streets (longer ones, and/or those identified as major thoroughfares), then if there's any space left, go ahead and annotate the residential streets.

But imagine a scenario where the longer street says, "I have more opportunities to label myself, so I'll let these shorter connecting streets label themselves first." Or any road might choose a label location that is farthest away from intersections with others. For example, a dead-end street might label toward the unconnected end (watch out for MINFEATURESIZE AUTO?)

If such a cooperation scheme becomes too iterative, then performance becomes an issue. The 'away from' spatial analysis can become costly. Perhaps there are rating values, such as "how much space around me," that can be pre-calculated and stored per-feature, then integrated into a cooperative placement scheme to speed things up.

This mechanism need not be limited to features of the same type (such as only roads).


Much Overlap between SXSW Interactive Conference and the Open Source GeoSpatial Universe?

Being more or less local to Austin, each year with the South By Southwest pandamonium coming on, I debate whether to shell out the small fortune to attend the Interactive Conference. I haven't yet taken the plunge, but was able to visit the Trade Show exhibits for the last afternoon using the Guest Pass (ATTN: SX Powers: please make Trade Show more available at low- or no-cost in the future!), the assumption being that the exhibitors are a rough approximation of the conference subject matter.

More later...

Mapserver/OpenLayers Performance Question

Something I haven't done time trials on before: Say I have a dozen or so of the familiar basemap features in PostGIS (roads, hydro, cities, counties, landmarks, etc.) and I define a Mapserver WMS LAYER for each.

Scenario 1: I define a 1:1 LAYER association between Maperver and OpenLayers, and use the OL LayerSwitcher to turn them on and off. They're all turned on and map image refreshes.

Scenario 2: I define just one OL base layer. I build my own layer activation with an array of check boxes. I check all of them and refresh the map image.

I believe in Scenario 1, OL makes 12 separate WMS GetMap requests to Mapserver. Each invocation of Mapserver has only 1/12 of the workload, but incurs the start up and database connect plus the process switching with each. In Scenario 2, checking the activation boxes on and off causes the param['LAYERS'] = 'roads,hydro,...' to change before the single WMS GetMap for the base layer is made. So one invocation of Mapserver with, when all layers are turned on, the full workload.

Is there much performance gain with Scenario 2?


Modeling Rules for Hwy Labels

So, here's what I want: I'm drawing maps where cities and towns and the highways between them are shown (at the moment I have a Mapserver scene of West Central TX, scale 1:652,000, it is more or less the situation I'm thinking about). Mostly the roads shown are US and State Highways, plus a section of Interstate 20. I'll be using 'shields' containing the highway number, and generally want them spaced between towns, and not placed on the towns or on road intersections.

As I mentioned in the previous post, the Census TIGER data set partitions a road into many different segments. The edges breaks something like 'I- 20' into probably thousands of separate records (recording distinct address ranges, and breaking at jurisdictional and Census-generated boundaries). The priroads version of 'I- 20' will have far fewer separate records.

Meanwhile, Mapserver, if instructed to draw a layer from the edges records, will create a separate shapeobj from each record, and will wish to label each and every one of them unless constrained by MAXDISTANCE (pixel spacing between identical labels), or when incurring collision with other labels. When abiding by MAXDISTANCE, Mapserver may place a label at the permitted interval in such a way that it is an awkward placement. Maybe too close to a town, or right on top of a road intersection.

If the highway records are modeled (for example, in PostGIS) as starting and stopping at "cities and towns" then placing the shields between the towns will be much more likely. (I could go on at length about how "starting and stopping at towns" could actually become fairly complex.)

You might not want to robotically label each highway between each and every town. For one thing, you may be displaying at a larger scale where many towns are not even shown. Or there is the case where US Hwy 84 enters a town (that is shown at current scale) from the west, continues on to the east. You could label "84" on each side of town, but if no other U S Hwy passes through that town (or any road shown in the same linestyle), then it should be clear to the viewer that the highway on each side is the same one.

If US Hwys 84 and 48 intersect each other at that town, might the angles of the roads make their identities clear? How would you quantify that rule? "Highway 84 enters and leaves the town at 'essentially' the same angle" (How does one quantify "essentially"?)

What if, in between towns, two highways shown in the same style cross each other? From the angles involved, it might be necessary to label both highways on both sides of the intersection.

It might be advantageous to calculate some of this in advance and store in the database. Perhaps a point feature bearing the name and style-generating properties of the road, plus the criteria over which this will be displayed: "Label U S Hwy 84 at this location at scale range 500,000 - 2,000,000." Or is it more complicated than that? "Yes, that scale range, but these additional conditions...." This could become very unwieldy, requiring careful coordination between the layer behaviors as expressed in the mapfile and the rule representations as modeled in the point feature rules.


TIGER Roads Model Often at Odds with Map Styling

The TIGER dataset from the U S Census Bureau is a widely used and freely available source for USA basemaps and as a foundation for displays of demographic data and as a resource for geocoding U S addresses. It was used as a bulk data load for U S streets in OpenStreetMap (and the two static images to the lower left on this page were drawn from TIGER - with a little help from PostGIS, Mapserver, OpenLayers..)

TIGER's accuracy and completeness are historically spotty, especially in rural areas (this has been improving over each release apparently). It's primary purpose - that of aiding Census collection and tabulation - is often at cross-purposes with the goal of speedy and high-quality mapping.

This occurs around the roads data: of the data objects called edges, those where column mtfcc is of the pattern 'S1nnn' are the roads, but they are partitioned at jurisdictional boundaries and for storing address ranges (this is great for geocoding).

One instance of this would be Interstate Highway 10: it it comprised of 16,447 individual records in edges. Census has processed the road edges into additional sets that are more friendly to mapping ( priroads, prisecroads, roads ), by coalescing the many edges that make one 'road' into fewer, longer records. Number of records 'I- 10' in roads: 186.

Unfortunately, these merged records lose their direct connection between one road and often more than one name for that same road. This is the relationship between TIGER edges and featnames, joined over the unique tlid for each edge. Featnames also breaks the full street name into fields: the base name plus prefix/suffix values for type, direction, and 'qualifier' ('Old' 'Alt' 'Bus' et. al.)

These prefixes and suffixes also carry numeric values, one value assigned each (example: 'Ave' = '125'). These are very handy in classifying roads for map styles, as priroads, et. al., only store the fullname, not the individual fields. And the classification scheme there is not as refined.

There are interesting alternate groupings of road edges that might facilitate quality mapping. For example, within a particular jurisdiction, 'E Elm St' and 'W Elm St' might best be seen as one road for mapping (suppress the directional prefix in labeling).

TIGER road classifications do not automatically identify the 'major thoroughrares' in urban areas, where the mtfcc value is 'S1400'. The type suffix isn't a strong clue, except perhaps for the negative ('Cove' 'Cir' 'Ct' are most likely not thoroughfares ). The effective lenght of a road is often the best indicator, altho an imperfect one: in some jurisdictions, many purely residential streets run parallel to and are every bit as long as the wide commercial road nearby. We'd like to give graphical significance to the commercial strip on the map ( alternate, bolder colors, larger labels, labels shown at larger scales, etc.) Yet, if we blindly award emphasis to all S1400 of a certain minimum length, then we find we are highlighting many residential streets.

Other road data sets specifically store classification values intended to identify the urban thoroughfares.