GeoData
Models for Conservation & Ecology

-What
Are Data Models: ESRI is undertaking a new initiative
to provide its user base with essential data models that follow
a framework that is easily implemented by end users. These data
models will be developed for a number of industries, disciplines,
and categories of geographic phenomena, providing a template composed
of standard data classifications and feature data models.
-ESRI
GeoData Models: The geodatabase object-modeling capabilities
included in ArcInfo 8 provide an opportunity to develop and share
standard data models and templates in a variety of fields. These
standard models will promote the sharing and exchange of data and
designs for all software users, including ArcView.
-Modeling
Our World: (by ESRI Press) is the comprehensive guide
and reference to GIS data modeling in general, and to the geodatabase
model in particular.
-ArcGIS
Water Facilities Model: The first data model from the
ArcGIS Data Model initiative is called ArcFM Water, focused on the
water/wastewater industry.
ArcGIS
Biodiversity Model: A data model for the
conservation of biodiversity under the standards and practices initially
developed by The Nature Conservancy is under development in cooperation
with the
Association for Biodiversity Information (ABI), a global network
of Natural Heritage Programs and Databases.
ArcGIS
Forestry Data Model: the ESRI
Forestry Spatial Interest Group (FSIG), headed by Potlatch Corporation,
has published it's first white paper on data modelling standards.
ArcGIS
Hydrology Model: This effort is being led by Dr. David
Maidment of the University of Texas at Austin. Their first publication
is Hydrologic
and Hydraulic Modeling Support with Geographic Information Systems
(ESRI Press)
Data Models in
Conservation and GIS:
The ESRI Conservation Program is beginning work on a general conservation
data model which will describe very basic objects and relationships
for Conservation GIS in the widest sense, including: Taxonomy, Vegetation,
Physignomy, Plot data, Habitat Relations, Energy cycles, Matter
Cycles, Dispersal and Migration dynamics, Evolutionary Dynamics,
Autecology, Synecology, Biogeography, Source/Authority tracking
and bibliography, Conventions, Laws and Mandates, Organizational
memberships and demographics, GIS data libraries.
Partners and contributors from a wide variety of fields are being
solicited for input and suggestions, and the model/CD is set for
publication in 2001.
A skeleton overview of one of the antecedents to this type of wide-ranging
conservation model can be seen below, the integrated data modelling
tutorial developed by Charles Convis for the Conservation Data Manager
Project in 1995. Note in particular the example schema diagram at
the end, which contains many of the objects and relationships which
will appear in the new Conservation GeoData Model.
Conservation
Data Manager Design Concept Tutorial
(Charles Convis, ESRI Conservation
Program, 1995)
WITHOUT DESIGN
- As databases grow they become less useful
- Relationships between data are undefined
- Classification systems may not match
- Databases difficult to add on to
- Application software difficult to modify
- User requirements don't guide the process
- Data products very likely will not be used
- Software and applications will likely not be used
- Minimal and decreasing benefit to science
The requirements of conservation and science dictate
that as new knowledge is gained it should impact upon and enrich
many other areas of study. New data about plant ecology should enhance
our understanding of it's evolution, taxonomy, and threats. In the
world of databases, however, this promise has not been fulfilled.
In general, scientific databases have suffered from too much variation
in standards and structure to be readily combined with other data.
The typical scientific database is used within a very narrow realm
pertaining to it's specific content, or not at all. Attempts to
combine that data with data from other disciplines, or to fit it
into an integrated data collection or management scheme, often fail.
The size, number and variety of such databases continues to grow,
meaning that even in the midst of increasing amounts of data, less
and less of it is actually useful for conservation.
The variety of custom software tools for scientific data is also
increasing. Generally these tools are welded to a specific data
structure and designed as black boxes, with little opportunity for
local modification. As a result, they suffer from the same weaknesses
as databases in terms of only being useful within a very narrow
discipline by a very narrow user audience.
Design is a structured process of planning for both databases and
software which corrects most of these problems:
WITH DESIGN
- Databases part of a whole concept, as they grow they enrich
that concept
- Relationships between data are well-defined
- Classification systems relate in a defined and reproducible
manner
- Databases straightforward add on to
- Application software modification defined and possible even
at novice levels
- User requirements known and guide the process
- Data products very likely will be used
- Software and applications will likely be used
- Maximum and increasing benefit to science
The primary method used to ensure the success of a new GIS effort
is design. The design effort is a structured series of activities
carried out before and during system development, whose primary
goal is to help define the purpose of the new GIS effort, and how
it is likely to fit in with the existing flows of information and
communications. It results in many useful products: a flow chart
graphically showing how the movement of information and tasks through
the existing and proposed system can meet stated goals, an implementation
plan presenting the specific steps needed to build the new system
in an efficient and cost-effective manner.
In the design process, classification differences are explicity
addressed and managed. Standards in classification, scale and basemap
layers are chosen where appropriate so that all data components
have a common reference point allowing them to be linked together.
Existing databases and data sources are inventoried along with user
needs so that a clear picture of current information status can
be used as a starting point. Data tables are analyzed for shared
or common attributes, and normalized so that data sharing is possible
at all levels and future modifications can be done at minimal cost.
Software design is also normalized so that the different pieces
of any application can be developed and interchanged independently.
Data products and applications arising from a well-designed effort
can therefore fit readily into the existing user environments, so
that as new data is gathered and distributed it can provide immediate
utility and increasing enhancement to user science.
IF DESIGN IS SO GREAT WHY
DOESN'T EVERYONE DO IT?
- Relational Designs may be complex and challenging.
- They take a lot more time up front when people are least likely
to want to spend it.
- Cost savings and increased utility in the future are difficult
to sell today.
- Simple database and application tools suffice for undesigned
and stand-alone flat files.
- Powerful database and application tools are needed to support
all the requirements of a designed system.
- Without the tools to build databases and applications on those
designs then the effort is futile anyway.
The tools and methods of project design have been known since
they were first applied to aircraft manufacture during World War
II. They have been found to be especially powerful in software and
database design and are widely used in big-budget projects where
they help save millions in development and maintenance costs. The
reason they are not more widely used is mainly cost, both in dollars
and time. Most smaller projects are anxious to see software and
data products as soon as possible, and have little time patience
for a lengthy initial design effort. Most smaller projects therefore
produce software and databases which have limited to zero life outside
the scope or timeline of the project itself. Most small projects
have little awareness or concern for the "cost of ownership"
of a digital effort, as opposed to the "cost of acquisition".
The rule of thumb is that you will spend 100 to 1000 times the cost
of acquisition of a software/data product in terms of ongoing support,
modification and training just to keep it running over a few years.
Design efforts prior to acquistion will cut this figure by orders
of magnitude.
Another common problem is that database management tools on the
PC have been poor, limiting the kinds of applications that could
be developed. Many of them were based on flat-file managers and
therefore had very limited capability to handle the more complex
relationships typical of an integrated database design.
HOW DO WE GET
THERE?
- Traditional GIS design & development process:
Statement of goals
Data Inventory
User Needs Analysis
Conceptual Database design: define, expand, consolidate, review
Physical Database design
Conceptual Application design
Physical Application design
Training Design
Prototype cycle: research, standards, prototype
Implementation of Software, Database, Training
Data Automation
Distrubtion/publishing
Statement of goals
Defining the purpose of any new information management system must
begin with a careful examination of the purpose and parts of the
current system, whether manual or automated, looking at what works,
what doesn't, and what must change. The first part of this questionnaire
looks at your current program independent of any future GIS capacity
but using the same analytical approach that you will later use to
design your GIS program. A GIS is said to consist of 5 basic elements:
People, Data, Procedures to work on the data, Hardware, and Software.
People often consider only the last two, so for this exercise we
will ignore them. These first three elements can be examined in
more detail by breaking them down into the traditional who, what,
when, where and why of investigation:
Why are you here: program goals
Who do you serve: your audience
What do you provide: your products
Where does your data come from: data sources
How do you do it: your tasks.
Who helps you: your support
What constrains you: your limitations
We will look at each of these in detail to describe your current
system of information management and communication, then look at
them again to define what you expect from a new GIS program
.
Conceptual Database design: define, expand,
consolidate, review
The conceptual design itself will serve as a major guiding document
for all database development and software application. It is a living
document which will be reviewed continually as new data sources
arise, as staff skills expand, and as the GIS evolves. This is a
5-step process:
Define: Define features and entities at a conceptual level
Expand: List attributes and characteristics of each entity
Consolidate: Look for commonalties in the attributes or
classifications of the entities
Implement: Lay out the conceptual tables for each entity:
DATA DICTIONARY
Review: As GIS data and software development proceed
DEFINE
This is the step where we define the features and entities we will
be working with at a conceptual level. One way this can be done
is by conducting user interviews and listing for each user the main
types of features or entities that need to be handled. Another way
is to review standard reports, texts or maps looking for chapter
& section headings and legends. Features usually refers to things
with a spatial component, whereas entities can refer to anything.
Example: Define General Categories of things we would like to keep
track of:
Ecological data
Park Species Lists, Flora
Fauna
Existing Classification Systems
Field Plots
Data Sources
(maps, imagery, books)
Authors/Experts
Management data
Parks and offices
Contact persons
Logistical concerns
Progress of the mapping effort
Existing database themes
Quad sheets
photo missions
EXPAND
This is where we list the attributes and characteristics of each
feature & entity defined above. For each attribute, we also
want to ask if there is only one for each entity or if an entity
can have many. We also want to indicate which attributes are absolutely
required in order for the entity to exist, and we generally want
to indicate at least one attribute for this role. If there aren't
any completely unique attributes (like with transactions) we'll
just make up an internal sequential number.
Example: List the attributes of each entity/feature
Data Source:
- title, only one REQUIRED
- date, only one main date
- authors, can be many, some co-authors
- editor, usually only one
- storage location
Contact Person:
- name: REQUIRED
- address & contact information
- list of parks associated with
- list of skills
- list of publications
CONSOLIDATE
This is where we go through all of the lists and definitions and
look for commonalties. Where two entities list a similar attribute
we ask what the differences are, how important are they or what
the implications would be of making them the same. Where an entity
lists as an attribute that appears to be another entity we ask what
the differences are and if the attribute can actually be combined
with that entity. What we hope to do is winnow out a final condensed
list of entities and features and list for each only those attributes
common to every place they appear, then list separately the additional
attributes needed to define the characteristics of each relationship
but which are not attributes of either side by itself.
Example
Data Source:
- title, only one REQUIRED
- date, only one main date
- authors, can be many, some co-authors = PERSON
- editor, usually only one = PERSON
- storage location
PERSON (old Contact Person)
- name: REQUIRED
- address & contact information
- list of parks associated with = PARK (Relationship attribute
= role or title at that park)
- list of skills
- list of publications = DATA SOURCE (Relationship attribute
= authorship role in that source)
Data Dictionary: Sample Page
Entity: RED FLAG AREAS: Red-flag areas of special concern in or
around a park which will be considered in mapping the park and its
environs
File: Redarea.dbf indexes: rnum, rname
Item: form definition
+-----------------+-----+---------------------------------------
rareanum n4 red flag area database numeric ID, primary key
rname c30 name of the site or area
i.e. "Johnson farm", "Muir Wilderness"
parkcode c4 Park ID code where area occurs, foreign key
i.e. "YOSE"
rtype c10 type of red-flag area:
i.e. Inholding, Disturbance (fire, avalanche), Scientific study
area, Access-limited area, Vegetation Management area, Area of Interest,
etc.
rlocation c30 location description and directions
ie "Nelson Valley, Nelson Quad, 15m past route23 then 4 mi
north on trail"
quadnum n4 topo quad number
rnotes m additional notes on how to handle region.
i.e. "Vehicle access prohibited", "Foot access allowed
only with permission of landowner contact)
contactnum n4 contact ID code for info on the area, foreign key
These are individual areas of special concern within or near a
park, including the entire area of interest for mapping in the park
(this should correspond to the actual area mapped in each park (kvg####.pat)
but may not due to cost or logistics). These are any areas which
require special logistical considerations for vegetation study and
mapping, for ecological, political or whatever reasons. Each record
refers to one physical area within a park with a constant type of
condition. This may include several contiguous or closely-located
polygons as long as they can be treated as a unit for the purposes
of permitting, access, study or mapping.
Attributes include those which describe the type of management concern
and provide some background notes on how to handle each area.
RELATIONSHIPS
Parks
This is a 1-many relationship to the parks master file (see under
parks)
Red-flag areas coverage:
This is a 1-1 relationship to the GIS coverage (red####.pat) for
each park recording the actual polygons delineating each special
area.
IMPLEMENT
This is where we go through the final list of entities and attributes
and lay out conceptual tables for each entity, listing the core
attributes that will be kept for that entity and indicating which
ones are required and which ones serve to distinguish one record
from the next. Since we are relying upon the relational database
model, we can also create a table for every relationship defined,
which serves to link two entities together. It must therefore contain
the primary key attribute from each entity, plus any of the attributes
defined as describing that relationship. An advanced topic worth
noting at this point is the issue of scale. The entities defined
have an implicit scale, both in spatial and thematic terms, over
which they are valid. It is worth being aware of this and being
aware when you are defining relationships which are more or less
within the same scale (such as quads and rivers in space or river
processes and sediment load in theme) versus those which are across
scales (such as quads and global climate in space or sediment load
and continental drift in theme)
Example: Define feature/entity tables and relationship tables
Data Source Table
- title, only one REQUIRED
- date, only one main date
- storage location
PERSON (old Contact Person)
- name: REQUIRED
- address & contact information
- list of skills
PERSON-DATA RELATIONSHIP
- person name
- data source title
- authorship role in that source i.e. author, co-author, editor...
EXAMPLE
SCHEMA

|