|
Investigating etiology through space/time epidemiology |
|
Multiple Sclerosis Geographics |


|
Questions or Comments? |
|
Your comments would be greatly appreciated. This site is designed to provide information for patients and researchers alike. Please send suggestions to:
mmb.research@gmail.com Phone: 555-555-5555 |
|
What is spatial statistics? |
|
The ESRI Guide to GIS Analysis lists four specific questions that spatial statistics can answer: 1. How are the features distributed? 2. What is the pattern created by the features? 3. Where are the clusters? 4. What are the relationships between sets of features or values?
Spatial Statistical Techniques Calculating the Center of a Distribution There are three types of centers of a distribution: the mean center, the median center, and the central feature. (Note: These techniques are best suited to point data. Geographic studies involving contiguous features (such as counties) generally are not appropriate for these calculations.) Mean Center The mean center is the point whose coordinates are the mean x-coordinate and mean y-coordinate for all the features in the study area. Median Center The point whose total distance from all the features is the lowest possible sum. ESRI software starts with the mean center and, through trial and error, then finds the median center. There is no single equation for deriving the median center. Central Feature The central feature is the feature whose total distance from all other features in the study area is the lowest possible sum. Whereas mean center and median center are points, the central feature is one of the geographic features.
Measuring the Compactness of a Distribution Standard Distance The standard distance is a measure of the difference between the average distance and the distance from a given feature to the mean center.
As can be seen in the formula above, the standard distance is closely related to the standard deviations of the x– and y-coordinates. This produces a single value for the distribution, and the plot of the standard distribution is a circle of radius SD and center at the mean center. Parenthetically, sometimes the plot of the standard distance produces what appears to be an ellipse or oblong shape; this is usually due to the projection. Standard Deviational Ellipse The standard deviational ellipse is derived from the standard distance calculation. The standard deviational ellipse is an ellipse whose x-axis is twice the standard deviation of the x values, extending one standard deviation in both directions along the x-axis from the mean center. The y-axis is calculated similarly.
One of the most valuable features of the standard deviational ellipse is its ability to measure the orientation of a distribution. Because standard distance is simply a circle, it provides so such directional information. The orientation of a distribution is useful in many applications, such as predicting the direction in which a disease is headed. By looking at the orientation of a disease distribution using a standard deviational ellipse, epidemiologists can try to predict which areas should prepare for a rise in incidence of that disease. The orientation of a standard deviational ellipse is that rotation from geographic north which minimizes the sum of the squares of the deviation of the features from the axes.
Determining Existence of Clusters in a Distribution Ripley’s K-function The K-function measures the extent of the clustering or dispersion within a set of features. The software creates a set of concentric circles around each point feature. The GIS then sums the distance between the target point and every neighboring point, then determines from this information the number of points lying within a set distance (or radius) from the target point.
The value of the K-statistic at a given distance d is calculated as follows:
where A is the study area, n is the number of features, and I is weight. I is equal to 1 if the neighboring point lies within the specific distance of the target point and is equal to 0 if the neighboring point lies outside that radius. The extent of clustering or dispersion can be calculated by plotting the K-statistic or a transformation of it, L(d), over distance d (Some statisticians label the x-axis as “r”). The line y = x represents a random distribution of points, meaning that, in a random distribution, the expected value for any distance is the distance d. If the plot of the observed L(d) values lies above the y = x random distribution line, clustering is occurring. A dip in the observed plot indicates dispersion.
Measuring Clusters of a Distribution Moran’s I The Moran’s I statistic identifies clusters of features in a distribution. The GIS calculates the difference between the value of the target feature and the mean of all the features, as well as the difference between each neighbor and the mean. Finally, the GIS compares these differences for each point in order to identify clustering of like values. Moran’s I is calculated as follows:
where n is the number of features, w is the weight, xi is the target feature, and xj is the neighbor. A value of I > 0 indicates clustering, I = 0 indicates a random dispersion, and I < 0 indicates dispersion. General G-statistic General G is another method of identifying clusters, though it provides a key piece of information that Moran’s I does not: whether the clusters are of high values or low values. These clusters are called hot spots and cold spots, respectively. The G-statistic is calculated as follows:
where w is the weight, xi is the target feature, and xj is the neighbor. |
|
A 3-D map of California’s Highway 175 © www. pashnit.com |
Spatial statistics is the branch of statistics which analyzes spatial data and looks for spatial relationships. |


|
Calculation of the Moran’s I statistic. © http://www.soi.city.ac.uk
|
|
Example of mapping and visualization software produced by ESRI. © www.esri.com |

|
Contiguous vs. point features. On the left, a map from the Florida Literacy and Reading Excellence Center. On the right, a map from the U.S. department of Transportation displaying point data in Massachusetts. |
|
Overlapping standard deviational ellipses of two distributions. The white dot and red line (added to the original image) represent the mean center and axis, respectively. The length of the axis is equal to two standard deviations. © Sherman et al. in International Journal of Health Geographics |


