EXERCISE 11 – In Too Deep: Spatial Statistics
Environmental Resources 372:362
Intermediate Environmental Geomatics
I. Geostatistical Analyst
We've played around a bit with interpolations before using Spatial Analyst. There's another extension, Geostatistcal Analyst, that opens up vistas in the world of interpolation that you might not have imagined existed. If you've taken Ed Green's class, you might know enough to really get into trouble with the interpolations options available. Today we're just going to do a casual exploration of them, so that you can see how different they can be. Don't worry if you don't understand all of the options available, neither do we (and neither do the majority of even advanced ArcGIS users).
Activate the Geostatistical Analyst extension and its toolbar. The toolbar doesn't seem to have much to offer, even when you drop down the menu, but that's misleading. Under Explore Data are a number of very interesting interactive data exploration tools that can tell you quite a bit about your data if you are statistically savvy. The Geospatial Wizard allows us access to a slew of interpolation methods, each of which has many different options.
Load climate and climun from the \intgeo\ClassWork\spatial directory. Under Geospatial Analyst | Explore Data, choose Histogram. This option provides us with a histogram of our data (be sure you're looking at one of the temp or precip fields), along with summary statistics that provide information about the distribution of our data. This information is very important for choosing appropriate statistical methods or pre-analysis data transformations. The other exploratory tools provide information about spatial autocorrelation, among other things.
Now open the Geostatistical Wizard. The methods you see are different interpolation methods. With climate as your input data and TMAX as your attribute, choose Inverse Distance Weighting. After clicking next, you'll see the IDW interactive interface. The map window shows what points will be used for the interpolation for any given location on the map. They are color coded by the amount of influence they have. The Sector type options allows you to have multiple sectors (each with the minimum number of neighbors you specify). Note that you can preview both the points used and the interpolated surface. Play with the options to see how they impact both.
Once you're done playing around, click Next. The next screen shows the error associated with your interpolation method and parameters, calculated by comparing the value of the interpolated surface (the predicted value) to the actual value (the measured value) at each point. Click Finish, then OK. By default the results are added to your display. Note that the output is rectangular—Geostatistical Analyst only allows you to set a rectangular extent, so we’ll have to deal with the block rather than a nice NJ shape. The other option is to remove the areas outside the NJ outline via masking, but we won’t go into this in lab today. If you’d like to give it a try, see the directions (Appendix 1: Masking Rasters from Geostatistical Analyst) at the end of the lab exercise.
Note also that the output is not a "real" data layer, it's Geostatistical Analyst Output Layer. In order to save it as raster, you'll need to go to Properties | Data | Export to Raster just like you did for the temporary rasters you created with Spatial Analyst a few weeks ago. You will need to do this if you want to further manipulate the output, in the Raster Calculator, for instance.
II. Spatial Statistics: Measuring Pattern
Now we’re going to switch gears and explore some of the spatial statistics that ArcGIS has to offer. Again, we’re just skimming the surface of what’s available. You won’t be an expert by the end of the exercise, but hopefully you will have an idea of some of the possibilities.
You’ve probably seen the map created by John Snow in 1854 that shows cholera cases in London following an epidemic. After creating the map, Snow drew some conclusions about the pattern of deaths in the city. Today, we’ll be using his data to try out some spatial statistics. (How do you think we got the hand drawn map into GIS format???)
Open ArcMap and add deaths and buildings to your map from the SnowCholera geodatabase located at …\intgeo\ClassWork\spatial_stats. The point file represents the locations of cholera deaths mapped by Snow and the polygon file is the digitized city blocks shown on his map. When you open the point file’s attribute table, you’ll notice a field called Num_Cases, which is the number of deaths that occurred at each point.
Open ArcToolbox. Today we’ll be using Spatial Statistics Tools. We want to start out by analyzing patterns to determine if the deaths are spatially autocorrelated. If we determine that there is autocorrelation, then we’ll move on to more graphical methods by mapping clusters.
Starting out with a statistical measure may sound intimidating, but many of the statistics are very easy to interpret (the devil is in determining whether or not they’re appropriate to your data in the first place…fortunately, you don’t have to worry much about statistical assumptions today, but keep in mind that there’s more to this than meets the eye!). The statistic we’ll be using today is Moran’s I, which tests whether the data are clustered, dispersed, or random. The null hypothesis is that the features are randomly distributed and not spatially autocorrelated. Note that Moran’s I is one of several statistics for testing spatial autocorrelation.
Under Analyzing Patterns in the Spatial Statistics toolbox, open Spatial Autocorrelation (Moran’s I). We want to analyze Num_Cases in deaths. Check the box to display outputs graphically. Choose inverse distance and Euclidean distance as the conceptualization of spatial relationships and distance method. Click OK to run the tool.
You should get a pop-up window showing the Moran’s I index and Z score. Basically, when the Z score indicates statistical significance (i.e., greater than ±1.96 standard deviations with a 95% confidence interval), a positive Moran’s I index value means the data are clustered and a negative index value indicates dispersion.
You should have an index value of 0.03 and a Z Score of 2.5 standard deviations. Even without the fancy graphic that interprets the numbers for you, you should realize that this means the data are positively autocorrelated (i.e., clustered).
Hint: If you got lost at the mention of indices and Z scores, search for “What is a Z score? What is a p-value?” in ArcGIS help for a very straightforward description of how statistical tests work.
B. Hotspot Analysis
Now that we know for sure that the data are clustered, let’s look for ‘hot’ and ‘cool’ spots in the data. Hotspot analysis compares each feature to its neighboring features. Hot spots occur when a feature with a high value is surrounded by other features with high values, and vice versa for cold spots. In other words, we want to look for clusters of features with high values and clusters of features with low values. This can be a very useful way to visually assess patterns in the data.
One way to access the hotspot analysis tool is under Mapping Clusters. However, there’s another option that creates a nice layer file of the output so you don’t have to fiddle with the symbology, which can have an enormous effect on whether there appear to be hotspots or not. Both tools give the exact same results. So we’ll use Hot Spot Analysis with Rendering under Spatial Statistics Tools | Rendering to make sure our symbology is correct.
We want to look at Num_Cases in deaths again. Save the file to your folder. Notice that you’re saving two files here, a layer file to store the rendering/symbology information (.lyr) and the output feature class (.shp) to contain the actual results. Click OK and add the results to your map when the tool is finished.
Both the layer file and feature class file should show up in your map. At this point, they show the same thing so turn off the feature class file. What do you notice about the pattern?
The Z score should look familiar. Rather than calculating a Z score for the entire dataset, hotspot analysis calculated is for each individual point by comparing it to its neighbors. The larger the Z score, the more intense the spatial clustering. High positive values (>1.96) indicate hot spots, while low negative values (<-1.96) indicate cold spots.
C. Centrographic Statistics
The hotspot analysis may look pretty conclusive, but let’s look at a couple more measures of geographic distribution. In addition to looking at clustering and spatial autocorrelation, we can also calculate measures of central tendency. Today, we’ll look at the mean center and the spatial standard deviation of the data to further explore the spatial patterns.
Open the mean center tool in Measuring Geographic Distributions in Toolbox. This tool will find the geographic center of the deaths. Notice that we can weight the features so the mean center is more a measure of concentration than a measure of purely geographic distribution. In this case, we want to use the number of deaths at each point (Num_Cases) as the weight. Save the output to your folder and click OK.
Check out the results on your map—looks like the mean center occurs right in the middle of a major hotspot. What the mean center doesn’t tell us is whether the data is concentrated or dispersed or whether it has a directional trend. For this, we need to look at the standard deviational ellipse, which
Open the Directional Distribution (Standard Deviational Ellipse) tool. Use deaths as the input file and save the results to your folder. We want our ellipse to show one standard deviation (the default). Lastly, set Num_Cases as the weight field as you did for calculating mean center. Click OK and add the results to your map.
Now we can see that the deaths are distributed in a slight east-west direction. Change the symbology to have a clear fill to get a better idea of how the ellipse matches the points.
D. Pattern vs. Process
All of these measures have shown a concentration of deaths in the center of the city—but why? One of the caveats of spatial statistics is that you may not be able to deduce the cause of the patterns you observe without more information. We can easily learn a lot about the pattern, but we may not be able to understand the process that created the pattern.
In this case, you’ll want to add pumps to your map (find it in …\intgeo\ClassWork\spatial_stats\SnowCholera.gdb). According to your analysis, do you agree with John Snow that the Broad Street pump is responsible for most of the deaths?
Assignment:
2. Spatial statistics aren’t just for point data. Check out the 2000 census tract data for New York and Bronx Counties (\\ad-rsc\data\teach\intgeo\ClassWork\spatial_stats\NYC_Educ_tr). Use the Excel spreadsheet, CI Education.xls, to decipher the fields. (Note: The spreadsheet must be opened from Windows Explorer or Excel, not ArcCatalog.)
Run hotspot analysis on total population. Present your results on a map, explaining on the map why the pattern may exist based on your knowledge of New York City’s neighborhoods and geography.
(Keep in mind that we could also run analysis on other variables, such as education level, household income, or race to look for patterns. Measures such as mean center and standard deviational ellipse are especially useful for comparing the spatial distribution of two variables. We’re not asking you to do this since you should be working on your projects, but we could.)
Assignment due on Monday, April 27 at the beginning of lab.
Appendix 1: Masking Rasters from Geostatistical Analyst (optional)
This completely optional section describes how to mask an interpolated raster created in Geostatistical Analyst. Note that these general steps can be used to mask any raster, provided you have the appropriate masking file.
First, make sure you have climmun.shp added to ArcMap and a temporary interpolated raster created from climate.shp using Geostatistical Analyst. We’re going to start by extending the extent of the raster to climmun. Go to the Properties of your temporary interpolated raster. Under the Extent tab, use the dropdown menu to select the rectangular extent of climmun as the extent. Click Ok.
Next, export your temporary interpolated raster to your personal folder (right-click, Data | Export to Raster). Set the cell size to 1550. Add the new raster to your map.
Now we need a mask to remove the unwanted areas and keep the areas inside the state. For this example, we’ll use the raster, njmask, which is located in \\ad-rsc\data\teach\intgeo\ClassWork\spatial_stats. Check it out--you’ll notice that areas in NJ have values of 1’s and areas outside NJ have values of 0’s. Hmm, how might we use this to remove unwanted areas in the interpolated raster?
The raster calculator! We can simply multiply the interpolated raster times the mask. Make sure you have the Spatial Analyst toolbar activated and visible in ArcMap. Set the analysis extent to njmask (Spatial Analyst | Options | Extent). Now open Raster Calculator in Spatial Analyst and multiply your interpolated raster by njmask. You’ll get a temporary output raster called Calculation. If you’re happy with the result, export Calculation to make it permanent (Data | Export Data) and save it in your folder.