Geological Image
Analysis Software - GIAS v1.0
Nearest
Neighbor Panel
THe Nearest Neighbor
(NN) panel consists of the four graphical areas (Distance histogram, R
and c
statistical graphs, Skew vs. kurtosis plot) and a text summary of the
statistical properties of five different fitting models. It is always
active (i.e. input data is an image file, an NN image or NN Centroid
List). Note that previously processed data is stored in memory until it
is overwritten. This application is primarily written to test the
object distance distribution for spatial randomness or other types of
organisation, to examine if there are underlying processes which may be
influencing the placement of the objects.
Objects may exhibit a spatial organization along a continuum that
includes three classes
- Regular spacing, which in an extreme sense
manifests in an equal distance between all objects;
- Random distributions, where the location of each
object is independent of all other objects; and
- Aggregated distributions, which in an end-member
scenario results in objects being tightly clustered within a single
group.
By examining the NN distances, such models can be tested. NN distances are determined
by calculating the Euclidean distance between each interior object and
all
other objects within the dataset, including the vertices of the
surrounding convex hull. The results
for each interior point are sorted in ascending order to identify the
minimum
NN distance and calculate the actual mean NN distance.
All graphs are for visualisation purposes only. You cannot export any
of the figures directly (other than screenshot capture). The data to
reproduce the graphs can be saved to a spreadsheet-readable text file.
See Input Panel Help.

NN Distance
Histogram
The NN Histogram outputs the binned frequency of object
distances. The bin size defaults to 10 equally-spaced bins. All updates
take place immediately.
However, depending on the distribution of the object areas, the user
can change the bin size by selecting Custom and entering
the required bin size (which are correctly scaled). The starting and
ending size bins on the X axis can be chaged by selecting Custom and typing in
the green boxes marked 'Min' and 'Max'. If you do not do this
correctly, the Custom box will flash the word 'Value >' to
remind you to enter the required values.
The Y axis can be displayed in either Linear or Logarithmic spacing.
When the Log button is clicked, the bars of the histogram become single
points. The Maximum frequency displayed on the Y axis can also be
clipped.
To reset the values to Default, click on the Bin Size: Default button..
In the Y Axis (green) text box, type 'Max' to reset the frequency axis.
R and c parameters
Using
sample-size dependant values of R, c, and their respective thresholds
of significance, we may define the following scenarios:
- If
the values of R and c fall within ± 2 σ of their
expected values, then one can confidently assume that the model
correctly describes the input distribution.
- If
the value of c is outside the range of ± 2 σ, then
the model can be confidently rejected, regardless of the value of
R.
- If
c is outside ± 2 σ, and R is greater than the +2
σ threshold, then the input distribution exhibits a
statistically significant repelling relative to the model. Conversely,
if c is outside ± 2 σ range, and R less then than
the -2 σ threshold, then the input distribution is clustered
relative to the model.
- Lastly,
if R is outside ± 2 σ, and c is within the
± 2 σ range, then the test results are ambiguous,
which generally indicates that there is insufficient data to perform
the analysis and/or the standard error (σ_e) of the input
distribution is large.
Model Selection
One
of the novel
contributions of our application is the introduction of an automated
calculation of the sample-size dependant biases in NN statistics.
Baloga et al. (2007) proposed how finite sampling would affect several
NN models; however, they only implemented a solution for the Poisson NN
case. In our application, we have automated the Poisson NN procedure,
extended it to include the three other models of Baloga et al. (2007),
and derived the k = 2 case for the scavenged NN distribution (Appendix
1). Automation of the NN method within GIAS enables users to rapidly
determine if their data fulfills the size-dependant criteria of
statistical significance for the following five spatial distribution
models: (1) Poisson; (2) Normalized Poisson; (3) Scavenged, k = 1; (4) and
Scavenged k = 2; (Note the Logistic Distribution has been removed in Version 1.1).
See the mauscript for summaries and discussion of the properties of these models.
Skew versus Kurtosis Plot
In
addition to sample-size dependant R and c statistics, we expand upon
the methods of Baloga et al. (2007) by simulating the sample-size
dependant range of expected skewness and kurtosis values for each NN
hypothesis. Calculated confidence intervals—at 90%, 95% and
99%—are plotted within GIAS to illustrate range of acceptable
skewness versus kurtosis for each model. Plots of skewness versus
kurtosis are also useful for discriminating between populations with
otherwise similar R and c values, such as rootless cones and pingos
(Bruno et al., 2006).
Nearest Neighbor Results
The nearest neighbor distances and other statistical properties of each model, (e.g. minimum, maximum, mean, R and c fit and expected mean etc of the data ) can be selected by
clicking the appropriate button. All the data is output to the CSV
file.
References:
Baloga
SM., Glaze LS, Bruno BC, 2007, Nearest-neighbor analysis of small
features on
Mars: Applications to tumuli and rootless cones. Journal of Geophysical
Research
112, E03002.