Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
.
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Bovine coronavirus infection symptoms

Comparison of distance measures in spatial analytical modeling for health service planning

Background

Health service research is concerned with the investigation of how social, financial, organizational, technological, and behavioral factors affect access to health care, the quality and cost of health care, and ultimately health and well-being. Distance plays a vital role in studies assessing spatial disease patterns as well as access to hospital services. In a highly complex health care environment, even micro-geographic differences in the availability of tertiary services can affect access to care. The study of distances from patient homes to the nearest hospital is an example where distance is often studied as a crude but objective indicator of geographic accessibility to hospital services. In such situations, the measurement of actual travel distance (or travel time) on a road network is clearly the most appropriate method. Health service research, however, encompasses a much broader investigation area, where spatial analytical models are employed to assist in the provision of effective accessibility to health care services. Distance is often used indirectly in these types of analysis as one of the parameters defining the model's thrust and its results.

In a rapidly changing physical and social environment, transportation means and travel modes change quickly as do epidemic transmission modes, overturning traditional ways of conceptualizing and measuring distance. A commonly-used distance metric is the Euclidean distance, a straight line distance measurement between two points, 'as the crow flies'. This method is simple and intuitive, but very few are the applications where it can yield accurate distance estimates. An alternative, well-known distance metric is the Manhattan, or taxi-cab distance: as its name suggests, it is most appropriate for grid-like road networks, typical of many North American cities, characterized by a rectangular city block pattern. The Manhattan metric measures distance between points along a rectangular path with right angle turns. Most commonly, travel along road networks involves a mixture of Euclidean, Manhattan, and curvilinear trajectories. There is no firm consensus on methods for selecting a distance metric, nor is there much published information on the extent to which Euclidean, Manhattan, and road distances relate to one another in applied distance analysis.

Travel along a complex, or mixed network can be usefully modeled by a class of distance metrics, known as Minkowski distance, which is a general distance metric, of which the Euclidean and Manhattan metrics are special cases. This array of metrics provides flexibility and generality, in that, within a single class of metrics, a range of parameters can be selected; therefore, a single yet flexible method for measuring distance can be defined for the optimal estimation of distance on a variety of empirical road networks. One further important aspect is the fundamental role of time in accessing health care services: if distance is a crude estimate of accessibility, travel time is a more relevant estimate. Travel time computation is no longer a prohibitively time consuming and computationally intensive task, thanks to powerful GIS software, hardware, and rich road network datasets. However, actual travel time on a road network is highly variable due to local (spatial and temporal) conditions which are hardly predictable and controllable. Because of this characteristic, travel time computations lack general validity, requiring adjustments to account for specific temporal conditions, e.g., weekend vs. weekdays, rush vs. non-rush hours, season, and weather, as well as local spatial conditions, e.g., local traffic congestion, lane closures, or proximity to amenities or popular destinations. All these reasons hamper the implementation of travel time computations in spatial analytical models, since even local analytical models require the definition of a single rule for distance measurement. A crude solution to this problem is the use of average travel time in spatial models; a more realistic solution can be obtained through the use of Minkowski distance: an optimal value of Minkowski distance can be selected to model travel time on a complex road network.

Spatial data tend to exhibit characteristics that negatively impact the statistical properties of quantitative models, decreasing their reliability: spatial analytical models are designed to mitigate these negative effects. The most crucial properties of spatial data are spatial dependence (near things tend to be more similar than distant things) and non-stationarity (inconstant variability of phenomena across space). Two broad categories of spatial analytical models include spatially autoregressive (SAR) methods, which deal with spatial dependence, and geographically weighted (GWR) methods, which deal with spatial non-stationarity.

In empirical situations, spatial dependencies and non-stationarities take up specific forms, which are a function of many factors, including the nature of the phenomena under investigation and the representation of space underpinning the model. For this reason, a simplistic application of spatial analysis, one that does not carefully model the salient aspects of phenomena, often fails to fulfill the model's primary objective, which is to enhance the model reliability. The transition from a simplistic to a customized implementation of spatial analysis requires the calibration of each parameter defining the analysis: one of the most crucial parameters, affecting the analytical results, is the distance measurement method.

Cardiac catheterization is a procedure that is performed to determine presence or absence of coronary artery blockages. The procedure involves the percutaneous insertion of a catheter into the arterial system, after which it is guided into the aorta where the coronary arteries are positioned. Contrast dye is then injected into the coronary arteries so that blockages can be located and identified. In some instances, cardiac catheterization can lead to immediate use of percutaneous coronary intervention with balloon angioplasty and the insertion of coronary stents that open up partially or completely blocked arteries to restore blood flow. In some instances, this procedure is performed in stable patients where distance and travel times are a minor concern. In other instances, however, the procedure is done urgently, and for such situations, consideration of distances and travel times become a central consideration in the planning of health services.

In the context of an applied study of distance between patient residence and a tertiary cardiac catheterization facility in a large city, this paper analyzes the effectiveness of a selection of distance metrics in providing a useful model of travel distance and travel time along an urban road network. The comparison of different metrics leads to the identification of a metric that is conceptually sound and computationally effective. The metric thus identified is experimentally used in a spatial autoregressive model analyzing the spatial distribution of cardiac catheterization cases in the city.

Study Area

The study area encompasses the City of Calgary, one of the largest Canadian cities, with approximately 1 million residents, distributed over a large geographic area (roughly 750 Km2), characterized by diversity of population, housing type, residential density, and accessibility to heath services.

Cardiac catheterization is an invasive procedure for patients experiencing cardiovascular symptoms and defines coronary anatomy, left ventricular and valvular function; it provides important prognostic information for individuals affected by cardiovascular conditions. During the study period the procedure was only performed at the Foothills Medical Centre, located in the northwest of the city.

Data Sources

Three types of data are used in this study: cardiac catheterization patient database, postal code locations, and the Calgary road network. Cardiac catheterization patient data were obtained from the Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease (APPROACH), an ongoing data collection initiative, begun in 1995, producing information on all patients undergoing catheterization in Alberta. The data are released at the postal code spatial aggregation level. Data were extracted for Calgary residents only and catheterizations performed over the year 2002, resulting in a total of 2, 445 catheterization cases, distributed over 2, 138 postal codes.

A postal code conversion file (PCCF) was obtained from Statistics Canada. Only postal codes that have at least one catheterization case are retained for the analysis. It shall be observed that postal code locations refer to the primary residence of catheterization patients, not to the place where symptoms were felt or where emergency care was first administered.

The Calgary road network data were obtained from the University of Calgary data holdings, based on street information collected and compiled in 2005 by DMTI Spatial. This road network was used to calculate shortest road distances from patient residence location to hospital for cardiac catheterization services.

Distance Metric Calculation

Straight line (Euclidean) distance and Manhattan distance are often used in health service research. Each of these distance metrics may appropriately estimate distance in some parts of a study area, but their application at the city level tends to yield large errors in areas that depart from the dominant pattern, and may lead to highly inaccurate distance estimations. One of the reasons for using Euclidean and/or Manhattan distance is the relative ease of their implementation; in contrast, it is more problematic to design algorithms implementing actual road network distance in spatial analytical models.

In order to reduce the error associated with the Euclidean and Manhattan metrics while maintaining the computational simplicity of a single, intuitive mathematical formula, the general Minkowski metric is examined, to devise a single method that best approximates the average pattern of an empirical road network. Optimizing values of the Minkowski formula are calculated for road distance as well as travel time; the results are compared with more traditional distance measures in the context of assessing geographic accessibility to cardiac facilities. The Minkowski distance has the potential to provide a more accurate estimate of road network distance and travel time than the Euclidean and Manhattan metrics.

A set of 2, 138 distances between each patient's postal code of residence and the Foothills hospital are calculated according to each of the distance measurement methods considered. The geographic locations of each postal code from the PCCF and the hospital are recorded in latitude and longitude; therefore, in order to implement distance computations, the road network is projected using an equidistant projection system, which is chosen in order to preserve distance and produce consistent distance measurements. Latitude and longitude coordinates are then converted into Eastings and Northings, i.e., x and y values, expressed in kilometers. Alternative methods could have been used, for example the great circle distance formula, which, however, provides rougher estimations. The ArcGIS 9.3 Geometry calculator was used to calculate the x and y coordinates based on the projected dataset and the resulting x and y values were used in the distance formulas defined below.

Euclidean, Manhattan, and Minkowski distance can be calculated by the formula:(1)

where, for this application:

d is the distance between a patient's residence and the hospital;

xi, yi are the geographic coordinates of the centroid of each postal code of residence;

xj, yj are the geographic coordinates of the Foothills hospital.

The generic p parameter in Equation 1 can be replaced by the value 2 to yield the well known Euclidean distance; the value 1 would yield the Manhattan distance, and all the intermediate values in the in the [1

Road Network Distance and Travel Time Calculation

The computation of road network distances (shortest distance between two locations along a road network) is implemented directly in GIS software using a shortest path algorithm. It is recognized a priori that this method depicts the actual travel trajectory and is likely to produce the most accurate distance estimates for patient travel routes to a hospital. Likewise, travel time calculated over the road network using an appropriate algorithm is likely to yield the most accurate travel time estimates. The basic need for road distance calculation is a road network with information on all the segments constituting the network, as well as on all existing constraints such as prohibited left or right turns, one way streets, etc.

Travel time calculations, likewise, are implemented in GIS software. Additional information required for travel time calculation include estimated speed and length of road segments, along with an algorithm capable of taking all these factors into consideration. The Network Analyst extension within ArcGIS enables the modeling of spatial networks and provides the tools for road distance calculations and travel time calculations from multiple postal code locations to a hospital. Figure 1 illustrates road network distance, along with the distance metrics considered in this study.

Computing Minkowski Coefficients

A simple procedure can be implemented to select, within the [1 ≤ p ≤ 2] interval, the value of the parameter p in the distance formula that best approximates distance along a given road network. In light of the applied focus of this analysis, an empirical solution to this problem is sought for the set of 2, 138 patient-to-hospital distances. As a first step for determining the best Minkowski distance, Equation 1 is transformed into Equation 2:(2)

Two new quantities are defined as X = (xi - xj) and Y = (y i- yj), and replaced in Equation 2, which is also further modified by means of a logarithmic transformation. Equation 2 is solved for p values in the [1 ≤ p ≤ 2] interval, sampled at regular intervals. The set of distances thus calculated are considered approximations of the road distance. A simple regression model is defined, where the dependent variable is the road distance, and the independent variable is, in turn, the distance obtained by each p value. Goodness-of-fit, residuals, and other regression diagnostics are then compared over the entire interval. This simple method helps assess and rank the various coefficients, identifying the one that produces the highest R2 value, which is considered the best Minkowski coefficient.

The procedure is then modified to determine the optimal Minkowski coefficient for travel time: conceptually, this experiment is less straightforward, because travel time is a measure of time, whereas Minkowski remains a measurement of distance in space. Once travel time is computed, in order to make the transition between time and space, in terms that are valid both conceptually and computationally, the concept of speed, a simple ratio between space and time, is introduced. Average speed over the city is calculated, yielding the following values: the average distance traveled in one hour is 58.71 km; conversely, the average time required to travel 1 kilometer is 1.02 minutes. For the sake of simplicity, in order to make the argument more intuitive, these values were rounded to an average speed of 60 km/h or 1 minute to travel 1 kilometer. It shall be observed that these values are obtained under optimal conditions, i.e., without considering delays due to rush hour, traffic congestion, traffic lights, stop signs, road closures, weather, etc.

Travel time calculations are then used as the dependent variable, while the independent variables remain unchanged. To this end, actual travel times are converted to distances, using the average travel time, and replaced in the procedure for the calculation of the optimal Minkowski p value to approximate travel time. Once the two optimal Minkowski coefficients are identified, standard descriptive statistics are used to analyze and compare the four distance metrics and the two empirical distance measurements.

Differences between distance measurements and distance metrics

Table 2 presents a selection of summary statistics of the differences between each empirical measurement and the distance metrics considered. These are only global results, overshadowing the performance of each distance model throughout the city. These results suggest that, globally, the differences produced by each model are modest.

For road distance, the smallest differences are achieved by the Manhattan and Minkowski (p = 1.31) metrics, with Manhattan producing the overall best result. The same metrics produce the best results for travel time, with Minkowski (p = 1.31) producing the lowest standard deviation. Median and mean differences are lower than 2 kilometers and just over 2 minutes, respectively. However, standard deviations tend to be quite high, relative to the mean. Figure 3 and Figure 4 illustrate graphically these differences, providing greater spatial detail. Differences between road distance and each distance metric are presented in Figure 3, and those between travel time and each metric in Figure 4. The figures display spatially the magnitude of the error associated with each distance metric in each part of the city.

With respect to road distance (Figure 3), Euclidean distance produces only negative errors, but in some peripheral areas of the city these errors are large in absolute value. Manhattan distance produces mostly positive errors, and overall it produces better results, as the areas characterized by the highest absolute errors are reduced to a triangle west of the hospital and the far southeast corner of the city. The Minkowski metric with p = 1.54 improves over the Manhattan metric results, by resolving the area of large residuals in the southeast corner; conversely, the area of high residuals west of the hospital is moderately larger. The best results are produced by the Minkowski metric with p = 1.31, as for the entire eastern part of the city errors are contained in the interval 0-2.5 km, and greater errors remain only in some peripheral northwest areas.

The area west of the hospital consistently emerges as an outlier, despite its close distance to the hospital. Careful observation of the topography of the area reveals that the hospital is located on the east side of a hill, with a river running to the west of the hill. Therefore, no immediate access is possible from the west side of the river to the hospital, and patients are left with no option but to travel a considerable distance either northbound or southbound to access the nearest bridge. The presence of the hill further amplifies this distortion.

With respect to travel time (Figure 4) the improvement obtained by the two Minkowski metrics is more evident throughout the city. Areas of large residuals remain in the far southeast and southwest corners, and, in general each metric performs worse in the west than in the east part of the city. The latter observation is counterintuitive, since the hospital is located in the northwest; however, that part of the city, closer to the foothills, is characterized by a more complex topography. Moreover, in a city like Calgary, urban design and age of communities play an important role, as, over the decades, rectangular patterns have alternated with such patterns as crescent and cul-de-sac, and other typologies of urban connectivity. Likewise, it shall be observed that the most peripheral areas are very recent developments, and likely the planned road connections had not been completed during the study period.

Spatial analytical modeling application

Within the scope of a larger project, the association between cardiovascular disease and socio-economic variables was recently analyzed. Specifically, the relationship between cardiac catheterization and socio-economic variables was analyzed by means of a multivariate spatially autoregressive model. While distance does not explicitly enter in these models, it is one of the key parameters defining the spatial weight matrix (19), which represents the neighborhood definition, hence the model's ability to cope with spatial dependencies, and ultimately the reliability of the model estimates. Figure 5 provides an example of how different distance metrics affect the neighborhood configuration defined by a spatial weight matrix. Locations in the figure represent census tract centroids, as these relatively larger spatial units are used in these regression models. The lines connecting these locations indicate whether or not 2 close locations are considered neighbors and included in the estimation of spatial dependence. Other parameters contribute to the neighborhood definition: generally the most influential parameter is the number of nearest neighbors, complemented by a distance decay function and a weight. Figure 5a shows the connectivity defined by the Euclidean metric, while Figure 5b corresponds to the Minkowski metric optimizing travel time (p = 1.31). A close comparison of the two plots (aided by the superimposed circles) reveals how the modification of the distance metric substantially alters the neighborhood configuration.

Each neighborhood configuration, such as the ones presented in Figure 5, forms the basis for the definition of a spatial weight matrix, which is one of the crucial elements that define a spatial regression model, differentiating it from a standard (non-spatial) model. Through the spatial weight matrix, the neighborhood configuration ultimately affects the variance of the model estimates, hence their reliability. As an example, Table 3 shows the variation in the parameter estimates and the regression diagnostics determined by the two alternative neighborhood configurations depicted in Figure 5. The regression model analyzes the socio-economic variables significantly associated with cardiac catheterization: the main predictors are family status, income, and educational attainments; the spatial distribution of these variables is used to identify areas of social and economic concern. This model thus provides an effective analytical tool to support policy decisions, providing guidance for the initiation of targeted, localized preventative health measures.

A succinct summary of the parameters associated with each independent variable is presented in Table 3, along with a small selection of regression diagnostics. While the coefficients (beta) linking each independent to the dependent variable remain substantially unchanged, the Minkowski distance leads to increased values of their associated t test: the t values increase thanks to a reduction of the variance associated with the estimates. All the independent variables benefit, in varying degrees, from the modified distance model. Likewise, the regression diagnostics indicate that the distance model does not appreciably affect the overall goodness of fit (represented by the pseudo-R2), but it does noticeably impact the autoregressive coefficient and, more importantly, the spatial dependence in the regression residuals.

These results confirm the importance of an accurate distance model to enhance the reliability of spatial analysis for health service research.

Discussion

This study compares four different distance metrics, i.e., Euclidean, Manhattan, and Minkowski distance (the latter for two different coefficients), and contrasts them with road distance and travel time, respectively, in the context of applied health services research. The Euclidean metric is the most common and intuitive measure of distance; Manhattan is another common distance measure, but very rarely does either of them provide a close approximation of an empirical road network, such as an urban network. Road distance, conversely, provides an accurate measurement, but it is prone to local features, and does not provide a single model of travel throughout the urban network. Travel time is, arguably, the most relevant estimate of distance, but its calculation introduces further specificities, as temporal anomalies are added to the local features, further reducing its generality. For these reasons, Minkowski distance is a promising solution: it provides a general model of travel throughout an empirical network; it possesses a large range of parameters, which enhance its flexibility; it can be easily calculated; and it can provide a less crude approximation of travel along a road network.

Euclidean distance is widely used in distance analyses in the literature but it tends to underestimate road distance and travel time. Manhattan distance, on the contrary, tends to overestimate road distance and travel time. The use of either of these two metrics in any spatial analysis may result in inaccurate results. Ideally, travel time provides the most accurate estimate of a patient's travel from their place of residence to a given health facility. While accessibility studies may profitably employ this method, its use in spatial analytical models is inhibited by the particularity of its calculation, which tends to be heavily affected by local features, both in space and in time.

A simple, empirical optimization procedure led to the identification of the coefficients, in the Minkowski formula, that best approximate road distance and travel time, respectively. Summary statistics and cartographic representations consistently indicate the value p = 1.31 as the best coefficient, i.e., the one that leads to the most accurate approximation of both road distance and travel time. The model of distance based on this coefficient was experimentally introduced in spatial analytical routines, to define neighborhood connectivity and determine the spatial weight matrix for multivariate spatial regression analysis.

The enhanced reliability of the spatial analytical model based on the optimal distance model far outweighs the cost of the computational procedure that leads to the coefficient selection. The advantage of the procedure discussed in this paper can be best appreciated by considering that empirical measurements of distance and travel time cannot be implemented in spatial analytical modeling, exactly because of their empirical nature and the great impact they receive from local features and conditions. For this reason, the optimized Minkowski coefficient represents a valuable compromise between an approach that is often simplistic (i.e., Euclidean and Manhattan metrics) and the ideal, but impractical sophistication of empirical measurements (i.e., road distance and travel time, respectively).

This study has some limitations. Most importantly, it is limited to one class of distance metrics, i.e., Minkowski, whereas other metrics could be considered: Mahalanobis distance is just one example. There are locational inaccuracies in the patient data as well as in the road network; additional errors are likely to derive from the algorithm used for the distance calculations. Travel time calculations are based on assumptions, referred to as optimal conditions that tend to represent an ideal, but unlikely situation. The hypothesis of optimal conditions should be lifted, and more realistic conditions should be entered in the model, e.g., rush hour, or severe winter weather, and should be considered not just individually, but also jointly. An array of optimal Minkowski coefficients should consequently be calculated for the varying conditions, leading to the final identification of a stochastic optimum. The entire procedure is also based on the strong assumption of a single transportation mode: it can be argued that optimal conditions approximate ambulance travel, but the catheterization registry used in this analysis was not limited to patients who were transported directly from their residence to the tertiary catheterization facility. The APPROACH registry also includes patients initially admitted to hospitals or emergency facilities without catheterization facilities who were subsequently transferred to a tertiary center for this procedure under less urgent circumstances. Accuracy in travel route selection and in travel time estimates are most relevant to patients with more emergent cardiac conditions requiring rapid transportation from their residence to the catheterization facility. This limitation can be addressed by a twofold model, optimizing for ambulance and private vehicle travel. Still, road network travel is a reasonable assumption for urban environments, but is unlikely to be the sole mode of transportation to emergency care from rural and remote locations.

The proposed approach was tested, as an example, on a spatial autoregressive model; however, virtually all spatial analytical techniques involve some distance measurement. The impact of alternative distance measurements was examined in this implementation through an analysis of the spatial weight matrix, but distance is likely to impact other analytical techniques in further, different ways.

Given its advantages and limitations, Minkowski distance appears to be most usefully implemented in spatial analytical modeling; however, other useful applications can be envisaged, particularly in geographic areas characterized by paucity or unreliability of spatial data, or by high dynamism. Examples include urban or regional road networks of countries with poor spatial digital records, or characterized by high population mobility or varying transportations routes, for example due to seasonal variations. In all such cases, a model of travel in the area, obtained by the method discussed in this paper, can provide rough, but reasonable distance estimates, potentially useful for facility planning, or as initial input for more sophisticated analyses.

Conclusion

The proposed method provides a single model of travel on an urban road network, via the identification of an optimal coefficient within a class of distance metrics, known as Minkowski metrics. The coefficient can be optimized to approximate different distances, e.g., road distance or travel time, under varying conditions. The resulting distance model can be usefully input in spatial analytical models, providing a method for the estimation of less simplistic spatial analytical models, by means of a more accurate representation of distance. Such models yield more reliable estimates, hence more effective tools for health service planning and management.

Competing interests

The authors have no conflict of interest to declare and each has met the criteria for authorship (including access to the data and role in writing the manuscript).

Authors' contributions

All authors read and approved the final manuscript. RS compiled the data, performed the distance analyses and calculations, and drafted manuscript. SB conceived the conceptual method to optimize Minkowski metric; performed the spatial analytical implementation; and thoroughly revised the manuscript providing extensive guidance. MLK provided clinical context and comments on the manuscript. WAG participated in study design, contributed in writing manuscript and provided support and guidance throughout the study.

Pre-publication history

The pre-publication history for this paper can be accessed here:

http://www.biomedcentral.com/1472-6963/9/200/prepub