Guest Post: Identifying Supermarkets in OpenStreetMap and Overture Maps with Machine Learning

This guest post is written by Grant Gerrald, a 2025 summer intern at Henry Spatial Analysis, describing his research on identifying supermarkets across the United States from open spatial data sources. Grant, thanks for your great work this summer!


Why Supermarkets Matter

In this project, I set out to test whether machine learning models can reliably identify full-service supermarkets from open spatial databases.

Supermarkets function as destinations where households can complete nearly all of their weekly shopping in a single trip. This “all-in-one” quality makes them more than just retailers—they are essential infrastructure for daily living. The ability to cover every grocery need in one stop saves time, reduces travel costs, and provides predictability for families.

Because of this, the presence or absence of a supermarket can shape a neighborhood’s livability. Families deciding where to move often consider proximity to a supermarket as heavily as schools, parks, or public transit.

Supermarkets are a critical amenity in many neighborhoods. Photo credit: Cheung Yin on Unsplash

OPEN DATA SOURCES

OpenStreetMap (OSM) and Overture Maps are two open data sources for points of interest (POIs), including full-service supermarkets as well as other types of businesses. I considered potential supermarkets from both datasets using the following criteria:

  • OSM: places containing the shop = “supermarket” tag

  • Overture Maps: places tagged as “supermarket” or “grocery_store” in their primary or alternate categories

Both OSM and Overture Maps contain descriptive metadata contributed by users or aggregated from multiple sources.These metadata—shop type, opening hours, brand, and more—were not designed specifically to distinguish supermarkets from other types of grocery stores, but they capture details that can serve as useful signals for a predictive model.

Hypothesis: Machine learning (ML) models trained on metadata from open data sources can help us identify records that are likely supermarkets or non-supermarkets.

Defining a Supermarket

I applied three criteria to differentiate full-service supermarkets from smaller or specialized stores:

  1. Fresh produce: At least two aisles (e.g., one refrigerated and one room-temperature).

  2. Home goods: At least one aisle of paper goods, toiletries, or cleaning supplies.

  3. All-in-one shopping destination: People should be able to complete most of their weekly shopping here.

This criteria rules out greengrocers or specialty shops but allows for some edge cases, such as small-town general stores.

Refrigerated produce section at a supermarket

Fresh produce is a necessary indicator for a full-service supermarket. Photo credit: nrd on Unsplash

Creating the Training Dataset

To train the ML models, I needed ground-truth examples of both supermarkets and non-supermarkets. I used two complementary approaches:

  1. Manual tagging: I manually reviewed POIs in two states, Arkansas and Rhode Island, recording whether each location met the definition of a full-service supermarket. This produced carefully vetted examples across urban and rural contexts.

  2. Previously vetted dataset: I incorporated an existing dataset of tagged supermarkets and non-supermarkets from counties containing the 50 largest U.S. cities. This broadened coverage and ensured large metropolitan contexts were represented.

Together, these datasets formed the foundation for model training and evaluation.

Exploratory Analysis

Before training models, I conducted a univariate analysis of available tags in OSM and Overture Maps. The approach was simple: measure how often a tag appeared in confirmed supermarkets versus non-supermarkets, then compute a strength score (% in supermarkets - % not in supermarkets).

  • Positive strength: Tag is more common among supermarkets → helps identify true positives.

  • Negative strength: Tag is more common among non-supermarkets → helps filter out false positives.

Table 1 shows tags with a positive - negative spread of ±20%, highlighting the strongest signals:

HTML Table Generator
Dataset Tag % in Supermarkets % in Non-Supermarkets Strength Rating
OSM brand:wikidata 90.9% 25.4% +65.5
OSM brand 91.1% 26.3% +64.8
OSM opening_hours 66.0% 29.7% +36.3
OSM website 62.7% 28.7% +34.0
OSM addr:postcode 81.4% 49.3% +32.1
OSM phone 59.7% 26.0% +31.8
OSM addr:street 84.3% 55.6% +28.7
OSM addr:housenumber 81.0% 54.9% +26.1
Overture Alternate Category (AC) = supermarket 46.4% 4.0% +42.3%
Overture Primary Category (PC) = grocery_store 47.6% 6.0% +41.6
Overture PC = supermarket 42.5% 3.7% +38.9
Overture  PC = discount_store 0.8% 49.9% -49.1
Overture AC = grocery_store 47.2% 88.7% -41.5
 Overture AC = retail 5.6% 34.1% -28.5

Machine Learning Models

I tested three machine learning models for classifying supermarkets:

  1. GLMNET: A generalized linear model with regularization. Simple and interpretable.

  2. Gradient Boosting Machines (GBM): Sequential shallow decision trees, effective with messy, non-linear data.

  3. Random Forests (RF): Combines many decision trees. Reliable baseline, balances accuracy and robustness.

I trained each model on 70% of the POI dataset, then evaluated its performance on the held-out 30%.

Evaluation criteria

Each of the three classification models returned estimated probabilities that a candidate point of interest was a supermarket meeting our definition; we can then apply a probability cutoff to separate predicted positives and negatives. I used the following criteria to determine our cutoffs, reflecting our real-world use cases for these models:

  • Precision: The share of model-identified supermarkets that are actually supermarkets. When a model predicts a POI is a full-service supermarket, it should almost always be correct. We would ideally like a predictive model with precision >95%.

  • Negative Predictive Value: The share of model-identified non-supermarkets that are actually non-supermarkets. When a model predicts a POI is not a full-service supermarket, it flags that location for manual review. Negatives can be checked manually, but to preserve limited reviewer time, we want to aim for an NPV of at least 10%.

Given our application, we selected relatively high probability cutoffs that maximized precision while keeping negative predictive value above 10% (or until a precision of >99% was reached). We show two other key metrics in all of the results tables:

  • Recall: The share of actual supermarkets that the model successfully identifies.

  • Accuracy: The overall proportion of correct predictions (both positives and negatives).

I also created Receiver Operating Characteristics (ROC) curves for each model. ROC curves show how well a model distinguishes between true positives and false positives across different probability cutoffs, offering a complementary tool for comparing model performance.

Results

Overture Maps Models

Table 2 shows the selected probability cutoffs and corresponding key metrics for the Overture Maps data: 

HTML Table Generator
Model Threshold Precision NPV Recall Accuracy
GLMNet 0.919 99% 75% 78% 86%
GBM 0.905 99% 79% 82% 88%
RF 0.90 95% 91% 94% 93%

All three models achieved exceptionally high precision (>95%), ensuring that nearly all predicted supermarkets were true positives. Both glmnet and gbm were able to reach precision values of over 99%, but when doing so, recall plummeted from over 70% to below 20%; to make sure this was scalable to a large dataset, I slightly lowered the classification cutoff to improve recall.

  • The random forest model demonstrated the most balanced performance overall, achieving the highest NPV (91%) and recall (94%), thereby reducing manual verification workload while preserving predictive reliability.

  • The gradient boosting machine model also performed strongly, maintaining precision above 98% with a modest tradeoff in recall (82%).

  • The glmnet model achieved comparable precision but a notably lower recall (78%), reflecting its conservative classification behavior.

Figure 1: ROC curves for Overture Maps models, showing selected probability cutoffs.

OpenStreetMap Models

Table 3 below shows the selected probability cutoffs and corresponding key metrics for the OpenStreetMap data: 

HTML Table Generator
Model Threshold Precision NPV Recall Accuracy
GLMNet 0.9 99% 18% 92% 92%
GBM 0.9 99% 19% 93% 92%
RF 0.99 98% 50% 99% 98%
  • All three OSM models achieved high precision (>97%), successfully eliminating almost all false positives.

  • However, these higher thresholds reduced the negative predictive value substantially, particularly for glmnet and gbm (NPV ≈ 18-19%). 

  • The random forest model, while slightly lower in precision than the other two models (97.7%), achieved the highest NPV (50%) and recall (>99%), suggesting superior generalization and lower false-negative rates.

Overall, OSM models prioritized precision over negative predictive value, effectively minimizing false positives but still requiring manual verification for most model-identified negatives.

Figure 1: ROC curves for OSM models, showing selected probability cutoffs.

ConclusionS

This case study showed that ML models can reliably identify full-service supermarkets from open spatial datasets.

By training on hand-coded examples of true supermarkets and non-supermarkets, we developed models that achieved precision levels over 95%, meaning that almost every predicted supermarket is a true positive.

Beyond these preliminary results, the broader value of this project lies in how it can transform the way we maintain and improve open spatial data. With over 10 million points of interest in the United States captured by open datasets, manual vetting of each location is not realistic. The models developed here can now flag likely false positives and false negatives, directing human reviewers to the cases that matter most. This approach substantially cuts down the amount of manual review required while improving the overall quality of the resulting map.

In production, this workflow could operate as follows:

  1. Download the latest version of the open POI dataset.

  2. Train a suite of machine learning models using previously vetted POI records to flag data points that do not meet the supermarket criteria.

  3. Review the flagged POIs manually to confirm or exclude them.

  4. Iterate by incorporating all user-vetted POIs back into the training set for the next model update, ensuring continuous improvement of the classifier.

This iterative human-in-the-loop system provides a scalable and sustainable path for improving open geographic data. Each training cycle strengthens the model’s understanding of supermarket characteristics, gradually reducing the need for manual intervention.

In practical terms, this means we can maintain a far more accurate national supermarket map—critical for research on urban livability where proximity to daily essentials is a key community indicator.


About the author: Grant Gerrald is a Junior studying Management Information Systems at Santa Clara University. From Seattle, he is passionate about using data and machine learning to enable studies in public health, urban livability, and exercise science. You can find him on LinkedIn.

Next
Next

Model-Based Geostatistics (R-Package)