Guest Post: Identifying Supermarkets in OpenStreetMap and Overture Maps with Machine Learning

Oct 14

This guest post is written by Grant Gerrald, a 2025 summer intern at Henry Spatial Analysis, describing his research on identifying supermarkets across the United States from open spatial data sources. Grant, thanks for your great work this summer!

Why Supermarkets Matter

In this project, I set out to test whether machine learning models can reliably identify full-service supermarkets from open spatial databases.

Supermarkets function as destinations where households can complete nearly all of their weekly shopping in a single trip. This “all-in-one” quality makes them more than just retailers—they are essential infrastructure for daily living. The ability to cover every grocery need in one stop saves time, reduces travel costs, and provides predictability for families.

Because of this, the presence or absence of a supermarket can shape a neighborhood’s livability. Families deciding where to move often consider proximity to a supermarket as heavily as schools, parks, or public transit.

*Supermarkets are a critical amenity in many neighborhoods. Photo credit:* *Cheung Yin* on *Unsplash*

OPEN DATA SOURCES

OpenStreetMap (OSM) and Overture Maps are two open data sources for points of interest (POIs), including full-service supermarkets as well as other types of businesses. I considered potential supermarkets from both datasets using the following criteria:

OSM: places containing the shop = “supermarket” tag
Overture Maps: places tagged as “supermarket” or “grocery_store” in their primary or alternate categories

Both OSM and Overture Maps contain descriptive metadata contributed by users or aggregated from multiple sources.These metadata—shop type, opening hours, brand, and more—were not designed specifically to distinguish supermarkets from other types of grocery stores, but they capture details that can serve as useful signals for a predictive model.

Hypothesis: Machine learning (ML) models trained on metadata from open data sources can help us identify records that are likely supermarkets or non-supermarkets.

Defining a Supermarket

I applied three criteria to differentiate full-service supermarkets from smaller or specialized stores:

Fresh produce: At least two aisles (e.g., one refrigerated and one room-temperature).
Home goods: At least one aisle of paper goods, toiletries, or cleaning supplies.
All-in-one shopping destination: People should be able to complete most of their weekly shopping here.

This criteria rules out greengrocers or specialty shops but allows for some edge cases, such as small-town general stores.

Refrigerated produce section at a supermarket — *Fresh produce is a necessary indicator for a full-service supermarket. Photo credit:* *nrd* on *Unsplash*

Creating the Training Dataset

To train the ML models, I needed ground-truth examples of both supermarkets and non-supermarkets. I used two complementary approaches:

Manual tagging: I manually reviewed POIs in two states, Arkansas and Rhode Island, recording whether each location met the definition of a full-service supermarket. This produced carefully vetted examples across urban and rural contexts.
Previously vetted dataset: I incorporated an existing dataset of tagged supermarkets and non-supermarkets from counties containing the 50 largest U.S. cities. This broadened coverage and ensured large metropolitan contexts were represented.

Together, these datasets formed the foundation for model training and evaluation.

Exploratory Analysis

Before training models, I conducted a univariate analysis of available tags in OSM and Overture Maps. The approach was simple: measure how often a tag appeared in confirmed supermarkets versus non-supermarkets, then compute a strength score (% in supermarkets - % not in supermarkets).

Positive strength: Tag is more common among supermarkets → helps identify true positives.
Negative strength: Tag is more common among non-supermarkets → helps filter out false positives.

Table 1 shows tags with a positive - negative spread of ±20%, highlighting the strongest signals:

  
    


	HTML Table Generator 
	


	
		
				Dataset
				Tag
				% in Supermarkets
				% in Non-Supermarkets
				Strength Rating
			

		
				OSM
				brand:wikidata
				90.9%
				25.4%
				+65.5
			

				OSM
				brand
				91.1%
				26.3%
				+64.8
			

				OSM
				opening_hours
				66.0%
				29.7%
				+36.3
			

				OSM
				website
				62.7%
				28.7%
				+34.0
			

				OSM
				addr:postcode
				81.4%
				49.3%
				+32.1
			

				OSM
				phone
				59.7%
				26.0%
				+31.8
			

				OSM
				addr:street
				84.3%
				55.6%
				+28.7
			

				OSM
				addr:housenumber
				81.0%
				54.9%
				+26.1
			

				Overture
				Alternate Category (AC) = supermarket
				46.4%
				4.0%
				+42.3%
			

				Overture
				Primary Category (PC) = grocery_store
				47.6%
				6.0%
				+41.6
			

				Overture
				PC = supermarket
				42.5%
				3.7%
				+38.9
			

				Overture 
				PC = discount_store
				0.8%
				49.9%
				-49.1
			

				Overture
				AC = grocery_store
				47.2%
				88.7%
				-41.5
			

				 Overture
				AC = retail
				5.6%
				34.1%
				-28.5
			

	


  

Machine Learning Models

I tested three machine learning models for classifying supermarkets:

GLMNET: A generalized linear model with regularization. Simple and interpretable.
Gradient Boosting Machines (GBM): Sequential shallow decision trees, effective with messy, non-linear data.
Random Forests (RF): Combines many decision trees. Reliable baseline, balances accuracy and robustness.

I trained each model on 70% of the POI dataset, then evaluated its performance on the held-out 30%.

Evaluation criteria

Each of the three classification models returned estimated probabilities that a candidate point of interest was a supermarket meeting our definition; we can then apply a probability cutoff to separate predicted positives and negatives. I used the following criteria to determine our cutoffs, reflecting our real-world use cases for these models:

Precision: The share of model-identified supermarkets that are actually supermarkets. When a model predicts a POI is a full-service supermarket, it should almost always be correct. We would ideally like a predictive model with precision >95%.
Negative Predictive Value: The share of model-identified non-supermarkets that are actually non-supermarkets. When a model predicts a POI is not a full-service supermarket, it flags that location for manual review. Negatives can be checked manually, but to preserve limited reviewer time, we want to aim for an NPV of at least 10%.

Given our application, we selected relatively high probability cutoffs that maximized precision while keeping negative predictive value above 10% (or until a precision of >99% was reached). We show two other key metrics in all of the results tables:

Recall: The share of actual supermarkets that the model successfully identifies.
Accuracy: The overall proportion of correct predictions (both positives and negatives).

I also created Receiver Operating Characteristics (ROC) curves for each model. ROC curves show how well a model distinguishes between true positives and false positives across different probability cutoffs, offering a complementary tool for comparing model performance.

Results

Overture Maps Models

Table 2 shows the selected probability cutoffs and corresponding key metrics for the Overture Maps data:

  
	HTML Table Generator 
	
				Model
				Threshold
				Precision
				NPV
				Recall
				Accuracy
			
				GLMNet
				0.919
				99%
				75%
				78%
				86%
			
				GBM
				0.905
				99%
				79%
				82%
				88%
			
				RF
				0.90
				95%
				91%
				94%
				93%

All three models achieved exceptionally high precision (>95%), ensuring that nearly all predicted supermarkets were true positives. Both glmnet and gbm were able to reach precision values of over 99%, but when doing so, recall plummeted from over 70% to below 20%; to make sure this was scalable to a large dataset, I slightly lowered the classification cutoff to improve recall.

The random forest model demonstrated the most balanced performance overall, achieving the highest NPV (91%) and recall (94%), thereby reducing manual verification workload while preserving predictive reliability.
The gradient boosting machine model also performed strongly, maintaining precision above 98% with a modest tradeoff in recall (82%).
The glmnet model achieved comparable precision but a notably lower recall (78%), reflecting its conservative classification behavior.

***Figure 1:*** *ROC curves for Overture Maps models, showing selected probability cutoffs.*

OpenStreetMap Models

Table 3 below shows the selected probability cutoffs and corresponding key metrics for the OpenStreetMap data:

  
	HTML Table Generator 
	
				Model
				Threshold
				Precision
				NPV
				Recall
				Accuracy
			
				GLMNet
				0.9
				99%
				18%
				92%
				92%
			
				GBM
				0.9
				99%
				19%
				93%
				92%
			
				RF
				0.99
				98%
				50%
				99%
				98%

All three OSM models achieved high precision (>97%), successfully eliminating almost all false positives.
However, these higher thresholds reduced the negative predictive value substantially, particularly for glmnet and gbm (NPV ≈ 18-19%).
The random forest model, while slightly lower in precision than the other two models (97.7%), achieved the highest NPV (50%) and recall (>99%), suggesting superior generalization and lower false-negative rates.

Overall, OSM models prioritized precision over negative predictive value, effectively minimizing false positives but still requiring manual verification for most model-identified negatives.

***Figure 1:*** *ROC curves for OSM models, showing selected probability cutoffs.*

ConclusionS

This case study showed that ML models can reliably identify full-service supermarkets from open spatial datasets.

By training on hand-coded examples of true supermarkets and non-supermarkets, we developed models that achieved precision levels over 95%, meaning that almost every predicted supermarket is a true positive.

Beyond these preliminary results, the broader value of this project lies in how it can transform the way we maintain and improve open spatial data. With over 10 million points of interest in the United States captured by open datasets, manual vetting of each location is not realistic. The models developed here can now flag likely false positives and false negatives, directing human reviewers to the cases that matter most. This approach substantially cuts down the amount of manual review required while improving the overall quality of the resulting map.

In production, this workflow could operate as follows:

Download the latest version of the open POI dataset.
Train a suite of machine learning models using previously vetted POI records to flag data points that do not meet the supermarket criteria.
Review the flagged POIs manually to confirm or exclude them.
Iterate by incorporating all user-vetted POIs back into the training set for the next model update, ensuring continuous improvement of the classifier.

This iterative human-in-the-loop system provides a scalable and sustainable path for improving open geographic data. Each training cycle strengthens the model’s understanding of supermarket characteristics, gradually reducing the need for manual intervention.

In practical terms, this means we can maintain a far more accurate national supermarket map—critical for research on urban livability where proximity to daily essentials is a key community indicator.

About the author: Grant Gerrald is a Junior studying Management Information Systems at Santa Clara University. From Seattle, he is passionate about using data and machine learning to enable studies in public health, urban livability, and exercise science. You can find him on LinkedIn.

machine learningOpenStreetMapOverture Mapsopen spatial datarandom forestgradient boosting machinesglmnetclassificationHenry Spatial AnalysisClose.city

Nathaniel Henry