AcreValue’s Valuation Model – Part 2
In Part 1, we explored questions from farmers and landowners who use our values to compare their farms to others. Now we’ll address common questions from land professionals (e.g., brokers, appraisers) on the AcreValue model’s inner workings.
We’ll cover the following:
- What kind of model do you use?
- The land market differs by region – how do you handle that?
- How do you develop your model and test its accuracy?
AcreValue’s Automated Valuation Model
AcreValue’s automated valuation model (AVM) estimates a price per acre using mathematical modeling combined with publicly available data from more than a dozen sources. These data include variables like soil types, soil productivity, location, field shape, and dozens of other factors. The model uses machine learning — simply put, it applies a mathematical formula to represent the relationship between the attributes of a parcel and its price, and the formula is adjusted to fit real-world land sales as accurately as possible. In other words, the model “learns” from sales data. It is developed and tested by a team of data scientists, engineers, and real estate consultants, but humans are not particularly good at assimilating and comprehending very large amounts of data, or at making predictions for millions of agricultural parcels in a reasonable amount of time. That is where machine learning can be so powerful.
People often refer to machine learning models as “black boxes,” meaning that although the inputs and outputs of the model can be observed, its inner workings are too complex to understand, so why a black box machine learning model makes a certain prediction can’t be explained. However, that’s not the case with AcreValue’s model. AcreValue’s AVM is kept as simple and interpretable as possible, so one can understand how and why it predicts the way it does.
AcreValue’s data science team builds valuation models regionally, using publicly available land sales data, because different regions may have different factors that influence land values. For example, the land market in California’s Central Valley is completely different from the market in Illinois or Iowa. To fit a model, the AcreValue team begins by querying our database for all of the recent sales that took place in the region. Next, those sales are filtered to keep only market value and arm’s length transactions of agricultural land, and remove everything else (especially residential sales). AcreValue determines whether a sale is agricultural by using NASS’s Cropland Data Layer, which also powers the land cover history shown on its reports. Ultimately, a large collection of agricultural, arms length sales in the region of interest are put together.
If any of the sale information looks odd or incorrect, it is reviewed and verified prior to use in the model. The collection of sales is then split into a training set and test set, which is a standard technique in machine learning.
How The Model Is Trained
The first step is to tell the model what it is trying to predict: this is called the “target variable.” In AcreValue’s AVM, it’s the sale price of land in dollars per acre. The next step is to tell the model what “features” or “predictors” it is allowed to use when predicting the sale price. How does AcreValue decide what variables might be predictive of land values? This is the most difficult part of creating an AVM for agricultural land, and it requires a large amount of domain expertise, as well as access to informative datasets. AcreValue does not share the exact details of how the predictors are defined, but they include variables such as soil productivity, location, land cover, and field shape. The model learns by trying to use these variables to best predict the sale prices it observes on the training set sales, and it has to decide how much weight it should put on each predictor. For example, the model might learn that soil productivity has a strong, positive effect on price — the exact relationship is given by a mathematical formula that is learned from the data. (For more information on the various measures of soil productivity in the US, please read our recent blog.)
How The Model Is Tested
After the model has been fit to the training set, the next step is to “test” the AVM using the sales that were not used in the training step. The reason for separating the test sales from the training sales is to make sure that the model is capable of accurately predicting prices on land that it has never seen. If AcreValue created a model that perfectly memorized the prices of the training sales, but didn’t do well when predicting on parcels it hadn’t seen before, it wouldn’t be very helpful for its users! Due to the nature of the land market, an agricultural AVM will be less reliable than a residential one — unlike the residential real estate market, there is far less turnover with farmland, and modelers (just like appraisers) are challenged by a paucity of sales in many neighborhoods.
We hope that this article helps to demystify both our model and our process. If you missed Part 1, which addressed common questions from farmers and landowners about our value estimates, please check it out. And for additional perspective, be sure to visit our FAQs.
Adrian leads the AcreValue data science team at Granular. Previously, he held roles at Google, The Climate Corporation, and Goldman Sachs. Adrian holds a BA and MS from Stanford University, as well as a PhD from the Toulouse School of Economics.
LeeAnn directs industry marketing for AcreValue. She has held various roles with agricultural asset management, sales, and valuation companies and has written extensively on farm real estate and valuation. LeeAnn holds a BSc in Animal Science from the University of Guelph, and a PhD in Agricultural Economics from the University of Illinois.
Stay in the know!
Subscribe to get new blog posts sent straight to your email