Telling the Future

Telling the Future

By Mark Cooke – Audit Report Issue 54.

Cheap computational power, access to massive amounts of behavioral data, and flourishing data science algorithm tools have created an environment where most types of “predictive” analyses have seen a massive boost in their performance.  Imagine the online shopping cart experience, for example. Of all the millions of online shoppers, an algorithm can effectively put a singular desired item in front of an individual buyer.

What all of these algorithms have in common, and where they slide away from traditional descriptive statistics, is in the ability to distill massive amounts of data down to a single predictive event. This is no longer an average of averages. It is a specific pinpoint single entity sharp shooting of telling the future.

Where will this bear fruit in our industry?  Lots of places. It already has, actually, but the concept and the methods are becoming much more widely known and available. The proliferation of open data and open source technologies for data science has grown so much in the last five years that it is becoming common to see it in all industries, from large entities to small businesses.

In government, the concept of open data will continue to drive the innovation.  As we learn to share data across platforms we start to unlock some of the potential. When this expands across departments, or even across jurisdictions, then things really start to get powerful. Think about this in terms of the influence of property data on police data, or vice versa of course. Perhaps we might even look to utility data and its influence on predicting investment or school performance.  The applications are endless, to use a pat phrase, as long as we are breaking down the historical and traditional barriers between data repositories.

This is not just left up to individual innovators, either.  We have experienced the creative leaders who might have insight into these trends, or even suss out “what’s possible” in the midst of what we are doing every day. However, when it comes to data science we will shortly need teams of innovators, and stakeholders, and data engineers, and programmers, and GIS specialists, and domain experts… the list goes on. But the point I am trying to make is that this is a team sport now.

If we wanted a more concrete example on how all this comes together, we can think about how this might impact property valuations, and, interestingly, describing property sales. Historically, this has been the realm of descriptive statistics and mass valuations. We didn’t predict sales, but reacted to them. The industry modeled those sales after the fact based on house qualities, neighborhood or land maps, and various global characteristics.

However, what if we could mathematically review all properties individually, without the need for regression, but specifically looking at the property’s own specific characteristics?  All of them. We can account for a property’s square footage, acreage, age, siding, height, road frontage, geographic location, and heating source – just to name a few.  How do all of these impact the property value and more interestingly, will that property sell in the next twelve months?

As it turns out, we can do this.  Handily.  We can build a model from past sales that doesn’t describe the environment, but that sets up a model that tells the future.  And, it will do so relatively accurately and most of the time.  I haven’t been able to experiment to any great extent, but what we have looked at so far can tell us if a property will sell in the next twelve months. A specific property. That model can then be extended to suggest what the sales price will be. Amazing.

But there is a piece that is even more amazing. We can add more data, non-property related data, and get even better results.  What kind of data?  So far we have worked with geospatial data and, essentially, proximity to special features; things like parks, lakes, or grocery stores. Turns out even this simple data addition adds a substantive boost to the models’ performance.

So what if we had even more behavioral data? Data about demographics in the marketplace? Data about social trends, or political boundaries, or who knows what?  My gut tells me the model would perform even better still. The better we can describe the entities (adding more data qualities) the better the predictive models get.  Will it impact the way we do business? I suppose so, since we aren’t the only ones who will use and have access to this technology.

We can do all of this using the tools and techniques of data science, open data mentalities, and the open source tools available on the market. What it will take is a shift in the perspective, and then the slow gradual cultural shift in the way we do business. But that will come, too. Change is inevitable, and welcome when this exciting and profound.