Splines Model for Prediction of House Prices

Splines Model for Prediction of House Prices

Splines Model for Prediction of House Prices David Boniface UCL Aim To create a web-based facility for customers to enter address of a house and obtain graph showing trend of price of house since last sold, extrapolated to current date. UK Land Registry of house sale prices was available monthly from 2000. Properties were categorised as new-build or not, and Detached, Semi-detached, Terraced or Flats. Only detached houses model implemented.

Frequency Distribution of National House Prices 0 Frequency 500 1000 1500 2000 - 2010 0 100 200 300

Price (000) 400 500 National Mean House Price in Quarterly Intervals 0 mean price(000) 100 200 300 detached houses 2000-2010

0 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041 Initial plan was to model prices of houses in the vicinity of the target house in real time and hence estimate current price. The next slide shows the sale prices of 18 nearest houses to a target house last sold in August 2006 for 485k. 18 nearest houses to Target House 485,000 18/08/2006 in TN16 1RP Price (000) date miles

Post code TN16 Price (000) date miles Post code TN16 500 15/08/2006

0.95 1SD 415 09/03/2004 1.09 1PZ 630 29/05/2007 0.95

1SD 307 30/09/2003 1.09 1PZ 350 28/02/2005 0.95 1SD

400 30/06/2006 1.09 1TU 385 01/05/2003 0.95 1SD 247.5

04/07/2007 1.22 1TF 365 23/10/2003 0.95 1SD 412 17/11/2006

1.26 1RG 202 28/05/2004 0.99 1RE 295 10/11/2003 1.26

1RG 465 03/06/2004 0.99 1RE 455 31/08/2005 1.44 1AJ

350 07/07/2004 1.04 1TS 1020 25/04/2003 1.51 1SE 330

30/03/2007 1.04 1TS 430 29/10/2003 1.56 1SA Linear regression was used to give a prediction for current date using as predictors date and distance from target house.

Predictions compared with known recent sale prices Problems: 1: To get 50 houses sold in the relevant time period could require including houses a great distance away. 2: Predictions were out by as much as 100k. 3: Too much variability. Great Price Crash of 2008-2009 From autumn 2008 the great price crash began. This ruled out linear models. New strategy required. Decided to model the national price trend and apply this to the last known sale price of a target house. The Stata ado uvrs (with user specified knots) was used to model the national price curve. The parameter estimates were saved. Later, to respond in real time to a query about a particular

house, splinegen was used to generate the spline curve of mean prices for the required time span to current date. This was applied to target house. 1. Use of coded date Dates from Land Registry, in Excel, are in days from 01 Jan1900 In Stata, a %td date value is in days from 01Jan1960 Hence conversion of current date code from Stata to Excel format is by the following syntax: replace date = date(c(current_date),"DMY") + 60*365 + 16 creturn: c(current_date) * This returns the current date 2. Choice of user knots for splines (days since 1900)

100000 150000 200000 250000 300000 uvrs regress priceln date, knots(37000 38000 39000 39600 40000) noorthog Price trends 37000 38000 39000 national_price knot_37000 knot_39000 knot_40000 40000

your_house knot_38000 knot_39600 41000 3. Saving and retrieving the knots uvrs regress priceln date, knots(37000 38000 39000 39600 40000) file open myfile using makeglobals.do, write replace file write myfile "global knots `e(knots)'" _n file write myfile "global bknots `e(bknots)'" _n file close myfile This creates a do-file for later use containing commands that create global macros containing the knot values. This next syntax recreates the globals with required values: do makeglobals splinegen date $knots, bknots($bknots) i.e. splinegen date 37000 38000 39000 39600 40000, bknots(36529 40200)

3. Saving and retrieving the parameter estimates estimates save "uvrs3" This creates a binary file for later use containing coefficients etc This next syntax retrieves the values: splinegen date $knots, bknots($bknots) estimates use "uvrs3" predict yhatln 4. Use of log scale to deal with skewed price distribution gen lndelta = 150000 gen priceln = ln((price + lndelta)/100) Inverse transform applied before plotting: gen national_price = 100*exp(yhatln)-lndelta Has effect of scaling up price rises of more expensive houses - similar to applying a % increase. 5. Estimation of prediction intervals

95% confidence intervals based on estimated standard errors from the model. These were large: typically 60,000 6. The 2008/2009 slump in house prices This caused considerable difficulties for the project since the picture was continually changing. The modelling struggled to keep up with the evolving situation which resulted in the project being abandoned. Limitations 1. Beyond the range of data only a linear spline is used. This may not be ideal for prediction 2. We had insufficient information to account for the price of a house hence too much unexplained variability. 3. The trial and error process for selection of knots

is not appropriate automatic process required, Acknowledgements Dan Winchester of Labworks who funded the work Patrick Royston, MRC Clinical Trials Unit, London, who provided modified versions of uvrs and splinegen Kristin MacDonald of StataCorp who helped with globals

Recently Viewed Presentations

  • Treaty of Paris  French Canadians felt betrayed by

    Treaty of Paris French Canadians felt betrayed by

    Membership in the FLQ became a crime. October 16 Federal troops were sent in to watch the streets of Ottawa and Montreal Pro-separatist Quebeckers were arrested Trudeau felt that he needed to do this because he wanted to stop FLQ...
  • MATLAB and its Control Toolbox

    MATLAB and its Control Toolbox

    * MATLAB Control Toolbox * * MATLAB Control Toolbox * Transfer function State Space Zero-pole-gain tf2ss ss2tf tf2zp zp2tf ss2zp zp2ss pzmap: Pole-zero map of LTI models. pole, eig - System poles zero - System (transmission) zeros. dcgain: DC gain...
  • Crowded Coasts - SLC Geog A Level Blog

    Crowded Coasts - SLC Geog A Level Blog

    Thames Gateway Slide 17 Thames Gateway - complete a detailed table Tsunamis Starter What is a tsunami Slide 22 The development of a tsunami Slide 24 Slide 25 Slide 26 Slide 27 Slide 28 Slide 29 Slide 30 Task outline:...
  • Waves, Tides, and Currents

    Waves, Tides, and Currents

    A wave is a rhythmic movement that carries energy through matter and space. In the ocean, waves move through seawater. Anatomy of a Wave. Crest - the highest point of a wave. Trough - the lowest point of a wave.
  • Co ll eg ea nd A s It

    Co ll eg ea nd A s It

    Students are allowed to put as much money as they choose in each jar and the teacher with the most money after the set amount of time has to kiss a pig Spirit Chairs Students who bring registration in on...
  • Impulse Turbine / Pelton Turbine Pelton Turbines  The

    Impulse Turbine / Pelton Turbine Pelton Turbines The

    Water from the reservoir is brought to the turbine through penstocks, at the end of which a nozzle is fitted. The nozzle converts whole of the available head into the kinetic head in the form of a high velocity jet....
  • Laboratory Safety - University of Utah

    Laboratory Safety - University of Utah

    Penetrating corrosives, such as most alkalis, hydrofluoric acid and phenol, enter the skin or eyes deeply. Penetrating corrosives require a minimum water flush of 60 minutes or longer ... Even though the fire was finally put out by other laboratory...
  • Diapositiva 1 - Mcrferrara

    Diapositiva 1 - Mcrferrara

    Appropriatezza delle Richieste: Esperienza di Ferrara Dr. Lucio Trevisani M.D. di Endoscopia Digestiva A.O.U. "S. Anna" di Ferrara Partecipanti Gastroscopie Colonscopie PROGETTO VALUTAZIONE APPROPRIATEZZA Valutare la percentuale di richieste di Gastroscopie e Colonscopie - ambulatoriali e ricoverati - appropriate secondo...