R earth Package Update (January 2021)

This is an update to the previous post on the Stephen Milborrow’s R-Language “earth” package which was updated in September of 2020, along with several associated packages. A few things have changed and it is now fairly easy to get all plots for all the basis functions.

Use the data from the previous post, which you can download from GitHub (it hasn’t changed):

https://github.com/wcraytor/MLS_DATA

Read the previous post for more information on the data set. Install and bring up R (do not use R-Studio). Make sure the following packages are installed:

          • Formula
          • plotmo
          • TeachingDemos
          • gam
          • mgcv
          • mda
          • MASS
          • earth

Then assuming you have downloaded the data to your folder “c:\data\”, execute:

      1.   library(earth)
      2.   library(plotmo)
      3.   MyData = read.csv(“c:\data\MyData.csv”,header=TRUE)
      4.   MyData$”Filteredaddress” <- NULL # Important!! This ensures the address is removed from the input. Spell exactly the same, with same case
      5.   x=data.frame(MyData[,1:(ncol(MyData)-1)])
      6.   y=MyData[,ncol(MyData)]
      7.   b=earth(x,y,nprune=25) # max 25 basis functions
      8.   summary(b,digits=2,style=”pmax”)
      9.   plotmo(b)   # this creates the plot

You should get:

y =  # or the Sale Price
    6.1e+05     #$610,000 base value
+     234 * pmax(0,       1887 –    SaleAge) 
  –     455 * pmax(0,    SaleAge –       1887) 

  +     591 * pmax(0,    SaleAge –       2164)
  –     435 * pmax(0,    SaleAge –       4498) 
  +     239 * pmax(0,    SaleAge –       5439)
  +   49318 * pmax(0,     AreaID –        652) 
  +   14475 * pmax(0,        654 –     AreaID) 
  –   66058 * pmax(0,     AreaID –        654) 
  –     120 * pmax(0,       1450 – LivingSqFt)  # or -120/sf from base for GLA under 1450 sf
  +     148 * pmax(0, LivingSqFt –       1450)  # or $148 to base for GLA > 1450 sf
  –     6.9 * pmax(0,      15041 –    LotSize) # or -$6.90/sf from base for under 15,041sf lot size 
  +     6.2 * pmax(0,    LotSize –      15041)  # or $6.20/sf to base for > 15,041sf
  –   22086 * pmax(0,          2 –     Garage)  # $22,086/car from base for under 2 car garage
  +   85767 * pmax(0,     Garage –          2)  # $85,767 to base for over 2 car garage

Selected 15 of 16 terms, and 5 of 9 predictors (nprune=25)
Termination condition: Reached nk 21
Importance: SaleAge, LivingSqFt, LotSize, AreaID, Garage, Age-unused, …
Number of terms at each degree of interaction: 1 14 (additive model)

GCV 6.1e+09    RSS 9.4e+12    GRSq 0.82    RSq 0.83

Note 1:   The large value contribution of $85,767/car for 3+ car garages probably is collinear with quality of construction as 3+ car garages are associated with higher quality homes.  So you should try to keep 2 and 3 car garages separate in the comps or figure out a way to deal with condition adjustments.   This may or may not be a problem, depending on the residual scores and their relation to actual quality (something you have to visually inspect and decide on).

Note 2:  Variables like AreaID should probably be treated as a categorical variables.  And this can be done with Earth.  

EarthPlot-1