This is an update to the previous post on the Stephen Milborrow’s R-Language “earth” package which was updated in September of 2020, along with several associated packages. A few things have changed and it is now fairly easy to get all plots for all the basis functions.
Use the data from the previous post, which you can download from GitHub (it hasn’t changed):
https://github.com/wcraytor/MLS_DATA
Read the previous post for more information on the data set. Install and bring up R (do not use R-Studio). Make sure the following packages are installed:
- Formula
- plotmo
- TeachingDemos
- gam
- mgcv
- mda
- MASS
- earth
Then assuming you have downloaded the data to your folder “c:\data\”, execute:
- library(earth)
- library(plotmo)
- MyData = read.csv(“c:\data\MyData.csv”,header=TRUE)
- MyData$”Filteredaddress” <- NULL # Important!! This ensures the address is removed from the input. Spell exactly the same, with same case
- x=data.frame(MyData[,1:(ncol(MyData)-1)])
- y=MyData[,ncol(MyData)]
- b=earth(x,y,nprune=25) # max 25 basis functions
- summary(b,digits=2,style=”pmax”)
- plotmo(b) # this creates the plot
You should get:
y = # or the Sale Price
6.1e+05 #$610,000 base value
+ 234 * pmax(0, 1887 – SaleAge)
– 455 * pmax(0, SaleAge – 1887)
+ 591 * pmax(0, SaleAge – 2164)
– 435 * pmax(0, SaleAge – 4498)
+ 239 * pmax(0, SaleAge – 5439)
+ 49318 * pmax(0, AreaID – 652)
+ 14475 * pmax(0, 654 – AreaID)
– 66058 * pmax(0, AreaID – 654)
– 120 * pmax(0, 1450 – LivingSqFt) # or -120/sf from base for GLA under 1450 sf
+ 148 * pmax(0, LivingSqFt – 1450) # or $148 to base for GLA > 1450 sf
– 6.9 * pmax(0, 15041 – LotSize) # or -$6.90/sf from base for under 15,041sf lot size
+ 6.2 * pmax(0, LotSize – 15041) # or $6.20/sf to base for > 15,041sf
– 22086 * pmax(0, 2 – Garage) # $22,086/car from base for under 2 car garage
+ 85767 * pmax(0, Garage – 2) # $85,767 to base for over 2 car garage
Selected 15 of 16 terms, and 5 of 9 predictors (nprune=25)
Termination condition: Reached nk 21
Importance: SaleAge, LivingSqFt, LotSize, AreaID, Garage, Age-unused, …
Number of terms at each degree of interaction: 1 14 (additive model)
GCV 6.1e+09 RSS 9.4e+12 GRSq 0.82 RSq 0.83
Note 2: Variables like AreaID should probably be treated as a categorical variables. And this can be done with Earth.