Method to improve R square

Date: Aug 15, 2017

When I was building a linear model between biomass and lidar metrics. The R square I got ranges from 0.48 to 0.70. 

Firstly, I thought the low R squared was caused by the calculation of lidar metrics. The lidar metrics I used may be wrong. To fix this, I used ArcMap and LASTool to do the calculation, which gave me not the exactly the same result but pretty much similar. Therefore, I conclude that the lidar metrics are correct. I also get this conclusion by observing the numbers of maximum, mean height of trees which make sense in reality.

Now when I am considering the problem under the promise of correct lidar metrics. How to fix the low R squared problem came to 4 aspects 1) The choice of predictor variables; 2) quality of data; 3) R squared desired to have; 4) model construction.

By reading the papers from other people, I get the idea that metrics only considering height information are enough. Nelson can have R square as high as 0.71 by using only mean height. My lidar metrics are enough for this analysis. Studies have done using this Heiberg data including one paper about biomass estimation from Prof. Im. It's a study about comparing biomass estimation ability among several machine learning algorithms at both plot and single tree level. It does not use linear regression, but from the results of that study, the quality of this data is not as much promising. Among the papers doing biomass estimation using lidar, most people got R square at around 0.70, at least 0.55, rarely higher than 0.80. The model construction is where I need to put more effort on.

Lu et al. (2012) provides a method which did log transformation at both dependent and predictor variables. I tried in the model between biomass and maximum height. The R square improved from 0.4619 to 0.72.

Lu, D. et al. Aboveground Forest Biomass Estimation with Landsat and LiDAR Data and Uncertainty Analysis of the Estimates. Int. J. For. Res. 2012, 1–16 (2012).