Muhdan Syarovy Logo

Data Pre-processing Techniques with Machine Learning to Improve Accuracy of Chlorophyll Estimation in Oil Palm Leaves

Authors: Muhdan Syarovy, Iput Pradiko, Rana Farrasati, Winarna, S Rasyid, C Mardiana, Rizki D. P. Pane, Nuzul H. Darlan, Sumaryanto, Suroso Rahutomo, Fahmi Hidayat, Eka Listia

Original publication: IOP Conference Series: Earth and Environmental Science, 2024 · DOI: 10.1088/1755-1315/1308/1/012054

Illustration of data science and leaf analysis

Imagine having a small tool called a chlorophyll meter that can measure the chlorophyll content of a leaf simply by clipping it to the leaf’s surface. A device like the SPAD-502 has long been a staple for researchers and plantation practitioners to quickly and non-destructively estimate a plant’s nutritional status. However, did you know that the raw data from this device is not immediately usable? This is where the role of pre-processing or data pre-processing using machine learning becomes crucial, especially when we want the predictive model we build to be truly accurate and reliable in the field.

Research conducted by a team from the Indonesian Oil Palm Research Institute (IOPRI) in collaboration with Hasanuddin University specifically investigated how different pre-processing techniques can affect the accuracy of a model in estimating the chlorophyll content of oil palm leaves from portable chlorophyll meter data. Data was collected from three oil palm plantations in Sumatra and Kalimantan, covering various age groups of plants and land conditions. Five pre-processing methods were tested: Savitzky-Golay filter (SG), standard normal variate (SNV), multiplicative scatter correction (MSC), first derivative (FD), and second derivative (SD).

The results were quite interesting. The combination of the MSC-FD method gave the best performance with an R² value of 0.834 and an RMSE of 1.396, meaning the model was able to explain more than 83 percent of the variation in leaf chlorophyll content from just the light reflection measured by the chlorophyll meter. In comparison, if the raw data was used directly without pre-processing, the R² value only reached 0.782. This increase in accuracy may seem small on paper, but in practice, the difference could mean more targeted fertilization decisions, significant savings on fertilizer costs, and ultimately better plant productivity.

What does this mean for oil palm farmers? With the right pre-processing technique, a relatively inexpensive and easy-to-use chlorophyll measuring device can become a reliable decision support tool for managing plant nutrition with precision. There’s no need to send leaf samples to a laboratory and wait for days to determine the plant’s nutritional status. Simply clip, read, and let the machine learning model work behind the scenes, fertilization decisions can be made in a matter of minutes.

There is still room for improvement. This study used data from three locations with a sample size of 108 leaves, and the best model still left around 16 percent of the variation unexplained. The addition of environmental variables such as temperature, humidity, and light intensity during measurement may be able to further increase accuracy. However, as a first step, this study paves the way for the development of a practical, fast, and affordable oil palm (Elaeis guineensis) nutrient monitoring system based on machine learning.


References:
Syarovy, M., et al. (2024). Pre-processing techniques using a machine learning approach to improve model accuracy in estimating oil palm leaf chlorophyll from portable chlorophyll meter measurement. IOP Conference Series: Earth and Environmental Science, 1308(1), 012054. DOI: 10.1088/1755-1315/1308/1/012054