Optimizing Controlled Environmental Agriculture for Strawberry Cultivation Using RL-Informer Model

Lu, Yuze; Gong, Mali; Li, Jing; Ma, Jianshe

doi:10.3390/agronomy13082057

Open AccessArticle

Optimizing Controlled Environmental Agriculture for Strawberry Cultivation Using RL-Informer Model

¹

Key Laboratory Photonic Control Technology, Ministry of Education, Tsinghua University, Beijing 100083, China

²

International Joint Research Center for Smart Agriculture and Water Security of Yunnan Province, Yunnan Agricultural University, Kunming 650201, China

³

Division of Advanced Manufacturing, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2023, 13(8), 2057; https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13082057

Submission received: 5 July 2023 / Revised: 27 July 2023 / Accepted: 31 July 2023 / Published: 3 August 2023

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Controlled Environmental Agriculture (CEA) has gained a lot of attention in recent years, mainly because of its ability to overcome extreme weather problems and ensure food safety. CEA can meet the full growth state monitoring of the crop period; however, the optimization of the growing environment is still limited by the algorithm defects. In this article, we present an optimization method of growing environment based on reinforcement learning, Q-learning and the time-series prediction model Informer. This approach is demonstrated for the first time as far as we know. By employing Informer, we predicted the growth of strawberries based on current environmental variables and plant status. The prediction results served as a reward to motivate Q-learning, guiding it to make optimal modifications to the environment in real-time. This approach aimed to achieve the optimal cultivation environment continuously. Two groups of validation experiments were conducted based on different cultivation objectives: “obtaining the most stolons” and “obtaining the highest fruit count”. Compared to the empirically planted groups, the experimental groups using the RL-Informer model achieved yield increases of 17.81% and 20.78%, respectively. These experiments highlight the outstanding performance of the proposed RL-Informer model in real-time prediction and modification of environmental variables.

Keywords:

CEA; environment optimization; strawberry cultivation; multi-source data; RL-Informer

1. Introduction

Controlled Environmental Agriculture (CEA) is currently one of the most effective means of resolving the problems of agricultural land constraints, extreme climatic conditions and out-of-seasonal cultivation [1], which can also be accomplished with improved control of the growing environment [2] and better resource usage [3]. With the challenges that the world population is growing rapidly and the food problem is becoming more and more serious, CEA is increasingly supported and respected by more and more countries [4,5]. CEA enables vertical planting and soil-less cultivation, allowing for manual blending of specific nutrients, requiring smaller planting space and less water, while also avoiding pests and diseases effectively [6].

With the rapid development of the Internet of Things (IoT) technology [7], changes in CEA environmental variables are more easily monitored by people. Through wireless or wired methods, the information can be collected by various sensors in time and transferred to the main control computer. These sensors have fast communication speed, low power consumption, wide distribution and low cost. Multiple types and heterogeneous sources of data collection and processing such as temperature, humidity, light quality,

{CO}_{2}

and nutrient solution electrical conductivity (EC value) can be realized [8,9,10]. This has also led to the popularity of computer science in agriculture, gradually replacing traditional manual empirical farming methods. Depending on the IoT and computer technology mentioned, researchers have tried to use the rich monitoring data currently available to predict future crop growth. For example, ref. [11] predicted apple fruit size using environmental indicators based on temperature and photosynthetically active radiation (PAR) levels. Ref. [12] used the natural logarithm and modified Gompertz function (models) of regression to predict apple fruit size and yield at an early stage, based on differences in the biological characteristics of the apple tree itself—the fruiting period. For other types of plants and monitoring indicators, ref. [13] predicted commercial stone fruit cultivars depending on the pollen tube growth and pollen germination percentage; Ref. [14] extensively collected weather data with micro-sensors and measured spatial and temporal variations in temperature, relative humidity (RH), soil moisture and photosynthetically active radiation (PAR) irradiance to make predictions about strawberry growth; Ref. [15] also mentioned the use of IoT technology to detect

{CO}_{2}

and nutrient solution EC to help farmers predict yields. The wide variety of types of monitoring indicators and predicting algorithms have shown the positive attitude of researchers towards crop prediction. The tradition algorithms have been widely used, including correlation coefficients [16,17], regression methods [18], principal component (PAC) [19], etc. But correlation coefficient methods often fall short in accurately capturing nonlinear and nonmonotonic relationships among multiple variables. Regression methods struggle to handle multicollinearity between several variables, potentially resulting in overfitting issues, especially with high-dimensional data. Additionally, PAC methods may fail to deliver reliable results when sample quality is poor or when significant noise exists in the data.

Additionally, the complexity of the algorithm determines the potential for overfitting or underfitting in prediction results, and the adjustability of these algorithms may be limited. Moreover, these algorithms frequently require manual selection and adjustment of model parameters, relying on the expertise and experience of domain experts. Integrating and processing heterogeneous data can pose challenges for traditional algorithms, leading to issues such as data consistency, pre-processing, and feature selection. Furthermore, traditional algorithms primarily rely on offline modeling and predictions, lacking the capability to dynamically adapt and provide real-time feedback in dynamic environments. More importantly, traditional algorithms lack “memory” and encounter difficulties in considering timing issues in plant growth.

Therefore, deep learning prediction methods have been developed very quickly these years. Ref. [20] used the YOLOv3 network to detect the number of mushrooms growing based on machine vision technology to predict the growth rate and final yield of mushrooms in advance. Ref. [21] predicted the yield of lettuce crops grown in three different hydroponic systems with reference data for leaf number, water consumption, dry weight, stem length and stem thickness. Moreover, ref. [22] discussed the feasibility of machine learning methods to predict nutrient uptake in hydroponic systems for plant growth. In soil-based cultivation, soil physicochemical data and plant age were used to predict the RGB mean value in strawberry leaf, and further feedback on the health of strawberry plants based on machine learning in [23]. For multi-source heterogenous data, ref. [24] successfully predicted rice yield two months before maturity using Random Forest (RF) and Long Short-Term Memory Network (LSTM) models in combination with contiguous solar-induced chlorophyll fluorescence (SIF) and enhanced vegetation index (EVI).

Nevertheless, the challenges of deep-learning-based algorithms are obvious as well. According to [25], nearly 80% of the existing network models for prediction tasks using deep learning in a CEA environment were of CNN-based, and RNN-based and its derivative networks only accounted for about 10%. It must be acknowledged that RNN and time-series models continued to perform well in crop growth prediction tasks, and [26,27,28,29] then relied on environmental time-series variables to perform excellently. Additionally, CNN-based networks face limitations in processing time-series data directly due to their inability to model the temporal dimension. This constraint may hinder the performance of CNN models when dealing with medium- and long-term time-series prediction problems [30]. Also, it is worth noting that the image data used in CNN models represent environmental variables that have already influenced and manifested in the plant. In addition, RNN-based studies solely focus on predicting results based on current variables and do not offer feedback for modifying these variables.

Based on the aforementioned research status and challenges, this article presents a new network model RL-Informer that leverages multiple heterogeneous sources of information to make prediction with timely feedback capability. RL-Informer facilitates the prediction of future growth of a specific plant (in this study, the focus is on strawberries) by considering the current environmental variables and the present characteristics of the plant. Moreover, it offers prompt feedback regarding modifications to environmental variables to cater to diverse cultivation objectives. The proposed RL-Informer is mainly based on the Informer network [31] for improvement, and Informer is an improved network based on Transformer [32]. Transformer is known for its ability to capture long-term temporal data from input data, efficiently train large models, reduce noise in small data, and handle high-dimensional data, and has been widely used in plant prediction research in recent years [33]. Moreover, its self-attention mechanism is more robust than RNN-based models in the task of processing time-series data [34]. Compared with the Transformer, Informer has a self-attention distillation mechanism that can significantly reduce the redundant invalid parameters in the model and the computational cost. Moreover, a parallel generative decoder mechanism is employed so that outputs all predict results in one forward calculation for long time-series, which greatly improves the inference speed of long series prediction. Afterward, reinforcement learning [35] is incorporated into the model to deliver timely input for modifying the environmental variables, and the Agent learns via “trial and error” to ensure that the predicted cumulative reward is increased over time. First, reinforcement learning algorithm makes various modification assumptions about the current environment; Informer makes different predictions about the future growth state of plants based on assumed environmental data. The reinforcement learning algorithm uses the predicted growth state as “Reward” to deduce which assumed environment can lead to the best future growth state. Consequently, the assumed environmental condition becomes the new environment for the future period. RL-Informer then completes the complete prediction and feedback process.

2. Materials and Methods

2.1. Experimental Subject and Parameter

Strawberry (Fragaria × ananassa Duch. “Zhang Ji”) plant sample was chosen as experimental subject in this study. The reasons are: (1) different varieties of strawberry have been increasingly become the focus of planting in the CEA environment [36], as the quality of people’s lives improves; (2) strawberry has multiple growth stages [37], and the environmental variables required for different periods vary, so feedback from the network model is very necessary; (3) strawberry is grown for a variety of purposes: such as cultivating stolons for reproduction [38] and harvesting high-quality strawberry fruit, therefore, the environmental variables required for each planting purpose are different as well.

2.1.1. Acquired Experimental Data

In this section, the data involved in the training were divided into low-frequency and high-frequency groups; or, according to the nature of the data, into environmental and plant variables. It is important to note that the plant parameters were the dependent variables of the environmental variables, but they were also the independent variables of the plant variables at the next observation moment. The data in the study were all time-series data. The environmental variables collected in the experiment are shown in Figure 1.

High frequency data:
- Environmental Temperature: [39] has reported the best day/night temperature was 25/12 °C for leaf and stolon, 18/12 °C for root and fruit, and 25/12 °C for the total plant; and [40] has mentioned strawberry samples remained vegetative above 24 °C. Therefore, the temperature range in this experiment was set at 30/12 °C, which was the normal temperature range for strawberry growth. The attention needs to be paid to: the LED source lighting was used in CEA, in the “daytime” period when the light source worked, environmental temperature would rise due to LED heat dissipation [41], resulting in temperature fluctuations around the plants, only to control the temperature within an acceptable range. Environmental temperature data were collected in following form:
  
  $T_{t} = (T_{1}, T_{2}, T_{3}, \dots, T_{n}), (n \in 1, 2, 3, \dots, N)$
  
  (1)
  
  where $T_{n}$ is temperature at n-th collecting time with a 10-min interval. The temperature was monitored by a temperature detection system (Hualixin, TK6071iQ, Shenzhen, China) with Bluetooth and an infrared remote control module, which could remote control various brands of air conditioners separately. To control the temperature within the optimal range, a central air conditioner (Midea KFR-72T2W/B3DN1-XG(1), Guangzhou, China) and multiple wall-mounted air conditioners (Midea KFR-72LW/N8MZB1, Guangzhou, China) were used to control the temperature of each group separately, thus maintaining the proper temperature difference between day and night.
- Lighting Condition: Lighting is one of the most important conditions for CEA cultivated crops. In the experiments, it was expressed by Photosynthetic Photon Flux Density (PPFD) and the optimal light intensity and period varied depending on the strawberry variety, planting purpose and growth stage. According to to [42], maximum light intensities of 200, 250, 300 and 350 μmol $(m^{2} \cdot s)$ were selected. Although [43,44] confirmed strawberry samples have grown actively with increasing light intensity up to 450 $μ mol / (m^{2} \cdot s)$ , the heat dissipation from an LED operating at high power would have a huge impact on environmental temperatures. The changes of light were collected by a spectrometer (StellarNet, BLK-CXR-SR, Tampa, FL, USA). The relative intensities of the light components are calculated by Equation (2) and shown in Figure 2.
  
  $r e l (L) = \frac{L_{n}}{max (L_{n})}$
  
  (2)
  
  where $r e l (L) =$ is the relative intensity of each band of the spectrum, $(L_{n})$ is the real intensity of each band of the spectrum, obtained by the spectrometer.
- Nutrient Solution EC Value: According to [45], Yamazaki nutrient solution was used in the hydroponic process, but the concentration of the nutrient solution provided was not the same. Electrical conductivity meters (Shangtai, EC-4110-I, Dongguan, China) was used to measure the EC value of the nutrient solution each time it was replaced, so that the EC would fluctuate within a proper range and was measured and replenished every 3 to 5 days. We avoided strictly controlling the nutrient EC value as a larger range of variation and more complex trends enriched our dataset and better represented reality. This approach effectively prevented model overfitting and enhanced model robustness.

Figure 2. Relative intensity of illumination light components.

Low frequency data:

Plant Height, Leaf Size and Count of Leaf: Plant height information is not only a good measure of strawberry growth status, but also easy to measure. A vernier caliper was used to measure the portion above the root to the highest petiole as plant height information. In addition, the size of the leaf and the count of leaves determine the photosynthetic rate of the strawberry plant, which directly determines new bud differentiation and fruit yield. However, strawberry leaves are terately compound [46] and irregularly shaped, so the total length and width of each compound leaf was used as the data for consideration. These data were collected every 3 to 5 days; and data definitions are shown in Figure 3.

Strawberry samples from different groups and the growing conditions are shown in Figure 4. Figure 4A shows randomly selected strawberry samples from different groups taken on the 7th day after transplanting. Figure 4B shows how the strawberries were grown with the LED off and on, respectively.

2.1.2. Prediction Targets

Given that the selected variety for this study was a day-neutral strawberry cultivar, it possesses the ability to flower and fruit continuously under favorable environmental conditions. With respect to the planting objectives, two specific prediction targets were established: the cumulative count of stolons and the cumulative count of fruits. To ensure uninterrupted growth of both stolons and fruits, the removal of stolons occurred upon the development of three compound leaves or when the fruits reached maturity. Due to their gradual progression, data collection intervals of 3 to 5 days were employed.

According to the environmental variables, the grouped experiments in this study are presented in Table 1. The combination of different experimental conditions in groups is not strictly aligned, which could increase the diversity of the data. At the initiation of the experiment, seedlings derived from stolons of the parent plants, which had been individually cultivated for a period of 3 months and had at least developed 3 compound leaves, were carefully chosen. Each group of samples was 30 plants.

2.2. Informer Enhanced with Reinforcement Learning

2.2.1. Time-Series Prediction Informer Network

The Informer was applied to predict the future growth according to the current environmental variables in this task. Depending on the planting purposes, the prediction targets were set as the cumulative count of stolons and the cumulative count of fruits. And its predicted results would be used as rewards in subsequent reinforcement learning. Based on Transformer, Informer has added several important changes to improve prediction capability and save computing costs, among which, the biggest enhancements to this task are

In the classical Transformer network, the introduction of the multi-head attention mechanism leads to the presence of numerous redundant parameters during the network operation. This implied that most attention heads had a relatively weak impact, while only a few point pairs significantly influenced the primary attention mechanism. To quantify the sparsity of a query, the Kullback–Leibler divergence [47] was employed to calculate the relative entropy between the attention probability distribution of a query and a uniformly distributed probability distribution. Subsequently, the insignificantly varying queries, referred to as “Lazy” Queries, were eliminated, resulting in the ProbSparse attention mechanism as follows:

$A (Q, K, V) = s o f t max (\frac{\bar{Q} K^{⊤}}{\sqrt{d}}) V$

(3)

where $\bar{Q}$ is the sparse matrix of the same size of q and it only contains the top-u Queries under the sparsity measurement $M (q, K)$ . Through the above steps, the “Lazy” Query can be eliminated and the “Active” ones are maintained; thus, the space complexity is reduced from $O (L^{2})$ to $O (L l n L)$ .
To address the challenge of predicting long sequences, a generative style inference method is adopted in conjunction with a standard decoder structure. This approach aims to mitigate the decrease in prediction speed associated with long-term predictions. The decoder is provided with the following vectors:

$X_{d e}^{t} = C o n c a t (X_{t o k e n}^{t}, X_{0}^{t}) \in R^{(L_{t o k e n} + L_{y}) \times d_{m o d e l}}$

(4)

where $X_{t o k e n}^{t} \in R^{L_{t o k e n} \times d_{m o d e l}}$ is the known guiding sequence, $X_{0}^{t} \in R^{L_{y} \times d_{m o d e l}}$ is the sequence to be predicted. This decoder style can output all $X_{0}^{t}$ at once based on the known guiding sequence. This improvement allowed for more consistent prediction results and allowed for more intuitive access to the changes in strawberry plants over time in the future.
In the classical Transformer model, the Positional Embedding mechanism was introduced to capture the temporal order of the data. However, when dealing with the prediction of strawberry growth, where the biological characteristics of strawberries varied under diurnal conditions, the Global Time Stamp mechanism in the Informer model was better suited for handling such time-dependent temporal series. The Global Time Stamp mechanism involved attaching a timestamp to each input data point during training. This mechanism allowed the model to incorporate comprehensive time information into the sequence data, thereby enhancing its modeling capacity and improving prediction accuracy for time-series tasks. By utilizing the Global Time Stamp mechanism, the model could better understand and capture the time-dependent patterns and dynamics of strawberry growth, leading to improved performance in the prediction task. The Global Time Stamp mechanism is illustrated in Figure 5.

2.2.2. Q-Learning Feedback Models

The purpose of reinforcement learning in this task is to provide feedback on the current environment variables modifications based on the results of Informer predictions. Q-learning is one of the most commonly utilized model-free algorithms in reinforcement learning, which utilizes a Markovian decision process to make a certain behavior in each state so that the agent ends up with the maximum reward [48,49]. The actions (modification of environment variables in next steps) were made based on each current state (current environment variables) so that the intelligence agent (strawberry plant) could move towards a greater reward (most count of stolons or fruits) in the next state (new environment variables), and the execution was repeated until the final maximum reward was obtained. The rewards here were the result predicted by Informer. The future state update process of Q-learning is shown in Equation (5):

\begin{matrix} Q^{n e w} (s_{t}, a_{t}) \leftarrow & (1 - α) Q (s_{t}, a_{t}) \\ + α [R (s_{t}, a_{t}) + γ max_{a \in A} Q (s_{t + 1}, a_{t})] \end{matrix}

(5)

where

s_{t}

is the current state and

a_{t}

is the current action of the Q-learning;

Q (s_{t}, a_{t})

is the Q-value of

s_{t}

and

a_{t}

,

Q^{n e w} (s_{t}, a_{t})

is the updated Q-value of

s_{t}

and

a_{t}

;

(1 - α) Q (s_{t}, a_{t})

is the current Q-value weighted by learning rate

α

. Moreover,

R (s_{t}, a_{t})

is the reward if action

a_{t}

is taken when is state

s_{t}

;

γ

is discount factor;

max_{α \in A} Q (s_{t + 1}, a_{t})

is the optimal future value estimation. The Q-value initial conditions of the proposed technique are zero. Algorithm 1 presents the update of the environment variables for Q-learning in each state of strawberry:

Algorithm 1 Q-learning agent training process.

1:: initialization: $Q = 0, ϵ = 1$ ;
2:: Select $a_{t}$ based on Equation (5) and $a_{t} = arg max_{a} (Q (s_{t + 1}, a_{t}))$ ;
3:: Observe $s_{t + 1}$ and R based on $a_{i}$ ;
4:: Calculate $Q^{n e w} (s_{t}, a_{t})$ based on Equation (5);
5:: $s_{t} \overset{}{\to} s_{t + 1}$ ;
6:: $ϵ \overset{}{\leftarrow} ϵ - \frac{1}{n}$ ;
7:: Repeat: Steps (2)–(5);

At each time step, the intelligent agent selected the action to be performed based on the current Q function and the

ϵ -greedy

policy. The

ϵ -greedy

policy randomly selected an action with a probability of

ϵ

and chose the action with the highest current Q value with a probability of

1 - ϵ

, striking a balance between exploration and exploitation. In Q-learning, states, actions and rewards should be clearly defined, as definition in Table 2. Immediate reward was required for stimulation at each state update of Q-learning. According to the planting purpose, the change in the total count of stolons (fruits) in the predicted time period and in each group of strawberry plants was used as a reward in this experiment. The immediate reward was defined as Equation (6):

R (s_{t}, a_{t}) = \{\begin{matrix} N_{s t o l o n, t + 1} - N_{s t o l o n, t} \\ N_{f r u i t, t + 1} - N_{f r u i t, t} \end{matrix}

(6)

where

N_{t}

is the count of stolons (fruits) at

s_{t}

state,

N_{t + 1}

is the count of stolons (fruits) after

a_{t}

, that is the

s_{t + 1}

predicted by Informer.

2.2.3. Data Loading and Training of RL-Informer

The collected data of environment variables were utilized to train the Informer model, which was used to provide rewards for Q-learning. In the experiment, data were recorded for 200 days. The wilted leaves, stolons and damaged fruits were removed in time to keep the strawberry plants in a healthy growth. In order to make all the data of the same length, the low-frequency data were filled backward to the high-frequency length, and the data length was

200 \times 144 = 28,800

. The division of the training and test sets was 7:3. The key hyper-parameters in training process are indicated in Table 3. The training, prediction and feedback process of the network is shown in Figure 6, where Q-learning provided feedback on the optimal actions based on the current environment (state). The model was operated on a personal computer with Microsoft Windows 10, Intel i7-10750H CPU (Rio Rancho, NM, USA) and NVIDIA GeForce RTX 2070 (Zhengzhou, China).

The prediction process of the model was similar to the training and testing process. Out-of-sample data were fed into the RL-Informer model, and these existing data contained all historical environmental variables and plant parameters up to the present time. Q-learning first made assumptions about future environmental data based on the existing data, i.e., assumed future actions. Then, based on the future environmental data, the Informer part predicted the corresponding future plant parameters and number of stolons (fruits), with the largest number as the maximum reward. Each action and its reward corresponded to each other, allowing Q-learning to explore the maximum reward value and return the action for each step. The returned action was the feedback for modifying the current environment variables, and RL-Informer realized the real-time feedback function of the environment.

The strawberry cultivation process in this study was divided into two parts: the data collection section and the validation section. In Section 1, various data collection of strawberry plants and the environment were conducted through group experiments; in Section 2, the results of strawberry samples grown in general and those grown with the guidance of the RL-Informer model were compared and analyzed. The experimental validation is stated in detail in Section 3.

3. Results and Discussion

3.1. Evaluation of Strawberry Growth Prediction

In this research task, the Informer part was trained firstly to ensure that the Informer network part could make accurate predictions of the predicted targets using the current environmental variables and crop biological parameters. The parameters involved in the training were: environment temperature, intensity of illumination light, EC value of nutrient solution; and biological parameters: plant height, leaf size and count of leaves. Among them, the biological parameters were influenced by the environmental variables and, likewise, they directly determined the growth trend of the plants in the short term future. Considering that strawberry samples required different optimal conditions in different growth periods [50], the time-series data were manually labeled with growth, flowering and fruiting periods to match the Global Time Stamp mechanism. The predicted targets were the count of stolons and the count of fruits. The network was trained with 200 days of 10 min data, and the training loss and test loss during the training process are illustrated in Figure 7.

The hyperparameters for training the model were defined according to Table 3. It should be noted that in the Informer part, the ratio of the guiding sequence to the predicted sequence was not strictly required. A larger ratio of the former to the latter tended to yield more accurate prediction results, but it could also lead to issues such as high prediction computational pressure, low prediction efficiency, and overfitting. After careful consideration and numerous attempts, we set the guiding sequence to 360 (2.5 days) and the predicted sequence to 72 (0.5 days), which resulted in the best training results.

After 40 epochs of training, the train loss and test loss both stabilized, and there were no drastic changes observed, which could be regarded as the convergence of the results and the end of training.

In order to demonstrate the outperformance of Informer in this task, the classical LSTM network and Transformer network were used as compared network models with Informer for prediction task efficacy. The prediction performance of the three network models for the mean total count of stolon and count of fruit within multiple groups of strawberry plants is shown in Figure 8, where ground truth is the true value of the data we have collected.

Two evaluation metrics of the prediction results including

R^{2}

(measurement of the correlation between ground truth and predicted values) and

S E P

(percent standard error of the prediction) were conducted to measure the performance of three prediction models. The evaluation metrics equations are

R^{2} = 1 - \frac{S S E}{S S T O} = 1 - \frac{\sum_{i = 1}^{n} {(X_{t r u t h, i} - X_{p r e d, i})}^{2}}{\sum_{i = 1}^{n} {(X_{t r u t h, i} - {\bar{X}}_{t r u t h, i})}^{2}}

(7)

S E P (%) = \frac{100}{{\bar{X}}_{t r u t h, i}} \sqrt{\frac{1}{N} \sum_{i = 1}^{n} {(X_{t r u t h, i} - X_{p r e d, i})}^{2}}

(8)

where

X_{t r u t h, i}

is the ground truth collected from strawberry samples,

X_{t r u t h, i}

is the predicted results from different models and

{\bar{X}}_{t r u t h, i}

is the mean value of the ground truth. The evaluation results are listed in Table 4.

In the comparative experiments of the three prediction models, Informer showed the best comprehensive evaluation, followed by Transformer and LSTM. Mainly because of the simple network structure of LSTM, which led to low prediction accuracy, Transformer had a more complicated network structure compared to LSTM, but it required more training data. Moreover, both compared networks performed as quite inferior to Informer in the long-term prediction.

3.2. Experiment of RL-Informer Feedback

3.2.1. Feedback on Cultivation Environment

The prediction part of RL-Informer provided rewards for the subsequent reinforcement learning. Because the strawberry samples selected for the experiments are day-neutral variety, they could grow, flower and fruit continuously in a suitable environment, which indicated that the immediate reward after each action was readily available through the prediction network.

For a more detailed demonstration of the Q-learning workflow, its environmental feedback (Actions) and the corresponding short-term predicted results (Rewards) are shown in Figure 9.

It should be noted that in the experiments, the minimum variation scale was 1 °C of temperature, 10

μ mol / (m^{2} \cdot s)

of light intensity, and 50

μ s / cm

of EC value. After Q-learning made each action, Informer would predict how much the count of stolons/fruits would increase in the next several days if the action was performed, and took the predicted result as rewards. Figure 9 shows that in the task of obtaining the maximum count of fruits, Q-learning made five actions. The strategy with the most fruit increase was used as the final execution strategy: in Figure 9A, action 2 was selected as the final strategy on day 88, with its reward of a 24 increase on day 94; in Figure 9B, based on action 2, action 2–3 was selected as the final strategy on day 90, and the reward was 64 of increase. Moreover, actions in these two policies are indicated in Table 5:

In the experiments, we randomly took the predicted results within 5 to 7 days as rewards; and changed the new policy once in the same interval. The environmental temperature could be controlled separately by using an air conditioner, the light intensity was controlled by grouping by Bluetooth module, and the EC value of nutrient solution could be adjusted by adding water to dilute or replenish the nutrient solution.

According to the planting feedback provided by RL-Informer, the environment variables would be changed in time to achieve the optimal cultivation purposes.

3.2.2. RL-Informer Yield Enhancement Impact Experiment

To verify the performance of cultivation feedback of the proposed model, four groups of comparative experiments were set up: the group for obtaining stolons and the group for obtaining fruits planted empirically, and the group for obtaining stolons and the group for obtaining fruits under the guidance of the RL-Informer model. In each group, 50 strawberry seedlings with at least three compound leaves were selected after being grown independently for 3 months, at which stage each strawberry had the ability to grow stolons and to flower and fruit continuously; the culture time was 200 days as well. The empirically planted group was referenced to [51,52,53]. RL-Informer made timely feedback on environmental variables based on current environmental variables, biological parameters, and growth periods. Because the feedback data were too long, Figure 10 shows the environmental parameters statistics for the comparative experiments in different growth periods. Following that, the obtained biological parameters of the strawberry are thus shown in Figure 11.

In the experiment designed to obtain the maximum count of fruits, the median temperature suggested by RL-Informer was 18.7 °C, lower than the empirically planted group temperature (21.6 °C), and lower overall. Refs. [54,55] also proved the promotion of strawberry fruiting by a relatively lower temperature. The maximum light intensity of 335

μ mol / (m^{2} \cdot s)

and the upper quartile of 314

μ mol / (m^{2} \cdot s)

were also significantly higher than those of the empirically planted group, but the median of 272

μ mol / (m^{2} \cdot s)

was close to that of the empirically planted group; this was caused by the lower light intensity suggested by RL-Informer during the dormant state [56,57] of strawberry after each flowering and fruiting period. During the dormant period, the strawberry plants mainly engage nutrient accumulation and leaf growth, and the appropriate reduction of light intensity during this period can maintain the vegetative propagation of strawberry while saving electricity. For the EC value of nutrient solution, RL-Informer gave a significantly higher result (1067

μ s / cm

) than the empirically planted group (924

μ s / cm

) for the whole period. While, in the experiment designed to obtain the maximum count of stolons, RL-Informer recommended higher environmental temperatures (maximum 29.5 °C, median 22.4 °C), and lower EC values (maximum 945

μ s / cm

, median 816

μ s / cm

) for the nutrient solution; while there were still no significant modifications for light intensity. Ref. [58] has studied the lower concentration of nutrient solution was more beneficial to the occurrence of daughter plants of strawberry, which is consistent with the results output by the proposed model.

The two experiments guided by the RL-Informer model increased the total count of leaves by 14.6% and 22.02%; the average maximum leaf size by 29.78% and 11.10%; and the average plant height by 4.68% and 5.17% compared to the control groups. It could provide more nutrient accumulation for the stolon growth and fruiting. As a result, the cumulative count of stolons and fruits in four experimental groups are showed in Figure 12.

In Figure 12A, stolon counts were obtained in the RL-Informer guided group more than 80% of the whole time, and the experimental group maintained a steady increase in stolon count near the end of the experiment when the control group entered an inactive dormant period; the final stolon yield was 17.81% more than the control group. In Figure 12B, the experimental group results were better than those of the control group throughout and obtained a 20.78% increase in the total count of the fruit yielded.

4. Conclusions

We developed an RL-Informer model based on Q-Learning and a time-series prediction network Informer for short-term growth prediction of strawberries and feedback of opinions on current environmental optimization.

The method utilized the Informer network to predict the count of strawberry stolon and fruit production and used the predictions as a reward for the policies of Q-Learning, thus helping reinforcement learning to try the optimal actions in different environments. The RL-Informer model relied on environmental temperature, light intensity, nutrient solution EC value and plant body parameters; the Informer part obtained the best prediction performance (compared with LSTM and Transformer), fully exploiting the excellent prediction effect of Informer under multiple prediction variables. The prediction result of each action was used as the reward of Q-Learning, and RL-Informer selected the action corresponding to the best reward in the current state to execute in order to obtain the best planting environment for optimal harvesting. In the validation experiment, real-time environmental monitoring and opinion feedback on strawberry cultivation using RL-Informer achieved 17.81% increase in stolon yield and 20.78% increase in fruit number.

When using the RL-Informer model, growers can utilize multiple sensors to gather information from various sources and specify the plant traits to be predicted. The model will predict future changes in plant traits based on current parameters and provide feedback on the optimal environmental parameters to meet the grower’s requirements. The grower can then promptly adjust the environment based on the feedback from the model. In addition, the RL-Informer model is not only applicable to strawberry cultivation. With the development of CEA, plant growth full period data are becoming more and more accessible, and the proposed model can be used for better crop yield prediction and environmental feedback to achieve maximum harvests.

Author Contributions

Y.L. was primarily responsible for conceiving the method and writing the source code and the paper. M.G. designed the experiments and revised the paper. J.L. and J.M. provided the equipment of experiments. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rouphael, Y.; Kyriacou, M.C.; Petropoulos, S.A.; De Pascale, S.; Colla, G. Improving vegetable quality in controlled environments. Sci. Hortic. 2018, 234, 275–289. [Google Scholar]
Nichols, M. Plant factories-the ultimate in controlled environment agriculture. In ICESC2015: Hydroponics and Aquaponics at the Gold Coast 1176; International Society for Horticultural Science: Leuven, Belgium, 2015; pp. 17–22. [Google Scholar]
Engler, N.; Krarti, M. Review of energy efficiency in controlled environment agriculture. Renew. Sustain. Energy Rev. 2021, 141, 110786. [Google Scholar] [CrossRef]
Benke, K.; Tomkins, B. Future food-production systems: Vertical farming and controlled-environment agriculture. Sustain. Sci. Pract. Policy 2017, 13, 13–26. [Google Scholar]
Ragaveena, S.; Shirly Edward, A.; Surendran, U. Smart controlled environment agriculture methods: A holistic review. Rev. Environ. Sci. Bio/Technol. 2021, 20, 887–913. [Google Scholar] [CrossRef]
Li, C.; Adhikari, R.; Yao, Y.; Miller, A.G.; Kalbaugh, K.; Li, D.; Nemali, K. Measuring plant growth characteristics using smartphone based image analysis technique in controlled environment agriculture. Comput. Electron. Agric. 2020, 168, 105123. [Google Scholar] [CrossRef]
Farooq, M.S.; Riaz, S.; Abid, A.; Umer, T.; Zikria, Y.B. Role of IoT technology in agriculture: A systematic literature review. Electronics 2020, 9, 319. [Google Scholar] [CrossRef] [Green Version]
Dan, L.; Xin, C.; Chongwei, H.; Liangliang, J. Intelligent agriculture greenhouse environment monitoring system based on IOT technology. In Proceedings of the 2015 International Conference on Intelligent Transportation, Big Data and Smart City, Halong Bay, Vietnam, 19–20 December 2015; pp. 487–490. [Google Scholar]
Pallavi, S.; Mallapur, J.D.; Bendigeri, K.Y. Remote sensing and controlling of greenhouse agriculture parameters based on IoT. In Proceedings of the 2017 International Conference on Big Data, IoT and Data Science (BID), Pune, India, 20–22 December 2017; pp. 44–48. [Google Scholar]
Malika, N.Z.; Ramli, R.; Alkawaz, M.H.; Johar, M.G.M.; Hajamydeen, A.I. IoT based Poultry Farm Temperature and Humidity Monitoring Systems: A Case Study. In Proceedings of the 2021 IEEE 9th Conference on Systems, Process and Control (ICSPC 2021), Malacca, Malaysia, 10–11 December 2021; pp. 64–69. [Google Scholar]
Stanley, C.; Stokes, J.; Tustin, D. Early prediction of apple fruit size using environmental indicators. In Proceedings of the VII International Symposium on Orchard and Plantation Systems, Nelson, New Zealand, 30 January–5 February 2000; Volume 557, pp. 441–446. [Google Scholar]
Stajnko, D.; Rozman, Č.; Pavlovič, M.; Beber, M.; Zadravec, P. Modeling of ‘Gala’apple fruits diameter for improving the accuracy of early yield prediction. Sci. Hortic. 2013, 160, 306–312. [Google Scholar] [CrossRef]
Güçlü, S.F.; Öncü, Z.; Koyuncu, F. Pollen performance modelling with an artificial neural network on commercial stone fruit cultivars. Hortic. Environ. Biotechnol. 2020, 61, 61–67. [Google Scholar] [CrossRef]
Lee, M.A.; Monteiro, A.; Barclay, A.; Marcar, J.; Miteva-Neagu, M.; Parker, J. A framework for predicting soft-fruit yields and phenology using embedded, networked microsensors, coupled weather models and machine-learning techniques. Comput. Electron. Agric. 2020, 168, 105103. [Google Scholar] [CrossRef]
Muangprathub, J.; Boonnam, N.; Kajornkasirat, S.; Lekbangpong, N.; Wanichsombat, A.; Nillaor, P. IoT and agriculture data analysis for smart farm. Comput. Electron. Agric. 2019, 156, 467–474. [Google Scholar] [CrossRef]
de Oliveira, G.A.; Bureau, S.; Renard, C.M.G.C.; Pereira-Netto, A.B.; de Castilhos, F. Comparison of NIRS approach for prediction of internal quality traits in three fruit species. Food Chem. 2014, 143, 223–230. [Google Scholar] [CrossRef] [PubMed]
Zude, M. Comparison of indices and multivariate models to non-destructively predict the fruit chlorophyll by means of visible spectrometry in apple fruit. Anal. Chim. Acta 2003, 481, 119–126. [Google Scholar] [CrossRef]
Ktenioudaki, A.; O’Donnell, C.P.; Emond, J.P.; do Nascimento Nunes, M.C. Blueberry supply chain: Critical steps impacting fruit quality and application of a boosted regression tree model to predict weight loss. Postharvest Biol. Technol. 2021, 179, 111590. [Google Scholar] [CrossRef]
Chia, K.S.; Rahim, H.A.; Rahim, R.A. A comparison of Principal Component Regression and Artificial Neural Network in fruits quality prediction. In Proceedings of the 2011 IEEE 7th International Colloquium on Signal Processing and its Applications, Penang, Malaysia, 4–6 March 2011; pp. 261–265. [Google Scholar]
Lu, C.P.; Liaw, J.J.; Wu, T.C.; Hung, T.F. Development of a mushroom growth measurement system applying deep learning for image recognition. Agronomy 2019, 9, 32. [Google Scholar] [CrossRef] [Green Version]
Mokhtar, A.; El-Ssawy, W.; He, H.; Al-Anasari, N.; Sammen, S.S.; Gyasi-Agyei, Y.; Abuarab, M. Using machine learning models to predict hydroponically grown lettuce yield. Front. Plant Sci. 2022, 13, 197. [Google Scholar] [CrossRef]
Tambakhe, M.D.; Gulhane, V. Prediction of Plant Growth Through Nutrient Uptake in the Hydroponics System Using Machine Learning Approach. In Proceedings of the International Conference on Communication and Computational Technologies: ICCCT 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 453–463. [Google Scholar]
Madhavi, B.G.K.; Basak, J.K.; Paudel, B.; Kim, N.E.; Choi, G.M.; Kim, H.T. Prediction of strawberry leaf color using RGB mean values based on soil physicochemical parameters using machine learning models. Agronomy 2022, 12, 981. [Google Scholar] [CrossRef]
Cao, J.; Zhang, Z.; Tao, F.; Zhang, L.; Luo, Y.; Zhang, J.; Han, J.; Xie, J. Integrating multi-source data for rice yield prediction across China using machine learning and deep learning approaches. Agric. For. Meteorol. 2021, 297, 108275. [Google Scholar]
Ojo, M.O.; Zahid, A. Deep Learning in Controlled Environment Agriculture: A Review of Recent Advancements, Challenges and Prospects. Sensors 2022, 22, 7965. [Google Scholar]
Alhnaity, B.; Pearson, S.; Leontidis, G.; Kollias, S. Using deep learning to predict plant growth and yield in greenhouse environments. In Proceedings of the International Symposium on Advanced Technologies and Management for Innovative Greenhouses: GreenSys2019, Angers, France, 16 June 2019; Volume 1296, pp. 425–432. [Google Scholar]
Alhnaity, B.; Kollias, S.; Leontidis, G.; Jiang, S.; Schamp, B.; Pearson, S. An autoencoder wavelet based deep neural network with attention mechanism for multi-step prediction of plant growth. Inf. Sci. 2021, 560, 35–50. [Google Scholar]
Sun, J.; Di, L.; Sun, Z.; Shen, Y.; Lai, Z. County-level soybean yield prediction using deep CNN-LSTM model. Sensors 2019, 19, 4363. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gong, L.; Yu, M.; Jiang, S.; Cutsuridis, V.; Pearson, S. Deep learning based prediction on greenhouse crop yield combined TCN and RNN. Sensors 2021, 21, 4537. [Google Scholar] [CrossRef] [PubMed]
Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [PubMed] [Green Version]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February–1 March 2021; Volume 35, pp. 11106–11115. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Joshi, A.; Pradhan, B.; Gite, S.; Chakraborty, S. Remote-Sensing Data and Deep-Learning Techniques in Crop Mapping and Yield Prediction: A Systematic Review. Remote Sens. 2023, 15, 2014. [Google Scholar]
Rußwurm, M.; Körner, M. Self-attention for raw optical satellite time-series classification. ISPRS J. Photogramm. Remote Sens. 2020, 169, 421–435. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Samtani, J.B.; Rom, C.R.; Friedrich, H.; Fennimore, S.A.; Finn, C.E.; Petran, A.; Wallace, R.W.; Pritts, M.P.; Fernandez, G.; Chase, C.A.; et al. The status and future of the strawberry industry in the United States. HortTechnology 2019, 29, 11–24. [Google Scholar] [CrossRef] [Green Version]
Hajiboland, R.; Moradtalab, N.; Eshaghi, Z.; Feizy, J. Effect of silicon supplementation on growth and metabolism of strawberry plants at three developmental stages. N. Z. J. Crop Hortic. Sci. 2018, 46, 144–161. [Google Scholar] [CrossRef]
Savini, G.; Giorgi, V.; Scarano, E.; Neri, D. Strawberry plant relationship through the stolon. Physiol. Plant. 2008, 134, 421–429. [Google Scholar] [CrossRef]
Wang, S.Y.; Camp, M.J. Temperatures after bloom affect plant growth and fruit quality of strawberry. Sci. Hortic. 2000, 85, 183–199. [Google Scholar] [CrossRef]
HEIDE, O.M. Photoperiod and temperature interactions in growth and flowering of strawberry. Physiol. Plant. 1977, 40, 21–26. [Google Scholar] [CrossRef]
Fang, H.; Li, K.; Wu, G.; Cheng, R.; Zhang, Y.; Yang, Q. A CFD analysis on improving lettuce canopy airflow distribution in a plant factory considering the crop resistance and LEDs heat dissipation. Biosyst. Eng. 2020, 200, 1–12. [Google Scholar]
Zheng, J.; He, D.; Ji, F. Effects of light intensity and photoperiod on runner plant propagation of hydroponic strawberry transplants under LED lighting. Int. J. Agric. Biol. Eng. 2019, 12, 26–31. [Google Scholar] [CrossRef]
Zheng, J.; Ji, F.; He, D.; Niu, G. Effect of light intensity on rooting and growth of hydroponic strawberry runner plants in a LED plant factory. Agronomy 2019, 9, 875. [Google Scholar] [CrossRef] [Green Version]
Park, Y.; Sethi, R.; Temnyk, S. Growth, Flowering, and Fruit Production of Strawberry ‘Albion’in Response to Photoperiod and Photosynthetic Photon Flux Density of Sole-Source Lighting. Plants 2023, 12, 731. [Google Scholar] [CrossRef] [PubMed]
Jun, H.; Byun, M.; Liu, S.; Jang, M. Effect of nutrient solution strength on pH of drainage solution and root activity of strawberry’Sulhyang’in hydroponics. Korean J. Hortic. Sci. Technol. 2011, 29, 23–28. [Google Scholar]
Jo, J.S.; Sim, H.S.; Jung, S.B.; Moon, Y.H.; Jo, W.J.; Woo, U.J.; Kim, S.K. Estimation and validation of the leaf areas of five June-bearing strawberry (Fragaria × ananassa) cultivars using non-destructive methods. J. Bio-Environ. Control 2022, 31, 98–103. [Google Scholar] [CrossRef]
Hershey, J.R.; Olsen, P.A. Approximating the Kullback Leibler divergence between Gaussian mixture models. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Honolulu, HI, USA, 16–20 April 2007; Volume 4, p. IV-317. [Google Scholar]
Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Kosana, V.; Teeparthi, K.; Madasthu, S.; Kumar, S. A novel reinforced online model selection using Q-learning technique for wind speed prediction. Sustain. Energy Technol. Assess. 2022, 49, 101780. [Google Scholar] [CrossRef]
Yoshida, H.; Mizuta, D.; Fukuda, N.; Hikosaka, S.; Goto, E. Effects of varying light quality from single-peak blue and red light-emitting diodes during nursery period on flowering, photosynthesis, growth, and fruit yield of everbearing strawberry. Plant Biotechnol. 2016, 33, 267–276. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nadalini, S.; Zucchi, P.; Andreotti, C. Effects of blue and red LED lights on soilless cultivated strawberry growth performances and fruit quality. Eur. J. Hortic. Sci 2017, 82, 12–20. [Google Scholar] [CrossRef] [Green Version]
Stuemky, A.; Uchanski, M.E. Supplemental light-emitting diode effects on the growth, fruit quality, and yield of two greenhouse-grown strawberry (Fragaria × ananassa) cultivars. HortScience 2020, 55, 23–29. [Google Scholar] [CrossRef] [Green Version]
Tang, Y.; Ma, X.; Li, M.; Wang, Y. The effect of temperature and light on strawberry production in a solar greenhouse. Sol. Energy 2020, 195, 318–328. [Google Scholar] [CrossRef]
Koskela, E.A.; Hytönen, T. Control of flowering in strawberries. In The Genomes of Rosaceous Berries and Their Wild Relatives; Springer: Berlin/Heidelberg, Germany, 2018; pp. 35–48. [Google Scholar]
Hytönen, T.; Kurokura, T. Control of flowering and runnering in strawberry. Hortic. J. 2020, 89, 96–107. [Google Scholar] [CrossRef] [Green Version]
Grez, J.; Contreras, E.; Sánchez, S.; Alcalde, J.; Gambardella, M. Floral induction and dormancy behaviour in ‘Chilean white strawberry’ (Fragaria chiloensis (L.) Mill. subsp. chiloensis f. chiloensis). Sci. Hortic. 2020, 274, 109648. [Google Scholar] [CrossRef]
Sønsteby, A.; Heide, O.M. Dynamics of dormancy regulation in ‘Sonata’strawberry and its relation to flowering and runnering. CABI Agric. Biosci. 2021, 2, 1–12. [Google Scholar] [CrossRef]
Kim, H.M.; Kim, H.M.; Jeong, H.W.; Lee, H.R.; Jeong, B.R.; Kang, N.J.; Hwang, S.J. Growth of mother plants and occurrence of daughter plants of ‘Maehyang’strawberry as affected by different EC levels of nutrient solution during nursery period. J. Bio-Environ. Control 2018, 27, 185–190. [Google Scholar] [CrossRef]

Figure 1. Changes in collected environmental variables data. (A) Environmental temperature and air condition controlled temperature; (B) LED light intensity; (C) EC value of nutrient solution.

Figure 3. Collection of information of strawberry plant height, leaf size and count of leaf.

Figure 4. (A,B) Strawberry growing environment and condition of strawberry plant growth.

Figure 5. Global Time Stamp mechanism in Informer.

Figure 6. Training, prediction and feedback process of RL-Informer model.

Figure 7. Train loss and testing loss of the Informer network part.

Figure 8. Prediction results of LSTM, Transformer and Informer. (A) Prediction results of four prediction models for stolons; (B) Prediction results of four prediction models for fruits.

Figure 9. Different actions and corresponding rewards. (A) Corresponding predicted rewards for the first strategy implementation; (B) corresponding predicted rewards for the second strategy implementation.

Figure 10. Details of environmental variables in comparative experiments. (A) Temperature box plots; (B) light intensity box plots; (C) nutrient solution EC value box plots.

Figure 11. Changes in biological parameters of strawberries in four groups of experiments. (A) Changes of the cumulative count of leaves in each group; (B) average size of the largest leaf for each group on the 200th day; (C) changes of average plant height in each group.

Figure 12. Strawberry harvests of empirically planted and RL-Informer-guided planted. (A) Cumulative count of stolons; (B) cumulative count of fruits.

Table 1. The groups of different experimental environments.

Group	Group 1	Group 2	Group 3	Group 4	Group 5
Environment Variables	T: 12–20 °C L: 350 $μ mol / (m^{2} \cdot s)$ EC: 800 $μ s / cm$	T: 15–25 °C L: 250 $μ mol / (m^{2} \cdot s)$ EC: 1000 $μ s / cm$	T: 18–30 °C L: 300 $μ mol / (m^{2} \cdot s)$ EC: 1000 $μ s / cm$	T: 12–20 °C L: 250 $μ mol / (m^{2} \cdot s)$ EC: 1200 $μ s / cm$	T: 15–25 °C L: 200 $μ mol / (m^{2} \cdot s)$ EC: 800 $μ s / cm$
Group	Group 6	Group 7	Group 8	Group 9	Group 10
Environment Variables	T: 18–30 °C L: 250 $μ mol / (m^{2} \cdot s)$ EC: 1200 $μ s / cm$	T: 12–20 °C L: 300 $μ mol / (m^{2} \cdot s)$ EC: 1000 $μ s / cm$	T: 18–30 °C L: 200 $μ mol / (m^{2} \cdot s)$ EC: 800 $μ s / cm$	T: 15–25 °C L: 300 $μ mol / (m^{2} \cdot s)$ EC: 1200 $μ s / cm$	T: 15–25 °C L: 350 $μ mol / (m^{2} \cdot s)$ EC: 1000 $μ s / cm$

Note: T is temperature; L is maximum light intensity; EC is maximum EC value.

Table 2. The definition of Q-learning parameters in strawberry growth.

Definition	Parameters
States	Current strawberry growth period;
States	Current plant height, leaf size and count of leaf
Actions	Change of environmental temperature;
	Change of lighting condition;
	Change of nutrient solution EC value
Rewards	Changes in the count of strawberry stolons during the prediction time;
Rewards	Changes in the count of strawberry fruits during the prediction time

Table 3. Key hyper-parameters in proposed model.

Parameter	Value	Parameter	Value
Batch size	64	Dropout	0.05
Epochs	40	Loss function	MSE
Leaning rate	$4 \times 10^{- 5}$	Activation	GeLU

Table 4. Evaluation results of different prediction models.

Model	LSTM		Transformer		Informer
Target	Stolen	Fruit	Stolen	Fruit	Stolen	Fruit
$R^{2}$	0.5804	0.118	0.6052	0.461	0.936	0.835
$S E P (%)$	41.941	44.101	40.684	34.501	16.354	19.077

Table 5. Detailed actions of the policies in Figure 9.

Actions	Detailed Executed Action	Actions	Detailed Executed Action
${action}_{1}$	temperature down by 1 °C;	${action}_{2 - 1}$	temperature up by 1 °C;
	light intensity up by 10 $μ mol / (m^{2} \cdot s)$ ;		light intensity up by 10 $μ mol / (m^{2} \cdot s)$ ;
	EC value up by 50 $μ s / cm$		EC value down by 50 $μ s / cm$
${action}_{2}$	temperature down by 2 °C;	${action}_{2 - 2}$	temperature up by 1 °C;
	light intensity up by 20 $μ mol / (m^{2} \cdot s)$ ;		light intensity up by 20 $μ mol / (m^{2} \cdot s)$ ;
	EC value up by 50 $μ s / cm$		EC value down by 100 $μ s / cm$
${action}_{3}$	temperature up by 3 °C;	${action}_{2 - 3}$	temperature down by 1 °C;
	light intensity up by 50 $μ mol / (m^{2} \cdot s)$ ;		light intensity up by 10 $μ mol / (m^{2} \cdot s)$ ;
	EC value up by 50 $μ s / cm$		EC value unchanged
${action}_{4}$	temperature up by 3 °C;	${action}_{2 - 4}$	temperature down by 1 °C;
	light intensity up by 50 $μ mol / (m^{2} \cdot s)$ ;		light intensity up by 10 $μ mol / (m^{2} \cdot s)$ ;
	EC value unchanged		EC value up by 50 $μ s / cm$
${action}_{5}$	temperature up by 5 °C;	${action}_{2 - 5}$	temperature down by 4 °C;
	light intensity down by 50 $μ mol / (m^{2} \cdot s)$ ;		light intensity up by 20 $μ mol / (m^{2} \cdot s)$ ;
	EC value unchanged		EC value down by 50 $μ s / cm$
${action}_{n}$	⋯	${action}_{n - n}$	⋯

Note: The finally executed actions are in bolded.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Y.; Gong, M.; Li, J.; Ma, J. Optimizing Controlled Environmental Agriculture for Strawberry Cultivation Using RL-Informer Model. Agronomy 2023, 13, 2057. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13082057

AMA Style

Lu Y, Gong M, Li J, Ma J. Optimizing Controlled Environmental Agriculture for Strawberry Cultivation Using RL-Informer Model. Agronomy. 2023; 13(8):2057. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13082057

Chicago/Turabian Style

Lu, Yuze, Mali Gong, Jing Li, and Jianshe Ma. 2023. "Optimizing Controlled Environmental Agriculture for Strawberry Cultivation Using RL-Informer Model" Agronomy 13, no. 8: 2057. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13082057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Controlled Environmental Agriculture for Strawberry Cultivation Using RL-Informer Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Subject and Parameter

2.1.1. Acquired Experimental Data

2.1.2. Prediction Targets

2.2. Informer Enhanced with Reinforcement Learning

2.2.1. Time-Series Prediction Informer Network

2.2.2. Q-Learning Feedback Models

2.2.3. Data Loading and Training of RL-Informer

3. Results and Discussion

3.1. Evaluation of Strawberry Growth Prediction

3.2. Experiment of RL-Informer Feedback

3.2.1. Feedback on Cultivation Environment

3.2.2. RL-Informer Yield Enhancement Impact Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI