Next Article in Journal
Optimizing Retention Bunkers in Copper Mines with Numerical Methods and Gradient Descent
Previous Article in Journal
Effect of Compost from Cardoon Biomass, as Partial Replacement of Peat, on the Production and Quality of Baby Leaf Lettuce
 
 
Article
Peer-Review Record

Deep Learning-Based Wave Overtopping Prediction

by Alberto Alvarellos 1,*, Andrés Figuero 2, Santiago Rodríguez-Yáñez 1, José Sande 2, Enrique Peña 2, Paulo Rosa-Santos 3 and Juan Rabuñal 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 24 February 2024 / Revised: 12 March 2024 / Accepted: 14 March 2024 / Published: 20 March 2024
(This article belongs to the Special Issue Artificial Intelligence in Civil and Environmental Engineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear authors,

I think the work you present is well written and described. There are few innovative aspects, but some points should be described in a deeper way to make the description more robust:

·         In the introduction, it is worth mentioning that in the last years the number of catastrophic events is rising. I think these should be pointed out as a problem that supports the need of your solution for monitoring purposes. It should be valuable to add some words about this problem also for other field like structural health monitoring of bridges. You can refer to the following paper on Future Internet that tells about the problem:

https://0-www-mdpi-com.brum.beds.ac.uk/2214294

Bono, F.M.; Radicioni, L.; Cinquemani, S.; Benedetti, L.; Cazzulani, G.; Somaschini, C.; Belloli, M. A Deep Learning Approach to Detect Failures in Bridges Based on the Coherence of Signals. Future Internet 202315, 119. https://0-doi-org.brum.beds.ac.uk/10.3390/fi15040119

·         You are dealing with an imbalanced dataset. Did you try some of the following techniques for dealing with this problem: SMOTE, ADASYN. Can these techniques improve your outcome?

·         Did you try other model architecture? Instead of using DL you can use some shallow models for classification. It can be worth to include a comparison with other models in terms of accuracy

·         In my opinion, the paper in some parts is quite verbose. You can consider to summarize some parts in order to keep the attention of a future reader and to be not too long

 

Author Response

We thank the reviewer for the review. The suggestions help improve our paper.
We quote each suggestion and respond to it.

In the introduction, it is worth mentioning that in the last years the number of catastrophic events is rising. I think these should be pointed out as a problem that supports the need of your solution for monitoring purposes. It should be valuable to add some words about this problem also for other field like structural health monitoring of bridges. You can refer to the following paper on Future Internet that tells about the problem:

The initial paper version had a brief text regarding climate change and its influence on wave overtopping events at the end of the introduction section. We expanded that text, explained why the number and intensity of overtopping events could increase due to climate change, and added a reference to the "Special Report on the Ocean and Cryosphere in a Changing Climate" of The Intergovernmental Panel on Climate Change (IPCC), relating it to the topic of wave overtopping (we only included examples and applications of techniques to measure overtopping events, it would be difficult to justify including an example of Detecting Failures in Bridges).

You are dealing with an imbalanced dataset. Did you try some of the following techniques for dealing with this problem: SMOTE, ADASYN. Can these techniques improve your outcome?

We considered using SMOTE initially and did some testing with SMOTE and random forest, but the results were not very good, so we decided to use an appropriate metric before relying on synthetic data generation.
While SMOTE and ADASYN are effective in creating a more balanced dataset and potentially improving the model's performance on minority class predictions, they have drawbacks. The synthetic sample generation can introduce noise (especially if the synthetic samples do not represent the true underlying distribution of the minority class well). This can lead to models that perform well on the oversampled training data but poorly on real-world, unseen data, and a real-world application was our goal. Furthermore, these methods increase the complexity of the model training process and the risk of overfitting to the synthetic samples, and the training process was already complex in this work.
In contrast, using appropriate metrics insensitive to class imbalance can provide a more accurate assessment of a model's performance. Metrics such as the F1 score, Precision-Recall AUC, or Balanced Accuracy, among others, offer a more nuanced understanding of how well the model performs across both minority and majority classes without the need to artificially balance the dataset. They help to highlight the model's ability to identify the minority class correctly (precision) and its coverage of the actual minority class instances (recall).
While techniques like SMOTE and ADASYN can be valuable tools in handling imbalanced datasets, relying solely on them without considering the appropriateness of evaluation metrics may lead to misleading conclusions about a model's effectiveness. Given that this work intended to compare the three datasets and create a model that works generally well (for any decision threshold), we used the PR AUC metric that performs across both minority and majority classes without the need to balance the dataset artificially, so we decided not to include SMOTE/ADASYN to make the process simpler: had we used artificial data generation, it would have been more challenging to analyse when a model has poor performance due to its bias or the noise the generated data introduced.
Now that we have identified the dataset that provides the best results and have an idea of the model architecture that provides the best results, we plan to combine these oversampling techniques with robust evaluation metrics to develop a more accurate and reliable model in future works. While this could potentially provide better results, we think that gathering more data is always going to be a better approach.

Did you try other model architecture? Instead of using DL you can use some shallow models for classification. It can be worth to include a comparison with other models in terms of accuracy

Yes, we tested shallow models, as explained on page 10. We tested neural networks with 1, 2, 3, and 4 layers, and in each case, we tested 8, 16, 32, 64, 128, and 256 neurons per layer. Furthermore, we created even simpler models using Dropout regularization (we tested no dropout and dropout ratios of 0.0625, 0.125, 0.25, 0.375, and 0.5). 
Thus, the simplest model we tested in this work was a neural network of 1 layer and four neurons in that layer (when a 0.5 dropout ratio is used).
The simplest models and more complex networks were tested in the initial iteration of the iterative approach explained in the work. Their results were worse than those of the complex models (high bias), so they were discarded in the following iterations.
Before the work explained in this paper, we did some testing using decision trees and random forests, but they did not provide better results than those of neural networks. Given that the literature suggests that gathering more and better data can provide better results than precise model selection [https://0-doi-org.brum.beds.ac.uk/10.1109/MIS.2009.36, for instance] and that the primary goal of this work was choosing the best dataset (predicted vs real data) and creating models that could provide good overall results when the decision threshold of its output is changed, we focused our efforts on gathering more data and creating such models. Including more model types will multiply the time it takes to create the models (3 datasets x the number of models x 10-fold cv), which is several months currently, and an improvement is not guaranteed.

In my opinion, the paper in some parts is quite verbose. You can consider to summarize some parts in order to keep the attention of a future reader and to be not too long

We changed/added text after this and the other reviewers' comments. It would be helpful to know which sections the reviewer is referring to (in the revised version of the paper) in order to improve/shorten them.

Reviewer 2 Report

Comments and Suggestions for Authors

In this manuscript, the authors present a method for predicting wave overtopping events in ports based on deep learning. To this end, real data from a port in Spain are gathered, pre-processed and merged with sea state and weather forecasts; this dataset is then used to predict the “anomalous” class of wave overtopping, obtaining a good accuracy both in the training and testing sets. The paper is in general interesting, well-written, very informative and comprehensive. Some minor comments:

Please mention the time window of the data that you use for the prediction. Is it only the parameter values of the previous time point (t-1) that you use to predict the wave overtopping anomaly at time (t) or do you select several time points (t-1, t-2, etc.)?

Please mention the inference latency of the models – how much time does it take to infer the model and make the prediction?

Apart from the “time-window” parameter, there is also the future time instance that you aim to predict the anomaly, i.e., the model aims to predict the probability of wave overtopping at time t+1 (or t+2, etc.). Usually there is a trade-off between accuracy and proactiveness. It would be interesting to mention if it is possible to train models for general t+m prediction and/or consider it as your future work. Relevant references from diverse domains are:

 

1.       "A supervised deep learning framework for proactive anomaly detection in cloud workloads." 2017 14th IEEE India Council International Conference (INDICON). IEEE, 2017.

2.       ReRe: A lightweight real-time ready-to-go anomaly detection approach for time series." 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2020.

3.       "Intelligent Mission Critical Services over Beyond 5G Networks: Control Loop and Proactive Overload Detection." 2023 International Conference on Smart Applications, Communications and Networking (SmartNets). IEEE, 2023.

Author Response

We thank the reviewer for the review. The suggestions help improve our paper.
We quote each suggestion and respond to it.

Please mention the time window of the data that you use for the prediction. Is it only the parameter values of the previous time point (t-1) that you use to predict the wave overtopping anomaly at time (t) or do you select several time points (t-1, t-2, etc.)?

Thanks for this suggestion, it made us realise that this needed to be clearly stated in the paper. We added text on page 7 to clarify it. In summary, we use the 72-hour hourly weather and sea forecasting (provided as a single value for several variables for each hour for the next 72 hours) as inputs of our models. The model takes the values for a single hour (no time window) and outputs the overtopping probability for that hour. Given that we have a 72-hour forecast, we can predict overtopping events for the next 72 hours.

We added the following text on the paper: 

"In production, the models will use the weather conditions and sea state forecasts provided by the Portus platform of the Spanish ports system [30] as inputs and output the probability of an overtopping event for a given input.

The Portus system forecasts the weather and sea state for 72 hours. Several variables characterise the sea state, and the sea state forecast is provided as a single hourly data point for that set of variables. I.e., Portus provides a single hourly value for the variables that describe the sea state for the following 72 sea states.

The models created in this work use a single time value of the meteorological conditions and the sea state (the single value for a specific hour, not a time window) as inputs and indicate, as output, whether or not there will be an overtopping event with these conditions. Thus, given that we have a 72-hour weather state and sea conditions forecast, we can predict overtopping events for each of the next 72 hours."

Please mention the inference latency of the models – how much time does it take to infer the model and make the prediction?

We did not mention it in the paper because the model latency is negligible. A given model output predicts the overtopping probability for a whole hour, so we only need 72 predictions once or twice a day (the weather/sea forecasts are updated twice a day). The model's predictions can be stored, so they are calculated once the weather/sea forecasts are available and then can be retrieved multiple times instantly.
In summary, processing time is not a problem in our models, but it could be a concern generally, so thanks for the question. We calculated the time it takes to make those 72 predictions by averaging over 10,000 "runs": on average, it takes the model 0.03 s to make 72 predictions on a regular laptop (12th Gen Intel® Core™ i7-1260P).
We included a paragraph explaining why latency could be a concern and the latency results in the "results" section of the paper.

Apart from the “time-window” parameter, there is also the future time instance that you aim to predict the anomaly, i.e., the model aims to predict the probability of wave overtopping at time t+1 (or t+2, etc.). Usually there is a trade-off between accuracy and proactiveness. It would be interesting to mention if it is possible to train models for general t+m prediction and/or consider it as your future work.

As mentioned in the first answer, given that we have a 72-hour forecast, we can predict overtopping events for the next 72 hours.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper is clear and well articulated, so it is quite interesting to read.

Because overtopping might be unknown to journal readership, the in-depth presentation of overtopping subject is welcome.

The paper subject is relevant to the field, and it has a practical use, considering that correct prediction of overtopping has economic, financial, and even safety implications.

Three quarters of the cited references are quite recent and there are no self-citations.

The manuscript is scientifically sound, and the deep learning is well conducted.

The conclusions are in line with the findings presented in the paper.

Some final remarks:

The last events data was collected in 2020. Assuming that the paper will be published in 2024, there should an explanation for the four years gap.

Page 2 – Please invert the images “a” and “b” in Figure 1 in respect to logical order.

Page 7 – Equation (1) – Please correct the position of the equation number.

Author Response

We thank the reviewer for the review. The suggestions help improve our paper.
We quote each suggestion and respond to it.

The last events data was collected in 2020. Assuming that the paper will be published in 2024, there should an explanation for the four years gap.

We rewrote the "discussion and conclusions" section to explain this more clearly and explain possible solutions to each problem.

Page 2 – Please invert the images “a” and “b” in Figure 1 in respect to logical order.

Corrected

Page 7 – Equation (1) – Please correct the position of the equation number.

Corrected.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I think you explain me with good description my comments. The paper is ready to publish.

 

Back to TopTop