Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Managing and Optimizing Big Data Workloads for On-Demand User Centric Reports

Big Data Cogn. Comput. 2023, 7(2), 78; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc7020078

by Alexandra Băicoianu^1,*

and Ion Valentin Scheianu²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Mazhar Javed Awan

Big Data Cogn. Comput. 2023, 7(2), 78; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc7020078

Submission received: 6 March 2023 / Revised: 5 April 2023 / Accepted: 11 April 2023 / Published: 18 April 2023

(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)

Round 1

Reviewer 1 Report

This paper in general was written well. However, I think there are some aspects that can be improved.

The authors mentioned they proposed a solution to deal with the challenges. But it seems that the contributions of the proposed solution are vague. In other words, the solution from the authors was mixed with some existing approaches.

I suggest the authors clearly provide what their contributions are in this paper and distinguish them from the existing solutions.

Author Response

Dear reviewer,

Thank you very much for the review.

As a result of your comments, please find in the attachment the .pdf file of the whole updated article.

We have added clear states about what we are proposing (the async MapReduce flow) and also detailed the challenges and solutions that are suggested for optimizing the workflow. In addition, a comparison with other existing solutions was done - see lines 54-59, lines 161-173, and lines 196-202 + (some comparison) lines 413-437, and lines 498-512.

The main corrections/supplementary information are highlighted.

Best regards,

Dr. Alexandra Baicoianu

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper is an empirical paper discussing some issues and challenges associated with big data and proposes a solution for addressing them.

As the proposed solution just composed of different big data storage methods, the contribution of the paper is not qualified for this journal.

The authors can propose more elaborate solutions with rigorous experiments and come up with some general conclusions to the research problems.

Author Response

Dear reviewer,

Thank you very much for the review.

As a result of the whole review process, we come back to you with a completed version of the paper, in which we have rigorously detailed the experiments and also the proposed solution.

Please find in the attachment the .pdf file where the main corrections are highlighted.

Thank you once again.

Best regards,

Dr. Alexandra Baicoianu

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper discusses the challenges and opportunities associated with big data and proposes approaches and technologies for handling and analyzing such data. The research highlights the potential of big data to drive innovation and support decision-making in various domains while emphasizing the importance of addressing the challenges and concerns raised by this rapidly evolving field. Ultimately, the paper provides insights into the value of big data and the need for effective strategies for managing and analyzing it.

The paper provides a comprehensive discussion of methods and technologies that could support the context of interest, and it is well-structured. While the contribution is solid, there are some aspects that may undermine its quality, and the authors could consider focusing on these aspects to improve the paper's presentation and structure:

The 3V’s model is considered as outdated in the current time; instead, the academic and industrial worlds are both focusing on the 5V’s model. The authors could improve the discussion by also taking into account the latter.
The background section could use a table showing an overview of the contexts therein discussed and their significance and challenges.
There are several techniques that could be applied in the context considered by the authors. For instance, the approach in [doi.org/10.1142/S0219622020500182] could be used to filter and structure data in the initial system design, while the one in [doi.org/10.1016/j.knosys.2019.06.028] could be useful in the context of data partition. The authors could consider including these references.
The authors claim that the selected technologies, e.g., Druid and Map Reduce, are optimal solutions for generating reports. However, in my opinion, these solutions belong to the “analysis” layer of classical big data workloads rather than to the “presentation” one. I suggest the authors to discuss this in a more clear way.
The paper gives the idea to concentrate on the management and optimization of big data workload for the production of reports. However, the content seems to not discuss the production of reports enough. I suggest the authors to strengthen this aspect or to modify the title at least.
The structure of the paper is missing and it should be provided.
Finally, there are several typos and repetition of thoughts. The authors should carefully read the manuscript and fix them.

Author Response

Dear reviewer,

Thank you very much for the review.

As a result of your comments, please find in the attachment the .pdf file of the whole updated article.

Major changes and additions to the revised manuscript:

1: “The 3V’s model is considered as outdated in the current time; instead, the academic and industrial worlds are both focusing on the 5V’s model. The authors could improve the discussion by also taking into account the latter.”

1: Response - Concerning the 3Vs model, thank you to the reviewer for pointing out this inconsistent issue. Still kept the 3Vs but added details about the current 5V standard, also talked about the 10Vs mentioned in one of the suggested reports. See lines 90-99;

2: “The background section could use a table showing an overview of the contexts therein discussed and their significance and challenges.”

2: Response - The background section was updated and we have added a table showing an overview of the contexts therein discussed in the section together with their significance and challenges. See lines 121-125;

3: “There are several techniques that could be applied in the context considered by the authors. For instance, the approach in [doi.org/10.1142/S0219622020500182] could be used to filter and structure data in the initial system design, while the one in [doi.org/10.1016/j.knosys.2019.06.028] could be useful in the context of data partition. The authors could consider including these references.”

3: Response - We used the 2 recommendations, we mentioned both articles in references and additionally, we have added differences and similarities between them. See lines 498-512 and 413-437;

4: “The authors claim that the selected technologies, e.g., Druid and Map Reduce, are optimal solutions for generating reports. However, in my opinion, these solutions belong to the “analysis” layer of classical big data workloads rather than to the “presentation” one. I suggest the authors to discuss this in a more clear way.”

4: Response - Concerning the selected technologies, we detailed the type of reports that we are referring to in the introduction (static vs dynamic report), these types of reports are well known and they are based on big data, multiple companies are leveraging Druid for the presentation layer. We are not claiming that Druid and MapReduce are optimal solutions for the presentation layer and we agree that they are not, but there are clear use cases when these are the only options (added this in the Introduction section). See lines 174-195, 203-228 and 233-242;

5: “The paper gives the idea to concentrate on the management and optimization of big data workload for the production of reports. However, the content seems to not discuss the production of reports enough. I suggest the authors to strengthen this aspect or to modify the title at least.”

5: Response - Thank you for the positive comment. We agree that the study do not concentrate on reports “production” but on the management and optimization of big data workloads. As suggested by the reviewer, we changed the title.

6: “The structure of the paper is missing and it should be provided.”

6: Response - We added the structure of the paper. See lines 60-67;

7: “Finally, there are several typos and repetition of thoughts. The authors should carefully read the manuscript and fix them.”

7: Response - Thank you for pointing out this inconsistent issue. We fixed the typos and repetition of thoughts.

The main corrections/supplementary information are highlighted.

Best regards,

Dr. Alexandra Baicoianu

Author Response File: Author Response.pdf

Reviewer 4 Report

The literature review with title of “Managing and Optimizing Big Data Workloads for Producing On-demand User Centric Reports “has shown good finding towards the field of big data, however there are some points that should be considered as follow:

1. Can you provide more information on the specific domains where big data is being used to drive innovation and support decision-making?

2. The abstract could benefit from a clearer and more concise summary of the proposed solution for addressing the challenges associated with big data.

3. What criteria were used to select the approaches and technologies discussed in the research?

4. Are there any limitations or potential drawbacks to the proposed solution for managing and optimizing big data workloads?

5. It would be helpful to provide more detailed information on the methodology used to conduct the research, including data sources, analytical techniques, and any potential biases.

6. Can you provide more examples or case studies of how big data has been successfully used in different industries or applications?

7. How does this research compare to other studies or approaches in the field of big data management and optimization?

8. The definition part of big data and framework should be discussed and need some more paper to refers The following paper will help you to explain the characteristics of big data, that is missing

https://0-doi-org.brum.beds.ac.uk/10.1177/2158244022109644

https://0-doi-org.brum.beds.ac.uk/10.3390/electronics10243125

https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph181910147

9. What are some practical implications of the research findings for businesses or organizations that are dealing with large amounts of data?

10. How might the proposed solution be applied in real-world settings, and what are some potential challenges or considerations that would need to be addressed?

11. Are there any potential ethical or legal implications associated with the use of big data, and if so, how are these being addressed in the proposed solution?

12. In discussion section:

13. Can you provide more specific examples of the challenges and problems that need to be addressed in the big data industry in the discussion section?

14. Can you explain in more detail the specific benefits and limitations of the solution proposed in the paper for handling on-demand user centric reports?

15. Have the authors discussed any potential alternative solutions to their proposed solution, such as using Hive instead of Spark? If so, can you explain the pros and cons of these alternative solutions in more detail?

16. Can you provide some suggestions for future research in the area of big data based on the findings of this paper?

17. remove the word we, will from the manuscript

Author Response

We appreciate the reviewer for taking the time to carefully review the manuscript and give detailed and constructive comments, which helped to improve this research paper. Below is our point-by-point response to each comment:

1: “Can you provide more information on the specific domains where big data is being used to drive innovation and support decision-making?”

1: Response - Thank you for the kind comment. We were suggesting that the solution can help users in their decision-making process, but to be clear on the decisions we are referring to, we considered removing the “decision-making” syntagm from the abstract. See lines: 1-10;

2: “The abstract could benefit from a clearer and more concise summary of the proposed solution for addressing the challenges associated with big data.”

2: Response - We have revised the abstract by including the major results of this study. See lines: 1-10;

3: “What criteria were used to select the approaches and technologies discussed in the research?”

3: Response - We have fully detailed the criteria for choosing them in the initial paper, we added more details in the Introduction section and also compared Spark with Hive in the Conclusions section. See lines: 337-351 and 640-650;

4: “Are there any limitations or potential drawbacks to the proposed solution for managing and optimizing big data workloads?”

4: Response - Talked about them in the Conclusions section and also we took into consideration adding a new research question about this observation. See lines 627-640 and 302-309;

5: “It would be helpful to provide more detailed information on the methodology used to conduct the research, including data sources, analytical techniques, and any potential biases.”

5: Response - We highlighted the sections with detailed information about the above observation, see lines 337-351, 520-550, Materials and Methods section starting at 310;

6: “Can you provide more examples or case studies of how big data has been successfully used in different industries or applications?”

6: Response - Mentioned them at the end of the Background section, see lines 161-168;

7: “How does this research compare to other studies or approaches in the field of big data management and optimization?”

7: Response - Fixed by mentioning articles from the same area of interest, added differences and similarities between them, See lines 498-512 and 413-437;

8: “The definition part of big data and framework should be discussed and need some more paper to refers The following paper will help you to explain the characteristics of big data, that is missing”

8: Response - Done (added a new reference also), mentioned them in multiple places, talked more about the used technologies in the Prerequisites section, see lines 322-323 and 337-351;

9: “What are some practical implications of the research findings for businesses or organizations that are dealing with large amounts of data?”

9: Response - Talked about them in the Introduction section, see also lines 174-187;

10: “How might the proposed solution be applied in real-world settings, and what are some potential challenges or considerations that would need to be addressed?”

10: Response - We consider that the solution is already proper for a real-world scenario. 163-168 + everything from Quantitative Results section, starting at line number 519;

11: “Are there any potential ethical or legal implications associated with the use of big data, and if so, how are these being addressed in the proposed solution?”

11: Response - We strongly believe we do not have any sensitive data in the study, but we considered adding a section that is addressing it, see lines 431-437;

12/13: “In discussion section: Can you provide more specific examples of the challenges and problems that need to be addressed in the big data industry in the discussion section? “

12/13: Response - Done, added some more information about it, see lines 121-125;

14: “Can you explain in more detail the specific benefits and limitations of the solution proposed in the paper for handling on-demand user centric reports?”

14: Response - Talked about them in the Conclusions section and also we took into consideration adding a new research question about this observation. See lines 627-640 and 302-309;

15: “Have the authors discussed any potential alternative solutions to their proposed solution, such as using Hive instead of Spark? If so, can you explain the pros and cons of these alternative solutions in more detail?”

15: Response - See the Conclusion section and also added more details about them in the Prerequisites section. We have fully detailed the criteria for choosing them in the initial paper, we added more details in the Introduction section and also compared Spark with Hive in the Conclusions section. See lines: 337-351 and 640-650;

16: ”Can you provide some suggestions for future research in the area of big data based on the findings of this paper?”

16: Response - Done, added more future directions at the very end of the paper, see lines 655-660;

17: “Remove the word we, will from the manuscript”

17: Response - Thank you for pointing out this inconsistent issue. Fixed them.

The main corrections/supplementary information are highlighted.

Best regards,

Dr. Alexandra Baicoianu

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors have already proposed elaborate solutions with rigorous experiments and come up with general conclusions to the research problems.

Reviewer 3 Report

The authors successfully addressed my concerns. From my point of view, the paper has now achieved a discrete quality and can be considered for publication.

Reviewer 4 Report

The authors have done all comments.

Article Menu

Managing and Optimizing Big Data Workloads for On-Demand User Centric Reports

Further Information

Guidelines

MDPI Initiatives

Follow MDPI