Next Article in Journal
Rail Pad Corrosion Effects on the Dynamic Behavior of Direct Fixation Track Systems in Marine Environments
Next Article in Special Issue
ALCC-Glasses: Arriving Light Chroma Controllable Optical See-Through Head-Mounted Display System for Color Vision Deficiency Compensation
Previous Article in Journal
Thermal and Mechanical Properties of Cement Mortar Composite Containing Recycled Expanded Glass Aggregate and Nano Titanium Dioxide
Previous Article in Special Issue
Comparative Performance Characterization of Mobile AR Frameworks in the Context of AR-Based Grocery Shopping Applications
 
 
Article
Peer-Review Record

Real-Time Application for Generating Multiple Experiences from 360° Panoramic Video by Tracking Arbitrary Objects and Viewer’s Orientations

by Syed Hammad Hussain Shah, Kyungjin Han and Jong Weon Lee *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 24 February 2020 / Revised: 13 March 2020 / Accepted: 18 March 2020 / Published: 26 March 2020
(This article belongs to the Special Issue Applications of Virtual, Augmented, and Mixed Reality)

Round 1

Reviewer 1 Report

Indeed this is a very interesting paper. Here are a few recommendations to improve the table.

Abstract: 1. Add a sentence about the method/s you have applied in this study

2. You need also to add a statement about the results of your paper as you have conducted the different evaluation, you should point out the findings in the abstract.

Related Works

You have mentioned that the CPU processing power is less than the others. I recommend adding a comparison table to demonstrate the efficiency of your solutions.

Rather read a short description of the cited publications. The laundry list,e.g., (line 124) of publications is very difficult to convince readers on your claim

Remove the extra do inline 129.

I recommend adding Research questions and research method section just before section 3.0

Line 171: I assume the HDM was connected to PC and then saved every 100ms. Did you save the frames in the cloud?

Increase the font size of figure 3.

 

I recommend also add a Discussion section to elaborate your own system evaluations and findings with other existing researches which you mentioned in the related work section

 

Author Response

Response to Reviewer 1 Comments

Point 1. Add a sentence about the method/s you have applied in this study

Response 1. We have added the following sentence (Line 36-37):

“In this study, technical evaluation of the system along with the detailed user study has been performed to assess the system’s application.”

 

Point 2. You need also to add a statement about the results of your paper as you have conducted the different evaluation, you should point out the findings in the abstract.

Response 2. We have added the following sentences (Line 38-40):

“Findings from the system’s evaluation showed that a single 360° multimedia content holds the capability of generating multiple experiences and transfer among the users. Moreover, the sharing of the 360° experiences enabled viewers to watch multiple interesting contents with less effort.”

 

Point 3. You have mentioned that the CPU processing power is less than the others. I recommend adding a comparison table to demonstrate the efficiency of your solutions.

Response 3. Comparison table is present in the section of experimental results. Hence, we added following sentence (Line 131-133):

“In contrast, our system uses a very small amount of processing power and runs in real-time on a CPU, as shown by the Figure 11 presented in the section on experimental results, and so is capable of running on devices with hardware constraints.”

 

Point 4. Rather read a short description of the cited publications. The laundry list, e.g., (line 124) of publications is very difficult to convince readers on your claim

Response 4. Purpose of the mentioned list of the publications is to identify that the visual object tracking has been a very hot research topic these days. So, we have added following sentence (Line 137-139):

“Some other human detection and tracking methods are also there in literature [12, 13, 14] which show the object tracking as the hot research topic in the current days.”

Point 5. Remove the extra do inline 129.

Response 5. We have removed the extra do inline.

 

Point 6. I recommend adding Research questions and research method section just before section 3.0

Response 6. We have added the section of Research questions and research methods which can be seen as the Section 3 in the revised version of the manuscript. (Line 175-193)

 

Point 7. Line 171: I assume the HDM was connected to PC and then saved every 100ms. Did you save the frames in the cloud?

Response 7. We used the wireless HMD named as Samsung VR Gear. Whole frames were not saved as the experience. But, the viewer’s orientation in VR was saved for the 360° frames after every 100ms.

 

Point 8. Increase the font size of figure 3.

Response 8. We have increased the font size in the Figure 3.

 

Point 9. I recommend also add a Discussion section to elaborate your own system evaluations and findings with other existing researches which you mentioned in the related work section

Response 9. We have added the section of ‘Discussion’ in the revised version of the manuscript. It discusses the overall findings and provide answers to the research questions mentioned in Section 3. (Line 499-517)

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors present a real-time application that generates multiple experiences based on a 360 degrees video. The idea is very interesting and has applicability in entertainment and other domains. The system was evaluated with users, receiving a positive feedback. Also, the performance of the object tracking module was demonstrated through various metrics.

There are some general issues regarding this paper, from my point of view:

  1. The scientific contributions proposed by this paper are not that significant. The steps proposed for the object tracker (tracking validation and object re-identification) are simple extensions to a method proposed by someone else (CSRT). Also, the two focus assistance techniques (decreasing the opacity of the visual indicator and changing the playback framerate) represent rudimentary approaches.
  2. The paper is very difficult to follow, because of the incorrect English. There are numerous cases when the authors do not use definite/indefinite articles for the nouns (some examples: “it is required to track that object in rest of the frames of panoramic video.” instead of “it is required to track that object in THE rest of the frames of THE panoramic video.”; “Most noticeable problem is movement of object out of one side of horizontal margin” instead of “THE most noticeable problem is THE movement of THE/AN object out of one side of horizontal margin….”; “User could change the selection” instead of “THE user could change the selection”). There are also a lot of other English grammar mistakes. In my opinion, this paper should be rewritten with the help of a native English speaker or a professional proof editor.

There are also some particular issues:

  1. When first introduced, Discriminative Correlation Filter Tracker with Channel and Spatial Reliability is abbreviated with DCF-CSR, while in some parts of the paper the CSRT abbreviation is used. Why are there 2 abbreviation for the same method?
  2. The flow chart from figure 3 is not clear. The tracking validation module and the object re-identification step should be highlighted on the flow chart. Instead, they are missing completely (I had to search in the text and find out that the tracking validation consists of feature extraction and feature matching, while the object re-identification consists of two other steps (extract area based on previous location and extract moving objects)). Why are there horizontal orange lines in Figure 3, before and after the steps belonging to the object re-identification module?
  3. What is the relation between the visual indicator and the angular distance, denoted in Equation 2? The relation used by the authors is the “Infinity” sign. If the users wanted to illustrate some other relation (I believe it would be the “inverse proportional”), they should have used a different symbol.
  4. What does the second question from the Efficiency section in the survey (“Knowing that your experience is being recorded while watching 360° video doesn’t restrict the enjoyment”) have to do with efficiency?
  5. When comparing their own tracking method with TLD, MTLD and Polar Model for Fast Object Tracking, do the authors use the same datasets and the same running platform? If not, then the comparison is erroneous.

Author Response

Response to Reviewer 2 Comments

Point 1. The scientific contributions proposed by this paper are not that significant. The steps proposed for the object tracker (tracking validation and object re-identification) are simple extensions to a method proposed by someone else (CSRT). Also, the two focus assistance techniques (decreasing the opacity of the visual indicator and changing the playback framerate) represent rudimentary approaches.

Response 1. One of the main focuses of the proposed system was to author the experiences of the 360° videos based on the tracking of the interesting object in the present scene. For this purpose, it was required to track the arbitrary object on real-time. In order to achieve the real-time performance for the arbitrary object tracking, the simple object trackers are known as the best solution for such scenarios. However, properties of the 360° panoramic videos are different from normal field of view videos. So, we designed the pipeline by introducing additional modules to the existing object tracker for our real-time application of authoring system which had not been done before in any object tracking system for 360° videos. To our best knowledge, our proposed pipeline is the first one which achieved the real-time performance against the arbitrary object tracking in 360° videos.

Large angular distance between the intended target and the desired target in virtual reality makes it difficult for viewer to find and reach the intended target in time. Being late in reaching the intended target results into loss of important visual information form a 360° content. Hence, the adjustment of the playback rate and opacity of the visual indicator was introduced for helping the viewers in reaching the target in time. It was seen during the user survey that these two focus assistance  techniques ensured the minimized loss of information and let the viewer reach the intended target in time.

Point 2. The paper is very difficult to follow, because of the incorrect English. There are numerous cases when the authors do not use definite/indefinite articles for the nouns (some examples: “it is required to track that object in rest of the frames of panoramic video.” instead of “it is required to track that object in THE rest of the frames of THE panoramic video.”; “Most noticeable problem is movement of object out of one side of horizontal margin” instead of “THE most noticeable problem is THE movement of THE/AN object out of one side of horizontal margin….”; “User could change the selection” instead of “THE user could change the selection”). There are also a lot of other English grammar mistakes. In my opinion, this paper should be rewritten with the help of a native English speaker or a professional proof editor.

Response 2. The issues with the English have been fixed through rewriting by the professional proof editor.

Point 3. When first introduced, Discriminative Correlation Filter Tracker with Channel and Spatial Reliability is abbreviated with DCF-CSR, while in some parts of the paper the CSRT abbreviation is used. Why are there 2 abbreviation for the same method?

Response 3. Issue with the abbreviation has been fixed throughout the manuscript by using the same abbreviation of “DCF-CSR tracker”.

Point 4. The flow chart from figure 3 is not clear. The tracking validation module and the object re-identification step should be highlighted on the flow chart. Instead, they are missing completely (I had to search in the text and find out that the tracking validation consists of feature extraction and feature matching, while the object re-identification consists of two other steps (extract area based on previous location and extract moving objects)). Why are there horizontal orange lines in Figure 3, before and after the steps belonging to the object re-identification module?

Response 4. We redrew the Figure 3 and removed all ambiguities. We increased the font size to make it clear. Moreover, we created the overall blocks for the object validation and the object re-identification to make it understandable. Horizontal lines were removed as they were creating the confusion in reader’s mind.

Point 5. What is the relation between the visual indicator and the angular distance, denoted in Equation 2? The relation used by the authors is the “Infinity” sign. If the users wanted to illustrate some other relation (I believe it would be the “inverse proportional”), they should have used a different symbol.

Response 5. Relation mentioned in Equation 2 is directly proportional. Different symbol has been used for the relation in the revised manuscript. (Line 378)

Point 6. What does the second question from the Efficiency section in the survey (“Knowing that your experience is being recorded while watching 360° video doesn’t restrict the enjoyment”) have to do with efficiency?

Response 6. Knowing that the other viewers are going to watch the experience of a viewer, it could make him/her more careful in making head transitions while watching the 360° video in virtual reality. Hence, it was thought as the important factor which could impact the viewer’s enjoyment and that’s why it was included in the questionnaire.

 

Point 7. When comparing their own tracking method with TLD, MTLD and Polar Model for Fast Object Tracking, do the authors use the same datasets and the same running platform? If not, then the comparison is erroneous.

Response 7. We used the same running platform to test the system. Dynamics of the 360° videos dataset used in this study were same as the dataset used and described in above mentioned methods. These mainly included the category, resolution, number of the frames, number of times object moved out of the frame and occlusion time of the objects for all the videos present in the dataset.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors have tackled all the issues highlighted by the reviewer.

In the reviewer's opinion, the paper can be published after a final reading.

Back to TopTop