Difference between revisions of "Comparison of X3D AR Proposals"

From Web3D.org
Jump to: navigation, search
(full spelling of month name)
 
(25 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Comparison between existing proposals - Working Draft =
+
By [[X3D and Augmented Reality|Augmented Reality Working Group]], Web3D Consortium
 
+
 
+
Augmented Reality Working Group
+
Web3D Consortium
+
 
+
Jan 25, 2012
+
  
 +
March 21, 2012
  
 
== 1. Introduction ==
 
== 1. Introduction ==
 
+
This document compares the existing proposals for extending X3D to support augmented and mixed reality visualization. Three main proposals are compared in terms of requirements – two from Korean Chapter (KC1 and KC2) and one from Instant Reality (IR). Proposal KC1 and KC2 are proposed by Korea Chapter members, Gun Lee and Gerard J. Kim, respectively, while proposal IR is from InstantReality developed by Fraunhofer IGD. The summary of each proposals can be found at [[X3D and Augmented Reality#Existing Proposals]]. The third proposal from Korea Chapter by Woontack Woo is not covered in this document since the proposal is not directly related to extending the X3D specification.
This document compares the existing proposals for extending X3D to support augmented and mixed reality visualization. Three (?) main proposals are compared in terms of requirements – two from Korean Chapter (A, B) and one from Instant Reality (C).
+
The criteria used for comparing each proposal is based on the requirements described at [[X3D AR Requirements and Use cases]]. In the rest of this document, each section compares the proposals in the aspect of one requirement, summarizing how each proposal deals with the requirement in the subsections, and concluding with discussion. After iterating through all of the requirements, we conclude with summary and discussion.
 
+
 
+
  
 
== 2. Using Live Video stream as a texture ==
 
== 2. Using Live Video stream as a texture ==
  
=== 2.1 Proposal A ===
+
=== 2.1 Proposal KC1 ===
 
This proposal proposed a new sensor node, CameraSensor (previously named LiveCamera node), for retrieving live video data from a camera device, and then routing the video stream to a PixelTexture node. The X3D browser is in charge of implementing and handling devices and mapping the video data to the CameraSensor node inside the X3D scene. The video stream itself is provided as a value (SFImage) field of the node which is updated every frame by the browser implementation according to the camera data.
 
This proposal proposed a new sensor node, CameraSensor (previously named LiveCamera node), for retrieving live video data from a camera device, and then routing the video stream to a PixelTexture node. The X3D browser is in charge of implementing and handling devices and mapping the video data to the CameraSensor node inside the X3D scene. The video stream itself is provided as a value (SFImage) field of the node which is updated every frame by the browser implementation according to the camera data.
  
Line 35: Line 28:
  
 
<Appearance>
 
<Appearance>
       <MovieTexture loop='true'   url=''/>  
+
       <MovieTexture loop='true' url=''/>  
 
</Appearance>
 
</Appearance>
  
Line 42: Line 35:
 
While this approach avoids performance problems by not exposing SFImage fields updated in real-time, it lacks of supports for using live video stream data for other purposes, such as background. This is to be solved partially by adding a new node MovieBackground, which behaves similarly to the MovieTexture but uses the user selected movie file or live video stream from a camera for filling the background of the 3D scene.
 
While this approach avoids performance problems by not exposing SFImage fields updated in real-time, it lacks of supports for using live video stream data for other purposes, such as background. This is to be solved partially by adding a new node MovieBackground, which behaves similarly to the MovieTexture but uses the user selected movie file or live video stream from a camera for filling the background of the 3D scene.
  
=== 2.2 Proposal B ===
+
=== 2.2 Proposal KC2 ===
The proposal from Gerard Kim, in Korea Chapter, proposed a new sensor node, , …
+
This proposal uses similar approach to Proposal KC1 in terms of explicitly defining a sensor node that represents a camera device. The video stream on the image field of the LiveCamera node is then routed to a texture node.
 
+
=== 2.3 Proposal C ===
+
This proposal proposes a general purpose IOSensor node, which allows to access external devices (e.g., joysticks and cameras) inside the X3D scene.  
+
  
 
<pre>
 
<pre>
<IOSensor system='auto' type='' name='' triggerName='Interaction' maxValuesPerTrigger='1' description='' enabled='TRUE' logFeature='' />
+
LiveCamera {
 +
SFString [in, out] source “default”
 +
SFImage [out] image
 +
SFMatrix4f [out] projmat “1 0 0 …”
 +
SFBool [out] on FALSE
 +
SFBool [out] tracking FALSE
 +
SFVec3f [out] position
 +
SFRotation [out] orientation
 +
}
 
</pre>
 
</pre>
  
 +
=== 2.3 Proposal IR ===
 +
This proposal proposes a general purpose IOSensor node, which allows to access external devices (e.g., joysticks and cameras) inside the X3D scene. Note: Due to technical reasons the input and output fields of a device can only be determined at runtime. Hence, most in-/out-slots are dynamically generated based on the device one wants to access.
  
The camera sensor (including marker tracking) is loaded through an instance of IOSensor, by defining the type of the sensor and it's field.
+
<pre>
 +
<IOSensor type='' name='' description='' enabled='TRUE' />
 +
</pre>
 +
 
 +
The camera sensor (including marker/poster/... tracking) is loaded through an instance of IOSensor, by defining the type of the sensor and it's fields as specified in the configFile (*.pm). Here is an example:
  
 
<pre>
 
<pre>
Line 66: Line 70:
 
</pre>
 
</pre>
  
Using the camera image for texture is nothing more than routing the VideoSourceImage field of the IOSensor node to a PixelTexture node.
+
Using the camera image for texture is nothing more than routing the VideoSourceImage field of the IOSensor node to a PixelTexture node, which can also be part of a Background or Foreground appearance.
  
 
=== 2.4 Discussion ===
 
=== 2.4 Discussion ===
Proposal A and B proposes a new node, specific for a camera, while C proposes a more generic type of node to be applied for variety of sensors. The trade off between simplicity and flexibility/extensibility needs further discussion.
+
Proposal KC1 and KC2 proposes a new node, specific for a camera, while proposal IR proposes a more generic type of node to be applied for variety of sensors. The trade off between simplicity and flexibility/extensibility needs further discussion.
+
 
+
 
+
 
+
  
 
== 3. Using Live Video stream as a background ==
 
== 3. Using Live Video stream as a background ==
  
=== 3.1 Proposal A ===
+
=== 3.1 Proposal KC1 ===
 
The proposal proposed a MovieBackground node, extended from Background node to support ‘liveSource’ field which is assigned with a CameraSensor node (as described in 2.1) from which the Background node receives the live video stream data. Once the ‘liveSource’ field is assigned with a validate CameraSensor node, the background image is updated according to the live video stream from the CameraSensor node, assigned. For other purpose of use, it could also have a url field on which general source of movie clip could be assigned an used as a background.
 
The proposal proposed a MovieBackground node, extended from Background node to support ‘liveSource’ field which is assigned with a CameraSensor node (as described in 2.1) from which the Background node receives the live video stream data. Once the ‘liveSource’ field is assigned with a validate CameraSensor node, the background image is updated according to the live video stream from the CameraSensor node, assigned. For other purpose of use, it could also have a url field on which general source of movie clip could be assigned an used as a background.
  
Line 90: Line 90:
 
Similar to the case in 2.1, the proposal also suggests a different approach where the MovieBackground node doesn’t explicitly need a CameraSensor node, but to let the browser to ask the user to choose the movie source (including camera device) when the url field is left empty (or filled with special token values, such as ‘USER_CUSTOMIZED’).
 
Similar to the case in 2.1, the proposal also suggests a different approach where the MovieBackground node doesn’t explicitly need a CameraSensor node, but to let the browser to ask the user to choose the movie source (including camera device) when the url field is left empty (or filled with special token values, such as ‘USER_CUSTOMIZED’).
  
 
+
=== 3.2 Proposal KC2 ===
=== 3.3 Proposal C ===
+
This proposal proposes to extend the TextureBackground node to support live video background. The video stream image is routed from the LiveCamera node to the frontTexture field. However, since TextureBackground node acts as an environment map there is a problem with the orientation of the TextureBackground, which is world registered and is not fixed to the viewpoint movement. To solve this problem, this proposal proposes to add a Boolean field called the ARmode. When the ARmode flag is true, the orientation of TextureBackground is fixed to the viewpoint, resulting the front side texture remains as a background.  
This proposal deals this problem similar to the case for using the camera image for texture. It proposes a PolygonBackground node, which represents a background that renders a single polygon using the specified material. It allows for defining an aspect ratio of the background image that is independent of the actual window size. Different modes are possible to fit the image in the window (vertical or horizontal).
+
  
 
<pre>
 
<pre>
 +
TextureBackground : X3DBackgroundNode
 +
{
 +
SFBool         [in]         set_bind
 +
SFBool         [in]         ARmode
 +
MFFloat [in,out] groundAngle [] [0,π/2]
 +
MFColor [in,out] groundColor [] [0,1]
 +
SFNode         [in,out] backTexture NULL [X3DTextureNode]
 +
SFNode         [in,out] bottomTexture NULL [X3DTextureNode]
 +
SFNode         [in,out] frontTexture NULL [X3DTextureNode]
 +
SFNode         [in,out] leftTexture NULL [X3DTextureNode]
 +
SFNode         [in,out] metadata NULL [X3DMetadataObject]
 +
SFNode         [in,out] rightTexture NULL [X3DTextureNode]
 +
SFNode         [in,out] topTexture NULL [X3DTextureNode]
 +
MFFloat [in,out] skyAngle [] [0,π]
 +
MFColor [in,out] skyColor 0 0 0 [0,1]
 +
SFFloat         [in,out] transparency 0 [0,1]
 +
SFTime         [out]         bindTime
 +
SFBool         [out]         isBound
 +
}
 +
</pre>
  
<PolygonBackground positions='0 0, 1 0, 1 1, 0 1' texCoords='0 0 0, 1 0 0, 1 1 0, 0 1 0' normalizedX='TRUE' normalizedY='TRUE' fixedImageSize='0,0' zoomFactor='1.0' tile='TRUE' doCleanup='TRUE' mode='VERTICAL' clearStencilBitplanes='-1' isDefault='FALSE' description='' triggerName='Synchronize' logFeature='' />
+
=== 3.3 Proposal IR ===
 +
This proposal deals with the problem similar to the case for using the camera image for texture. It proposes a PolygonBackground node, which represents a background that renders a single polygon using the specified appearance. It allows for defining an aspect ratio of the background image that is independent of the actual window size. Different modes are possible to fit the image in the window (vertical or horizontal).
  
 +
<pre>
 +
<PolygonBackground positions='0 0, 1 0, 1 1, 0 1' texCoords='0 0 0, 1 0 0, 1 1 0, 0 1 0' normalizedX='TRUE' normalizedY='TRUE' fixedImageSize='0,0' zoomFactor='1.0' tile='TRUE' doCleanup='TRUE' mode='VERTICAL' clearStencilBitplanes='-1' description='' />
 
</pre>
 
</pre>
  
Using the proposed PolygonBackground node, the image from camera is simply routed to the texture used for the PolygonBackground node.
+
Using the proposed PolygonBackground node, the image from the camera is simply routed to the texture used for the PolygonBackground node.
 
The image assigned to the image outslot of the IOSensor is routed to the texture in the appearance of the PolygonBackground node.
 
The image assigned to the image outslot of the IOSensor is routed to the texture in the appearance of the PolygonBackground node.
  
Line 107: Line 129:
 
<PolygonBackground fixedImageSize='640,480' mode='VERTICAL'>
 
<PolygonBackground fixedImageSize='640,480' mode='VERTICAL'>
 
     <Appearance>
 
     <Appearance>
         <PixelTexture DEF='tex' autoScale='false'/>
+
         <PixelTexture DEF='tex' />
 
         <TextureTransform scale='1 -1'/>
 
         <TextureTransform scale='1 -1'/>
 
     </Appearance>
 
     </Appearance>
Line 118: Line 140:
 
To make the polygon for the background fill the viewport, the PolygonBackground's field fixedImageSize is used for describing the aspect ratio of the image, and the mode field is set to "VERTICAL" or "HORIZONTAL" which describes the way the polygon fits the viewport.
 
To make the polygon for the background fill the viewport, the PolygonBackground's field fixedImageSize is used for describing the aspect ratio of the image, and the mode field is set to "VERTICAL" or "HORIZONTAL" which describes the way the polygon fits the viewport.
  
 +
Alternatively, for more simple cases, the ImageBackground node, which has texCoords and texture fields, can be used instead.
  
 
=== 3.4 Discussion ===
 
=== 3.4 Discussion ===
Proposal A proposes a dedicated node for movie backgrounds, while proposal C proposes a multi-purpose PolygonBackground node. While the later gives more flexibility, it requires details to be elaborated, compared to the former which is more simple. Again, the trade off between simplicity and flexibility/extensibility needs further discussion.
+
Proposal KC1 proposes a dedicated node for movie backgrounds. Proposal KC2 proposes to extend TextureBackground node and repurpose it for fixed textured background. Proposal IR takes more general approach, and proposes a multi-purpose PolygonBackground node, which can contain any type of appearance including shaders. While the latter gives more flexibility, it requires details to be elaborated, compared to the former which is more simple. Again, the trade off between simplicity and flexibility/extensibility needs further discussion.
 
+
  
 
== 4. Supporting color keying in texture ==
 
== 4. Supporting color keying in texture ==
  
=== 4.1 Proposal A ===
+
=== 4.1 Proposal KC1 ===
 
This proposal proposed to add a ‘keyColor’ field to the MovieTexture node, which indicates the color expected to be rendered as transparent, in order to provide chroma key effect on the movie texture. The browser will be in charge of rendering the parts of the MovieTexture with as transparent, and those browser that does not support this feature could simply fall back with rendering the MovieTexture in a normal way (i.e. showing the texture as is).
 
This proposal proposed to add a ‘keyColor’ field to the MovieTexture node, which indicates the color expected to be rendered as transparent, in order to provide chroma key effect on the movie texture. The browser will be in charge of rendering the parts of the MovieTexture with as transparent, and those browser that does not support this feature could simply fall back with rendering the MovieTexture in a normal way (i.e. showing the texture as is).
  
 
<pre>
 
<pre>
MovieTexture:X3DBackgroundNode {
+
MovieTexture:X3DTexture2DNode {
 
     ... // same to the MovieTexture node described in 2.1
 
     ... // same to the MovieTexture node described in 2.1
 
SFColor    [in] keyColor
 
SFColor    [in] keyColor
Line 135: Line 157:
 
</pre>
 
</pre>
  
 +
=== 4.2 Proposal KC2 ===
 +
This proposal does not include this feature.
  
=== 4.3 Proposal C ===
+
=== 4.3 Proposal IR ===
This proposal doesn't include direct solution to this case.
+
This proposal doesn't include a direct solution to this case, since it is not straightforward related to AR applications. Closely related functions in this proposal would be the ColorMaskMode node, the BlendMode, StencilMode and DepthMode as a child of the Appearance node.
Closely related functions in this proposal would be the ColorMaskMode node and BlendMode as a child of Appearance node.
+
 
 +
The ColorMaskMode masks a specific color channel, and this results in color changes in the global image. Rather than resulting pixels in key color to appear transparent, the ColorMaskMode makes color changes in every pixel.
 +
The ColorMaskMode together with the Appearance node's sortKey field (default sortKey is 0, a sortKey smaller than that is rendered first, and greater than another one is rendered last) can also be used to create invisible ghosting objects.
  
The ColorMaskMode masks a specific color channel, and this results color changes in the global image.
 
Rather than resulting pixels in key color to appear transparent, the ColorMaskMode makes color changes in every pixel.
 
 
<pre>
 
<pre>
<ColorMaskMode maskR='TRUE' maskG='TRUE' maskB='TRUE' maskA='TRUE' logFeature='' />
+
<ColorMaskMode maskR='TRUE' maskG='TRUE' maskB='TRUE' maskA='TRUE' />
 
</pre>
 
</pre>
  
 
The BlendMode gives general control over alpha blending function. However, there is no such function that compares the source images with a given key color, which is necessary to have proper result for color keying.
 
The BlendMode gives general control over alpha blending function. However, there is no such function that compares the source images with a given key color, which is necessary to have proper result for color keying.
 +
 
<pre>
 
<pre>
<BlendMode srcFactor='src_alpha' destFactor='one_minus_src_alpha' color='1 1 1' colorTransparency='0' alphaFunc='none' alphaFuncValue='0' equation='none' logFeature='' />
+
<BlendMode srcFactor='src_alpha' destFactor='one_minus_src_alpha' color='1 1 1' colorTransparency='0'  
 +
alphaFunc='none' alphaFuncValue='0' equation='none' />
 
</pre>
 
</pre>
  
 +
To achieve chroma keying for an arbitrary color, you can e.g. use a user defined shader that discards all fragments whose color is equal to the given one.
  
 
=== 4.4 Discussion ===
 
=== 4.4 Discussion ===
Proposal A suggests simpler way to provide a specific color keying function for textures, while C suggests a more generic functions that can achieve required function. Although, the current proposed nodes in proposal C misses certain features to fulfill color keying, therefore needs some modification. Again, the trade off between simplicity and flexibility/extensibility needs further discussion.
+
Proposal KC1 suggests simpler way to provide a specific color keying function for textures, while proposal IR suggests a more generic functions that can achieve required function. Although, the corresponding nodes in proposal IR misses certain features to fulfill color keying out-of-the-box, this can be achieved via shaders. Again, the trade off between simplicity and flexibility/extensibility needs further discussion.
 
+
 
+
  
 
== 5. Retrieving tracking information ==
 
== 5. Retrieving tracking information ==
  
=== 5.1 Proposal A ===
+
=== 5.1 Proposal KC1 ===
 
This proposal suggests using the same CameraSensor node, used for retrieving live video stream, for retrieving tracking information. As described in 2.1, the proposed CameraSensor node includes ‘position’ and ‘orientation’ fields that represent the tracking information of the camera motion.  
 
This proposal suggests using the same CameraSensor node, used for retrieving live video stream, for retrieving tracking information. As described in 2.1, the proposed CameraSensor node includes ‘position’ and ‘orientation’ fields that represent the tracking information of the camera motion.  
  
Line 175: Line 200:
 
The method has its limitations with not supporting tracking information of general objects other than the camera sensor.  
 
The method has its limitations with not supporting tracking information of general objects other than the camera sensor.  
  
=== 5.3 Proposal C ===
+
=== 5.2 Proposal KC2 ===
For retrieving tracking information, this proposal uses the same IOSensor node used for retrieving camera image. The TrackedObject1Camera_ModelView field of the IOSensor node represents the transformation matrix of tracked position of the tracked object (visual marker).  
+
This proposal proposes a new node named "ImagePatch" which provides tracking information of a visual marker. In comparison with Proposal KC1, this is a separate node from a node that represents camera sensor. This allows using multiple visual markers for tracking.
 +
 
 +
<pre>
 +
ImagePatch : X3DARNode
 +
{
 +
MFString [in, out] filename
 +
SFVec3f         [in, out] position
 +
SFRotation [in, out] orientation
 +
}
 +
</pre>
 +
 
 +
This proposal also proposes nodes for retrieving tracking information from sensors, other than vision based tracking. For instance, GPSLocation node provides tracking information from a GPS sensor.
 +
<pre>
 +
GPSLocation : X3DSensorNode
 +
{
 +
SFBool [in, out] status
 +
MFString [in, out] values
 +
}
 +
</pre>
 +
 
 +
=== 5.3 Proposal IR ===
 +
For retrieving tracking information, this proposal uses the same IOSensor node as used for retrieving camera image. In this example, the TrackedObject1Camera_ModelView field of the IOSensor node represents the transformation matrix of the tracked position/orientation of the tracked object (visual marker). However, these are all dynamic fields and depend on the configuration as defined in the pm file.
  
 
<pre>
 
<pre>
Line 189: Line 235:
 
</pre>
 
</pre>
  
The node could support multiple tracking objects by changing the configFile (TutorialMarkerTracking_OneMarker.pm file in the sample code), and defining additional ModelView fields for tracked objects.
+
The node could support multiple tracking objects by changing the configFile (TutorialMarkerTracking_OneMarker.pm file in the sample code), and defining additional Modelview, Projection, etc. fields for tracked objects and/or the camera pose.
  
 
=== 5.4 Discussion ===
 
=== 5.4 Discussion ===
While both proposes to retrieve tracking information from a node that represents a camera sensor, proposal A gives the tracking information of the camera, while C deals with the tracking information of tracked object. This makes proposal C to be more extensible in terms of supporting multiple tracking objects. However, the method used for defining tracking objects and markers through proprietary configuration file  needs to be revised for standardization.
+
While both proposal KC1 and IR proposes to retrieve tracking information from a node that represents a camera sensor, proposal KC1 gives the tracking information of the camera, while proposal IR deals with the tracking information of tracked object. This makes proposal IR to be more extensible in terms of supporting multiple tracking objects. However, the method used for defining tracking objects and markers through proprietary configuration file  needs to be revised for standardization. On the other hand, proposal KC2 proposes a dedicated node for tracking, separated from a camera sensor node. As a result, multiple tracking objects are easily supported by creating multiple instances of this tracking node. Proposal KC2 also has a GPS tracking node, besides computer vision based tracking. GPS-based tracking should be investigated and compared to another proposal by Myeongwon Lee, which was originally discussed at X3D Earth working group [http://www.web3d.org/membership/login/memberwiki/index.php/X3D_v3.3_Specification_Changes#GpsSensor_node].
 
+
  
 
== 6. Using tracking information to change 3D scene ==
 
== 6. Using tracking information to change 3D scene ==
  
=== 6.1 Proposal A ===
+
=== 6.1 Proposal KC1 ===
 
This proposal proposes to use routing method to link tracking information from the CameraSensor node to a Viewpoint node’s position and orientation, in general. This could be also extended by a MatrixViewpoint node (to be described in 8.1) which could have a field to identify the corresponding CameraSensor node, causing the same results without explicitly routing the corresponding fields.
 
This proposal proposes to use routing method to link tracking information from the CameraSensor node to a Viewpoint node’s position and orientation, in general. This could be also extended by a MatrixViewpoint node (to be described in 8.1) which could have a field to identify the corresponding CameraSensor node, causing the same results without explicitly routing the corresponding fields.
  
=== 6.3 Proposal C ===
+
=== 6.2 Proposal KC2 ===
This proposal proposes to use routing method to link tracking information from the IOSensor node to a Transform node of a corresponding virtual object.
+
This proposal also uses routing method for using tracking information in the 3D scene, routing tracking results (position and orientation) to transform nodes.
 +
 
 +
Besides this basic method for using raw tracking information, this proposal also proposes higher level event nodes, such as VisibilitySensor and RangeSensor. VisibilitySensor node triggers events when whether tracking visual marker is detected or lost, while the RangeSensor node triggers event when a tracking object gets close enough within a certain range.
 +
 
 +
<pre>
 +
VisibilitySensor : X3DEnvironmentalSensorNode
 +
{
 +
SFBool [in, out] enabled
 +
SFTime [out] enterTime
 +
SFTime [out] exitTime
 +
SFBool [out] isActive
 +
}
 +
 
 +
RangeSensor : X3DEnvironmentalSensorNode
 +
{
 +
SFBool [in, out] enabled
 +
SFTime [out] enterTime
 +
SFTime [out] exitTime
 +
SFBool [out] isActive
 +
SFInt32 [in, out] sequence
 +
SFString [in, out] lBound
 +
SFString [in, out] uBound
 +
}
 +
</pre>
 +
 
 +
 
 +
=== 6.3 Proposal IR ===
 +
This proposal proposes to use the routing method to link tracking information from the IOSensor node to a Transform node of a corresponding virtual object or viewpoint. Example:
 +
 
 
<pre>
 
<pre>
 
<MatrixTransform DEF='TransformRelativeToCam'>  
 
<MatrixTransform DEF='TransformRelativeToCam'>  
Line 215: Line 288:
 
</pre>
 
</pre>
  
For routing a transform matrix to a transform node, this proposal also proposes a MatrixTransform node that takes a transform matrix directly, rather than using position and orientation fields.
+
For routing a transform matrix to a transform node, this proposal also proposes a MatrixTransform node that takes a transform matrix directly, rather than using position and orientation fields. The render field allows determining visibility.
 +
 
 
<pre>
 
<pre>
 
MatrixTransform : X3DGroupingNode {
 
MatrixTransform : X3DGroupingNode {
Line 224: Line 298:
 
</pre>
 
</pre>
  
=== 6.4 Discussion ===
+
It is of course also possible to route the tracked camera pose (also in orientation/position notation) to the bound Viewpoint node.
While both proposals relies on routing for applying tracking results for updating the 3D scene, as discussed in 5.4, proposal A focuses on updating the Viewpoint node, while proposal C uses for updating a virtual object (or scene). Proposal C also proposes a new type of transformation node for dealing with transformation matrices, while A sticks to traditional position and orientation vectors.
+
There are different field-of-view modes: vertical, horizontal, and smaller. The field-of-view and principal point delivered by the IOSensor can be routed to the viewpoint; example below.
  
 +
<pre>
 +
<Viewpoint principalPoint='0 0' fieldOfView='0.785398' fovMode='SMALLER' aspect='1.0' retainUserOffsets='FALSE'
 +
zFar='-1' jump='TRUE' zNear='-1' description='' position='0 0 10' orientation='0 0 1 0' centerOfRotation='0 0 0' />
 +
</pre>
  
 +
<pre>
 +
<Viewpoint DEF='vp' position='0 0 0' fovMode='VERTICAL‘ />
 +
 +
<ROUTE fromNode='VisionLib' fromField='Camera_PrincipalPoint' toNode='vp' toField='principalPoint'/>
 +
<ROUTE fromNode='VisionLib' fromField='Camera_FOV_vertical' toNode='vp' toField='fieldOfView'/>
 +
<ROUTE fromNode='VisionLib' fromField='Camera_CAM_aspect' toNode='vp' toField='aspect'/>
 +
</pre>
 +
 +
=== 6.4 Discussion ===
 +
While all of the proposals relies on routing for applying tracking results for updating the 3D scene, as discussed in 5.4, proposal KC1 focuses on updating the Viewpoint node, while proposal KC2 and IR allows updating both, the camera as well as a virtual object (or scene). Proposal IR also proposes a new type of transformation node for dealing with transformation matrices, too, while proposal KC1 sticks to traditional position and orientation vectors.
 +
In addition, proposal KC2 proposes higher level event generation nodes that triggers tracking based events such as proximity and visibility.
  
 
== 7. Retrieving camera calibration (internal parameters) information ==
 
== 7. Retrieving camera calibration (internal parameters) information ==
  
=== 7.1 Proposal A ===
+
=== 7.1 Proposal KC1 ===
 
This proposal suggests using the same CameraSensor node, used for retrieving live video stream, for retrieving camera calibration information. As described in 2.1, the proposed CameraSensor node includes a ‘projmat’ field which represents the calibration information of the CameraSensor.
 
This proposal suggests using the same CameraSensor node, used for retrieving live video stream, for retrieving camera calibration information. As described in 2.1, the proposed CameraSensor node includes a ‘projmat’ field which represents the calibration information of the CameraSensor.
  
Line 245: Line 334:
 
</pre>
 
</pre>
  
 +
=== 7.2 Proposal KC2 ===
 +
This proposal takes similar approach to proposal KC1, providing a field representing camera calibration information in the node for live video camera.
  
=== 7.3 Proposal C ===
+
=== 7.3 Proposal IR ===
This proposal suggests using the same IOSensor node, used for retrieving images from camera sensor. Four fields (TrackedObject1Camera_PrincipalPoint, TrackedObject1Camera_FOV_horizontal, TrackedObject1Camera_FOV_vertical, TrackedObject1Camera_CAM_aspect) in this node provides the calibration information.  
+
This proposal suggests using the same IOSensor node, used for retrieving images from camera sensor. Several fields (in this example they are called e.g. TrackedObject1Camera_PrincipalPoint, TrackedObject1Camera_FOV_horizontal, TrackedObject1Camera_FOV_vertical, TrackedObject1Camera_CAM_aspect) in this node provide the calibration information.  
  
 
<pre>
 
<pre>
Line 261: Line 352:
  
 
=== 7.4 Discussion ===
 
=== 7.4 Discussion ===
Both proposals suggest reusing the node that is used for accessing a camera sensor, and using a dedicated field of that node for providing camera calibration information. While proposal A suggests to use projection matrix as a calibration information, C suggests using a set of parameters. The latter approach could be safer in terms of encapsulating the projection matrix of a viewpoint which could be implementation dependent based on what graphics API it is using.
+
All three proposals suggest reusing the node that is used for accessing a camera sensor, and using a dedicated field of that node for providing camera calibration information. While proposal KC1 and KC2 suggest using projection matrix as a calibration information, proposal IR suggests using a set of parameters. The latter approach could be safer in terms of encapsulating the projection matrix of a viewpoint which could be implementation dependent based on what graphics API it is using.
 
+
 
+
  
 
== 8. Using calibration information to set properties of (virtual) camera ==
 
== 8. Using calibration information to set properties of (virtual) camera ==
  
=== 8.1 Proposal A ===
+
=== 8.1 Proposal KC1 ===
 
This proposal suggests a MatrixViewpoint node, which is a child of a scene node which represents a virtual viewpoint calibrated according to the corresponding physical live video camera (on the user's computer). The 'projmat' field represents the internal parameters (or projection matrix) of the MatrixViewpoint. The ‘position' and ‘orientation’ fields represent three dimensional position and orientation of the viewpoint within the virtual space. The ‘cameraSensor’ field represents a CameraSensor node, from which the viewpoint parameters (including projmat, position and orientation fields) of the MatrixViewpoint are updated according to. Once the ‘cameraSensor’ field is assigned with a validate CameraSensor node, the viewpoint parameters are updated according to the values from the CameraSensor node, assigned. Otherwise, it could be also used with routing each parameter of the MatrixViewpoint node from corresponding source of calibrated values.
 
This proposal suggests a MatrixViewpoint node, which is a child of a scene node which represents a virtual viewpoint calibrated according to the corresponding physical live video camera (on the user's computer). The 'projmat' field represents the internal parameters (or projection matrix) of the MatrixViewpoint. The ‘position' and ‘orientation’ fields represent three dimensional position and orientation of the viewpoint within the virtual space. The ‘cameraSensor’ field represents a CameraSensor node, from which the viewpoint parameters (including projmat, position and orientation fields) of the MatrixViewpoint are updated according to. Once the ‘cameraSensor’ field is assigned with a validate CameraSensor node, the viewpoint parameters are updated according to the values from the CameraSensor node, assigned. Otherwise, it could be also used with routing each parameter of the MatrixViewpoint node from corresponding source of calibrated values.
  
Line 279: Line 368:
 
</pre>
 
</pre>
  
 +
=== 8.2 Proposal KC2 ===
 +
This proposal suggests similar approach with proposal KC1, which uses a viewpoint node that accepts camera calibration information in matrix form.
  
=== 8.3 Proposal C ===
+
=== 8.3 Proposal IR ===
 
+
 
<pre>
 
<pre>
 
Viewpoint : X3DViewpointNode {
 
Viewpoint : X3DViewpointNode {
Line 288: Line 378:
 
   SFVec2f  [in,out] principalPoint 0 0
 
   SFVec2f  [in,out] principalPoint 0 0
 
   SFFloat  [in,out] aspect        1.0
 
   SFFloat  [in,out] aspect        1.0
 +
  SFFloat  [in,out] zNear          -1
 +
  SFFloat  [in,out] zFar          -1
 
}
 
}
 
</pre>
 
</pre>
Line 306: Line 398:
 
</pre>
 
</pre>
  
 
+
=== 8.4 Discussion ===
 +
All of the proposals propose a new type of Viewpoint nodes to support camera calibration information described in section 7. While they use different type and number of fields for representing the camera calibration information, they all use same routing method to apply these value to a Viewpoint node. As discussed in 7.4, assigning a projection matrix directly to a viewpoint may result in defects, such as incorrect projections or near-far clipping planes.
  
 
== 9. Specifying nodes as physical object representatives ==
 
== 9. Specifying nodes as physical object representatives ==
  
=== 9.1 Proposal A ===
+
=== 9.1 Proposal KC1 ===
 
This proposal suggests a GhostGroup node for indicating its child nodes being representatives of physical objects for visualizing correct occlusion. The proposed node is extended from Group node to support those geometries of its child nodes are rendered as ghost objects. The browser should render the child nodes only into the depth buffer and not into the color buffer. As a result, the portion of the live video image corresponding to the ghost object is visualized with correct depth value, forming correct occlusion with other virtual objects.
 
This proposal suggests a GhostGroup node for indicating its child nodes being representatives of physical objects for visualizing correct occlusion. The proposed node is extended from Group node to support those geometries of its child nodes are rendered as ghost objects. The browser should render the child nodes only into the depth buffer and not into the color buffer. As a result, the portion of the live video image corresponding to the ghost object is visualized with correct depth value, forming correct occlusion with other virtual objects.
  
 
<pre>
 
<pre>
Group: X3DGroupingNode{
+
GhostGroup: X3DGroupingNode{
 
     ... // same to the original Group node
 
     ... // same to the original Group node
 
}
 
}
 
</pre>
 
</pre>
  
 +
=== 9.2 Proposal KC2 ===
 +
This proposal does not include this feature.
  
=== 9.3 Proposal C ===
+
=== 9.3 Proposal IR ===
 +
This proposal proposes using a ColorMaskMode node to render the geometry not into color buffer, and only to the depth buffer. In addition, a new field "sortKey" is proposed for the Appearance node for making sure the ghost objects are rendered prior to other geometries.
  
See http://www.web3d.org/x3d/wiki/index.php/X3D_and_Augmented_Reality
+
<pre>
 +
<Shape>
 +
  <Appearance sortKey='-1'>
 +
    <ColorMaskMode maskR='false' maskG='false' maskB='false' maskA='false'/>
 +
  </Appearance>
 +
  ...
 +
</Shape>
 +
</pre>
  
 +
=== 9.4 Discussion ===
 +
While proposal KC1 suggests a high level, simple to use approach for a specific application in AR/MR depth occlusion visualization, proposal IR suggest general purpose detail control of the rendering process. Proposal KC1 directly deals with depth buffer values, providing general-case solution for depth occlusion problem. In comparison, proposal IR uses color masking technique to mimic the depth occlusion effect, which could have limitations with incorrect results in dynamic scenes.
  
 +
== 10. Conclusion ==
 +
Table 1 summarizes the difference between proposals, showing what modifications are proposed in each proposals (column) in terms of each functional requirements (row).
  
 +
{| border='1'
 +
|+ Table 1. Comparison of X3D AR proposals ('''Bold''': newly proposed nodes, ''Italic'': modification to standard nodes)
 +
!  !! width="27%"|Proposal KC1 !! width="27%"|Proposal KC2 !! width="27%"|Proposal IR
 +
|-
 +
| Using Live Video stream as a texture
 +
| ''MovieTexture'' node ( or optionally with routing from '''CameraSensor''' node)
 +
| '''LiveCamera''' node, routing to a PixelTexture node
 +
| '''IOSensor''' node, routing to a PixelTexture node
 +
|-
 +
| Using Live Video stream as a background
 +
| '''MovieBackground''' node ( or optionally with routing from '''CameraSensor''' node)
 +
| '''LiveCamera''' node + ''TextureBackground'' node
 +
| '''IOSensor''' node + '''PolygonBackground''' node (or optionally '''ImageBackground''' node)
 +
|-
 +
| Supporting color keying in texture
 +
| ''MovieTexture'' node
 +
| N/A
 +
| N/A (use general shader support)
 +
|-
 +
| Retrieving tracking information
 +
| '''CameraSensor''' node
 +
| '''ImagePatch''' and '''GPSSensor''' node
 +
| '''IOSensor''' node
 +
|-
 +
| Using tracking information to change 3D scene
 +
| routing tracking data from '''CameraSensor''' node
 +
| routing tracking data from '''ImagePatch''' and '''GPSSensor''' nodes + events generated by  '''VisibilitySensor''' and '''RangeSensor''' nodes
 +
| routing tracking data from '''IOSensor''' node
 +
|-
 +
| Retrieving camera calibration (internal parameters) information
 +
| '''CameraSensor''' node
 +
| '''LiveCamera''' node
 +
| '''IOSensor''' node
 +
|-
 +
| Using calibration information to set properties of (virtual) camera
 +
| '''MatrixViewpoint''' node
 +
| ''Viewpoint'' node
 +
| '''Viewfrustum''' and ''Viewpoint'' nodes (alternatively '''MatrixTransform''' node)
 +
|-
 +
| Specifying nodes as physical object representatives
 +
| '''GhostGroup''' node
 +
| N/A
 +
| '''ColorMaskMode''' and ''Appearance'' nodes (together with sortKey field)
 +
|}
  
== 10. Conclusion ==
+
While all of the proposals suggest and cover similar set of functionalities required for supporting AR and MR visualization in X3D, Proposal KC1 and KC2 take the path of relatively higher level control, that provides simpler syntax that could be applied for specific cases for AR and MR. On the contrary, proposal IR introduces more generic purpose nodes and suggests to combine these nodes to implement required AR functions, dealing AR and MR visualization as a special use case of proposed extension. Considering the difference between the proposals, trade off between simplicity and flexibility/extensibility needs further discussion as AR WG proceeds to develop specifications for AR visualization components.
 +
 
 +
In the content authors' point of view, providing higher-level abstracted control gives more simpler and easy-to-use syntax. However, detail control might be missing which could be necessary for applications other than common AR/MR visualization.
 +
 
 +
In the browser implementors' point of view, encapsulating the functions into higher-level components gives more room to choose their own way to implement the given function. However, if further detail control is required and added later for other applications, this could affect the ways how previous higher level components are implemented and may result in need for change in implementation level. Testing each function would be more complicated if low level details are accessible to scene authors, since there are more cases to test in order to make sure each combination of low level components work together in general case.
 +
 
 +
Providing both options could be an alternative, providing multiple choices to the content authors. However, this would give more burdens to the browser implementors, and the specification development would take more effort. Especially when considering the fact, that AR and tracking methods are still a moving target that is far from standardization.

Latest revision as of 15:58, 17 August 2012

By Augmented Reality Working Group, Web3D Consortium

March 21, 2012

1. Introduction

This document compares the existing proposals for extending X3D to support augmented and mixed reality visualization. Three main proposals are compared in terms of requirements – two from Korean Chapter (KC1 and KC2) and one from Instant Reality (IR). Proposal KC1 and KC2 are proposed by Korea Chapter members, Gun Lee and Gerard J. Kim, respectively, while proposal IR is from InstantReality developed by Fraunhofer IGD. The summary of each proposals can be found at X3D and Augmented Reality#Existing Proposals. The third proposal from Korea Chapter by Woontack Woo is not covered in this document since the proposal is not directly related to extending the X3D specification. The criteria used for comparing each proposal is based on the requirements described at X3D AR Requirements and Use cases. In the rest of this document, each section compares the proposals in the aspect of one requirement, summarizing how each proposal deals with the requirement in the subsections, and concluding with discussion. After iterating through all of the requirements, we conclude with summary and discussion.

2. Using Live Video stream as a texture

2.1 Proposal KC1

This proposal proposed a new sensor node, CameraSensor (previously named LiveCamera node), for retrieving live video data from a camera device, and then routing the video stream to a PixelTexture node. The X3D browser is in charge of implementing and handling devices and mapping the video data to the CameraSensor node inside the X3D scene. The video stream itself is provided as a value (SFImage) field of the node which is updated every frame by the browser implementation according to the camera data.

CameraSensor:X3DDirectSensorNode {
   
   SFImage 	[out]		value
   
   SFBool   	[out]         	on       	FALSE
   
   SFMatrix4f	[out]		projmat   "1 0 0 0 … “
   
   SFBool	[out]		tracking	FALSE
   
   SFVec3f	[out]		position
   
   SFRotation 	[out]		orientation 

}

While this straight forward, routing SFImage values might lead to performance and implementation problem. As an alternative, the same proposal also proposed to extend the behavior of the existing MovieTexture node to support live video stream within the node. The proposed behavior X3D browser is to allow users to select a file or a camera device for the MovieTexture node in the scene, if the url field of the node is empty (or filled with special token values, such as ‘USER_CUSTOMIZED’).


<Appearance>
      <MovieTexture loop='true' url=''/> 
</Appearance>

While this approach avoids performance problems by not exposing SFImage fields updated in real-time, it lacks of supports for using live video stream data for other purposes, such as background. This is to be solved partially by adding a new node MovieBackground, which behaves similarly to the MovieTexture but uses the user selected movie file or live video stream from a camera for filling the background of the 3D scene.

2.2 Proposal KC2

This proposal uses similar approach to Proposal KC1 in terms of explicitly defining a sensor node that represents a camera device. The video stream on the image field of the LiveCamera node is then routed to a texture node.

LiveCamera {
	SFString		[in, out]		source		“default”
	SFImage	[out]		image
	SFMatrix4f	[out]		projmat		“1 0 0 …”
	SFBool		[out]		on		FALSE
	SFBool		[out]		tracking		FALSE
	SFVec3f		[out]		position
	SFRotation	[out]		orientation
}

2.3 Proposal IR

This proposal proposes a general purpose IOSensor node, which allows to access external devices (e.g., joysticks and cameras) inside the X3D scene. Note: Due to technical reasons the input and output fields of a device can only be determined at runtime. Hence, most in-/out-slots are dynamically generated based on the device one wants to access.

<IOSensor type='' name='' description='' enabled='TRUE' />

The camera sensor (including marker/poster/... tracking) is loaded through an instance of IOSensor, by defining the type of the sensor and it's fields as specified in the configFile (*.pm). Here is an example:

<IOSensor DEF='VisionLib' type='VisionLib' configFile='TutorialMarkerTracking_OneMarker.pm'>
    <field accessType='outputOnly' name='VideoSourceImage' type='SFImage'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_ModelView' type='SFMatrix4f'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_PrincipalPoint' type='SFVec2f'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_FOV_horizontal' type='SFFloat'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_FOV_vertical' type='SFFloat'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_CAM_aspect' type='SFFloat'/>
</IOSensor>

Using the camera image for texture is nothing more than routing the VideoSourceImage field of the IOSensor node to a PixelTexture node, which can also be part of a Background or Foreground appearance.

2.4 Discussion

Proposal KC1 and KC2 proposes a new node, specific for a camera, while proposal IR proposes a more generic type of node to be applied for variety of sensors. The trade off between simplicity and flexibility/extensibility needs further discussion.

3. Using Live Video stream as a background

3.1 Proposal KC1

The proposal proposed a MovieBackground node, extended from Background node to support ‘liveSource’ field which is assigned with a CameraSensor node (as described in 2.1) from which the Background node receives the live video stream data. Once the ‘liveSource’ field is assigned with a validate CameraSensor node, the background image is updated according to the live video stream from the CameraSensor node, assigned. For other purpose of use, it could also have a url field on which general source of movie clip could be assigned an used as a background.

MovieBackground:X3DBackgroundNode {
     ... // same to the original Background node
     SFString    [in] url
     SFNode 	[in] liveSource
}

Similar to the case in 2.1, the proposal also suggests a different approach where the MovieBackground node doesn’t explicitly need a CameraSensor node, but to let the browser to ask the user to choose the movie source (including camera device) when the url field is left empty (or filled with special token values, such as ‘USER_CUSTOMIZED’).

3.2 Proposal KC2

This proposal proposes to extend the TextureBackground node to support live video background. The video stream image is routed from the LiveCamera node to the frontTexture field. However, since TextureBackground node acts as an environment map there is a problem with the orientation of the TextureBackground, which is world registered and is not fixed to the viewpoint movement. To solve this problem, this proposal proposes to add a Boolean field called the ARmode. When the ARmode flag is true, the orientation of TextureBackground is fixed to the viewpoint, resulting the front side texture remains as a background.

TextureBackground : X3DBackgroundNode 
{
	SFBool 	        [in] 	        set_bind 
	SFBool 	        [in] 	        ARmode
	MFFloat 	[in,out] 	groundAngle 	[] 	[0,π/2] 
	MFColor 	[in,out] 	groundColor 	[] 	[0,1] 
	SFNode	        [in,out] 	backTexture 	NULL 	[X3DTextureNode] 
	SFNode	        [in,out] 	bottomTexture 	NULL 	[X3DTextureNode] 
	SFNode	        [in,out] 	frontTexture 	NULL 	[X3DTextureNode] 
	SFNode	        [in,out] 	leftTexture 	NULL 	[X3DTextureNode] 
	SFNode	        [in,out] 	metadata 	NULL 	[X3DMetadataObject] 
	SFNode	        [in,out] 	rightTexture 	NULL 	[X3DTextureNode] 
	SFNode	        [in,out] 	topTexture 	NULL 	[X3DTextureNode] 
	MFFloat 	[in,out] 	skyAngle 	[] 	[0,π] 
	MFColor 	[in,out] 	skyColor 	0 0 0 	[0,1] 
	SFFloat	        [in,out] 	transparency 	0 	[0,1] 
	SFTime	        [out] 	        bindTime 
	SFBool 	        [out] 	        isBound 
} 

3.3 Proposal IR

This proposal deals with the problem similar to the case for using the camera image for texture. It proposes a PolygonBackground node, which represents a background that renders a single polygon using the specified appearance. It allows for defining an aspect ratio of the background image that is independent of the actual window size. Different modes are possible to fit the image in the window (vertical or horizontal).

<PolygonBackground positions='0 0, 1 0, 1 1, 0 1' texCoords='0 0 0, 1 0 0, 1 1 0, 0 1 0' normalizedX='TRUE' normalizedY='TRUE' fixedImageSize='0,0' zoomFactor='1.0' tile='TRUE' doCleanup='TRUE' mode='VERTICAL' clearStencilBitplanes='-1' description='' />

Using the proposed PolygonBackground node, the image from the camera is simply routed to the texture used for the PolygonBackground node. The image assigned to the image outslot of the IOSensor is routed to the texture in the appearance of the PolygonBackground node.


<PolygonBackground fixedImageSize='640,480' mode='VERTICAL'>
    <Appearance>
        <PixelTexture DEF='tex' />
        <TextureTransform scale='1 -1'/>
    </Appearance>
</PolygonBackground>

<ROUTE fromNode='VisionLib' fromField='VideoSourceImage' toNode='tex' toField='image'/>

To make the polygon for the background fill the viewport, the PolygonBackground's field fixedImageSize is used for describing the aspect ratio of the image, and the mode field is set to "VERTICAL" or "HORIZONTAL" which describes the way the polygon fits the viewport.

Alternatively, for more simple cases, the ImageBackground node, which has texCoords and texture fields, can be used instead.

3.4 Discussion

Proposal KC1 proposes a dedicated node for movie backgrounds. Proposal KC2 proposes to extend TextureBackground node and repurpose it for fixed textured background. Proposal IR takes more general approach, and proposes a multi-purpose PolygonBackground node, which can contain any type of appearance including shaders. While the latter gives more flexibility, it requires details to be elaborated, compared to the former which is more simple. Again, the trade off between simplicity and flexibility/extensibility needs further discussion.

4. Supporting color keying in texture

4.1 Proposal KC1

This proposal proposed to add a ‘keyColor’ field to the MovieTexture node, which indicates the color expected to be rendered as transparent, in order to provide chroma key effect on the movie texture. The browser will be in charge of rendering the parts of the MovieTexture with as transparent, and those browser that does not support this feature could simply fall back with rendering the MovieTexture in a normal way (i.e. showing the texture as is).

MovieTexture:X3DTexture2DNode {
     ... // same to the MovieTexture node described in 2.1
SFColor    [in] keyColor
}

4.2 Proposal KC2

This proposal does not include this feature.

4.3 Proposal IR

This proposal doesn't include a direct solution to this case, since it is not straightforward related to AR applications. Closely related functions in this proposal would be the ColorMaskMode node, the BlendMode, StencilMode and DepthMode as a child of the Appearance node.

The ColorMaskMode masks a specific color channel, and this results in color changes in the global image. Rather than resulting pixels in key color to appear transparent, the ColorMaskMode makes color changes in every pixel. The ColorMaskMode together with the Appearance node's sortKey field (default sortKey is 0, a sortKey smaller than that is rendered first, and greater than another one is rendered last) can also be used to create invisible ghosting objects.

<ColorMaskMode maskR='TRUE' maskG='TRUE' maskB='TRUE' maskA='TRUE' />

The BlendMode gives general control over alpha blending function. However, there is no such function that compares the source images with a given key color, which is necessary to have proper result for color keying.

<BlendMode srcFactor='src_alpha' destFactor='one_minus_src_alpha' color='1 1 1' colorTransparency='0' 
 alphaFunc='none' alphaFuncValue='0' equation='none' />

To achieve chroma keying for an arbitrary color, you can e.g. use a user defined shader that discards all fragments whose color is equal to the given one.

4.4 Discussion

Proposal KC1 suggests simpler way to provide a specific color keying function for textures, while proposal IR suggests a more generic functions that can achieve required function. Although, the corresponding nodes in proposal IR misses certain features to fulfill color keying out-of-the-box, this can be achieved via shaders. Again, the trade off between simplicity and flexibility/extensibility needs further discussion.

5. Retrieving tracking information

5.1 Proposal KC1

This proposal suggests using the same CameraSensor node, used for retrieving live video stream, for retrieving tracking information. As described in 2.1, the proposed CameraSensor node includes ‘position’ and ‘orientation’ fields that represent the tracking information of the camera motion.

CameraSensor:X3DDirectSensorNode {
   
   SFImage 	[out]		value
   
   SFBool   	[out]         	on       	FALSE
   
   SFMatrix4f	[out]		projmat   "1 0 0 0 … “
   
   SFBool	[out]		tracking	FALSE
   
   SFVec3f	[out]		position
   
   SFRotation 	[out]		orientation 

}

The method has its limitations with not supporting tracking information of general objects other than the camera sensor.

5.2 Proposal KC2

This proposal proposes a new node named "ImagePatch" which provides tracking information of a visual marker. In comparison with Proposal KC1, this is a separate node from a node that represents camera sensor. This allows using multiple visual markers for tracking.

ImagePatch : X3DARNode
{
	MFString	[in, out]		filename
	SFVec3f	        [in, out]		position
	SFRotation	[in, out]		orientation
}

This proposal also proposes nodes for retrieving tracking information from sensors, other than vision based tracking. For instance, GPSLocation node provides tracking information from a GPS sensor.

GPSLocation : X3DSensorNode
{
	SFBool		[in, out]		status
	MFString	[in, out]		values
}

5.3 Proposal IR

For retrieving tracking information, this proposal uses the same IOSensor node as used for retrieving camera image. In this example, the TrackedObject1Camera_ModelView field of the IOSensor node represents the transformation matrix of the tracked position/orientation of the tracked object (visual marker). However, these are all dynamic fields and depend on the configuration as defined in the pm file.

<IOSensor DEF='VisionLib' type='VisionLib' configFile='TutorialMarkerTracking_OneMarker.pm'>
    <field accessType='outputOnly' name='VideoSourceImage' type='SFImage'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_ModelView' type='SFMatrix4f'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_PrincipalPoint' type='SFVec2f'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_FOV_horizontal' type='SFFloat'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_FOV_vertical' type='SFFloat'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_CAM_aspect' type='SFFloat'/>
</IOSensor>

The node could support multiple tracking objects by changing the configFile (TutorialMarkerTracking_OneMarker.pm file in the sample code), and defining additional Modelview, Projection, etc. fields for tracked objects and/or the camera pose.

5.4 Discussion

While both proposal KC1 and IR proposes to retrieve tracking information from a node that represents a camera sensor, proposal KC1 gives the tracking information of the camera, while proposal IR deals with the tracking information of tracked object. This makes proposal IR to be more extensible in terms of supporting multiple tracking objects. However, the method used for defining tracking objects and markers through proprietary configuration file needs to be revised for standardization. On the other hand, proposal KC2 proposes a dedicated node for tracking, separated from a camera sensor node. As a result, multiple tracking objects are easily supported by creating multiple instances of this tracking node. Proposal KC2 also has a GPS tracking node, besides computer vision based tracking. GPS-based tracking should be investigated and compared to another proposal by Myeongwon Lee, which was originally discussed at X3D Earth working group [1].

6. Using tracking information to change 3D scene

6.1 Proposal KC1

This proposal proposes to use routing method to link tracking information from the CameraSensor node to a Viewpoint node’s position and orientation, in general. This could be also extended by a MatrixViewpoint node (to be described in 8.1) which could have a field to identify the corresponding CameraSensor node, causing the same results without explicitly routing the corresponding fields.

6.2 Proposal KC2

This proposal also uses routing method for using tracking information in the 3D scene, routing tracking results (position and orientation) to transform nodes.

Besides this basic method for using raw tracking information, this proposal also proposes higher level event nodes, such as VisibilitySensor and RangeSensor. VisibilitySensor node triggers events when whether tracking visual marker is detected or lost, while the RangeSensor node triggers event when a tracking object gets close enough within a certain range.

VisibilitySensor : X3DEnvironmentalSensorNode 
{
	SFBool		[in, out]		enabled
	SFTime		[out]		enterTime 
	SFTime		[out]		exitTime 
	SFBool		[out]		isActive 
} 

RangeSensor : X3DEnvironmentalSensorNode
{
	SFBool		[in, out]		enabled
	SFTime		[out]		enterTime 
	SFTime		[out]		exitTime 
	SFBool		[out]		isActive 
	SFInt32		[in, out]		sequence
	SFString		[in, out]		lBound 
	SFString		[in, out]		uBound 
} 


6.3 Proposal IR

This proposal proposes to use the routing method to link tracking information from the IOSensor node to a Transform node of a corresponding virtual object or viewpoint. Example:

<MatrixTransform DEF='TransformRelativeToCam'> 
    <Shape> 
        <Appearance> 
            <Material diffuseColor='1 0.5 0' /> 
        </Appearance> 
        <Teapot size='5 5 5' /> 
    </Shape> 
</MatrixTransform> 

<ROUTE fromNode='VisionLib' fromField='Camera_ModelView' toNode='TransformRelativeToCam' toField='set_matrix'/> 

For routing a transform matrix to a transform node, this proposal also proposes a MatrixTransform node that takes a transform matrix directly, rather than using position and orientation fields. The render field allows determining visibility.

MatrixTransform : X3DGroupingNode {
 ...
 SFBool     [in,out] render TRUE
 SFMatrix4f [in,out] matrix identity
}

It is of course also possible to route the tracked camera pose (also in orientation/position notation) to the bound Viewpoint node. There are different field-of-view modes: vertical, horizontal, and smaller. The field-of-view and principal point delivered by the IOSensor can be routed to the viewpoint; example below.

<Viewpoint principalPoint='0 0' fieldOfView='0.785398' fovMode='SMALLER' aspect='1.0' retainUserOffsets='FALSE' 
 zFar='-1' jump='TRUE' zNear='-1' description='' position='0 0 10' orientation='0 0 1 0' centerOfRotation='0 0 0' />
<Viewpoint DEF='vp' position='0 0 0' fovMode='VERTICAL‘ /> 

<ROUTE fromNode='VisionLib' fromField='Camera_PrincipalPoint' toNode='vp' toField='principalPoint'/> 
<ROUTE fromNode='VisionLib' fromField='Camera_FOV_vertical' toNode='vp' toField='fieldOfView'/> 
<ROUTE fromNode='VisionLib' fromField='Camera_CAM_aspect' toNode='vp' toField='aspect'/> 

6.4 Discussion

While all of the proposals relies on routing for applying tracking results for updating the 3D scene, as discussed in 5.4, proposal KC1 focuses on updating the Viewpoint node, while proposal KC2 and IR allows updating both, the camera as well as a virtual object (or scene). Proposal IR also proposes a new type of transformation node for dealing with transformation matrices, too, while proposal KC1 sticks to traditional position and orientation vectors. In addition, proposal KC2 proposes higher level event generation nodes that triggers tracking based events such as proximity and visibility.

7. Retrieving camera calibration (internal parameters) information

7.1 Proposal KC1

This proposal suggests using the same CameraSensor node, used for retrieving live video stream, for retrieving camera calibration information. As described in 2.1, the proposed CameraSensor node includes a ‘projmat’ field which represents the calibration information of the CameraSensor.

CameraSensor:X3DDirectSensorNode {
   
   SFImage 	[out]		value
   
   SFBool   	[out]         	on       	FALSE
   
   SFMatrix4f	[out]		projmat   "1 0 0 0 … “
   
   SFBool	[out]		tracking	FALSE
   
   SFVec3f	[out]		position
   
   SFRotation 	[out]		orientation 

}

7.2 Proposal KC2

This proposal takes similar approach to proposal KC1, providing a field representing camera calibration information in the node for live video camera.

7.3 Proposal IR

This proposal suggests using the same IOSensor node, used for retrieving images from camera sensor. Several fields (in this example they are called e.g. TrackedObject1Camera_PrincipalPoint, TrackedObject1Camera_FOV_horizontal, TrackedObject1Camera_FOV_vertical, TrackedObject1Camera_CAM_aspect) in this node provide the calibration information.

<IOSensor DEF='VisionLib' type='VisionLib' configFile='TutorialMarkerTracking_OneMarker.pm'>
    <field accessType='outputOnly' name='VideoSourceImage' type='SFImage'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_ModelView' type='SFMatrix4f'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_PrincipalPoint' type='SFVec2f'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_FOV_horizontal' type='SFFloat'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_FOV_vertical' type='SFFloat'/>
    <field accessType='outputOnly' name='TrackedObject1Camera_CAM_aspect' type='SFFloat'/>
</IOSensor>

7.4 Discussion

All three proposals suggest reusing the node that is used for accessing a camera sensor, and using a dedicated field of that node for providing camera calibration information. While proposal KC1 and KC2 suggest using projection matrix as a calibration information, proposal IR suggests using a set of parameters. The latter approach could be safer in terms of encapsulating the projection matrix of a viewpoint which could be implementation dependent based on what graphics API it is using.

8. Using calibration information to set properties of (virtual) camera

8.1 Proposal KC1

This proposal suggests a MatrixViewpoint node, which is a child of a scene node which represents a virtual viewpoint calibrated according to the corresponding physical live video camera (on the user's computer). The 'projmat' field represents the internal parameters (or projection matrix) of the MatrixViewpoint. The ‘position' and ‘orientation’ fields represent three dimensional position and orientation of the viewpoint within the virtual space. The ‘cameraSensor’ field represents a CameraSensor node, from which the viewpoint parameters (including projmat, position and orientation fields) of the MatrixViewpoint are updated according to. Once the ‘cameraSensor’ field is assigned with a validate CameraSensor node, the viewpoint parameters are updated according to the values from the CameraSensor node, assigned. Otherwise, it could be also used with routing each parameter of the MatrixViewpoint node from corresponding source of calibrated values.

MatrixViewpoint : X3DViewpointNode{
     SFMatrix4f 		[in,out]	projmat
     SFVec3f 		[in,out]	position
     SFRotation 		[in,out]	orientation
     SFNode 		[in,out]	cameraSensor
}

8.2 Proposal KC2

This proposal suggests similar approach with proposal KC1, which uses a viewpoint node that accepts camera calibration information in matrix form.

8.3 Proposal IR

Viewpoint : X3DViewpointNode {
  ...
  SFString [in,out] fovMode        VERTICAL
  SFVec2f  [in,out] principalPoint 0 0
  SFFloat  [in,out] aspect         1.0
  SFFloat  [in,out] zNear          -1
  SFFloat  [in,out] zFar           -1
}

The new fields provide a more general camera model than the standard Viewpoint. The "principalPoint" field defines the relative position of the principal point. If the principal point is not equal to zero, the viewing frustum parameters (left, right, top, bottom) are simply shifted in the camera's image plane. A value of x = 2 means the left value is equal to the default right value. A value of x = -2 means the right value is equal to default. If the principal point is not equal to zero, the "fieldOfView" value is not equal to the real field of view of the camera, otherwise it complies with the default settings.

To extend this idea, the "fovMode" defines whether the field of view is measured vertically, horizontally or in the smaller direction, which is important for correctly parameterizing the aforementioned cinematographic camera. The field ``aspect defines the aspect ratio for the viewing angle defined by the "fieldOfView" range. This setting is independent of the current aspect ratio of the window, but reflects the aspect ratio of the actual capturing device. This extension allows us to model cameras with a non-quadratic pixel format, i.e. it defines (width / height) of a pixel.

In addition to the Viewpoint extension we include a new camera node named Viewfrustum. This node has the two input/output fields "modelview" and "projection" of type SFMatrix4f. With the Viewfrustum node we are able to define a camera position and projection utilizing a standard projection/ modelview matrix pair.

Viewfrustum : X3DViewpointNode {
  ...
  SFMatrix4f [in,out] modelview  (identity)
  SFMatrix4f [in,out] projection (identity)
}

8.4 Discussion

All of the proposals propose a new type of Viewpoint nodes to support camera calibration information described in section 7. While they use different type and number of fields for representing the camera calibration information, they all use same routing method to apply these value to a Viewpoint node. As discussed in 7.4, assigning a projection matrix directly to a viewpoint may result in defects, such as incorrect projections or near-far clipping planes.

9. Specifying nodes as physical object representatives

9.1 Proposal KC1

This proposal suggests a GhostGroup node for indicating its child nodes being representatives of physical objects for visualizing correct occlusion. The proposed node is extended from Group node to support those geometries of its child nodes are rendered as ghost objects. The browser should render the child nodes only into the depth buffer and not into the color buffer. As a result, the portion of the live video image corresponding to the ghost object is visualized with correct depth value, forming correct occlusion with other virtual objects.

GhostGroup: X3DGroupingNode{
     ... // same to the original Group node
}

9.2 Proposal KC2

This proposal does not include this feature.

9.3 Proposal IR

This proposal proposes using a ColorMaskMode node to render the geometry not into color buffer, and only to the depth buffer. In addition, a new field "sortKey" is proposed for the Appearance node for making sure the ghost objects are rendered prior to other geometries.

<Shape>
   <Appearance sortKey='-1'>
     <ColorMaskMode maskR='false' maskG='false' maskB='false' maskA='false'/>
   </Appearance>
   ...
</Shape>

9.4 Discussion

While proposal KC1 suggests a high level, simple to use approach for a specific application in AR/MR depth occlusion visualization, proposal IR suggest general purpose detail control of the rendering process. Proposal KC1 directly deals with depth buffer values, providing general-case solution for depth occlusion problem. In comparison, proposal IR uses color masking technique to mimic the depth occlusion effect, which could have limitations with incorrect results in dynamic scenes.

10. Conclusion

Table 1 summarizes the difference between proposals, showing what modifications are proposed in each proposals (column) in terms of each functional requirements (row).

Table 1. Comparison of X3D AR proposals (Bold: newly proposed nodes, Italic: modification to standard nodes)
Proposal KC1 Proposal KC2 Proposal IR
Using Live Video stream as a texture MovieTexture node ( or optionally with routing from CameraSensor node) LiveCamera node, routing to a PixelTexture node IOSensor node, routing to a PixelTexture node
Using Live Video stream as a background MovieBackground node ( or optionally with routing from CameraSensor node) LiveCamera node + TextureBackground node IOSensor node + PolygonBackground node (or optionally ImageBackground node)
Supporting color keying in texture MovieTexture node N/A N/A (use general shader support)
Retrieving tracking information CameraSensor node ImagePatch and GPSSensor node IOSensor node
Using tracking information to change 3D scene routing tracking data from CameraSensor node routing tracking data from ImagePatch and GPSSensor nodes + events generated by VisibilitySensor and RangeSensor nodes routing tracking data from IOSensor node
Retrieving camera calibration (internal parameters) information CameraSensor node LiveCamera node IOSensor node
Using calibration information to set properties of (virtual) camera MatrixViewpoint node Viewpoint node Viewfrustum and Viewpoint nodes (alternatively MatrixTransform node)
Specifying nodes as physical object representatives GhostGroup node N/A ColorMaskMode and Appearance nodes (together with sortKey field)

While all of the proposals suggest and cover similar set of functionalities required for supporting AR and MR visualization in X3D, Proposal KC1 and KC2 take the path of relatively higher level control, that provides simpler syntax that could be applied for specific cases for AR and MR. On the contrary, proposal IR introduces more generic purpose nodes and suggests to combine these nodes to implement required AR functions, dealing AR and MR visualization as a special use case of proposed extension. Considering the difference between the proposals, trade off between simplicity and flexibility/extensibility needs further discussion as AR WG proceeds to develop specifications for AR visualization components.

In the content authors' point of view, providing higher-level abstracted control gives more simpler and easy-to-use syntax. However, detail control might be missing which could be necessary for applications other than common AR/MR visualization.

In the browser implementors' point of view, encapsulating the functions into higher-level components gives more room to choose their own way to implement the given function. However, if further detail control is required and added later for other applications, this could affect the ways how previous higher level components are implemented and may result in need for change in implementation level. Testing each function would be more complicated if low level details are accessible to scene authors, since there are more cases to test in order to make sure each combination of low level components work together in general case.

Providing both options could be an alternative, providing multiple choices to the content authors. However, this would give more burdens to the browser implementors, and the specification development would take more effort. Especially when considering the fact, that AR and tracking methods are still a moving target that is far from standardization.