October 5, 2009

Augmented Reality Scale Update (Reality Recognition)

by Thomas K. Carpenter in augmented reality4 Comments

About a month ago, I proposed an augmented reality scale to help us define different applications of the technology.  While the post received good feedback, I felt that it could use more refinement. 

This update will define the scaling between 1 and 10 for the Reality Recognition axis.  This will help others use it, as previously, the definitions behind numbers on the axis were too fuzzy and therefore not intuitive enough to be usable.  I’ll give more definition to the other axis, Perceived Reality, in a later post. 


RIM Scale (Blank)













First, a refresher on how the scale works.  The RIM scale is composed of two axis: Perceived Reality (PR) and Reality Recognition (RR).  I chose two axis because AR exists through the mixing of reality and the virtual.

The Perceived Reality axis shows us how the graphics are indistinguishable from reality (on a scale from one to ten).  The Reality Recognition axis explains how completely computers understand the world (on a scale of one to ten).  A total score for an application of the augmented reality concept can be given by using the two numbers as a vector (#,#).

Previously, I only gave a few points on each axis giving a general idea of how to score a particular application of the technology.  Now I would like to refine that.  On the Reality Recognition axis, we’re trying to define how computers see the world.  I broke the axis down into the three types of objects the computer will need to understand:

1) Non-moving structural objects – buildings, trees, landmarks, roads, lightposts, signs, etc.  Anything that can be constructed and doesn’t move regularly.  These will be best defined by GPS systems since they do not move.  They will form the outline of the augmented world. 

2) Moving objects and people – Cars, cans, books, posters, desks, people, dogs, etc.  All the pieces between that move.  They will be best defined by local object/person recognition. 

3) Information – the invisible data that exists without an object.  Temperatures, pollution levels, CO2 levels, ocean currents, etc.  This is information that can be collected by a sensor (Pachube or other data collector), but not information that defines an object (information about an object will already be tied to it).

Each category (Non-moving, moving and information) has three degrees of implementation.  Simply put, each one can be low (1) , medium (2)  or high (3) implementation. 

Non-moving Structural Objects

A low implementation value for a structural object like a building would mean that only the general GPS location was defined.  One could find the building and attach wiki or other information within an AR network, but you could not know its shape and size.  A medium implementation value would give size and shape in a general manner.  A high implementation would show the building down to the finer details and might require local object recognition to define it. 

All current Nearest-X type of AR apps are at a low implementation for structural places like a Starbucks or a subway station.  As projects like Microsoft’s photosynth or Google Earth define the size and shape of the world, then this category will move to medium implementation.

Moving Objects and People

For moving objects, we’re still within the primitive low implementations because object/person recognition works only on narrowly defined applications.  Sein’s SREngine, Sony’s Vision Library or Zugara’s motion capture game Cannonballz use elements of local recognition.  A medium implementation, within a category, could choose between a large data set. 

For example, at medium, computers could recognize any face on the planet (given a data set to compare to).  But at medium levels, we’ll have to be concerned with personal freedoms because computers can then be used to track people much the same way usage habits are currently tracked on the Internet. 


The last category Information is the least advanced of the three.  Mostly this is because the hardware side is both limited and expensive.  GPS data is supplied by the government and cell phones give us a handy tool to utilize it for structural places.  Object recognition requires a camera, but since cell phones have them and are ubiquitious, this data is also accessible. 

Information like pollution data or temperature sets requires a sensor to collect it.  While the world is filled with billions of sensors, most are not connected to the Internet and therefore are not usable within an AR system.  These sensors are also often proprietary. 

Companies like Pachube aim to change this by supplying sensors that can post data to the Internet (if you’re not familiar with why this is important read: Pachube, Patching the Planet: Interview with Usman Haque), therefore, giving information to the general public that can be put to good use.  However, either sensors must be put into place or environments tracked (Extended Enviroments Markup Language (EEML)). 

The placement of sensors or accessing environments creates a barrier to widespread implementation.  Creative workarounds can be used to collect information without dedicated sensors.  Crowdsourcing information can fill in the gaps, though the information might be messy, the wisdom of the crowd can transform it to be a usable source.


The breakdown of Reality Recognition into three parts with three levels gives it an easy scoring method.  Each section is worth three points:

Non-moving structural objects – High (3), Medium (2), Low (1) or None (0). 

Moving objects or people – High (3), Medium, (2), Low (1) or None (0).

Information – High (3), Medium (2), Low (1) or None (0). 

Since I’ve scaled it from one to ten, all these scores are added to the base score of one. 

The recently released Cyclopedia would score a two on the scale since it doesn’t understand moving or informational data and only scores as a low on the structural side.  This concept AR Playbox video below would score a three on the Reality Recognition scale because it would have to know general structural objects within the game area and also have to keep track of a moving person. 


Hopefully this more nuanced scale can help visualize the level of augmented reality and suggest areas that need refinement as we move towards an integrated Web 3.0 world.  If you have agreements, disagreements or suggestions for the scale, please leave a comment.


Thomas K. Carpenter

Thomas K. Carpenter is a full time contemporary fantasy author with over 50 independently published titles. His bestselling, multi-series universe, The Hundred Halls, has over 25 books and counting. His stories focus on fantastic families, magical academies, and epic adventures.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}