Interactive Digital Video, Visualization, and Virtualization

The new world is coming together: interactivity, convergence, mobility, content and high-speed access - to create a new networked world.

As computer and networking capabilities continue their pace of improvement, electronic information resources can become vastly more user-friendly and effective. Video offers one of the most compelling ways to present information. It is said that a picture is worth a thousand words.

The U.S. has a fully developed infrastructure for analog video. This infrastructure, however, cannot be extended in a cost-effective manner to serve the many new capabilities, which are possible with digital video. For example, dynamic interactivity with analog video is extremely difficult whereas digital video is ideal for random and dynamic assembly of video content. Therefore, most significant advances in video technology will be digital.

Computers are the current devices of choice for interactive access to information. Through Compact Discs (CDs), the World Wide Web, and Digital Video Disks (DVDs), interactive content (with rudimentary video) is available to computer users. But computers are ultimately limited because they are unlikely to ever achieve the widespread market penetration of telephones or televisions. The telephone has long been recognized as a vital, "universal access," instrument for all Americans. Yet the market penetration of the telephone (93.9%) lags that of the television (98%). Therefore, the digital television is an optimal device for empowering the public with access to our information rich future.

Fully realizing the possibilities of information distribution through our emerging digital television infrastructure is a major challenge and opportunity. Even though television is generally thought of as a passive receive device, the new digital televisions will be capable of very low cost interactivity through the development of push technologies. These interactive services will include information access, home shopping services, and public services (such as access to government information).

Digital video on the corporate user's desktop is today a novelty. While today it is commonplace for professionals to develop their own presentations with graphics, color and other effects to improve impact, video is seldom considered. If digital video was easy enough to produce, many opportunities for professional communications would be done with video. By extending the capabilities of desktop digital video, it can become a mainstream application of substantial value in day-to-day business.

As the U.S. makes the transition to digital television, the functionality of televisions and personal computers is likely to become closer.

Today when we think of digital video, we contemplate video programming delivered to the home, videoconferencing and other applications that have parallels in the analog world. Digitization of moving pictures will open up a new realm of possibilities for active participation with interactive video. Video hyperlinks will probably be a key component supporting a whole new level of home shopping experiences and virtual video libraries. As we move forward, digital video will include a mix of real and synthetic content with various possibilities for interaction. Virtual telepresence, for example, allows multiple people to "meet" in a virtual space and interact much as though the space was real. As real and synthetic content are merged real-time in an interactive experience, the reality of simulations can be greatly enhanced. Multiple participants can collaborate in live video simulations. Furthermore, availability of important simulation technology can be extended to a much wider audience, even to the general public using consumer televisions. Virtual reality technologies relying upon realistic motion image quality are needed to facilitate these advances. Stereo viewing has long been a desirable idea that has never found an appropriately practical solution.

Interactive Digital Video

1. Advanced interactive video applications and effects including telepresence, personal avatars, virtual teleconferencing, etc.

2. Compression technologies particularly those related to MPEG-4 hybrid synthesis (including VRML as applied to hybrid composition).

3. Tools and key technologies to enable datacasting technologies (interactive push technologies particularly to consumer televisions).

4. Tools and key technologies to enable interactive networked video applications for customer support, training, and other information services.

5. Interactive video applications for public networks, particularly for consumers, which can demonstrate inexpensive on-demand dynamically-controllable access to video-enabled consumer information.

(ADVENT is greeting the advent of digital video. The television of the future will be composed of zeros and ones.)

Called ADVENT (All-digital Video Encoding, Networking, and Transmission), the service is organized by CTR's Image and Advanced Television (IATV) Laboratory. Here, principal investigators and more than 20 graduate students perform pioneering research in the field of digital video and provide hands-on technical expertise.

The laboratory, which is developed over the past seven years, contains the most advanced research equipment for displaying full-resolution HDTV signals in all formats. ADVENT affiliates from industry are encouraged to send researchers to the IATV Laboratory to participate in the research.

Work currently under way in the IATV Laboratory includes research on wavelets, the new function that provides an alternative to the classic Fourier functions. Other research focuses on the development of multimedia workstations that integrate digital video with data, voice, and graphics. Another area of research investigates a multiresolution technique for compression and transmission.

"Digital video will be to this decade what digital audio-including compact disks-was to the 1980s," says electrical engineering professor Dimitris Anastassiou in explaining the rationale for the new service. "What we have to offer is actual expertise, with well-trained graduate students working on the most advanced equipment."

The Evolution of the Analog Set-Top Terminal to a Digital Interactive Home Communications Terminal by H. Allen Ecker and J. Graham Mobley (selected sections)

While current cable systems use analog signals for video and audio, advancements in digital technology now allow cable systems to add a digital video layer to increase channel capacity with little or no increased distribution bandwidth.

Two technological breakthroughs in digital processing are clearing the way for digital video and audio program content. The first is the adoption of standards for digitizing, compressing, and decompressing video programming. In 1992, the Moving Picture Experts Group (MPEG) of the ISO set out to develop a set of standards for digitizing and compressing an analog video signal. This international standards group laid the groundwork for standardizing the algorithms, syntax, and transport format necessary to allow interoperability among different suppliers of video compression and decompression systems. The attitude at the outset was that if digital television was to flourish, equipment built by different vendors must be compatible and interchangeable. The international standards adopted by the MPEG committee allow freedom to be creative in the encoding process within the parameters of the defined "encoding language" while maintaining compatibility with standard MPEG decoders. The analogy in the computer programming world is to say that software programmers can approach a programming problem in many different ways; however, if their program is written in C, for example, any C compiler can compile and execute the program.

The second development was a means for delivering the digital signals to the customer. Schemes for modulating and demodulating an RF carrier using either quadrature amplitude modulation (QAM) or vestigial sideband modulation (VSB) have been developed. These approaches are compatible with standard analog cable systems and can deliver data information rates up to 36 Mbps in a single 6-MHz channel. The combination of digital compression of video and audio and digital transmission over cable can increase the number of video services in a single 6-MHz channel by a factor of approximately 5 to 10 depending on programming content and picture resolution.

Not only does digital processing increase capacity, but it also allows greater flexibility in the types of services that can be provided. No longer will there be the restriction that the information provided be a video signal. Digital computer games, digital music, and other multimedia applications will all be likely candidates for the new broadband digital cable.

The cable set-top terminal needed to fully utilize these new services will evolve to a fully interactive home communications terminal (HCT) that allows client/server functionality.

Steve Case: (selected sections from Keynote Address of Internet World Convention Los Angeles, California, April 5, 2000)

If you just think about the four devices we rely on the most in our homes: the television, the PC, the telephone and the stereo, this trend becomes pretty interesting.

Already, the distinctions between these four devices are blurring -- and interactivity is starting to connect all of them together.

Soon, televisions will come equipped with portals, and people will be able to bookmark their favorite programs like they bookmark their favorite Web sites. They will even be able to access interactive services like e-mail and Instant Messaging while they're watching TV.

Through digital delivery, music and streaming videos will be available on demand. People will be able to store their music on servers in their homes, or keep it in online jukeboxes they can carry with them.

For the telephone, the distinction between long distance and local calling will disappear -- and communications will be integrated with the TV and the PC. You might be trading instant messages with somebody and then decide to switch to voice -- and you'll do it quickly and easily and inexpensively. Or you'll be able to answer the phone through the TV and make a video call.

The role of the PC in the home will change too. Just as the TV has evolved in many houses from a single console in the living room, people will have interactive devices all through the house. A recent AOL survey found that 52% of people online are already rearranging the furniture for the PC. By the way, that statistic even exceeded our favorite one that said the majority are skipping breakfast to log on.

The biggest change of all will likely come with home networks -- that, just like electricity and plumbing, will run through homes linking together all four of these devices as well as others. These networks will help transform the way people live -- dramatically increasing our choices and opportunities, and making everything simpler and more convenient.

The fact is, the first steps of convergence already are driving consumers' expectations -- and the more they get, the more they want.

And ultimately, we will see the rise of the multi-platform Webtop in fully connected households with a range of applications and appliances that will extend the power and convenience of the Internet to every room of the house, and make the benefits of the Internet accessible to every person in the house.

That's a big part of the idea behind our second announcement: the exciting new family of AOL/Gateway Internet appliances.

First, there will be a countertop appliance that provides access to a customized version of the AOL service over a device that is small and light enough to be placed in highly trafficked areas of the house like the kitchen or the family room.

Next, here is a wireless Web Pad that will make accessing the Internet even simpler and more convenient. By simply touching the web pad's screen or using the accompanying wireless keyboard, users will be able to experience a wide range of the Web's content and features from anywhere within the home.

Finally, we are very excited to unveil a desktop appliance -- a simplified Internet device that will be a great, lower-cost alternative for families which need more than one computer that is shared by everyone, and for people seeking alternative ways to connect to the Web.

How Education Will Change, by Crawford Kilian, July/August 1997 (selected section)

It's NOT the wave of the future. Within a decade, interactive digital video (maybe we'll call it "face mail") could make us pine for the good old days of e-mail, when literacy almost made a comeback. The F2F experience, whether live or asynchronous, will dominate the Net/Web/next damn thing because it will provide more information, especially nonverbal, than text-based media can ever hope to.

If the present text-based systems won't last, and interactive digital video is the real wave of the future, why are we killing ourselves to master the equivalent of the 78 rpm record? Why not wait until technology provides a real equivalent to the interactivity of the classroom?

We are trying to master a transient technology while superior technologies loom in the future. But remember the wonderful moon base in 2001: A Space Odyssey? Didn't quite happen like that, did it? In 30 years, moon landings have gone from SF to nostalgia item. Computer technology may not deliver all the bells and whistles it's promised; in the meantime, we've got a text-based system that lets us communicate with colleagues across the planet and students from next door to the Negev. We may as well get good at it.

If the classic student-teacher relationship is psychologically satisfying, so what? Slave-owners used to think they really benefited their slaves. Who among us hasn't enjoyed a quiet, sadistic thrill at announcing what the class would have to do on the big term paper, or to prepare for a quiz? To see all those people acknowledging our power, our superiority . . . it's a buzz, all right. Like a bottle of good vodka, and ultimately just as dangerous.

Because no one knows much about online learning, we're all de facto learners here; we're lucky if we're two jumps ahead of our students. I think that's much healthier for us, as teachers, than to feel we're the Lord High Pedagogues, exempt from having to learn anything more. If we consider ourselves continuing learners in our disciplines as well, we will deal with our students more constructively. We will learn more and better, and so will they.

One of the things we will learn about is our own role and the function of institutional education in an online world. We may sometimes be a stimulus to our offline colleagues, and sometimes just a pain in their collective ass. If we are to serve them well, we need to show them how the computer can help them do what they want to do anyway, instead of making them feel they've been shanghaied to destinations they didn't choose and don't care about. Technology should offer them choices, not requirements.

Exactly the same is true of our students. We should be helping them advance toward their own goals, not co-opting them for our own. This doesn't mean allowing them to fool around aimlessly; it means encouraging them to be self-propelled toward ambitious but realistic goals. "Cybernetics" comes from the Greek word for "steersman"; we should teach students how to steer for themselves.

If we are going to offer real online access to colleagues and students, sometimes we are going to have to be wet blankets, counseling conservatism and caution. Too many of us have rushed into ill-conceived projects, imagining that good intentions and elbow grease would make up for real lacks in planning or hardware or tech support. Serves us right if our offline colleagues begin to think that online learning really is glamorous, and jump off into the deep end only to get into trouble. We owe them better counsel; the ultimate payoff is too big to risk.

That payoff, I believe, will be a radical re-conception of the learning process and the roles of the participants. Somewhere in the fairly recent past, education fell into the hands of the bean counters. Nowhere in Plato do we learn how many evening symposiums were required for a Socratic certificate. Alexander the Great never had to send back to Aristotle for a transcript of his grades. When Paul had his revelation on the road to Damascus, he didn't hand in a term paper on what he'd learned (nor did he cite God's question as a "personal communication" in the footnotes), and his epistles did not appear in refereed journals. Custer went to West Point; Crazy Horse didn't.

J. Alfred Prufrock measured out his life with coffee spoons. We measure out our own in credit-hours and essays submitted and MLA-approved citation format. This bureaucratization generates a lot of clerical work and committee meetings, but I really doubt that it advances genuine self-propelled learning.

After all, what we learn ought to surprise us, open up unexpected opportunities, create whole new industries and cultures. Bureaucrats can deal only with the known, the predictable, the measurable. No bureaucrat anticipated the Internet; no bureaucrat can control the World Wide Web.

We online teachers are domesticated beasts suddenly at liberty, like the conquistadors' horses running wild on the Texas plains. If we can learn how to be free, and how to stay free, then we can teach the same freedom to our students. I can't imagine a nobler calling.

Internet2 Digital Video Project

Something that caught my attention lately was a newspaper article about a college that is getting on Internet 2. They are putting computers with digital video client, server, and production software in all the dorm rooms, and compared it as that the web allowed everyone to become a publisher, this will allow everyone to become producers. And the universities may not even realize that yet, but in many ways, I think the universities' campuses during the next two or three years is going to be the test beds. And by that, I mean really a social test bed as to how network video can inform instruction and research. For example, the Internet2 Digital Video Project which envisions information exchange among the major universities, where they can show distance learning classes, special colloquia, or they can show modeling animation of scientific projects. There are many things that can be done as information exchange across the universities -- not only nationally but also internationally.

Internet2 (http://dv.internet2.edu/) is a project of the University Corporation for Advanced Internet Development. Membership to the I2 DV initiative is open to any individual or organization affiliated with I2 member universities, which are seriously interested in digital video.

The Internet2 Digital Video initiative is developing a wide range of advanced digital video capabilities for the national research community. These efforts are enabling a new generation of digital video applications that take full advantage of the potential of high performance networks. Participants in this initiative are leading by undertaking technology research and development activities in areas where it is required and encouraging it in areas in which there are existing efforts. Some of these efforts are directed at designing and deploying Internet2 guided services and capabilities on a national scale. Others are directed at exploring, reporting on and recommending guidelines and standards for the I2 community, and representing that community in standards development forums. The I2 Digital Video initiative engages cooperatively with other advanced digital video development consortiums, gathers and disseminates information about leading edge projects (including regional and individual efforts), and creates best practices guidelines. In addition, this initiative organizes efforts, if required, that focus on creating, gathering, managing, and transmitting digital video content.

Partners:

1. ResearchTV

A collaboration founded by a core group of leading accredited research universities and research organizations to experiment with opportunities to expand high bandwidth modes of delivery and exchanges in educational and research-oriented information.

2. The Video Development Initiative (ViDe)

The goal of The Video Development Initiative (ViDe) is to promote the deployment of digital video in higher education by leveraging collective resources and expertise towards addressing challenges to deployment - poor interoperability, volatile standards and high cost. A multi-institutional effort, ViDe is composed of representatives from CANARIE, The Georgia Institute of Technology, George Washington University, North Carolina State University, NYSERNet (New York State, Educational, and Research Network), Ohio State University, The University of Hawaii, The University of North Carolina at Chapel Hill, The University of Tennessee, Knoxville, The University of South Carolina, Vanderbilt University, The College of William and Mary, and Yale University.

In May 1999, ViDe expanded its membership from the four founding institutions (Georgia Institute of Technology, The University of North Carolina-Chapel Hill, The University of Tennessee, Knoxville, and North Carolina State University) to include nine additional institutions/organizations sharing ViDe technical directions and goals.

3. International Center for Advanced Internet Research (iCAIR)

The mission of the center is to accelerate leading-edge innovation and enhanced digital global communications through advanced Internet technologies, in partnership with the international community.

The Center accomplishes that mission by undertaking projects in four key areas, advanced Internet applications, advanced middleware and metasystems, advanced infrastructure and policy. Virtually all sectors of the national economy require new types of network-based applications supported by an advanced information infrastructure -- an internetworking fabric capable of providing high performance, reliable, high capacity communication services that can be rapidly scaled and readily managed.

4. NYSERNet

NYSERNet has been involved in video-over-IP conferencing technologies since 1996 to learn how the technologies can assist New York's academic community in education and research. NYSERNet's focus has been to examine these technologies as they apply to collaborative research, telemedicine, and distance learning.

Its mission is to enable collaboration and promote technology transfer for research and education. The major means it uses to accomplish this mission is to advance network technologies and applications. Another goal of NYSERNet is to expand the benefits of advanced networking to government, industry, and the broader community.

5. North Carolina Networking Initiative

The North Carolina Networking Initiative (NCNI), is a next-generation information technology, networking and Internet program with a hybrid focus. One focus is on the research, applications development and infrastructure of the next-generation technologies, and the other focus is on early deployment of these technologies and capabilities to serve the production needs of education, business and government. One of the first activities was to establish the infrastructure for the North Carolina GigaPop.

The NC GigaPop was initially established in May 1996 by Duke University, North Carolina State University (NCSU), UNC at Chapel Hill, MCNC, Cisco, IBM, Nortel and Time Warner Communications. In early 1998 Wake Forest University and Alcatel joined as partners, and discussions are being held with other potential partners from industry and universities.

The intent from the outset has been to address both academic and business interests with the result that NCNI maintains a healthy tension between advanced research opportunities and early deployment of production services and applications. One of the goals is to exploit emerging technologies and applications to the competitive advantage of the Universities and business participating in NCNI.

6. MREN/CIC

Founded in 1958, the CIC is the academic consortium of 12 major teaching and research universities. The continued goal of the CIC is to sustain and enhance current cooperative academic activities among the institutions and to address the ever-changing needs of higher education from curriculum to technology to research.

MREN/CIC Digital Video Project emphasizes the investigation of leading edge digital video technologies and an identification of those applications, which can benefit from high-end digital video technology.

7. CAVNER (Center for Advanced Video Network Engineering & Research)

CAVNER is an independent research initiative dedicated to the promotion and application of networked video technologies. Evolving from work done in the early 1990s at the University of North Carolina at Chapel Hill, CAVNER coordinates and promotes relationships between developers of video network technology and end users. CAVNER's primary goal is to effectively test and utilize new, experimental video network applications in real-world situations. Collaborators benefit from access to uncommercialized, yet practical, technology-in-development.

8. The Southeastern Universities Research Association (SURA)

SURA is a consortium of colleges and universities in the southern United States, including the District of Columbia, established in 1980 as a nonstock, nonprofit corporation. SURA serves as an entity through which colleges, universities, and other organizations may cooperate with one another and with government in acquiring, developing, and using laboratories and other research facilities and in furthering knowledge and the application of that knowledge in the physical, biological, and other natural sciences and engineering. SURA's IT Strategy is in direct support of the SURA mission, fostering collaboration through enhancements to regionally available network infrastructure and through explorations and/or sponsorship of investigations into key developing information technologies. SURA initially sponsored and continues to support ViDe (The Video Development Initiative) and also undertakes its own activities in digital video with staff serving as active members of ViDe and also as current co-chair of the Internet2 Digital Video Steering Committee.

RM3D Initiative and the MPEG Formats

(IDG) -- The Web3D Consortium on Tuesday announced the Rich Media 3D (RM3D) initiative, aimed at developing an open standard for rich media content, including 3-D graphics, video, and audio, to be transmitted over the Internet and used in digital broadcast applications.

The Web3D Consortium has created an RM3D working group that aims to develop a straightforward way of communicating content to ensure that new types of Internet devices, including Internet terminals and set-top boxes, can receive and present rich media, said Neil Trevett, president of the Web3D Consortium. The working group also aims to simplify the development of content, Trevett said in an interview.

The initiative is being driven by equipment makers and companies that supply graphics and content, the Web3D Consortium said in a release Tuesday. Representatives of 3Dlabs, ATI Technologies, Eyematic Interfaces, iVAST, OpenWorlds, Shout Interactive, Sony, SRI International, and the Austrian company Uma, comprise the working group. They intend to publish an initial draft standard next month and final specifications in December.

Much of the technology behind an RM3D specification already has been developed by Sony for the emerging interactive television market. Content developers have become involved in further development of the RM3D standard because a lack of open standards is an impediment to their business, the Web3D Consortium release said.

"Sony has done significant work in this area and is offering it to the group as a potential starting point," Trevett said.

One of the first tasks of the working group will be to determine if the work Sony has done so far -- under the project name Blendo -- is a reasonable starting point, Trevett said.

"I personally expect the group to embrace Blendo, which Sony has generously put into the open-standards track," Trevett said. "That will be the mechanics by which we can achieve rapid progress [on the standard]."

The RM3D initiative is intended to closely interoperate with the MPEG-4 standards group, to enhance the Web3D Consortium's contributions to that standard, so that in future versions of the MPEG-4 specification it will be possible to include 3-D graphics along with audio and video. RM3D will also interoperate with current Web3D technologies such as VRML (Virtual Reality Modeling Language).

Established in 1988, the Moving Picture Experts Group (MPEG) is a working group of the International Standards Organization (ISO)/International Electro-technical Commission (IEC). MPEG is a committee within ISO/IEC that's open to experts who are "duly accredited by an appropriate National Standards Body," according to the group's home page (www.cselt.it/mpeg/).

While MPEG-4 is one of the latest standards birthed by the group, MPEG-1 is the standard on which products such as Video CD and MP3 are based. MPEG-2, essentially developed for the compression and transmission of digital TV signals, is the standard on which products like digital TV set-tops and DVD are based. And what of MPEG-3? Targeted at HDTV (high definition television), it was folded into the MPEG-2 standard.

MPEG-4, formally referred to as the "Coding of Audiovisual Objects," is the standard for multimedia on the Web. In 1993, MPEG-4 started out as "Very Low Bitrate Audiovisual Coding."

While MPEG-1 and MPEG-2 deal with frame-based video and audio, the MPEG-4 standard describes digital AV scenes as "AV objects" that have certain relations in space and time. Because it's based on objects, MPEG-4 is a powerful standard that offers a new kind of interactivity (with each AV object, and at the levels of coding, decoding or object composition). It also allows for the integration of objects of different natures (e.g. natural video, graphics and text), according to the group.

The group says that targeted applications for the standard include: Internet multimedia, interactive video games, interpersonal communications (videoconferencing, videophone); interactive storage media (optical disks); multimedia mailing; networked database services (via ATM, etc.); remote emergency systems; remote video surveillance; wireless multimedia; and broadcasting.

In terms of specific functionality, MPEG-4 provides a standardized way to describe a scene. For example, it allows the technology user "to place media objects anywhere in a given coordinate system; apply transforms to change the geometrical or acoustical appearance of a media object; group primitive media objects in order to form compound media objects; apply streamed data to media objects, in order to modify their attributes (e.g. a sound, a moving texture belonging to an object; animation parameters driving a synthetic face); and change, interactively, the user's viewing and listening points anywhere in the scene," according to "Overview of the MPEG-4 Standard/Executive Overview," edited by Rob Koenen. (See figure 1, or original chart in Web overview.)

Of course, MPEG-4 is not the MPEG to end all MPEGs. Work on MPEG-21, "Multimedia Framework," began this past June, and current efforts are focused on MPEG-7, "Multimedia Content Description Interface."

MPEG4 was originally developed or convened to address video phone type video, but they soon realized this was not going anywhere. Instead, they turned their attention to video that really anticipates the entirely digital creation and repurposing of video. It's video that is not passive video, the way MPEG1 and MPEG2 or other video streams are typically being used on campus now, which is much like a tape player machine: you start the video at the beginning of your clip and you kind of play it to the end, and you can't do much other than watch it and maybe kind of scroll back to the beginning or to the middle.

But in MPEG4, they're breaking up video into modular objects and you can combine graphics, live video, stored video and the video can even respond to either cookies or profiles that it detects on the client to change the appearance or different aspects of the video stream for the individual user.

Another question is, what is the use of MPEG2 today? And generally, that's in distance education where you have dedicated classrooms that have the de-encoder boards installed. That is, there's a program that's set up to deliver high quality education to specialized audiences and those types of purposes are good for MPEG2. You're not trying to distribute over a whole campus; you're doing dedicated, specific work. Also it's good for such things as medical imaging classes where you need very high resolution.

And then finally, people want to know when the MPEG2 decoding software will be available, and again, maybe third quarter this year. But then what's going to happen when people start using MPEG4? Well, that's a different format. Well, that's sort of a different purpose, actually, and it's more for interactive than, for example, for store and forward, for streaming.

Results of a Mini Survey

Question: where do you believe interactive digital video and visualization/VR/virtualized reality will be by 2010?

Security at Home:

Security system (automated and personnel) is redundant. Hidden cameras will be everywhere including within walls. Sensors in the floor will track a person to within 6 inches.

Interactive video will augment burglar alarms, augment fire alarms, and allow us to watch out for pests, our children, and act as a doorbell.

There will be miniature robots in the shapes of animals and cars traveling through the house during the night, equipped with cameras and alarms.

Security Outside Home:

For pedestrians to make sure their route is safe
For customers to make sure their ATM transactions are safe
For shoppers to keep a watch on their vehicles, while shopping in malls, at work, at school
For parents to find lost children in stores and interact with them

Accountability: Accessible Cameras:

For parents to make sure their children are being treated well, while in school
For children to make sure their parents are being treated well in retirement homes
For us to make sure patients are being treated well in hospital

Science/Education:

1. If everyone has video input/output capabilities, anytime you want to do something (say, mend a fence) you could do a lookup on the vidnet for someone else doing it, and get them to help you (sort of an interactive tutorial.)

2. Study Groups - study with other people around the world in real time.

3. Focused Study Enabler -- perhaps fill out a form/test to get a feel for your body of knowledge, then the machine gives content according to the gaps in your knowledge.

4. Being able to see missed classes

5. Guest Speakers/Lecturers

6. Use of camera-enabled microscopes for research/education

7. Interactive virtual high schools for the home schooled will become commonplace since there are no barriers to prevent home schoolers from taking advantage of this new technology. Public schools can't adapt in an environment of opportunities to dynamically changing circumstances.

Wearables:

1. Fashion: easily change color and textures of garments upon a whim, resulting in smaller fashion "seasons", with new styles downloaded from Internet, with new types of fashion designers

2. "PAN" - personal area network, that contains all of one's vital stats (heart rate monitor, blood pressure, breathing rate, temperature), which wirelessly ties into any other network when it enters it's aura (the Vehicle Area Network, the Local Area Network, Municipal Area Network). These will be built into the clothing and driven by fabric circuitry. If the person is under medical supervision, an ambulance can immediately be summoned if their heart stops, as well as the computer contacting medical authorities. It will eventually have the capability to administer first aid (CPR, tourniquet, heart shockers.)

For military applications, there will be the ultimate chameleon suit, able to camouflage the wearer totally into their background. Also, could camouflage out any of their other traces (body heat perhaps powering the CPU rather than emanating into the air for any infrared pick-ups to notice.)

As an input for avatar technology: An electronic suit could easily be used as an I/O for controlling an avatar, raise your arm, and the avatar raises its arm. Also, for training, to have the clothes move your body through the "perfect" golf shot, over and over until it is hard-wired. Similarly, for working out, the suit could exercise your muscles by making your movements be working against whichever PSI you chose, or, it could simply send electric shocks to the muscles in need of exercise. It could also help prevent RSI (although, as an ultimately ergonomic input it already negates the concept of the mouse/keyboard tradition) by forcing you to exercise your muscles every once in a while.

Wearing a computer also opens doors to being a constant recorder, processor, with the ability to do work from anywhere, an all-around historian.

3. Eyeglasses -- glasses that allow one to see the world around them as well as data from many different sources, with the ability to change focus and interact on many different levels...being wired in, at all times.

Education/Lecturing/"Virtual Lectern" -- Can have access to texts and notes without having to look down at the lectern. Teachers can also have real-time input as to which students are actually paying attention and focus upon them, or be able to better answer questions and immediately give information, as well as link them through to sources for following up on their answer. Students can enrich the experience by having bookmarks to relevant databanks offered, and by having games and videos playing that show the classroom lesson with the teacher's narration.

Medicine -- Surgeons can have access to readouts about bodily functions without needing to look away from the work at hand. Also, cut-lines and under layers can be mapped over their real vision, thus cutting down on mistakes and subsequent infections.

Transportation -- "smart cars" which come before teleportation, can have all readouts mapped into the periphery of one's vision, with associated maps of exact distances of destinations, surrounding cars, interesting landmarks being passed, potential arrival times, vehicle readouts.

"Living" -- a clock that allows for viewing other wavelengths of light...to see if the plants are okay, where the child has gone, if the water is about to boil.

Coffee machine knows what time your clock went off, so the coffee is ready when you enter the kitchen, or you might go there because of the pleasant coffee air.

The bathroom mirror shows a short version of today's news.

Look through the eyes of any of the multitude of cameras available at any given time ("if agitated, take ten deep breathes and look over Honolulu Bay.")

Keep track of things in the oven, while you are out of the house. If you notice the turkey is done through your cam, you instruct your oven to shut down.

Have your PIM flash messages across your vision ("shareholder's meeting in 30 minutes"). Allow it to work as a beeper/email, to flash messages of people contacting you.

Look in on any of the cameras you have set up (home security, retirement home, daycare center)

Allow it to be able to know your eyeglass prescription and make viewing through them unnecessary, also, can work as binoculars, magnifying glasses, satellite GPS output (same for earphones...)

Interesting Stuff

Interactive Video

Smart Rooms act like invisible butlers. They have cameras, microphones, and other sensors, and use these inputs to try to interpret what people are doing in order to help them. We have already built smart rooms that can recognize who is in the room and can interpret their hand gestures, and smart car interiors that know when drivers are trying to turn, stop, pass, etc., without being told. The Smart Room can provide an unencumbered user-interface to a virtual environment.

The Artificial Life Interactive Video Environment (ALIVE)

The Artificial Life Interactive Video Environment (ALIVE) is a virtual reality system where people can interact with virtual creatures without being constrained by headsets, goggles, or special sensing equipment. The system is based on a magic mirror metaphor: a person in the ALIVE space sees their own image on a large-screen TV as if in a mirror. Autonomous, animated characters join the user's own image in the reflected world.

MIT Interactive Cinema Group: Some Lessons Learned and Thoughts about the Future

The projects described in this article should have made it abundantly clear that video conferencing may indeed change the way we go about doing legal education and law practice. The transnational simulation project vividly illustrates the critical role that video conferencing can play in the increasingly global legal domain of the future and, equally important, how that technology can assist law students to prepare for the technologically based law practice of that future. In particular, the project demonstrated that video conferencing can facilitate negotiated solutions to complex legal disputes, especially those arising in an international setting.

The cross examination exercise provides powerful support for the belief that video conferencing can do much to reduce the exploding costs of litigation and do so without sacrificing the integrity of the judicial processes in connection with which the technology is used. More empirical study of such uses of video conferencing is needed, but the indications are that the results will be strongly positive. The virtual bankruptcy proceeding project similarly reveals how video conferencing can be used to reduce costs and wear and tear on judges, attorneys and clients, especially where courts are located at such distances from attorneys and clients that the administration of justice has been detrimentally affected.

So, what is video conferencing good for? Hopefully, I have demonstrated that video conferencing has many possible uses in legal education and law practice-related only a few of, which have been described here. Advances in video conferencing technology and wider deployment of higher quality and lower cost equipment and facilities should hasten efforts to harness the power of this exciting new technology and guarantee that it will change the way we think about and go about doing legal education and law practice. The introduction of quality desktop video conferencing, in particular, seems likely to insure that this will happen sooner rather than later.

Interactive Video: Foundations of Multimedia/Hypermedia

Historically, the term interactive video has referred to a pioneering form of interactive multimedia. While there has never been a precise, agreed upon definition of interactive video (experts in the field have their own concepts and individual definitions), most definitions share the common notion that interactive video involves the use of a video delivery system, usually videodisc or sometimes videotape, designed in such a way that it responds to choices made by the individual user. (Note: the literature on distance education has a separate and distinct meaning of interactive video -- a video delivery system capable of full two-way audio and video interconnection between two or more sites. That meaning of interactive video is not addressed in this paper.)

Uses of Interactive Video

A few years ago, an estimate by The Videodisc and Multimedia Monitor, the leading publication in the field, suggested that the markets for interactive videodisc use broke down as follows:

31% Training
24% Point of purchase applications
17% Military and government use
10% Education
6% Medical
12% All others

Although percentages have likely shifted today, these categories remain important. Training in business and industry has traditionally been one of the major uses of interactive video and multimedia. The military and government, while making use of videodisc's archiving capabilities, also has used interactive video extensively for training.

The U.S. Army's Project EIDS, Electronic Information Delivery System, several years ago surpassed General Motor's interactive video network as the largest single interactive video project ever. Point of purchase (POP) applications remain important in a number of venues. Interactive video has been used in many retail outlets as well as in the real estate industry as a sales tool. And, interactive video continues to be used for information dissemination in places like Walt Disney World's EPCOT Center and the St. Louis Zoo.

Interactive Video in Education

While education lagged behind many other sectors (so what else is new?), interactive video made a significant impact in education. Over half of U.S. schools have videodisc players, and there are thousands of educational videodisc titles. In Indiana, a Texas-produced ninth grade physical science curriculum on interactive video, TLTG Physical Science, was introduced at Lawrence Central High School in Indianapolis. A group from Brownsburg was honored with a national award for life sciences videodisc they created, Putting the Zoo in Zoology, using footage shot in part at the new Indianapolis Zoo. At Purdue, Professor Sandra Abell developed videodisc materials for elementary science teacher preparation. Videodiscs have also been created at other universities in the state for content area instruction. There have been many exciting projects.

There are a number of uses of interactive video in education. These include the following:

1. Archive or Visual Database

Many videodiscs are essentially archives or visual encyclopedias. For example, discs in art or biology can contain hundreds or thousands of excellent stills. These can be accessed by students or teachers using a level 1 system for research or review, or they can be used for repurposing in a level 3 system.

2. Lecture Demonstration / Illustration

Videodiscs can be an excellent lecture tool. Again, with only a level 1 system, the teacher can select appropriate video information to illustrate lectures. Different visuals can be selected for testing.

3. Interactive Teaching and Learning

With a level 3 system, students can receive interactive instruction. This can be large group, small group, or individualized as needed. One of the most innovative uses of the videodisc, the Jasper Woodbury series developed at Vanderbilt University, relies on videodisc-based stories to anchor students' learning about topics in mathematics.

4. Counseling Tool

Videodiscs can be a counseling tool. A program called College USA makes information about many colleges available to high school guidance counselors.

5. Others

The variety of options can go on and on. With appropriate hardware and software for example, images from a videodisc can be "captured" and used in desktop publishing software to illustrate handouts or student reports. Students can learn to develop interactive video programs, teachers can use videodiscs for testing, and so on.

All it takes to get started is a little equipment and the desire. It is best to start simply. One can begin with a level 1 system and a small collection of videodiscs. After getting comfortable, you can move on to level 3. Interactive video is a fantastic tool, but like any educational tool it relies on the knowledge and experience of educators for best use.

Recent National Science Foundation Awards

1. Title: Interacting with the Visual World: Capturing, Understanding, and Predicting

This is the first year funding of a five-year continuing award. This research program is geared towards making significant advances to the science and engineering of visual information processing, and addresses fundamental problems in the fields of computational vision, computer graphics, and human-machine interactions. Today, images and video clips are ubiquitous on the Internet, digital video is changing the way entertainment is produced, distance learning is used in various facets of education, and advanced visual interfaces to machines are around the corner. However, at present there are severe limits to the extent to which a user can benefit from visual information, because virtually all of this information is presented in its raw form, that is, the way it was captured. The goal of this project is to develop the technical tools needed to achieve a variety of complex manipulations of visual data. These tools will enable a user to freely explore, interact with, and create variations of the physical world being presented. For instance, a user may remove and add objects to an image of a scene, vary lighting conditions, change the materials of surfaces, or view the scene from a novel perspective.

This project encompasses a comprehensive research program for creating the science and technology base required to enable such advanced manipulations of visual data. The general research problem may be stated as follows: Capturing, understanding, and predicting the appearance of our everyday world. Success in this domain of research necessitates a unified approach to open problems in two fields: computational vision and computer graphics. The research effort will focus on five pertinent areas: sensing, modeling, estimation, generation, and evaluation. The tangible contributions will be in the form of sensors that provide new types of visual information; complex models of materials, reflectances and textures; estimation algorithms that use the team's new models to recover scene properties from minimal data; advanced rendering techniques; and a set of comprehensive image/video databases for evaluation of work in this field. The results will impact numerous application domains, including digital imaging, entertainment, virtual environments, distance learning, e-commerce, interactive product design, art restoration, architectural modeling, restorative surgery, and surface inspection.

2. Title: CAREER: Bandwidth Allocation Techniques for Video-on-Demand Systems

As networking and multimedia technologies continue to advance new video-based applications such as digital libraries and video-on-demand services will become increasingly more common, requiring the efficient network transmission of compressed stored video streams. For constant-quality video streams, however, the efficient allocation of network resources is complicated by several factors. First, compressed digital video data has much larger bandwidth requirements than textual data, and hence, place much larger bandwidth requirements on the network. Second, compressed digital video data can exhibit significant burstiness in bandwidth requirements at multiple time scales. In particular, stored video content will typically be much more complex than "live" videoconference type video streams, resulting in much greater long-term burstiness. Finally, due to the continuous nature of digital video, stringent guarantees on network jitter and delay may be required (especially for live-video transmission).

To efficiently allocate network resources for the delivery of compressed video streams, bandwidth-smoothing techniques have been introduced to take advantage of a priori information that may be available for a video stream. The bandwidth smoothing techniques that have been presented thus far have not considered the interactions of bandwidth smoothing algorithms with network delivery mechanism. The proposed research will address the integration of bandwidth smoothing techniques into current networking mechanisms in two key areas:

1. Bandwidth Smoothing over ATM - ATM networks are a popular networking technology because it provides some minimum rate guarantees as well as different classes of service. In this study, we will investigate the interactions between bandwidth smoothing techniques and three of ATM's classes: constant-bit-rate (CBR), variable-bit-rate (VBR), and the available-bit-rate (ABR) services. In the first part, we will investigate the efficient mapping of bandwidth smoothing algorithms into the CBR and VBR services. In the second part of this subtask, we will investigate the mapping of bandwidth smoothing techniques into the ABR service class. In particular, the ABR service class provides a minimum cell rate guarantee and, in times of relatively low network load, an available cell rate which allows the applications to send more than their minimum cell rate that has been guaranteed. We will investigate techniques for shaping video streams into the ABR minimum cell rate while taking advantage of the available cell rate to enhance the video stream when possible.

2. Bandwidth Allocation Techniques for Best-Effort Networks - Many of the current approaches for delivering video across best-effort networks such as the Internet do so by monitoring network feedback and altering the video frame rate or video frame quality to fit within the available resources. For stored video streams, these algorithms typically do not take advantage of the a priori information available from stored video, resulting in video frame rates delivered to the user that are more variable than necessary. In this study, we will investigate techniques for the efficient streaming of stored video across best-effort networks that take advantage of the a priori information to smooth the video frame rate delivered to the user.

The solutions in this project will be evaluated using two methods. First, the solutions will be evaluated and refined through simulation studies using captured and compressed video data. In the later stages of this project, we will implement a simple streamed video delivery system that will demonstrate the effectiveness of the proposed solutions. We will evaluate the proposed solutions using (1) the Internet, (2) the Ohio Computing and Communication ATM Research Network (OCARNet) which connects six institutions within the state of Ohio, and (3) vBNS - the very high speed backbone network that we have access to through the Ohio Supercomputing Center.

The contribution of this project will be to answer the following question:

What is the best way to take advantage of a priori information in the delivery of stored video streams across networks, which may provide varying guarantees of network bandwidth?

Through the intelligent management of network resources, a client-side buffer, and the a priori information avail-able from stored video streams, the solutions developed in this project will allow us to deliver the highest video quality to the user. We expect that this work can serve as a catalyst for the development of future stored-video applications such as digital libraries, distance learning, and video-on-demand.

3. Title: DLI Phase 2: Informedia-II: Integrated Video Information Extraction and Synthesis for Adaptive Presentation and Summarization from Distributed Libraries

The Informedia-II Project continues the pursuit of search and discovery in the video medium. This phase will transform the paradigm for accessing digital video libraries through meaningful, manipulable overviews of video document sets, multimodal queries, and adaptive summarizations of very large amounts of video from heterogeneous distributed sources. Video information collages are the key technology in Informedia-II and will be built by advancing information visualization research to effectively deal with multiple video documents. A video information collage is a presentation of text, images, audio, and video derived from multiple video sources in order to summarize, provide context, and communicate aspects of the content for the originating set of sources. The collages to be investigated include chrono-collages emphasizing time, geo-collages emphasizing spatial relationships, and auto-documentaries which preserve video's temporal nature. Users will be able to interact with the video collages to generate multimodal queries across time, space, and sources.

Together with external partners, the project will also create an accessible, lasting digital video archive of historical, political and scientific relevance. Vast collections of video and audio recordings have captured the events of the last century, yet these remain a largely untapped resource of historical and scientific value.

4. Title: Textual Information Access for the Visually Impaired

An ever-increasing segment of the population suffers from low vision resulting from complications of disease and old age. Surveys conducted by one of the Co-PIs as part of a previous project have determined, that the key information, which is not available to people with low vision, is textual information, usually of a directive or warning nature. For example, shopping in a large department store in a mall might involve looking for signs indicating where the store is, reading aisle signs in the store, and looking at product names, at labels and prices. This research will develop a "seeing-eye" computer to help people with low vision to observe and receive such information, so that they can participate more efficiently and comfortably in every day activities, and thereby lead more fulfilling and productive lives. The system will be composed of a digital video camera, computer, user interface, and speech or magnified visual output that can detect textual information in the environment, understand it using OCR, and provide it to the user who either has low vision or is blind.

To achieve these goals, the PIs will in collaboration with colleagues at Johns Hopkins University build, over the first six months and then over the first two years, prototype systems using mostly existing technology and extensions to vision algorithms we have developed for identification of text regions in images and OCR, which can be evaluated on volunteer patients at the Wilmer Ophthalmological Institute and the National Federation for the Blind.. The functionality and range of applicability of our prototypes will necessarily be limited. Simultaneously, the PIs will work on long-term research problems that must be addressed to develop next generation seeing-eye computers with greater scalability and capability. In year three patient-volunteers at Wilmer and at the NFB will perform evaluations of the developed prototypes Subsequently, successful results will be commercialized and brought to the larger patient body (as have previous developments at Wilmer).

Fundamental research problems to be addressed include: real-time algorithms for detection and rectification of text on planes and cylinders subject to perspective distortions; OCR from digital video, and OCR for text on textured backgrounds; and more robust and efficient algorithms and systems for stabilization and super-resolution of text blocks from video streams.

5. Title: The Digital Sky: Bringing Cosmology into the Classroom

It is becoming increasingly apparent that the next decade will mark the beginning of a golden age for observational cosmology. Technological advances have now made it feasible to map the properties and distributions of galaxies in the local and distant Universe with unprecedented detail. New astronomical surveys are underway, or nearing completion, that will produce images and spectroscopy for hundreds of millions of objects each with measurements covering the full electromagnetic spectrum (from X-ray wavelengths through to radio frequencies). In the near future we will no longer be restricted to applying for telescope time at national or private facilities. Instead, people will be able to "dial-up" a region of the sky (a virtual observatory is becoming a reality). This new paradigm for undertaking research will have profound implications for research at the undergraduate, graduate and postdoctoral levels.

One of the fundamental questions to be addressed with these new multi-frequency surveys is how do galaxies evolve as a function of redshift or lookback time. Extant photometric and spectroscopic surveys provide a framework for this evolution, finding a rapid increase in the total star formation rate from a redshift of z = 0 to redshift z = 1 (a factor of 10 increase in luminosity). The limitation of these surveys is that they extend over very small regions of the sky (for instance the deepest image ever undertaken, the Hubble Deep Field, subtends only 2 arcminutes on a side) and comprise only a few hundred galaxies. Consequently, we do not have the numbers of galaxies to identify anything more than the broad features of galaxy evolution (i.e. our analyses are shot noise limited) nor do we survey sufficient volume to be able to say whether we are seeing a true reflection of how the Universe evolves (i.e. we are sample variance limited).

Many of these issues will be resolved by the new multicolor sky surveys. The Sloan Digital Sky Survey alone will contain > 108 galaxies and cover a huge volume of the sky. For the first time, there will be access to large, statistically complete samples of galaxies that can provide an accurate census of the local and distant Universe. Through the use of novel statistical techniques we can use the multicolor information within these surveys to estimate the physical properties of galaxies (i.e. their redshifts, spectral types and luminosities). From this we can then directly trace how galaxies change as a function of redshift, galaxy type and environment and thereby guide the theories of galaxy formation and evolution.

Beyond the insights into the physics of galaxy formation and evolution, these data will open new avenues for introducing students into the astrophysical sciences. These multifrequency surveys can be combined to form a digital sky; a database of stars and galaxies covering a large fraction of the night sky that is accessible to undergraduate and graduate students alike. Students will, in essence, have access to their own telescope and can explore or analyze the data at their own pace. Given the enormous research and teaching potential of these new sky surveys, a program of research and education designed to build the graduate and undergraduate astrophysics programs at the University of Pittsburgh will be carried out. The principle goals of these programs are:

To quantify how the properties of galaxies (their spectral types, star formation and luminosities) evolve from a redshift of z = 0 to z = I and to determine what physical processes drive these changes.
To trace the clustering of galaxies at low and intermediate redshifts (z < 1) as a function of lookback time, galaxy type and environment and to relate these observations to current cosmological theories of galaxy formation and evolution.
To develop the new virtual observatories into an undergraduate and graduate research and teaching environment (a "virtual telescope"). One that enables students to undertake exploration of online astrophysical databases at their own pace.
To integrate computational sciences into astrophysics to enable a cross-fertilization of ideas and techniques at the graduate and undergraduate level.

6. Title: ITR: Interacting with the Visual World: Capturing, Understanding, and Predicting Appearance

This research program is geared towards making significant advances to the science and engineering of visual information processing, and addresses fundamental problems in the fields of computational vision, computer graphics, and human-machine interactions. Today, images and video clips are ubiquitous on the Internet, digital video is changing the way entertainment is produced, distance learning is used in various facets of education, and advanced visual interfaces to machines are around the corner. However, at present there are severe limits to the extent to which a user can benefit from visual information, because virtually all of this information is presented in its raw form, that is, the way it was captured. The goal of this project is to develop the technical tools needed to achieve a variety of complex manipulations of visual data. These tools will enable a user to freely explore, interact with, and create variations of the physical world being presented. For instance, a user may remove and add objects to an image of a scene, vary lighting conditions, change the materials of surfaces, or view the scene from a novel perspective.

7. Title: Direct visualization of the structure and dynamics of complex fluids during flow by confocal and epifluorescence microscopy

Studies of the structure and dynamics of complex fluids, particularly in the presence of flow, address fundamental questions of engineering science in fluid physics, rheology and colloidal science. Direct visualization techniques of confocal laser scanning microscopy and epifluorescence microscopy promises new development in these areas by capturing actual pictures of colloidal particle positions and displacements in three and two dimensions, respectively. We propose an integrated program of research and education with the research aim of using these methods to directly observe complex fluid structure and dynamics during flow. The materials studied will be colloidal particulate gels and associative polymer solutions containing colloidal particles. The work will impact the areas of ceramics, paints, inks, digital storage media and DNA sequencing. In addition, our recent scattering and rheological studies of these materials have identified research questions in which direct visualization methods generate opportunity for unprecedented discovery.

The research plan involves the use of a specially designed and constructed shear cell that will be mounted on a confocal laser scanning microscope and an epifluorescence microscope. Experiments will encompass transient and steady shear flow in both linear and non-linear regimes. Associated polymers and fluorescent colloids necessary for the work will be synthesized in house by previously tested methods. Training for two graduate students, and research opportunities for a number of undergraduates, will be provided.

The research efforts will also synergistically promote parallel efforts in education and outreach. An undergraduate elective in polymers will be redesigned to teach the skills necessary to manipulate polymeric complex fluids at the molecular level. A research project requiring original analysis will be instituted that will reinforce this new educational material. A graduate course in complex fluids will be incrementally revised to include new results of direct visualization methods, as they become available, thereby providing a conduit between the research program and graduate education. In an outreach project, hands-on complex fluid projects will be developed for a program in which middle school students from groups traditionally underrepresented in engineering come to the department for tutoring and instruction.

8. Title: SE Research Infrastructure: Digital Campus: Scalable Information Services on a Campus-wide Wireless Network

Researchers at the University of California at Santa Barbara will implement a wireless-networked, distributed heterogeneous environment on campus and use it to conduct research in databases, networking, distributed systems, and multimedia. The PIs will focus on large-scale systems in which data is the critical resource and system services are based on various data manipulation functions including data collection, movement/delivery, aggregation/processing, and presentation. A significant part of the research will be conducted using a digital classroom, a remote classroom, and individual and team kiosks. Services such as lecture on demand, virtual offices, and remote learning will be provided using this infrastructure. Specific research issues that will be investigated include content-based access, personalized views, multi-dimensional indexing, smart end-to-end applications, joint source-network coding, scaleable storage, reliable network service, information summarization, distributed collaboration, multimedia annotation, and interactivity.

Some of the Players

Vingage Corporation is a technology company that has changed how companies manage and deliver video over the Internet. We are the first and only company to create a video server that streams all major formats through real-time transcoding and delivery from a single file.

Virage is the leading provider of software products and application services that enable media and entertainment companies, enterprises, and consumers to publish, manage and distribute their video content over the Internet or Intranets. The company's products and services enable Internet users to find and use the video content they want, and then to interact with it in a way that is familiar and comfortable to them.

Digital Video Ready for Net work

Digital video has now reached a stage of maturity characterized by the emergence of comprehensive platforms and solutions for broadband delivery, making video as easy to locate, manage and use as text.

The Internet has significantly raised the bar on what people expect from all types of digital content. Users now expect to easily browse, search, download and share information any time, anywhere. Until recently, these types of interactions applied primarily to text-based information because traditional video, even in digital form, did not lend itself to these new levels of interactivity.

To fully enable video for the Internet, it must be fully indexed at a fine level of granularity, intelligently prepared in a device-independent way, and delivered by an application platform that can exploit the index information. For the purposes of this article, we will refer to application-server-driven indexed video as "applied video." But before delving into the meaning of applied video, it would be worthwhile to understand where digital video has been and where it is going.

Clearly, analog video in the form of VHS tape and broadcast/cable has been with us for a long time. The migration to digital media assembly processes and digital TV delivery has been a lengthy and painful process, one that is not yet complete. Web-based video, on the other hand, has evolved much more rapidly. Beginning with QuickTime, Microsoft Video for Windows and MPEG-1, digital video files became available for download in the early '90s. The growth of the Internet brought forth new video standards based on streaming (such as RealVideo) and Microsoft Media based on the real-time interactive networked multimedia standard MPEG-4.

The driving force behind the recent evolution of digital video is the opportunity to commercialize content in new ways that go beyond traditional delivery and commerce mechanisms. Applied video introduces opportunities for content owners to develop new revenue streams.

Many of the steps in the value chain are shared with traditional video production and delivery processes. However, several new mechanisms are required to fully enable video for intelligent, interactive delivery in the broadband paradigm of "what I want, when I want it, and where I want it." For example, when something of interest is found using a wireless PDA while commuting on the train, and then sent to the user's desktop at work for viewing when they arrive. If there is one central ingredient to the recipe, it is the capture and intelligent use of video metadata, index information about the contents of the video.

The preparation of video for film, broadcast, training or strategic enterprise purposes follow essentially the same production and media assembly process, as does the preparation of applied video. The key departure from traditional preparation workflow occurs where the finished content is transformed into intelligent, indexed and purposefully annotated content. The transformation process is accomplished using a new breed of video indexing technology and tools typified by VideoLogger and MediaSite Publisher, which can perform state-of-the-art signal analysis on the video and audio to extract useful metadata.

Metadata Elements

Metadata consists of time-stamped data elements such as keyframes, spoken text, speaker and face identification, on-screen text reading, logo detection, and so on. Each of these metadata elements acts as a reference back into the video content in much the same way that a card catalog unlocks the wealth of information in a library. The video index enables searching, fine-grained navigation, preview and association with ancillary activities, such as personalization and content-rights management.

Metadata also enables video content to be effectively managed and delivered with targeted advertising and content-relevant e-commerce opportunities. If a user is watching a skiing video, show them an ad for snowboards, an option to enroll in a drawing for a ski-weekend getaway, and allow a one-click buying opportunity for lift tickets at a resort near the user. For premium content owners, indexed video is the basic starting point for broadband delivery mechanisms such as syndication, pay-per-view and subscription business models.

But video indexing using a video logging tool is only part of the story. Metadata is captured and offered to users through an application-server mechanism, while the video content itself is distributed through content delivery networks (CDNs) and edge caching infrastructure. The metadata must refer, in a time-accurate manner, back to the video content itself. In a multiplatform delivery model, this actually means referring back to many different physical renditions of a given piece of content.

Modem users need 56-kbit/second streams, while broadband users need 300-kbit/s streams and above. Video content for set-top box delivery must be broadcast-quality, while wireless devices are currently best served with text and thumbnail images rather than actual video.

Therefore, the transformation process must not only produce a rich metadata index of the content, but must also prepare a wide variety of renditions of the content, all of which are time-synchronized with the metadata.

One popular solution to this problem is the SmartEncode process, which orchestrates video indexing with any number of simultaneous encoding processes in various bit rates and formats. Such indexed video is the first step to searchability and interactivity, allowing users to pull snippets of video of interest to them from repositories of long-form video, such as, "I don't want to watch the entire debate, I just want to see what the candidate said about Social Security and education."

To address the new requirements of increased interactivity and multiplatform delivery, the popular digital video and streaming formats are evolving rapidly and new standards are emerging to handle metadata. De-facto standards like RealVideo, QuickTime, Microsoft Media, and the Virage VDF metadata format have a solid foothold today, but new formats will become a factor in the future. MPEG-7 is an emerging standard that provides a standardized description of various types of multimedia information. MPEG-7 is formally called Multimedia Content Description Interface. The standard does not specify the (automatic) extraction of video metadata, nor does it specify the search mechanism that can make use of the description. Several other metadata standards are also gaining acceptance, such as the SMPTE metadata working group efforts, the Advanced Authoring Format, as well as interesting things that can be accomplished with the highly flexible MPEG-4 format.

Video Matures

An important sign that digital video has matured to the point where one can rightfully call it applied video is the emergence of complete video solutions that support the intelligent applications discussed above, in a platform-independent manner. Today, video application servers have matured to a level on par with traditional Web authoring and content management solutions for text and graphics media. Turnkey video publishing platforms can be licensed or outsourced as hosted applications that support all the advanced capabilities and delivery platforms of interest. Premier vendors have established relationships and tight integration with CDNs, and can provide not only the basic video indexing and application hosting features, but also value-added editorial assistance to fully exploit e-commerce and ad targeting opportunities. Some vendors even provide layered application frameworks that allow the addition of syndication engines, personalization support, and community building features.

"To date, home-grown, piecemeal streaming solutions have frustrated content providers and held back broadband video both within the enterprise and on the Internet," said Jeremy Schwartz, senior analyst at Forrester Research Inc. (Cambridge, Mass.). "But with the advent of integrated video application platforms, content providers can now stop experimenting and start efficiently exploiting their rich media assets, either as strategic information or monetized content."

Video metadata and streaming media, effectively managed by a video application server, is a key ingredient to achieving device independence in the delivery chain. For example, most wireless devices can't today receive and display streaming video at any bit rate. However, they can display thumbnail images and transcript text, as well as provide links to direct applied video back to your desktop. Or a set-top box can deliver quality video time-synched with auxiliary Web-based content.

Applied video pervades the convergence landscape, from PCs with broadband access to interactive television and set-top boxes to wireless devices. Underneath it all, video metadata and application servers are the central nervous system that places content in the right location, on the right device, at the right time.

Obstacles: Proprietary Systems

Almost all of the products that are generally available on the market are closed, proprietary systems, and what really is needed is something that is much more open. It becomes a practical concern for colleges and universities because what they would like to do is simply give out a video client that would be able to read a lot of the formats. The problem today is that everyone has to have on their desktop ten different clients for reading ten different proprietary formats, and that becomes a difficult situation to administer for the universities -- and it becomes very difficult for those people who are trying to incorporate this technology into classes and into research. So that is a problem and I think that at some point it's going to be solved with a more open type client. That is, things can be proprietary, but they ought to be inter-operational.

Back to: Resume and More Papers