This document defines a set of APIs that allow local media, including audio and video, to be requested from a platform.
This document is not complete. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation. The API is based on preliminary work done in the WHATWG. The Media Capture Task Force expects this specification to evolve significantly based on:
Access to multimedia streams (video, audio, or both) from local devices (video cameras, microphones, Web cams) can have a number of uses, such as real-time communication, recording, surveillance.
This document defines the APIs used to get access to local devices (video cameras, microphones, Web cams) that can generate multimedia stream data. This document also defines the stream API by which JavaScript is able to manipulate the stream data or otherwise process it.
The MediaStream
interface is used to represent
streams of media data, typically (but not necessarily) of audio and/or
video content, e.g. from a local camera. The data from a
MediaStream
object does not necessarily have a
canonical binary form; for example, it could just be "the video currently
coming from the user’s video camera". This allows user agents to
manipulate media streams in whatever fashion is most suitable on the
user’s platform.
Each MediaStream
object can contain zero or more
tracks, in particular audio and video tracks. All tracks in a MediaStream
are intended to be synchronized when rendered. Different MediaStreams do
not need to be synchronized.
Each track in a MediaStream object has a corresponding
MediaStreamTrack
object.
A MediaStreamTrack
represents content comprising
one or more channels, where the channels have a defined well known
relationship to each other (such as a stereo or 5.1 audio signal).
A channel is the smallest unit considered in this API specification.
A MediaStream
object has an input and an output.
The input depends on how the object was created: a
LocalMediaStream
object generated by a
getUserMedia()
call (which is described later in this
document), for instance, might take its input from the user’s local
camera. The output of the object controls how the object is used, e.g.
what is saved if the object is written to a file, what is displayed if
the object is used in a video
element
Each track in a MediaStream
object can be
disabled, meaning that it is muted in the object’s output. All tracks are
initially enabled.
A MediaStream
can be finished, indicating
that its inputs have forever stopped providing data.
The output of a MediaStream
object MUST correspond
to the tracks in its input. Muted audio tracks MUST be replaced with
silence. Muted video tracks MUST be replaced with blackness.
A new MediaStream
object can be created from
existing MediaStreamTrack
objects using the
MediaStream()
constructor.
The constructor takes two lists of MediaStreamTrack
objects as arguments; one for audio tracks and one for video tracks. The
lists can either be the track lists of another stream, subsets of such
lists, or compositions of MediaStreamTrack
objects
from different MediaStream
objects.
The ability to duplicate a MediaStream
, i.e.
create a new MediaStream
object from the track lists
of an existing stream, allows for greater control since separate
MediaStream
instances can be manipulated and
consumed individually.
The LocalMediaStream
interface is used when the
user agent is generating the stream’s data (e.g. from a camera or
streaming it from a local video file).
When a LocalMediaStream
object is being generated
from a local file (as opposed to a live audio/video source), the user
agent SHOULD stream the data from the file in real time, not all at once.
The MediaStream
object is also used in contexts outside
getUserMedia
, such as [[!WEBRTC10]]. Hence ensuring a
realtime stream in both cases, reduces the ease with which pages can
distinguish live video from pre-recorded video, which can help protect
the user’s privacy.
The MediaStream()
constructor takes two arguments. The arguments are two lists with
MediaStreamTrack
objects which will be used to
construct the audio and video track lists of the new
MediaStream
object. When the constructor is invoked,
the UA must run the following steps:
Let audioTracks be the constructor’s first argument.
Let videoTracks be the constructor’s second argument.
Let stream be a newly constructed
MediaStream
object.
Set stream’s label attribute to a newly generated value.
If audioTracks is not null, then run the following sub steps for each element track in audioTracks:
If track is of any other kind than
"audio
", then throw a SyntaxError
exception.
If track has the same underlying source as another element in stream’s audio track list, then abort these steps.
Add track to stream’s audio track list.
If videoTracks is not null, then run the following sub steps for each element track in videoTracks:
If track is of any other kind than
"video
", then throw a SyntaxError
exception.
If track has the same underlying source as another element in stream’s video track list, then abort these steps.
Add track to stream’s video track list.
A MediaStream
can have multiple audio and video
sources (e.g. because the user has multiple microphones, or because the
real source of the stream is a media resource with many media tracks).
The stream represented by a MediaStream
thus has zero
or more tracks.
The tracks of a MediaStream
are stored in two
track lists represented by MediaStreamTrackList
objects; one for audio tracks and one for video tracks. The two track
lists MUST contain the MediaStreamTrack
objects that
correspond to the tracks of the stream. The relative order of all tracks
in a user agent MUST be stable. Tracks that come from a media resource
whose format defines an order MUST be in the order defined by the format;
tracks that come from a media resource whose format does not define an
order MUST be in the relative order in which the tracks are declared in
that media resource. Within these constraints, the order is user-agent
defined.
An object that reads data from the output of a
MediaStream
is referred to as a
MediaStream
consumer. The list of
MediaStream
consumers currently includes the media
elements, PeerConnection
(specified in
[[!WEBRTC10]]).
MediaStream
consumers must be able to
handle tracks being added and removed. This behavior is specified per
consumer.
A MediaStream
object is said to be
finished when all tracks belonging to the stream have
ended. When this happens for any reason other than the
stop()
method being
invoked, the user agent MUST queue a task that runs the following
steps:
If the object’s ended
attribute has the value
true already, then abort these steps. (The stop()
method was probably called
just before the stream stopped for other reasons, e.g. the user
clicked an in-page stop button and then the user-agent-provided stop
button.)
Set the object’s ended
attribute to true.
Fire a simple event named ended
at the object.
If the end of the stream was reached due to a user request, the task source for this task is the user interaction task source. Otherwise the task source for this task is the networking task source.
When a LocalMediaStream
object is created,
the user agent MUST generate a globally unique identifier string, and
MUST initialize the object’s label
attribute to that string.
Such strings MUST only use characters in the ranges U+0021, U+0023 to
U+0027, U+002A to U+002B, U+002D to U+002E, U+0030 to U+0039, U+0041
to U+005A, U+005E to U+007E, and MUST be 36 characters long.
When a MediaStream
is created from another
using the MediaStream()
constructor, the label
attribute is initialized to
a newly generated value.
The label
attribute MUST return the value to which it was initialized when the
object was created.
Returns a MediaStreamTrackList
object
representing the audio tracks that can be enabled and disabled.
The audioTracks
attribute MUST return an array host
object for objects of type
MediaStreamTrack
that is fixed length
and read only. The same object MUST be returned each time
the attribute is accessed.
Returns a MediaStreamTrackList
object
representing the video tracks that can be enabled and disabled.
The videoTracks
attribute MUST return an array host
object for objects of type
MediaStreamTrack
that is fixed length
and read only. The same object MUST be returned each time
the attribute is accessed.
The MediaStream.ended
attribute MUST return true if the MediaStream
has
finished, and false otherwise.
When a MediaStream
object is created, its
ended
attribute
MUST be set to false, unless it is being created using the
MediaStream()
constructor
whose arguments are lists of MediaStreamTrack
objects that are all ended, in which case the
MediaStream
object MUST be created with its
ended
attribute set
to true.
ended
, MUST be supported by all
objects implementing the MediaStream
interface.Before the web application can access the users media input devices it
must let getUserMedia()
create a
LocalMediaStream
. Once the application is done using,
e.g., a webcam and a microphone, it may revoke its own access by calling
stop()
on the
LocalMediaStream
.
A web application may, once it has access to a
LocalMediaStream
, use the MediaStream()
constructor to construct
additional MediaStream
objects. Since a derived
MediaStream
object is created from the tracks of an
existing stream, it cannot use any media input devices that have not been
approved by the user.
When a LocalMediaStream
object’s stop()
method is invoked,
the user agent MUST queue a task that runs the following steps on
every track:
Let track be the current
MediaStreamTrack
object.
End track. The track start outputting only silence and/or blackness, as appropriate.
Dereference track’s underlying media source.
If the reference count of track’s underlying media source is greater than zero, then abort these steps.
Permanently stop the generation of data for track’s source. If the data is being generated from a live source (e.g. a microphone or camera), then the user agent SHOULD remove any active "on-air" indicator for that source. If the data is being generated from a prerecorded source (e.g. a video file), any remaining content in the file is ignored.
The task source for the tasks
queued for the stop()
method is the DOM
manipulation task source.
A MediaStreamTrack
object represents a media
source in the user agent. Several MediaStreamTrack
objects can represent the same media source, e.g., when the user chooses
the same camera in the UI shown by two consecutive calls to
getUserMedia()
.
A MediaStreamTrack
object can reference its media
source in two ways, either with a strong or a weak reference, depending
on how the track was created. For example, a track in a
MediaStream
, derived from a
LocalMediaStream
with the MediaStream()
constructor, has a weak
reference to a local media source, while a track in a
LocalMediaStream
has a strong reference. This means
that a track in a MediaStream
, derived from a
LocalMediaStream
, will end if there is no
non-ended track in a LocalMediaStream
which
references the same local media source.
The concept with strong and weak references to media
sources allows the web application to derive new
MediaStream
objects from
LocalMediaStream
objects (created via
getUserMedia()
),
and still be able to revoke all given permissions with LocalMediaStream.stop()
.
A MediaStreamTrack
object is said to end
when the user agent learns that no more data will ever be forthcoming for
this track.
When a MediaStreamTrack
object ends for any reason
(e.g. because the user rescinds the permission for the page to use the
local camera, or because the data comes from a finite file and the file’s
end has been reached and the user has not requested that it be looped, or
because the UA has instructed the track to end for any reason, or because
the reference count of the track’s underlying media source has reached
zero, it is said to be ended. When track instance
track ends for any reason other than stop()
method being invoked on the
LocalMediaStream
object that represents
track, the user agent MUST queue a task that runs the
following steps:
If the track’s readyState
attribute
has the value ENDED
(2) already, then
abort these steps.
Set track’s readyState
attribute to
ENDED
(2).
Fire a simple event named ended
at the object.
If the end of the stream was reached due to a user request, the event source for this event is the user interaction event source.
The MediaStreamTrack.kind
attribute MUST return the string "audio
" if the object’s
corresponding track is or was an audio track, "video
" if
the corresponding track is or was a video track, and a user-agent
defined string otherwise.
User agents MAY label audio and video sources (e.g. "Internal
microphone" or "External USB Webcam"). The MediaStreamTrack.label
attribute MUST return the label of the object’s corresponding track,
if any. If the corresponding track has or had no label, the attribute
MUST instead return the empty string.
Thus the kind
and label
attributes do not
change value, even if the MediaStreamTrack
object
is disassociated from its corresponding track.
The MediaStreamTrack.enabled
attribute, on getting, MUST return the last value to which it was
set. On setting, it MUST be set to the new value, and then, if the
MediaStreamTrack
object is still associated with
a track, MUST enable the track if the new value is true, and disable
it otherwise.
Thus, after a MediaStreamTrack
is
disassociated from its track, its enabled
attribute still
changes value when set, it just doesn’t do anything with that new
value.
The track is active (the track’s underlying media source is making a best-effort attempt to provide data in real time).
The output of a track in the LIVE
state can be switched
on and off with the enabled
attribute.
The track is muted (the track’s underlying media source is temporarily unable to provide data).
A MediaStreamTrack
in a
LocalMediaStream
may be muted if the user
temporarily revokes the web application’s permission to use a media
input device.
The track has ended (the track’s underlying media source is no longer providing data, and will never provide more data for this track).
For example, a video track in a
LocalMediaStream
finishes if the user unplugs the
USB web camera that acts as the track’s media source.
The readyState
attribute represents the state of the track. It MUST return the value
to which the user agent last set it (as defined below). It can have
the following values: LIVE, MUTED or
ENDED.
When a MediaStreamTrack
object is created, its
readyState
is either
LIVE
(0) or
MUTED
(1),
depending on the state of the track’s underlying media source. For
example, a track in a LocalMediaStream
, created
with getUserMedia()
, MUST initially have its readyState
attribute
set to LIVE
(1).
muted
, MUST be supported by
all objects implementing the MediaStreamTrack
interface.unmuted
, MUST be supported
by all objects implementing the MediaStreamTrack
interface.ended
, MUST be supported by
all objects implementing the MediaStreamTrack
interface.Mints a Blob URL to refer to the given
MediaStream
.
When the createObjectURL()
method
is called with a MediaStream
argument, the user
agent MUST return a unique Blob URL for the
given MediaStream
. [[!FILE-API]]
For audio and video streams, the data exposed on that stream MUST
be in a format supported by the user agent for use in
audio
and video
elements.
A Blob URL is the
same as what the File API specification calls a Blob URI, except that
anything in the definition of that feature that refers to
File
and Blob
objects is hereby extended to
also apply to MediaStream
and
LocalMediaStream
objects.
A MediaStreamTrackList
object’s corresponding
MediaStream
refers to the
MediaStream
object which the current
MediaStreamTrackList
object is a property of.
MediaStreamTrack
object at the
specified index.Adds the given MediaStreamTrack
to this
MediaStreamTrackList
according to the ordering
rules for tracks.
When the add()
method is
invoked, the user agent MUST run the following steps:
Let track be the
MediaStreamTrack
argument.
Let stream be the
MediaStreamTrackList
object’s corresponding
MediaStream
object.
If stream is finished, throw an
INVALID_STATE_ERR
exception.
If track is already in the
MediaStreamTrackList
, object’s internal list,
then abort these steps.
Add track to the end of the
MediaStreamTrackList
object’s internal
list.
Removes the given MediaStreamTrack
from this
MediaStreamTrackList
.
When the remove()
method
is invoked, the user agent MUST run the following steps:
Let track be the
MediaStreamTrack
argument.
Let stream be the
MediaStreamTrackList
object’s corresponding
MediaStream
object.
If stream is finished, throw an
INVALID_STATE_ERR
exception.
If track is not in the
MediaStreamTrackList
, object’s internal list,
then abort these steps.
Remove track from the
MediaStreamTrackList
object’s internal
list.
addtrack
, MUST be
supported by all objects implementing the
MediaStreamTrackList
interface.removetrack
, MUST
be supported by all objects implementing the
MediaStreamTrackList
interface.Prompts the user for permission to use their Web cam or other video or audio input.
<<<<<<< HEADThe options argument is an object of type
MediaStreamOptions
.
The constraints argument is an object of type
MediaStreamConstraints
.
If the user accepts, the successCallback is invoked,
with a suitable LocalMediaStream
object
[[!WEBRTC10]] as its argument.
If the user declines, the errorCallback (if any) is invoked.
When the
getUserMedia()
method is called, the user agent
MUST run the following steps:
Let options be the method's first argument.
=======Let constraints be the method's first argument.
>>>>>>> masterLet successCallback be the callback indicated by the method's second argument.
Let errorCallback be the callback indicated by the method's third argument, if any, or null otherwise.
Let requestedMediaTypes be the set of media types in constraints (at the time of writing, only "audio" and "video" are available).
Let finalSet be an (initially) empty set.
If successCallback is null, abort these steps.
For each media type in requestedMediaTypes,
Let candidateSet be all possible tracks of the current media type that the browser could return.
For each constraint key-value pair in the "mandatory" dictionary,
If the constraint is not supported by the browser, call the errorCallback with a NOT_SUPPORTED_ERR
error naming the constraint and abort these steps.
Remove from the candidateSet any track that cannot satisfy the value given for the constraint.
If the candidateSet no longer contains at least one track, call the errorCallback with a MANDATORY_UNSATISFIED_ERR
error naming the constraint and abort these steps. Otherwise, continue with the next mandatory constraint.
Let the secondPassSet be the current contents of the candidateSet.
For each constraint key-value pair in the "optional" sequence of the constraints that are for the current media type, in order,
If the constraint is not supported by the browser, skip it and continue with the next constraint.
Remove from the secondPassSet any tracks that cannot satisfy the value given for the constraint.
If the secondPassSet is now empty, let the secondPassSet be the current contents of the candidateSet. Otherwise, let the candidateSet be the current contents of the secondPassSet.
Select one track from the candidateSet to add to the finalSet. The decision of which track to choose from the candidateSet is completely up to the browser. Unless and until a new set of constraints is provided, the browser MAY change its choice of track at any point, provided that 1) the new choice does not violate the given user permission, and 2) it notifies the application code by raising the ******TBD****** event. It may wish to do this, for example, if the user interface or network congestion changes. Note that no such change will have an effect on the presence or absence of each type of track, merely the contents.
Return, and run the remaining steps asynchronously.
Optionally, e.g. based on a previously-established user <<<<<<< HEAD preference, for security reasons, or due to platform limitations, jump to the step labeled failure below.
======= preference, for security reasons, or due to platform limitations, jump to the step labeled failure below. >>>>>>> masterPrompt the user in a user-agent-specific manner for permission
<<<<<<< HEAD
to provide the entry script's origin with a LocalMediaStream
object
[[!WEBRTC10]] representing a media stream.
The provided media SHOULD include a track for each local
representation of the MediaStreamOptions
dictionary members that is set to true.
The provided media MUST NOT include a track for any of the
local representation of the
MediaStreamOptions
dictionary members that is
not set to true.
LocalMediaStream
object
[[!WEBRTC10]] representing a media stream.
The provided media MUST include precisely the tracks in the finalSet.
>>>>>>> masterUser agents are encouraged to default to using the user's primary or system default camera and/or microphone (as appropriate) to generate the media stream. User agents MAY allow users to use any media source, including pre-recorded media files.
<<<<<<< HEADUser agents MAY wish to offer the user more control over the provided media. For example, a user agent could offer to enable a camera light or flash, or to change settings such as the frame rate or shutter speed.
======= >>>>>>> masterIf the user grants permission to use local recording devices, user agents are encouraged to include a prominent indicator that the devices are "hot" (i.e. an "on-air" or "recording" indicator).
If the user denies permission, jump to the step labeled failure below. If the user never responds, this algorithm stalls on this step.
Let stream be the LocalMediaStream
object
[[!WEBRTC10]] for which the user granted permission.
Queue a task to invoke successCallback with stream as its argument.
Abort these steps.
Failure: If errorCallback is null, abort these steps.
Let error be a new NavigatorUserMediaError
object whose code
attribute has
the numeric value 1 (PERMISSION_DENIED
).
Queue a task to invoke errorCallback with error as its argument.
The task source for these tasks is the user interaction task source.
A MediaTrackConstraintSet is a dictionary containing one or more key-value pairs, where each key MUST be a valid registered constraint name in the IANA-hosted RTCWeb Media Constraints registry [[!RTCWEB-CONSTRAINTS]] and its value SHOULD be as defined in the associated reference[s] given in the registry.
A MediaTrackConstraint is a dictionary containing exactly one key-value pair, where the key MUST be a valid registered constraint name in the IANA-hosted RTCWeb Media Constraints registry [[!RTCWEB-CONSTRAINTS]] and the value SHOULD be as defined in the associated reference[s] given in the registry.
>>>>>>> masterPERMISSION_DENIED
is defined.This sample code exposes a button. When clicked, the button is disabled and the user is prompted to offer a stream. The user can cause the button to be re-enabled by providing a stream (e.g. giving the page access to the local camera) and then disabling the stream (e.g. revoking that access).
<input type="button" value="Start" onclick="start()" id="startBtn"> <script> var startBtn = document.getElementById('startBtn'); function start() { navigator.getUserMedia({audio:true, video:true}, gotStream); startBtn.disabled = true; } function gotStream(stream) { stream.onended = function () { startBtn.disabled = false; } } </script>
This example allows people to take photos of themselves from the local video camera.
<article> <style scoped> video { transform: scaleX(-1); } p { text-align: center; } </style> <h1>Snapshot Kiosk</h1> <section id="splash"> <p id="errorMessage">Loading...</p> </section> <section id="app" hidden> <p><video id="monitor" autoplay></video> <canvas id="photo"></canvas> <p><input type=button value="📷" onclick="snapshot()"> </section> <script> navigator.getUserMedia({video:true}, gotStream, noStream); var video = document.getElementById('monitor'); var canvas = document.getElementById('photo'); function gotStream(stream) { video.src = URL.createObjectURL(stream); video.onerror = function () { stream.stop(); }; stream.onended = noStream; video.onloadedmetadata = function () { canvas.width = video.videoWidth; canvas.height = video.videoHeight; document.getElementById('splash').hidden = true; document.getElementById('app').hidden = false; }; } function noStream() { document.getElementById('errorMessage').textContent = 'No camera available.'; } function snapshot() { canvas.getContext('2d').drawImage(video, 0, 0); } </script> </article>
IANA is requested to register the following constraints as specified in [[!RTCWEB-CONSTRAINTS]]:
This is a enum type constraint that can take the values "true" and "false". The default is a non mandatory "true".
When one or more audio streams is being played in the proceses of varios microphones, it is often desirable to attempt to remove the sound being played from the input signals recorded by the microphones. This is referred to echo cancelation. There are cases where it is not needed and it is desirable to turn it off so that no audio artifacts are introduced. This constraint allows the application to control this behavior.
This is a enum type constraint that can take the values "true" and "false". The default is a non mandatory "true".
Indicates that application would like to capture audio input from a microphone.
This is a enum type constraint that can take the values "true" and "false". The default is a non mandatory "true".
Indicates that application would like to capture video input from a camera.
This is a enum type constraint that can take the values "true" and "false". The default is a non mandatory "true".
Indicates that application would like to capture an single image from a camera.
This section will be removed before publication.
-