SRV3 is a proprietary captioning format used by YouTube for storing closed captions. It is based on the TTML standard, but with major modifications. The format is XML based.
The structure of a YTT file is similar to that of HTML, but with a few key differences. As the format is designed specifically for closed captions and not general web content, it has a few unique elements and attributes.
The structure of a YTT file is as follows:
- Header tag (
<head>
)- Pens (
<pen>
) A pen is a variable to store a style for text. It can be referenced by a span to apply the style to the text, similar to the HTMLclass
attribute. Pens are defined in the header section of the file, and referenced in the body inside a span asp="<pen_id>"
. - Window styles (
<ws>
) A window style is a variable to store a style for the text "window" (the box that contains the text). - Window positions (
<wp>
) A window position is a variable to store the position of the text window on the screen, anchored to a specific corner of the screen, with X and Y offsets defined as(ah, av)
.
- Pens (
- Body tag (
<body>
)- Lines (
<p>
) A line is a block of text that appears on the screen at a specific time, with a specific duration, at a specific position and with a specific style. The text content of the line is stored as the inner text of the<p>
tag.- Spans (
<s>
) A span is a block of text with a specific style applied to it. Spans can be used to apply different styles to different parts of the text within a line. You can reference a pen style by using thep="<pen_id>"
attribute on the span tag. - Breaks (
<br>
) A break is a line break within a line of text. It is used to split a line of text into multiple lines.
- Spans (
- Lines (
All the above data is then wrapped inside a <timedtext>
tag.
<tt>
is replaced by<timedtext>
.- Pen styles are styles for text. They are referenced by spans to be applied.
- Window styles can be used to style the text window.
- Custom window positioning using window position variables.
- Ruby text support.
Note
Type definitions are in the format name: type = description
.
Boolean types are represented as integers 0 and 1, 0 being false and 1 being true.
The root element of the TTML document
The header section of the YTT file. Contains definitions for pens, window styles, and window positions.
Window position.
A variable declared to store a position for a window onscreen.
id: enum = Position ID
ap: enum = Anchor point
ah: int = X offset (0-100)
av: int = Y offset (0-100)
A variable to store a style for the text window.
id: enum = Style ID
ju: enum = Justification ID
pd: enum = Pitch
sd: enum = Yaw/Skew
A pen is a variable to store styles, it is defined by an ID and can be referenced by span to be applied to, similar to CSS classes.
id: int = Pen ID
fs: enum = Font style (0-7)
sz: int = Font scale
of: int = offset
b: bool = bold
i: bool = italic
u: bool = underline
fc: hex = Foreground color
fo: int = Foreground opacity
bc: hex = Background color
bo: int = Background opacity
ec: hex = Shadow (edge) color
et: enum = Shadow (edge) type
rb: enum = Ruby text (0-5)
hg: bool = Packed text
The body section of the YTT file. Contains the lines of text to be displayed.
A line of text to be displayed on the screen.
t: int = line start (in ms at video time)
d: int = line duration (in ms)
wp: enum = position ID
ws: enum = window style ID
0 - Top Left 1 - Top Center 2 - Top Right 3 - Middle Left 4 - Center 5 - Middle Right 6 - Bottom Left 7 - Bottom Center 8 - Bottom Right
0 - Top Left, Middle Left, Bottom Left 1 - Top Right, Middle Right, Bottom Right 2 - Top Center, Center, Bottom Center
0 - No ruby text 1 - Base 2 - Parentheses 4 - Before text 5 - After text
0 - Default font (Roboto) 1 - Monospace Serif (Courier New) 2 - Proportional Serif (Times New Roman) 3 - Monospace Sans (Lucida Console) 4 - Proportional Sans (Roboto) 5 - Casual (Comic Sans MS) 6 - Cursive (Monotype Corsiva) 7 - Small Capitals (Arial with font-variant small-caps)
These are used to set the pitch and yaw of the text window.
2,0 - Characters above each other, columns right to left 2,1 - Characters above each other, columns left to right 3,0 - Subtitle rotated 90° CCW, columns left to right 3,1 - Subtitle rotated 90° CCW, columns right to left
0 - Subscript 1 - Normal 2 - Superscript
0 - No shadow 1 - Hard shadow 2 - Beveled shadow 3 - Glow/Outline 4 - Soft shadow
- Android does not support foreground opacity (
fo
). - Android and iOS do not support custom backgrounds (
bc
&bo
). - Android and iOS do not support edge color (
ec
&et
). - Android and iOS do not support custom fonts (
fs
). - Android does not support font sizes (
sz
). - Android and iOS do not support ruby text (
rb
). - Android and iOS do not support subscript/superscript (
of
). - Android and iOS do not support vertical text alignment (
pd
&sd
).