Transcription Formats

Transcript files don’t follow a single global standard. In fact, they can vary significantly depending on:

The transcription tool used
Whether the transcript was created manually or automatically
The formatting preferences of researchers or agencies
Industry-specific standards and legacy systems

As a result, transcripts may look very different from one another.

Let’s walk you through the formats that are supported and how DoReveal works with them.

At its core, DoReveal focuses on conversations between speakers. To process any transcript effectively, it looks for:

Who is speaking
Where speaker changes occur
What each speaker is saying

Speaker labels can take many forms, such as:

Moderator / Participant
Speaker 1 / Speaker 2
Names like John / Sarah
Even generic labels or partial identifiers

As long as speaker separation can be inferred, DoReveal can structure the conversation for analysis.

Standard Transcript Files

These can be a word document or a .txt file. DoReveal looks for speaker labels and actual text. If there is additional information like section labels or some meta-data on top, it could impact the accuracy. Simple formats like the following work the best:

Example 1: Speaker names, followed by what they said:

Example 2: A variation of the first example, with additional separating character like a ":" next to speaker names

Example 3: Generic speaker labels instead of actual names

Handling Non-Standard Formats

Some transcripts don’t follow structured speaker labeling. DoReveal provides utilities to help convert a couple of them into a usable format. These are available through transcript format tools on the DoReveal website.

Supported Transcript Utilities

DoReveal provides tools to help standardize two common non-standard formats:

1. Font-Based Transcript Format

Some transcripts do not explicitly label speakers. Instead, formatting is used to distinguish them.
For example:

Moderator text may appear in bold
Participant text appears in regular font

DoReveal can interpret this structure and convert it into clear speaker labels.

This format is most useful for One-on-one interviews (IDIs)

2. VTT-Like Structured Format

Another common format includes structured blocks such as:

Sequential IDs (1, 2, 3…)
Timestamp ranges
Speaker roles and names (or partial labels)

For example:

Moderator: Henry – spoken text
Respondent: John – spoken text
Or simply:
- Henry – spoken text
- John – spoken text

DoReveal can process these variations and normalize them into structured conversations.

If you need support for additional format, please contact us at support@doreveal.com.