Help

Transcription Formats

Transcript files don’t follow a single global standard. In fact, they can vary significantly depending on:
  • The transcription tool used
  • Whether the transcript was created manually or automatically
  • The formatting preferences of researchers or agencies
  • Industry-specific standards and legacy systems

As a result, transcripts may look very different from one another.

Let’s walk you through the formats that are supported and how DoReveal works with them.




At its core, DoReveal focuses on conversations between speakers. To process any transcript effectively, it looks for:
  • Who is speaking
  • Where speaker changes occur
  • What each speaker is saying

Speaker labels can take many forms, such as:
  • Moderator / Participant
  • Speaker 1 / Speaker 2
  • Names like John / Sarah
  • Even generic labels or partial identifiers

As long as speaker separation can be inferred, DoReveal can structure the conversation for analysis.

Standard Transcript Files

These can be a word document or a .txt file. DoReveal looks for speaker labels and actual text. If there is additional information like section labels or some meta-data on top, it could impact the accuracy. Simple formats like the following work the best:

Example 1: Speaker names, followed by what they said:




Example 2: A variation of the first example, with additional separating character like a ":" next to speaker names


Example 3: Generic speaker labels instead of actual names







Handling Non-Standard Formats


Some transcripts don’t follow structured speaker labeling. DoReveal provides utilities to help convert a couple of them into a usable format. These are available through transcript format tools on the DoReveal website.



Supported Transcript Utilities


DoReveal provides tools to help standardize two common non-standard formats:

1. Font-Based Transcript Format

Some transcripts do not explicitly label speakers. Instead, formatting is used to distinguish them.
For example:



  • Moderator text may appear in bold
  • Participant text appears in regular font
DoReveal can interpret this structure and convert it into clear speaker labels.



This format is most useful for One-on-one interviews (IDIs)


2. VTT-Like Structured Format



Another common format includes structured blocks such as:
  • Sequential IDs (1, 2, 3…)
  • Timestamp ranges
  • Speaker roles and names (or partial labels)

For example:
  • Moderator: Henry – spoken text
  • Respondent: John – spoken text
  • Or simply:
    • Henry – spoken text
    • John – spoken text

DoReveal can process these variations and normalize them into structured conversations.



If you need support for additional format, please contact us at support@doreveal.com.

© Synthefai Inc.