Session Transcription

What is a Transcript?

A Transcript is a text version of all the words spoken during a session, for all users, combined together with additional metadata such as timestamps and user information to provide a permanent record of what was said and when. It can be used for record-keeping or as input into other systems.

For billing purposes, we work on a per-transcribed participant minute (ptpm) basis. 1 ptpm equates to one minute of audio processed per person in the session. Note that, unlike some tiers of billing, these minutes disregard the number of users present at any given time: the full audio of all users will be transcribed for valid sessions.

When does Lessonspace generate a session transcript?

We generate transcripts for sessions linked to spaces which have been explicitly flagged to be transcribed in their API call. By default, transcription is turned off for all spaces.

In addition, certain sessions will never be transcribed, even if you indicate for them to be transcribed:

Sessions less than 150 seconds in length.
Sessions with only one unique user in them.

Enabling Transcription

In order to use transcriptions in any capacity, it must be toggled on in your organisation settings. This is a global organisation flag that overrides any other transcription flags passed to sessions. This allows you to disable transcription for your entire organisation on demand without having to alter any API calls.

Transcriptions for a session can only be enabled when calling the Launch API. Each API call made for a space must explicitly enable the feature if it should remain enabled; if it is omitted or explicitly set as false, it will be disabled for that space. This is done to prevent accidental transcriptions being done. If two calls are made to generate two join links for a space, and the first one enables transcription but the second disables it (or omits it), the feature will revert to being off for that space. Note that this is true irrespective of the order in which the join links are ultimately used.

Session transcription is set by passing the transcribe field to the Launch API as a boolean.

Recording of AV must also be enabled for the space for transcription to work. It can either be enabled in the same API call, or flagged as on by default for the whole organisation in your Lessonspace settings. If AV recording is not enabled, you will receive an error from the Launch API when calling it with `transcribe` set to true.

Sample Request Body

{
    "id": "your-space-id",
    "transcribe": true,
    "record_av": true,   // Only needed if AV recording is disabled for the org
    "user": {
        "name": "Teddy Transcriber"
    }
}

Example Response

{
    "client_url": "...",
    "api_base": "...",
    "room_id": "40a28b5f-5f37-41f5-8c71-2bb8be1c3956",
    "secret": "74027e20-ab04-45ba-a9c2-80b0c31f7f25",
    "session_id": "eff55401-b33a-4cc0-ba22-4aac0f6e9ec3",
    "user_id": 3077485,
    "room_settings": {
        "record_av": true,
        "record_content": true,
        "waiting_room": false,
        "transcribe": true  // This indicates that transcription is now turned on for the room
    }
}

You can use the room_settings object in the response to verify whether transcription is enabled for a space.

It is not possible to only enable transcription for some users in a space: the feature is either on for everyone or off for everyone.

Accessing Transcripts

The Lessonspace Dashboard

Located on the Lessonspace dashboard, on the Sessions page, under the “More Actions” menu for each session.

The Download Transcription option will only be listed in the menu once the transcript has been generated, and only for sessions both originally flagged for transcription and also valid to be transcribed based on the length and number of participants.

The Lessonspace API

Transcripts can be retrieved programmatically for an individual session by performing a GET request with the session UUID in the query parameter to the transcript endpoint.

Response

Once the transcription process is complete, the response will contain a pre-signed AWS S3 download URL in the transcriptionUrl field. If the process is still ongoing, an error will be returned. URLs will be valid for 12 hours, after which you will need to repeat the API call to receive a new link.

{ "transcriptionUrl": "<pre-signed S3 URL>" }

Webhooks

It is possible to subscribe to a webhook to be notified the moment a session’s transcript is ready. You can read more about implementing webhooks generally in our documentation.

You define the webhook in the API call:

{
    "id": "your-space-id",
    "transcribe": true,
    "record_av": true,
    "user": {
        "name": "Teddy Transcriber"
    },
    "webhooks": {
        "transcription": {
            "finish": "https://your.url.here"
        }
    }
}

The webhook payload will be as such (identical to the response from the Lessonspace API):

{ "transcriptionUrl": "<pre-signed S3 URL>" }

Note that the links provided via webhook are valid for 24 hours.

Transcription Output Format

The downloaded JSON has the following structure. It is an array with multiple items.

[
    {
      "start_time": number,
      "end_time": number,
      "user": {
          "id": number,
          "name": string,
        },
       "breakout_id": string,
       "text": string
    }
]

start_time: number of seconds since session start for this segment.
end_time: number of seconds since session start for this segment.
user.id: the Lessonspace user ID, matching the user_id returned in the Launch call and the ID in the payload for relevant webhooks.
user.name: the user’s displayName.
breakout_id: the string main if the user was in the main room, or a UUID if the user was in a breakout room. Users may appear to be speaking over one another or conversing nonsensically if you do not account for breakout separation.
text: the transcribed text for this segment.

Known Limitations

Due to the merging of multiple independent audio streams, timestamps for users may be very slightly offset, such that it sometimes appears that someone responded to a question before it was spoken.
The audio at the very start of a session may falsely detect speech, especially if the audio begins with a prolonged period of silence.