Parsing Descript Output Into Individual Video Clips

At least once a week, I use Descript to find clips in lengthy video files (usually School Board or other meetings, downloaded from YouTube). Being able to edit/trim/find speech by just searching for and editing text (especially when the meeting runs several hours) is amazing.

For social media reuse, it’s helpful to have clips of individual speakers. I had been using Descript to find the start of each person’s speech, adding that timestamp to a cuts.txt file (hopefully without typos) and running a script over that to let ffmpeg separate out the individual speaker videos.

Fine as far as it goes, but error-prone and tedious. What I want to do is just mark the speakers in the Descript composition, export it, and let a script do the rest.

That script is now on github. The new workflow:

Find the first Public Comment speaker, and drop a marker with their name at the beginning of their speech.
Do the same for the next speaker, and repeat until…
Drop and “End” marker at the end of the last speaker (or the last speaker I want)

Descript makes it very easy to delete speakers (or other chatter) I don’t want.

Once things are cleaned up to my satisfaction, I export the edited video (including those markers) and run this script to get a collection of clips, ordered and named per the markers.

I just ran it against the video of today’s Brevard County School Board meeting. Here are the markers I dropped:

A Descript screenshot, showing markers naming each speaker's starting point

And the resulting videos:

A list of numbered video clips, names matching the markers from Descript