Transcribing Poetry And Speeches With Wav2Vec2

Efficient transcription of audio files has been one of the major “missing links” in modern NLP — till now. Enter Hugging Face’s implementation of Facebook’s Wav2Vec2 model, which produces impressive out-of-the-box results.

Chua Chin Hon
Towards Data Science

--

Poet Amanda Gorman delivering the inauguration poem on Jan 20, 2021. Screen-capture via PBS NewsHour’s YouTube clip.

AI can’t write great poetry on its own (yet?). But it can now transcribe a poetry recital really well, if results from the Wav2Vec2 transformer model is anything to go by.

My trials using audio clips ranging in length from 62s to 12.5 minutes, including the evocative Inaugural Poem by youth poet Amanda Gorman, turned up pretty impressive results.

Efficient audio-to-text transcription has been one of the “missing links” in the modern Natural Language Processing (NLP) toolkit. Not anymore it seems, thanks to Hugging Face’s implementation of the Wav2Vec2 model by Facebook.

What’s exciting about this is that it opens up new possibilities of “chain linking” NLP tasks from audio files to, say, textual sentiment analysis or even translation in one go. The caveat here is that long audio clips are really memory intensive when processed this way, and require additional work arounds for folks (like me) who don’t have access to a super-duper computer.

This post outlines results from my recent trials, and includes links to a repo with notebooks and sample files to run your own experiment. You can check out the output transcripts of the two longish clips here.

REPO, REQUIREMENTS, AND REFERENCES

My repo contains 3 notebooks (2 Jupyter, 1 Colab) and 3 sets of audio files. To run them, you’ll need:

The audio files in the repo have been split-up and downsampled to 16kHz. If you wish to use your own audio clips, make sure they are sampled at 16kHz as that’s the frequency that the Wav2Vec2 base model was pre-trained and fine tuned on.

How long should each audio clip be? That would depend on the amount of compute at your disposal. The Wav2Vec2 transcription process is very memory intensive. Attempts to transcribe audio files longer than 90s have all crashed on me, even on Colab Pro.

As such, I’ve kept the audio clips to about 60s max. To process longer audio clips, you’ll have to split them up into equal parts of a minute (or whatever your machine can deal with) each. We’ll get into this later.

I used Audacity to split up the audio files in my trials. It’s free and more than adequate for the task at hand.

I won’t get into the technical details in this post. But the documentation and paper on Wav2Vec2 are worth a read:

There are several versions of the Wav2Vec2 model on Hugging Face’s model hub. For this exercise, I’m sticking with the wav2vec2-base-960h base model.

TRIAL #1: “ASK WHAT YOU CAN DO FOR YOUR COUNTRY ”

Former US President John F Kennedy delivering his inaugural address on 20 January 1961. Screen-grab via video on the website of the JFK Presidential Library and Museum.

The simplest way to try out Wav2Vec2 is with a small audio clip that would be relatively fast for the model to transcribe in one go. I picked a 62s clip from late US President John F Kennedy’s famous inaugural address in 1961 (13th-minute mark onwards), and this is the transcript from Wav2Vec2:

IN THE LONG HISTORY OF THE WORLD ONLY A FEW GENERATION HAVE BEEN GRANTED THE ROLE OF DEFENDING FREEDOM IN ITS OUR O MAXIMUM DANGER I DO NOT SHRINK FROM THIS RESPONSIBILITY I WELCOME IT I DO NOT BELIEVE THAT ANY OF US WOULD EXCHANGE PLACES WITH ANY OTHER PEOPLE OR ANY OTHER GENERATION THE ENERGY THE FAITH THE DEVOTION WHICH WANGE BRANG TO THIS END OF UP WILL NIT OUR COUNTRY AND ALL WHO SERVE IT AND THE GLOW FROM THAT FIRE AND TRULY LIKE THE WORLD AND SO MY FELLOW AMERICA ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY

The parts in bold in the Wav2Vec2 transcript are the problematic parts. Overall, the results are pretty impressive, in my view. But how would the model perform on a longer audio clip, with a change in accent and cadence of speaking?

TRIAL #2: “THE HILL WE CLIMB”

Poet Amanda Gorman delivering the inauguration poem on Jan 20, 2021. Screen-capture via PBS NewsHour’s YouTube clip.

For a second trial that would feature distinct contrast with the first, I jumped 40 years ahead to another US Presidential Inauguration and picked a 5 minutes 34s clip of Amanda Gorman delivering a beautiful and evocative poem from the steps of the US Capitol building.

However, attempts to transcribe the 334s clip in one go proved futile, with out-of-memory issues crashing my notebook repeatedly. I eventually settled on a simple if clumsy workaround by splitting the original clip into smaller 35s clips and transcribing them one at a time. See this notebook.

This is the transcript from Wav2Vec2:

MISTER PRESIDENT DOCTER BYDEN MADAM VICE PRESIDENT MISTER MHOFF AMERICANS AND THE WORLD WHEN DAY COMES WE ASK OURSELVES WHERE CAN WE FIND LIGHT IN THIS NEVER ENDING SHADE THE LOSS WE CARRY A SEA WE MUST WADE WE BRAVE THE BELLY OF THE BEAST WE'VE LEARNED THAT QUIET ISN'T ALWAYS PEACE IN THE NORMS IN NOTIONS OF WHAT JUST
IS ISN'T ALWAYS JUST IS AND YET THE DAWN IS HOURS BEFORE WE KNEW IT SOMEHOW WE DO IT SOMEHOW WE'VE WEATHERED AND WITNESSED A NATION THAT ISN'T BROKEN BUT SIMPLY UNFINISHED WE THE SUCCESSORS OF A COUNTRY AND A TIME WERE A SKINNY BLACK GIRL DESCENDED FROM SLAVES AND RAISED BY A SINGLE MOTHER CAN DREAM OF BECOMING PRESIDENT ONLY TO FIND HERSELF RECITING
FOR ONE AND YES WE ARE FAR FROM POLISHED FAR FROM PESTIM BUT THAT DOESN'T MEAN WE ARE STRIVING TO FORM A UNION THAT IS PERFECT WE OR STRIVING TO FORGE OR UNION WITH PURPOSE TO COMPOSE A COUNTRY COMMITTED TO ALL CULTURES COLORS CHARACTERS AND CONDITIONS OF MAN AND SO WE LIFT OUR GAZES NOT TO WHAT STANDS BETWEEN US BUT WHAT STANDS BEFORE US WE CLOSE THE DIVIDE BECAUSE WE KNOT A
PUT OUR FUTSURE
FIRST WE MUST FIRST PUT OUR DIFFERENCES ASIDE WE LAY DOWN OUR ARMS TIL WE CAN REACH OUT OUR ARMS TO ONE ANOTHER WE SEEK HARM TO NONE AND HARMONY FOR ALL LET THE GLOBE IF NOTHING ELSE SAY THIS IS TRUE THAT EVEN AS WE GRIEVED WE GREWE THAT EVEN AS WE HURT WE HOPED THAT EVEN AS WE TIRED WE TRIE THAT WILL FOREVER BE TIED TOGETHER VICTORIOUS NOT BECAUSE WE WILL NEVER AGAIN KNO DEFEAT BUT BECAUSE WE WILL N
EVER AGAIN SO DIVISION SKIPSER TELLS US TO INVISION THAT EVERYONE SHALL SIT UNDER THE OWN VINE AND FIG TREE AND NO ONE SHALL MAKE THEM AFRAID IF WE'RE TO LIVE UP TO OUR OWN TIME THAN VICTORY WON'T LIE IN THE BLADE BUT IN ALL THE BRIDGES WE'VE MADE THAT IS THE PROMISE TO GLADE THE HILL WE CLIME IF ONLY WE DARE IT BECAUSE BEING AMERICAN IS MORE THAN A PRIDE WE INHERIT IT'S THE PAST WE STEP INTO AND HOW WE RE
HAR
IT WE'VE SEEN A FOREST THAT WOULD SHATTER OR NATION RATHER THAN SHARE IT WELD DESTROY OUR COUNTRY IF IT MEANT DELAYING DEMOCRACY AND THIS EFFORT VERY NEARLY SUCCEEDED BUT WHILE DEMOCRACY CAN BE PERIODICALLY DELAYED IT CAN NEVER BE PERMANENTLY DEFEATED IN THIS TRUTH IN THIS FAITH WE TRUST FOR WHILE WE HAVE OUR EYES ON THE FUTURE OR HISTORY HAS ITS EYES ON US THIS IS THE ERA OF JUST REDEMPTION WE FEARED A
CEPTION
WE DID NOT FEEL PREPARED TO BE THE EIRS OF SUCH A TERRIFYING HOUR BUT WITHIN IT WE FOUND THE POWER TO AUTHOR A NEW CHAPTER TO OFFER HOPE AND LAUGHTER TO OURSELVES SO WHILE ONCE WE ASKED HOW COULD WE POSSIBLY PREVAIL OVER CATASTROPHE NOW WE ASSERT HOW COULD CATASTROPHE POSSIBLY PREVAIL OVER US WE WILL NOT MARCH BACK TO WHAT WAS BUT MOVE TO WHAT SHALL BE A COUNTRY THAT IS BRUISED BUT WHOLE BE
VOLENCE
BUT BOLD FIERCE AND FREE WE WILL NOT BE TURNED AROUND OR INTERRUPTED BY INTIMIDATION BECAUSE WE KNOW OUR INACTION AND INERTIA WILL BE THE INHERITANCE OF THE NEXT GENERATION OUR BLENDERS BECOME THEIR BURDENS BUT ONE THING IS CERTAIN IF WE MERGE MERCY WITH MIGHT AND MIGHT WITH MIGHT THEN LOVE BECOMES OUR LEGACY IN CHANGE OUR CHILDREN'S BIRTHRIGHT SO LET US LEAVE BEHIND A COUNTRY
BETTER THAN ONE WE WERE LEFT WITH EVERY BREATH FROM OUR BRONZE POUNDED CHEST WE WILL RAISE THIS WOUNDED WORLD INTO A WONDROUS ONE WE WILL RISE FROM THE GOLDLIMD HILLS OF THE WEST WE WILL RISE FROM THE WIND SWEPT NORTHEAST WHERE OUR FORFATHER'S FIRST REALISE REVOLUTION WE WILL RISE FROM THE LAKE RIMD CITIES OF THE MIDWESTERN STATES WE WILL RISE FROM THE SUNBAKED SOUTH WE WILL REBUILD RECONCILE AND BECOVER AND EVERY KNOWN NOOK OF OUR NATION IN EVERY CORNER CALLED OUR COUNTRY OUR PEOPLE DIVERSE AND BEAUTIFUL WILL EMMERGE BATTERED
AND BEAUTIFUL WHEN DAY COMES WE STEP OUT OF THE SHADE AFLAME AND UNAFRAID THE NEW DAWN BALLOONS AS WE FREE IT FOR THERE WAS ALWAYS LIGHT IF ONLY WERE BRAVE ENOUGH TO SEE IT IF ONLY WERE BRAVE ENOUGH TO BE IT

Again, the parts in bold are the problematic parts that the model failed to get right. But they are pretty minor issues I would say, and can be easily cleaned up. As the Wav2Vec2 models get more sophisticated, I’m pretty sure the results will be better.

TRIAL #3: SINGAPORE AND THE WEF

Screen-capture via Singapore Prime Minister’s Office’s YouTube Channel.

For the third trial, I wanted to assess the model on a clip longer than 10 minutes featuring a speaker with an Asian accent. I settled on a January 29 2021 speech by Singapore Prime Minister Lee Hsien Loong at the World Economic Forum Davos Agenda Week.

This time round, I split the 12 minute 30s speech into 13 parts (12x60s clips and 1x30s clips) and ran it on Colab Pro. The transcript took 2 minutes 4s to produce:

IAM VERY HONOR TO SPEAK AT DISCLOSING ADDRESS AND I LIKE TO CONGRATULATE PROFESSOR SCHOEB YOURSELF AND THE WHOLE BLW  F TEAM FOR PUTTING TOGETHER A SUCCESSFUL PROGRAMM IT 'S BEEN A YEAR SINCE WE WERE ALL PHYSICALLY GATHERED IN DAVORCE FOR THE FIFTIETH ANNUAL MEETING OFER THE DBU F AT THAT TIME WE WERE JUST STARTING TO HEAR ABOUT THIS NEW VIRUS AND TRYING TO UNDERSTAND WHAT WAS HAPPENING NONE OF US ANTICIPATED HOW QUICKLY A FULL SCALE PANDAMIC WOULD BLOW UP AND DRAMATICALLY CHANGE OUR WORLD THE DESRUPTION TO LIVES AND LIVELIHOODS HAS BEEN MASSIVE AND UNPRECEDENTED THE VIRAS IS STILL RAGING IN MANY COUNTRIES IN THE DEVELOPED WORLD IN THE US AN EUROPE AND ALSO IN THE DEVELOPING WORLD IN AFRICA SOUTH AMERICA AND SOUTH ASIA THANKFULLY WITH BAXINES BECOMING AVAILABLE THERE IS SOME LIGHT AT THE END OF THE TUNNEL IT IS NOW
CRITICAL THAT VAXINES ARE RULED OUT QUICKLY ACROSS THE WORLD BUT EVEN WITH VAXINES THE PANDAMIC IS FAR FROM BEING QUELLED THE NEW VARIANCE DISCOVERED AND THE U K IN SOUTH AFRICA AND BRASIL ARE WARRYING AND FURTHER MUTATIONS WILL SURELY EMERGE UNTIL A LARGE PART OF THE WORLD'S POPULATION IS VACCINATED WE STILL NEEDS STRONG PUBLIC HEALTH MEASURES EVERYWHERE TO SUPPRESS THE SPREAD OF THE VIOLUS AND KEEP POPULATION SAFE WHAT WILL THE POSTCOVE NINETEEN WOLL LOOK LIKE WILL COUNTRIES EMERGE MORE RESOLVE TO BUILD A MORE RESILIANT BUT STILL GLOBALIZED WELL OR ARE WE HEADED TOWARDS A LESS INTEGRATED GLOBAL ECONOMY A LESS STABLE INTERNATIONAL ORDER THE ANSWER DEPENDS ON THE DECISIONS THAT COUNTRIES TAKE NOW EVEN BEFORE COVER NINETEEN GLOBILIZATION WAS ALREADY UNDER PRESSURE CONFIDENCE AND MALTILACERAL INSTITUTIONS AND RULES AND NORMES WAS ERODING POPULOUS POLITICS NATIVISM NATIONALISM PROTECTIONISM WERE ON THE RISE COUNTRYS INITIAL REACTIONS TO THE PENDEMIC SEEMED TO HERL GRUBILIZATIONS DEMIES BORDERS WERE CLOSE SUPPLY CHANGE WERE BADLY DISRUPTED EACH COUNTRY SCRAMBLE TO SECURE ITS OWN SUPPLIES OF ESSENTIAL GOODS ESPECIALLY IN MEDICINES FACE MASKS AND VENTILATORS IT WAS EACH MAN FOR HIMSELF BUT AS THEIR SITUATION UNFOLDED WE WERE FORCEFULLY REMINDED THAT OUR FATES WERE INTERTWINED AND THAT WE HAD TO WORK TOGETHER AND SO WE DID IN MANY AREAS WE RESTORED SUPPLY CHANGE WE REPATRATED EACH OTHER'S CITIZENS STUCK OVER SEAS WE SHARED TESTS AND MEDICAL SUPPLIES WE SUPPORTED VAXYNG MALTILATERALISM INITIATIVES LIKE THE KOVACS
SO THAT ALL COUNTRIES AV SPECIALLY THE LEAST DEVELOP ONES WOULD HAVE ACCESS TO VACSCENES AND AS WE GRADUALLY REBUILT CONFIDENCE IN ONE ANOTHER WE OPENED UP CONTROL CORRIDORS FOR TRAVEL AND TRADE BETWEEN COUNTRIES CRUCIALLY INTERNATIONAL SCIENTIFIC CO OPERATION IN THE FIGHT AGAINST COVERNINETEEN CONTINUE DOCTORS AND SCIENTISTS SHARED INFORMATION ABOUT THE DISEASE AND THE VIRUS STUDYING THEM DEVELOPING TREATMENTS AND TESTING VAKSCENES THIS ENABLE US TO IMPROVE PATIENT CARE AND TO PRODUCE EFFECTIVE ACTENES IN RECORD TIME SOME USING NEW ECKNOWLEDGIES SUCH INTERNATIONAL CO OPERATION AND MALTILATERAL EFFORTS REMAIN ESSENTIAL TO TACKLE THE GLOBAL PANDEMIC COHERENTLY WITH BORDER CLOSURES AND LOCK DOWNS ECONOMIES HAVE ALL TAKEN A DEEP PLUNGE THE LIBLIHOODS OF MILLIONS CAME UNDER IN
NORMOUS
STRESS ONLY UNPRECEDENTED LEVELS OF EMERGENCY SPENDING AND MAGETARY STIMULUS HAVE KEPT US AFLOAD PROVIDING A LIFE LINE TO COMPANIES WORKERS AND FAMILIES CENTRAL BANKS HAVE PLAYED THEIR PART TO PREVENT FINANCIAL SYSTEMS AND GLOBAL CAPITAL MARKETS FROM SEIZING UP UNLIKEN PREVIOUS CRISES THESE EXTRAORDINARY MEASURES CANNOT BE SUSTAINED INDEFINITELY IN FACT SPENDING PACKAGES ARE ALREADY TAPERING OFF BUT HOPEFULLY AS FASCINATION BECOMES MORE WIDESPREAD AND WE MAKE HEADWAYS SUPPRESSING THE VIOLUS COVET NINETEEN RESTRICTIONS CAN BE PROGRESSIVELY EASE AND ECONOMIES WILL REBOUN THE WORLD BANK AND I MY FORECAST GLOBAL GROWTH TO RECOVER THIS YEAR IT WILL NOT RESTORE OUTPUT TO PRICOVENINETEEN LEVELS BUT TI SOMETHING STILL TO BE THANKFUL FOR NOW WE ARE ENTERING A NEW PHASE
THE PANDEMIC HAS EXPOSED BUSINESSES AND JOBS WHICH ARE NOT GOING TO REMAIN VIABLE THEY HAVE TO BE LET GO TO ALLOW NEW GROWTH AND BETTER JOBS TO BE CREATED IN THEIR PLACE HARD DECISIONS HAVE TO BE MADE AND THIS WILL EXASCEBATE EXISTING STRESSES GOVERNMENTS WILL COME UNDER MORE PRESSURE TO ADOPT PROTECTIONISTS AND NATIVEST POSITIONS TO RESUME GROWTH WE MUST LOOK BEYOND RETURNING TO THE STATEST QUA ANTI WE MUST LOOK AHEAD WILEN COUNTRIES GOVERNMENTS AND BUSINESSES MUST COLLABORATE TO TACT NEW MARKETS AND DEVELOP NOVELT ACKNOWLEDGES EXTERNALLY COUNTRIES NEED TO STRENGTHEN THE FRAMEWORK FOR INTERNATIONAL CO OPERATION AS AN IMMEDIATE TASK COUNTRIES SHOULD COLLABORATE TO DEVELOP A STANDARDIZE ROBUST SYSTEM TO VERIFY THE AUTHENTICITY OF TESTS AND VACCINATIONS THIS IS ESSENTIAL TO REOPEN B
ORDERS AND RESUME INTERNATIONAL TRAVEL IN THE LONGER TERM COUNTRIES SHOULD WORK TOGETHER TO UPDATE AND STRENGTHENINTS TERNATIONAL INSTITUTIONS LIKE THE W TEOL AND CREATE NEW RULES TO GOVERN AND FOSTER NOVEL FORMS OF ECONOMIC ACTIVITY FOR EXAMPLE TO SUSTAIN THE GROWTH OF THE DIGITAL ECONOMY AND FACILITATE SAFE SECURE AND EFFICIENT CROSS BORDER E PAYMENTS AND DATA FLOWS WE HAVE TO DEVELOP NEW E TRADE REGULATIONS SINGAPOLE HAS CONCLUDED DIGITAL ECONOMY AGREEMENTS WITH LIKE MINDED COUNTRIES LIKE AUSTRALIA CHILLE AND NEW ZEALAN WE HOPE THAT THIS IS ONLY THE BEGINNING WE ENCOURAGE ALL COUNTRIES TO COME TOGETHER TO SHAPE AND GROW THE DIGITAL ECONOMY GLOBLY THUSSIGNING OF THE REGINAL COMPREHENSIVE ECONOMIC PARTNERSHIP OR THE ARLC E P LAST
TIA
BY FIFTEEN COUNTRIES IN ASIA WAS ALSO A MAJOR COLLECTIVE COMMITMENT TO TRADE AN ECONOMIC INTEGRATION AMIDST THE PANDEMIC THE ARC P WILL BROADEN TRAN OPEN AT MARKETS IN EAST AND SOUTH EASTATIA AND AUSTRALASIA AND HOPEFULLY PREVENT THE PUSH FOR RASILIANTS AND SELF RELIANCE FROM GOING TOO FAR WHILE DEALING WITH THE AFTERMATH OF COVET NINETEEN WE MUST NOT LOSE SIGHT OF OTHER LONG TERM CHALLENGES THAT AFFECT ALL OF US ONE MAJOR PROBLEM IS CLIMATE CHANGE TWENTY TWENTY WAS THE WORLD'S HOTTESEER ON RECORD EXTREME WEATHER EVENTS HAVE BECOME MUCH MORE FREQUENT LAST YIAR CARBON EMISSIONS WENT DOWN BUT ONLY BECAUSE OF COVET NINETEEN OTHERWISE THE TREN HAS BEEN INEXORABLY UPWARDS CLIMATE CHANGE IS CLEARLY EX
CELERATING
DANGEROUSLY AND IT IS LATE IN THE DAY BUT IF COUNTRIES ACT NOW AND IN CONCERT HUMAN KIND CAN STILL HOPE TO AVERT A CATASTROPHE WE ALL KNOW WHAT WE NEED TO DO WITHIN INDIVIDUAL COUNTRIES TO MUSTER SUPPORT FOR POLICIES AND MEASURES THAT WILL SLOW THE CHANGES AND LIMIT GLOBLE WARNING COLLECTIVELY TO SET HIGHER COMMON STANDARDS AND HOLD ONE ANOTHER TO OUR MUTUAL COMMITMENTS WHETHER IT'S TIGHTENING EMISSION RULES FASING OUT FOSSILFIELD SUBSIDIES OR PROMOTING RENEWABLE ENERGY WE CAN TAKE SOME COMFORT THAT COUNTRIES ARE NOW TAKING CLIMATE CHANGE MORE SERIOUSLY THE US HAS REJOINED THE PARIS AGREEMENT CHINA HAS ANNOUNCED A ZERO EMISSIONS TARGET BY TWENTY SIXLY BUT MUCH MORE STILL NEEDS TO BE DONE GOING BEYOND OUR PARIS COMMITMENTS OTHERWISE
WE RISK GRAVE CONSEQUENCES IN THE NOT TOO DISTANT FUTURE EVEN WITHIN OUR OWN LIFETIMES TO TACKLE THESE CHALLENGES COVET NINETEEN ECONOMIC RECOVERY AND CLIMATE CHANGE GLOBA CO OPERATION IS ESSENTIAL BUT GETTING COUNTRIES TO WORK TOGETHER IS NOT SIMPLY A MATTER OF NURTURING AND SHOWING GOODWILL THE INTERNATIONAL ORDER MUST BE UNDERPINNED BY STABLE GREAT POWER RELATIONS BIG COUNTRIES NATURALLY JOSTLE AND COMPETE WITH ONE ANOTHER FOR INFLUENCE AND POWER BUT THEY ALSO NEED TO WORK WITH ONE ANOTHER THROUGH ESTABLISH AN ACCEPTED RULES AND NORMS ON ISSUES WHICH AFFECT US ALL BE IT PANDAMICS ECONOMIC CO OPERATION OR CLIMATE CHANGE RECENT YEARS HAVE WITNESSED GROWING FRICTION AND DISTRUST RATHER THAN CO OPERATION AND CONFIDENCE BUILDING AMONG MAJOR POWERS THE MOST WORRYING TREND I
S THE W S CHINA RELATIONS THIS REMAINS THE MOST IMPORTANT BYLATURAL RELATIONSHIP FOR THE WORLD IN THE YEARS AHEAD OVER THE LAST FOUR YEARS TENSIONS BETWEEN THE U S AND CHINA HAVE INTENSIFIED SHARPLY BOTH POWERS HAVE ADOPTED MORE SERTIVE AND UNCOMPROMISING POSTURES THE U S NOW SEES CHINAS AS STRATEGIC RIVAL AND CHALLENGER TO ITS PRE EMINENT POSITION AND CHINA IS VIGOROUSLY ASSERTING WHAT IT CONSIDERS ITS RIGHTFUL PLACE IN THE WORLD ON BOTH SIDES DOMESTIC PRESSURES TO HARDEN THEIR EXTERNAL POSITIONS ARE CONSIDERABLE AND MODERATE VOICES HAVE BEEN MARGINALIZE GIVEN THE ENORMOUS STAKES DIFFICULT AS IT WILL BE IT CANNOT POSSIBLY BE TOO LATE FOR THE U S AND CHINA TO RESET THE TONE OF THEIR INTERACTIONS AND AVERT A CLASH
BETWEEN THEM WHICH WILL BECOME A GENERATIONAL TWILIGHT STRUGGLE THE NEW US ADMINISTRATION IS AN OPPORTUNITY TO STEER THE RELATIONSHIP TOWARDS SAFER WATERS AMID PRESIDENT BIDONS MANY URGENT PREOCCUPATIONS THE U S CHINA RELATIONSHIP SHOULD BECOME A KEY STRATEGIC PRORITY TO BILD A STABLE INTERNATIONAL ORDER REGULAR CONSTRUCTIVE DIALOGUE IS CRITICAL IAM THUS HAPPY TO SEE MANY DISTINGUISHED PARTICIPANTS TAKING PART IN THE DAVOSA GENDER WEEK THE WORLD ECONOMIC FORUM PLAYS AN IMPORTANT RULE PROMOTING DIALOGUE BRINGING TOGETHER LEADERS IN GOVERNMENT INDUSTRY AND CIVIL SOCIETY IT'S A FORUM WHERE LEADERS FROM COUNTRIES LARGE AND SMALLER LIKE CAN SPEAK AND BE HEARD AND THIS IS WHY WHEN PROFESSOR SHROB ASK ME WHETHER SINGO POET HOSTESS SPECIAL ANNUAL MEETING OF THE W E F I AGREED IT WAS NOT
A DECISION LIGHTLY TAKEN BUT WE ARE HAPPY TO MAKE A MODEST CONTRIBUTION TO THE GLOBAL DISCUSSION AS THE HOST COUNTRY WE WILL WORK WITH A W E F TO ENSURE THE HEALTH AND SAFETY OF ALL I WELCOME ALL OF YOU TO SING A POR IN MAY SO THAT WE CAN TAKE THESE DISCUSSIONS FORWARD AND FORGE A NEW PARTH AHEAD TOGETHER THANK YOU

As per the first two trials, the parts in bold indicate the problematic areas. The speech had a number of technical terms which clearly tripped up the model, as did the PM’s pronunciation of some words.

A lot more cleaning up is required this time. But this took all of 2 minute 4s. Just think about the amount of time one could save from manually transcribing the speech.

The Wav2Vec2 transcripts aren’t perfect by any means. The lack of punctuation could be an issue for some use cases, and the management of audio clips would require additional resources.

But all things considered, I would say the Wav2Vec2 model opens up exciting new possibilities in NLP, beyond just fast and accurate transcripts. I’m particularly excited by the prospects of using the transcripts as inputs for translation, sentiment analysis, or summaries via other transformer models.

This could get really interesting.

As always, if you spot mistakes in this or any of my earlier posts, ping me at:

The repo for this post, containing the data and notebooks for the charts, can be found here.

--

--