AssemblyAI has introduced vital upgrades to its PII Redaction and Entity Detection options, aimed toward enhancing knowledge safety and extracting key insights from audio transcripts. In line with AssemblyAI, the newest updates embrace assist for PII Textual content Redaction throughout 47 languages and the addition of 16 new entity sorts to its Entity Detection mannequin, bringing the overall to 44.
Enhanced PII Redaction Capabilities
The up to date PII Textual content Redaction function now helps 47 languages, guaranteeing complete safety of personally identifiable info (PII) throughout numerous areas. This improve permits customers to establish and take away delicate knowledge similar to addresses, telephone numbers, and bank card particulars from their transcripts. Moreover, customers can generate transcripts with PII eliminated or use the instrument to “beep out” delicate info in audio recordsdata.
An instance of tips on how to use the API for PII redaction is offered by AssemblyAI:
import assemblyai as aai
aai.settings.api_key = "YOUR API KEY"
audio_url = "https://github.com/AssemblyAI-Group/audio-examples/uncooked/foremost/20230607_me_canadian_wildfires.mp3"
config = aai.TranscriptionConfig(speaker_labels=True).set_redact_pii(
insurance policies=[
aai.PIIRedactionPolicy.person_name,
aai.PIIRedactionPolicy.organization,
aai.PIIRedactionPolicy.occupation,
],
substitution=aai.PIISubstitutionPolicy.hash,
)
transcript = aai.Transcriber().transcribe(audio_url, config)
for utterance in transcript.utterances:
print(f"Speaker {utterance.speaker}: {utterance.textual content}")
print(transcript.textual content)
Customers can discuss with AssemblyAI’s documentation for extra detailed examples and an in-depth dive into the updates.
Expanded Entity Detection
The Entity Detection mannequin has been upgraded with 16 new entity sorts, permitting for the automated identification and categorization of crucial info in transcripts. This brings the overall variety of supported entity sorts to 44, which incorporates names, organizations, addresses, and extra. The mannequin ensures 99% accuracy in main languages, making it a strong instrument for extracting precious insights from audio knowledge.
An instance of tips on how to use the API for Entity Detection can also be offered:
import assemblyai as aai
aai.settings.api_key = "YOUR API KEY"
audio_url = "https://github.com/AssemblyAI-Group/audio-examples/uncooked/foremost/20230607_me_canadian_wildfires.mp3"
config = aai.TranscriptionConfig(entity_detection=True)
transcript = aai.Transcriber().transcribe(audio_url, config)
for entity in transcript.entities:
print(entity.textual content)
print(entity.entity_type)
print(f"Timestamp: {entity.begin} - {entity.finish}n")
Extra Sources
AssemblyAI has additionally shared a number of new weblog posts and tutorials to assist customers get probably the most out of their merchandise. Subjects embrace utilizing Claude 3.5 Sonnet with audio knowledge, understanding Microsoft’s Florence-2 picture mannequin, and making a real-time language translation service with AssemblyAI and DeepL in JavaScript.
For extra info on these updates and to discover extra sources, go to AssemblyAI’s official weblog.
Picture supply: Shutterstock