top of page

PP-Structure v3

Key Features:

  • Live Video OCR with Tracking: Real-time text recognition + object tracking in videos.

  • Multimodal Input (Image/Video/Audio): Speech-to-text integration for contextual analysis.

  • GDPR Compliance: Built-in redaction for sensitive data (e.g., IDs, bank numbers).

  • Cloud-Scale Processing: Dynamic token handling for large documents.

  • Tool Integration: Search as a tool, code execution for custom workflows.



Model Deployment Status:



General Availability Yes (Enterprise-only)

 

Supported Data Types for Input Image, Video, Audio, Live Feeds

 

Supported Data Types for Output Text + Semantic Annotations

 

Supported # Tokens for Output 16k (context-aware)

 

Knowledge Cutoff June 2024



Best For:

  • Surveillance video analytics

  • Automated compliance reporting


Availability:

  • Gemini API

  • AWS Bedrock


PP-Structure v3
bottom of page