top of page

PP-OCRV5

Key Features:

  • Multimodal Input (Image/Video/Audio): Speech-to-text integration for audio context.

  • Live Video OCR with Tracking: Real-time text recognition + object tracking.

  • NLP-Enhanced Output: Semantic tagging (e.g., "signature block", "total amount").

  • Diffusion Pre-Processing: Restores heavily degraded text (e.g., old scans).

  • GDPR Compliance: Built-in redaction for sensitive data.



Model Deployment Status:

General Availability Yes (Enterprise-only)

 

Supported Data Types for Input Image, Video, Audio, Live Feeds

 

Supported Data Types for Output Text + Semantic Annotations

 

Supported # Tokens for Input Dynamic (cloud-scale)

 

Supported # Tokens for Output 16k (context-aware)

 

Knowledge Cutoff June 2024

 

Tool Use Search as a tool, Code execution



Best For:

  • Legal contract analysis

  • Surveillance video analytics

  • Availability:

    • Gemini API

    • AWS Bedrock

bottom of page