Engines#
PanoOCR uses dependency injection for OCR engines. Provide any object with a matching recognize() method.
OCREngine Protocol#
OCREngine
#
Bases: Protocol
Protocol for OCR engines (structural typing).
Any class with a matching recognize() method can be used.
No inheritance required.
recognize
#
Recognize text in an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
list[FlatOCRResult]
|
List of FlatOCRResult objects with normalized bounding boxes (0-1 range). |
Structured Engines#
Structured engines return per-word bounding boxes, enabling geographic text indexing.
MacOCREngine#
Uses Apple's Vision Framework for fast, accurate OCR on macOS. Requires the [macocr] extra.
MacOCREngine
#
OCR engine using Apple Vision Framework via ocrmac.
This engine uses macOS's built-in Vision Framework for text recognition. It provides excellent accuracy for many languages on Apple Silicon.
Attributes:
| Name | Type | Description |
|---|---|---|
language_preference |
List of language codes to use for recognition. |
|
recognition_level |
Recognition accuracy level ("fast" or "accurate"). |
Example
from panoocr.engines.macocr import MacOCREngine, MacOCRLanguageCode
engine = MacOCREngine(config={ ... "language_preference": [MacOCRLanguageCode.ENGLISH_US], ... "recognition_level": MacOCRRecognitionLevel.ACCURATE, ... }) results = engine.recognize(image)
Note
Requires macOS and the ocrmac package. Install with: pip install "panoocr[macocr]"
Initialize the MacOCR engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Dict[str, Any] | None
|
Configuration dictionary with optional keys: - language_preference: List of MacOCRLanguageCode values. - recognition_level: MacOCRRecognitionLevel value. |
None
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If ocrmac is not installed. |
ValueError
|
If configuration values are invalid. |
Source code in src/panoocr/engines/macocr.py
recognize
#
Recognize text in an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
List[FlatOCRResult]
|
List of FlatOCRResult with normalized bounding boxes. |
Source code in src/panoocr/engines/macocr.py
RapidOCREngine#
PaddleOCR PP-OCRv4/v5 models via ONNX Runtime. Bypasses PaddlePaddle framework (avoids Apple Silicon issues). Supports v4 and v5 model versions. Requires the [rapidocr] extra.
RapidOCREngine
#
OCR engine using RapidOCR (PP-OCRv4/v5 via ONNX Runtime).
Attributes:
| Name | Type | Description |
|---|---|---|
ocr_version |
Which PP-OCR model version to use ("PP-OCRv4" or "PP-OCRv5"). |
Example
from panoocr.engines.rapidocr_engine import RapidOCREngine
engine_v4 = RapidOCREngine() engine_v5 = RapidOCREngine(config={"ocr_version": "PP-OCRv5"}) results = engine_v4.recognize(image)
Note
Install with: pip install "panoocr[rapidocr]" onnxruntime
Source code in src/panoocr/engines/rapidocr_engine.py
recognize
#
Recognize text in an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
List[FlatOCRResult]
|
List of FlatOCRResult with normalized bounding boxes. |
Source code in src/panoocr/engines/rapidocr_engine.py
EasyOCREngine#
Cross-platform OCR supporting 80+ languages. Requires the [easyocr] extra.
EasyOCREngine
#
OCR engine using EasyOCR library.
EasyOCR supports 80+ languages and can run on CPU or GPU. It provides good accuracy for many scripts including CJK.
Attributes:
| Name | Type | Description |
|---|---|---|
language_preference |
List of language codes to use. |
|
reader |
EasyOCR Reader instance. |
Example
from panoocr.engines.easyocr import EasyOCREngine, EasyOCRLanguageCode
engine = EasyOCREngine(config={ ... "language_preference": [EasyOCRLanguageCode.ENGLISH], ... "gpu": True, ... }) results = engine.recognize(image)
Note
Install with: pip install "panoocr[easyocr]" For GPU support, install PyTorch with CUDA.
Initialize the EasyOCR engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Dict[str, Any] | None
|
Configuration dictionary with optional keys: - language_preference: List of EasyOCRLanguageCode values. - gpu: Whether to use GPU (default: True). |
None
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If easyocr is not installed. |
ValueError
|
If configuration values are invalid. |
Source code in src/panoocr/engines/easyocr.py
recognize
#
Recognize text in an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
List[FlatOCRResult]
|
List of FlatOCRResult with normalized bounding boxes. |
Source code in src/panoocr/engines/easyocr.py
PaddleOCREngine#
PaddlePaddle-based OCR supporting multiple languages with automatic model management. Requires the [paddleocr] extra (includes both paddleocr and paddlepaddle).
PaddleOCREngine
#
OCR engine using PaddleOCR library (v3.x).
PaddleOCR is developed by PaddlePaddle and supports multiple languages. It provides good accuracy with automatic model management.
Attributes:
| Name | Type | Description |
|---|---|---|
language_preference |
Language code for recognition. |
|
recognize_upside_down |
Whether to use textline orientation classifier. |
|
ocr_version |
The PP-OCR model version to use. |
Example
from panoocr.engines.paddleocr import PaddleOCREngine, PaddleOCRLanguageCode
engine = PaddleOCREngine(config={ ... "language_preference": PaddleOCRLanguageCode.CHINESE, ... }) results = engine.recognize(image)
Note
Install with: pip install "panoocr[paddleocr]" paddlepaddle For GPU support, install paddlepaddle-gpu instead.
Initialize the PaddleOCR engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Dict[str, Any] | None
|
Configuration dictionary with optional keys: - language_preference: PaddleOCRLanguageCode value. - recognize_upside_down: Enable textline orientation classifier (default: False). - ocr_version: PaddleOCRVersion or string like "PP-OCRv5" (default: auto-selected by PaddleOCR based on language). - text_detection_model_name: Override detection model name. - text_recognition_model_name: Override recognition model name. - text_det_limit_side_len: Max side length for detection input. - text_rec_score_thresh: Minimum recognition score threshold. |
None
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If paddleocr or paddlepaddle is not installed. |
ValueError
|
If configuration values are invalid. |
Source code in src/panoocr/engines/paddleocr.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
recognize
#
Recognize text in an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
List[FlatOCRResult]
|
List of FlatOCRResult with normalized bounding boxes. |
Source code in src/panoocr/engines/paddleocr.py
GoogleVisionEngine#
Google Cloud Vision API (TEXT_DETECTION). Uses REST API with an API key. Requires the [google-vision] extra and GOOGLE_VISION_API_KEY environment variable.
GoogleVisionEngine
#
OCR engine using Google Cloud Vision API (TEXT_DETECTION).
Uses the REST API with an API key. Set GOOGLE_VISION_API_KEY in environment or .env file, or pass via config.
Example
from panoocr.engines.google_vision import GoogleVisionEngine
engine = GoogleVisionEngine() results = engine.recognize(image)
Note
Install with: pip install "panoocr[google-vision]" Requires GOOGLE_VISION_API_KEY environment variable.
Source code in src/panoocr/engines/google_vision.py
recognize
#
Recognize text in an image via Google Cloud Vision API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
List[FlatOCRResult]
|
List of FlatOCRResult with normalized bounding boxes. |
Source code in src/panoocr/engines/google_vision.py
Florence2OCREngine#
Microsoft's Florence-2 vision-language model via transformers + torch. Requires the [florence2] extra.
Florence2OCREngine
#
OCR engine using Microsoft's Florence-2 model.
Florence-2 is a vision-language model that can perform OCR with region detection. It provides good accuracy across many languages and can detect text in various orientations.
Attributes:
| Name | Type | Description |
|---|---|---|
device |
Device to run inference on (cuda, mps, or cpu). |
|
model |
The Florence-2 model. |
|
processor |
The Florence-2 processor. |
Example
from panoocr.engines.florence2 import Florence2OCREngine
engine = Florence2OCREngine(config={ ... "model_id": "microsoft/Florence-2-large", ... }) results = engine.recognize(image)
Note
Install with: pip install "panoocr[florence2]" For GPU support, install PyTorch with CUDA.
Initialize the Florence-2 engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Dict[str, Any] | None
|
Configuration dictionary with optional keys: - model_id: HuggingFace model ID (default: "microsoft/Florence-2-large"). - device: Device to use ("cuda", "mps", "cpu", or None for auto). |
None
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If dependencies are not installed. |
Source code in src/panoocr/engines/florence2.py
recognize
#
Recognize text in an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
List[FlatOCRResult]
|
List of FlatOCRResult with normalized bounding boxes. |
Source code in src/panoocr/engines/florence2.py
Florence2MLXEngine#
Florence-2 via mlx-vlm with <OCR_WITH_REGION> structured output. The only VLM engine that returns per-word bounding boxes. Requires macOS Apple Silicon with the [mlx-vlm] extra, plus torch and torchvision.
Florence2MLXEngine
#
OCR engine using Florence-2 via mlx-vlm with
This is a structured engine -- it returns per-word bounding boxes.
Example
from panoocr.engines.florence2_mlx import Florence2MLXEngine engine = Florence2MLXEngine() results = engine.recognize(image)
Note
Install with: pip install "panoocr[mlx-vlm]" torch
Source code in src/panoocr/engines/florence2_mlx.py
recognize
#
Source code in src/panoocr/engines/florence2_mlx.py
Unstructured Engines#
Unstructured engines return text without bounding boxes. Each detection gets a full-image bounding box for crop-level attribution in the panoocr pipeline.
GeminiEngine#
Google Gemini API (Gemini 2.5 Flash / Pro). Requires the [gemini] extra and GOOGLE_GEMINI_API_KEY environment variable.
GeminiEngine
#
OCR engine using Gemini API with spatial grounding.
Returns structured text with per-detection bounding boxes by default. Set config "use_bbox" to False for plain text mode (crop-level attribution).
Example
from panoocr.engines.gemini import GeminiEngine engine = GeminiEngine(config={"model": "gemini-2.5-flash"}) results = engine.recognize(image)
Note
Install with: pip install "panoocr[gemini]" Requires GOOGLE_GEMINI_API_KEY environment variable.
Source code in src/panoocr/engines/gemini.py
recognize
#
Source code in src/panoocr/engines/gemini.py
GlmOCREngine#
GLM-OCR (0.9B) via mlx-vlm. Document-focused VLM with limited effectiveness on scene text. Requires macOS Apple Silicon with the [mlx-vlm] extra.
GlmOCREngine
#
OCR engine using GLM-OCR (0.9B) via mlx-vlm.
This is an unstructured engine -- it returns text without bounding boxes. Each text line gets a full-image bounding box for crop-level attribution.
Example
from panoocr.engines.glm_ocr import GlmOCREngine engine = GlmOCREngine() results = engine.recognize(image)
Note
Install with: pip install "panoocr[mlx-vlm]" torch
Source code in src/panoocr/engines/glm_ocr.py
recognize
#
Source code in src/panoocr/engines/glm_ocr.py
DotsOCREngine#
DOTS.OCR (2.9B) via mlx-vlm. Document layout parser with limited effectiveness on scene text. Requires macOS Apple Silicon with the [mlx-vlm] extra.
DotsOCREngine
#
OCR engine using DOTS.OCR (2.9B) via mlx-vlm.
This is an unstructured engine -- it returns text without bounding boxes. Each text line gets a full-image bounding box for crop-level attribution.
Example
from panoocr.engines.dots_ocr import DotsOCREngine engine = DotsOCREngine() results = engine.recognize(image)
Note
Install with: pip install "panoocr[mlx-vlm]" torch
Source code in src/panoocr/engines/dots_ocr.py
recognize
#
Source code in src/panoocr/engines/dots_ocr.py
TrOCREngine (experimental)#
Microsoft's TrOCR transformer-based OCR. Requires the [trocr] extra.
TrOCREngine
#
OCR engine using Microsoft's TrOCR model.
TrOCR is a transformer-based OCR model that excels at single-line text recognition. It does NOT provide bounding boxes - it reads the entire image as a single text line.
WARNING: This engine is experimental and may not work well for panorama OCR since it doesn't detect text regions. Consider using Florence2OCREngine or other engines that provide region detection.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
The TrOCR model. |
|
processor |
The TrOCR processor. |
Example
from panoocr.engines.trocr import TrOCREngine, TrOCRModel
engine = TrOCREngine(config={ ... "model": TrOCRModel.MICROSOFT_TROCR_LARGE_PRINTED, ... })
Note: Returns single result for entire image#
results = engine.recognize(cropped_text_image)
Note
Install with: pip install "panoocr[trocr]" For GPU support, install PyTorch with CUDA.
Initialize the TrOCR engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Dict[str, Any] | None
|
Configuration dictionary with optional keys: - model: TrOCRModel enum value or model ID string. - device: Device to use ("cuda", "mps", "cpu", or None for auto). |
None
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If dependencies are not installed. |
ValueError
|
If configuration values are invalid. |
Source code in src/panoocr/engines/trocr.py
recognize
#
Recognize text in an image.
NOTE: TrOCR treats the entire image as a single text line and does not provide bounding boxes. This makes it unsuitable for most panorama OCR use cases. The result will have a bounding box covering the entire image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input image as PIL Image. |
required |
Returns:
| Type | Description |
|---|---|
List[FlatOCRResult]
|
List with single FlatOCRResult covering the entire image, or empty |
List[FlatOCRResult]
|
list if no text is recognized. |
Source code in src/panoocr/engines/trocr.py
Custom Engines#
Any class with a compatible recognize() method works:
from panoocr import PanoOCR, FlatOCRResult, BoundingBox
from PIL import Image
class MyEngine:
def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
# Return list of FlatOCRResult with normalized bounding boxes (0-1)
return [
FlatOCRResult(
text="Hello",
confidence=0.95,
bounding_box=BoundingBox(
left=0.1, top=0.2, right=0.4, bottom=0.3,
width=0.3, height=0.1
),
engine="my_engine",
)
]
pano = PanoOCR(engine=MyEngine())