Engines#

PanoOCR uses dependency injection for OCR engines. Provide any object with a matching recognize() method.

OCREngine Protocol#

OCREngine #

Bases: Protocol

Protocol for OCR engines (structural typing).

Any class with a matching recognize() method can be used. No inheritance required.

recognize #

recognize(image: Image) -> list[FlatOCRResult]

Recognize text in an image.

Parameters:

Name Type Description Default
image Image

Input image as PIL Image.

required

Returns:

Type Description
list[FlatOCRResult]

List of FlatOCRResult objects with normalized bounding boxes (0-1 range).

Source code in src/panoocr/api/models.py
def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
    """Recognize text in an image.

    Args:
        image: Input image as PIL Image.

    Returns:
        List of FlatOCRResult objects with normalized bounding boxes (0-1 range).
    """
    ...

MacOCREngine#

Uses Apple's Vision Framework for fast, accurate OCR on macOS. Requires the [macocr] extra.

pip install "panoocr[macocr]"

MacOCREngine #

MacOCREngine(config: Dict[str, Any] | None = None)

OCR engine using Apple Vision Framework via ocrmac.

This engine uses macOS's built-in Vision Framework for text recognition. It provides excellent accuracy for many languages on Apple Silicon.

Attributes:

Name Type Description
language_preference

List of language codes to use for recognition.

recognition_level

Recognition accuracy level ("fast" or "accurate").

Example

from panoocr.engines.macocr import MacOCREngine, MacOCRLanguageCode

engine = MacOCREngine(config={ ... "language_preference": [MacOCRLanguageCode.ENGLISH_US], ... "recognition_level": MacOCRRecognitionLevel.ACCURATE, ... }) results = engine.recognize(image)

Note

Requires macOS and the ocrmac package. Install with: pip install "panoocr[macocr]"

Initialize the MacOCR engine.

Parameters:

Name Type Description Default
config Dict[str, Any] | None

Configuration dictionary with optional keys: - language_preference: List of MacOCRLanguageCode values. - recognition_level: MacOCRRecognitionLevel value.

None

Raises:

Type Description
ImportError

If ocrmac is not installed.

ValueError

If configuration values are invalid.

Source code in src/panoocr/engines/macocr.py
def __init__(self, config: Dict[str, Any] | None = None) -> None:
    """Initialize the MacOCR engine.

    Args:
        config: Configuration dictionary with optional keys:
            - language_preference: List of MacOCRLanguageCode values.
            - recognition_level: MacOCRRecognitionLevel value.

    Raises:
        ImportError: If ocrmac is not installed.
        ValueError: If configuration values are invalid.
    """
    # Check dependencies first
    _check_macocr_dependencies()

    config = config or {}

    # Parse language preference
    language_preference = config.get(
        "language_preference", DEFAULT_LANGUAGE_PREFERENCE
    )
    try:
        self.language_preference = [
            lang.value if isinstance(lang, MacOCRLanguageCode) else lang
            for lang in language_preference
        ]
    except (KeyError, AttributeError):
        raise ValueError("Invalid language code in language_preference")

    # Parse recognition level
    recognition_level = config.get("recognition_level", DEFAULT_RECOGNITION_LEVEL)
    try:
        self.recognition_level = (
            recognition_level.value
            if isinstance(recognition_level, MacOCRRecognitionLevel)
            else recognition_level
        )
    except (KeyError, AttributeError):
        raise ValueError("Invalid recognition level")

recognize #

recognize(image: Image) -> List[FlatOCRResult]

Recognize text in an image.

Parameters:

Name Type Description Default
image Image

Input image as PIL Image.

required

Returns:

Type Description
List[FlatOCRResult]

List of FlatOCRResult with normalized bounding boxes.

Source code in src/panoocr/engines/macocr.py
def recognize(self, image: Image.Image) -> List[FlatOCRResult]:
    """Recognize text in an image.

    Args:
        image: Input image as PIL Image.

    Returns:
        List of FlatOCRResult with normalized bounding boxes.
    """
    import ocrmac.ocrmac

    annotations = ocrmac.ocrmac.OCR(
        image,
        recognition_level=self.recognition_level,
        language_preference=self.language_preference,
    ).recognize()

    mac_ocr_results = [
        MacOCRResult(
            text=annotation[0],
            confidence=annotation[1],
            bounding_box=annotation[2],
        )
        for annotation in annotations
    ]

    return [result.to_flat() for result in mac_ocr_results]

EasyOCREngine#

Cross-platform OCR supporting 80+ languages. Requires the [easyocr] extra.

pip install "panoocr[easyocr]"

EasyOCREngine #

EasyOCREngine(config: Dict[str, Any] | None = None)

OCR engine using EasyOCR library.

EasyOCR supports 80+ languages and can run on CPU or GPU. It provides good accuracy for many scripts including CJK.

Attributes:

Name Type Description
language_preference

List of language codes to use.

reader

EasyOCR Reader instance.

Example

from panoocr.engines.easyocr import EasyOCREngine, EasyOCRLanguageCode

engine = EasyOCREngine(config={ ... "language_preference": [EasyOCRLanguageCode.ENGLISH], ... "gpu": True, ... }) results = engine.recognize(image)

Note

Install with: pip install "panoocr[easyocr]" For GPU support, install PyTorch with CUDA.

Initialize the EasyOCR engine.

Parameters:

Name Type Description Default
config Dict[str, Any] | None

Configuration dictionary with optional keys: - language_preference: List of EasyOCRLanguageCode values. - gpu: Whether to use GPU (default: True).

None

Raises:

Type Description
ImportError

If easyocr is not installed.

ValueError

If configuration values are invalid.

Source code in src/panoocr/engines/easyocr.py
def __init__(self, config: Dict[str, Any] | None = None) -> None:
    """Initialize the EasyOCR engine.

    Args:
        config: Configuration dictionary with optional keys:
            - language_preference: List of EasyOCRLanguageCode values.
            - gpu: Whether to use GPU (default: True).

    Raises:
        ImportError: If easyocr is not installed.
        ValueError: If configuration values are invalid.
    """
    # Check dependencies first
    _check_easyocr_dependencies()

    import easyocr

    config = config or {}

    # Parse language preference
    language_preference = config.get(
        "language_preference", DEFAULT_LANGUAGE_PREFERENCE
    )
    try:
        self.language_preference = [
            lang.value if isinstance(lang, EasyOCRLanguageCode) else lang
            for lang in language_preference
        ]
    except (KeyError, AttributeError):
        raise ValueError("Invalid language code in language_preference")

    # Parse GPU setting
    use_gpu = config.get("gpu", True)

    # Initialize reader
    self.reader = easyocr.Reader(self.language_preference, gpu=use_gpu)

recognize #

recognize(image: Image) -> List[FlatOCRResult]

Recognize text in an image.

Parameters:

Name Type Description Default
image Image

Input image as PIL Image.

required

Returns:

Type Description
List[FlatOCRResult]

List of FlatOCRResult with normalized bounding boxes.

Source code in src/panoocr/engines/easyocr.py
def recognize(self, image: Image.Image) -> List[FlatOCRResult]:
    """Recognize text in an image.

    Args:
        image: Input image as PIL Image.

    Returns:
        List of FlatOCRResult with normalized bounding boxes.
    """
    image_array = np.array(image)
    annotations = self.reader.readtext(image_array)

    easyocr_results = []
    for annotation in annotations:
        bounding_box = annotation[0]
        text = annotation[1]
        confidence = annotation[2]

        easyocr_results.append(
            EasyOCRResult(
                text=text,
                confidence=confidence,
                bounding_box=bounding_box,
                image_width=image.width,
                image_height=image.height,
            )
        )

    return [result.to_flat() for result in easyocr_results]

PaddleOCREngine#

PaddlePaddle-based OCR with optional V4 server model for Chinese text. Requires the [paddleocr] extra.

pip install "panoocr[paddleocr]"

PaddleOCREngine #

PaddleOCREngine(config: Dict[str, Any] | None = None)

OCR engine using PaddleOCR library.

PaddleOCR is developed by PaddlePaddle and supports multiple languages. It provides good accuracy and can optionally use the V4 server model for better results on Chinese text.

Attributes:

Name Type Description
language_preference

Language code for recognition.

recognize_upside_down

Whether to use angle classifier.

use_v4_server

Whether to use the V4 server model.

Example

from panoocr.engines.paddleocr import PaddleOCREngine, PaddleOCRLanguageCode

engine = PaddleOCREngine(config={ ... "language_preference": PaddleOCRLanguageCode.CHINESE, ... "use_gpu": True, ... }) results = engine.recognize(image)

Note

Install with: pip install "panoocr[paddleocr]" For GPU support, install paddlepaddle-gpu.

Initialize the PaddleOCR engine.

Parameters:

Name Type Description Default
config Dict[str, Any] | None

Configuration dictionary with optional keys: - language_preference: PaddleOCRLanguageCode value. - recognize_upside_down: Enable angle classifier (default: False). - use_v4_server: Use V4 server model for better Chinese OCR. - use_gpu: Whether to use GPU (default: True). - model_dir: Custom directory for V4 server models.

None

Raises:

Type Description
ImportError

If paddleocr is not installed.

ValueError

If configuration values are invalid.

Source code in src/panoocr/engines/paddleocr.py
def __init__(self, config: Dict[str, Any] | None = None) -> None:
    """Initialize the PaddleOCR engine.

    Args:
        config: Configuration dictionary with optional keys:
            - language_preference: PaddleOCRLanguageCode value.
            - recognize_upside_down: Enable angle classifier (default: False).
            - use_v4_server: Use V4 server model for better Chinese OCR.
            - use_gpu: Whether to use GPU (default: True).
            - model_dir: Custom directory for V4 server models.

    Raises:
        ImportError: If paddleocr is not installed.
        ValueError: If configuration values are invalid.
    """
    # Check dependencies first
    _check_paddleocr_dependencies()

    from paddleocr import PaddleOCR

    config = config or {}

    # Parse language preference
    language = config.get("language_preference", DEFAULT_LANGUAGE)
    try:
        self.language_preference = (
            language.value if isinstance(language, PaddleOCRLanguageCode) else language
        )
    except (KeyError, AttributeError):
        raise ValueError("Invalid language code")

    # Parse other settings
    self.recognize_upside_down = config.get(
        "recognize_upside_down", DEFAULT_RECOGNIZE_UPSIDE_DOWN
    )
    if not isinstance(self.recognize_upside_down, bool):
        raise ValueError("recognize_upside_down must be a boolean")

    self.use_v4_server = config.get("use_v4_server", False)
    if not isinstance(self.use_v4_server, bool):
        raise ValueError("use_v4_server must be a boolean")

    use_gpu = config.get("use_gpu", True)
    self.model_dir = config.get("model_dir", "./models")

    # Initialize OCR engine
    if not self.use_v4_server:
        self.ocr = PaddleOCR(
            use_angle_cls=self.recognize_upside_down,
            lang=self.language_preference,
            use_gpu=use_gpu,
        )
    else:
        # Download and setup V4 server models
        self._download_v4_server_models()

        model_base = Path(self.model_dir) / "PP-OCRv4" / "chinese"
        self.ocr = PaddleOCR(
            use_angle_cls=self.recognize_upside_down,
            det_model_dir=str(model_base / "ch_PP-OCRv4_det_server_infer"),
            det_algorithm="DB",
            rec_model_dir=str(model_base / "ch_PP-OCRv4_rec_server_infer"),
            rec_algorithm="CRNN",
            cls_model_dir=str(model_base / "ch_ppocr_mobile_v2.0_cls_slim_infer"),
            use_gpu=use_gpu,
        )

recognize #

recognize(image: Image) -> List[FlatOCRResult]

Recognize text in an image.

Parameters:

Name Type Description Default
image Image

Input image as PIL Image.

required

Returns:

Type Description
List[FlatOCRResult]

List of FlatOCRResult with normalized bounding boxes.

Source code in src/panoocr/engines/paddleocr.py
def recognize(self, image: Image.Image) -> List[FlatOCRResult]:
    """Recognize text in an image.

    Args:
        image: Input image as PIL Image.

    Returns:
        List of FlatOCRResult with normalized bounding boxes.
    """
    image_array = np.array(image)

    # Use slicing for large images
    slice_config = {
        "horizontal_stride": 300,
        "vertical_stride": 500,
        "merge_x_thres": 50,
        "merge_y_thres": 35,
    }

    annotations = self.ocr.ocr(image_array, cls=True, slice=slice_config)

    paddle_results = []
    for annotation in annotations:
        if not isinstance(annotation, list):
            continue

        for res in annotation:
            bounding_box = res[0]
            text = res[1][0]
            confidence = res[1][1]

            paddle_results.append(
                PaddleOCRResult(
                    text=text,
                    confidence=confidence,
                    bounding_box=bounding_box,
                    image_width=image.width,
                    image_height=image.height,
                    use_v4_server=self.use_v4_server,
                )
            )

    return [result.to_flat() for result in paddle_results]

Florence2OCREngine#

Microsoft's Florence-2 vision-language model for OCR. Requires the [florence2] extra.

pip install "panoocr[florence2]"

Florence2OCREngine #

Florence2OCREngine(config: Dict[str, Any] | None = None)

OCR engine using Microsoft's Florence-2 model.

Florence-2 is a vision-language model that can perform OCR with region detection. It provides good accuracy across many languages and can detect text in various orientations.

Attributes:

Name Type Description
device

Device to run inference on (cuda, mps, or cpu).

model

The Florence-2 model.

processor

The Florence-2 processor.

Example

from panoocr.engines.florence2 import Florence2OCREngine

engine = Florence2OCREngine(config={ ... "model_id": "microsoft/Florence-2-large", ... }) results = engine.recognize(image)

Note

Install with: pip install "panoocr[florence2]" For GPU support, install PyTorch with CUDA.

Initialize the Florence-2 engine.

Parameters:

Name Type Description Default
config Dict[str, Any] | None

Configuration dictionary with optional keys: - model_id: HuggingFace model ID (default: "microsoft/Florence-2-large"). - device: Device to use ("cuda", "mps", "cpu", or None for auto).

None

Raises:

Type Description
ImportError

If dependencies are not installed.

Source code in src/panoocr/engines/florence2.py
def __init__(self, config: Dict[str, Any] | None = None) -> None:
    """Initialize the Florence-2 engine.

    Args:
        config: Configuration dictionary with optional keys:
            - model_id: HuggingFace model ID (default: "microsoft/Florence-2-large").
            - device: Device to use ("cuda", "mps", "cpu", or None for auto).

    Raises:
        ImportError: If dependencies are not installed.
    """
    # Check dependencies first
    _check_florence2_dependencies()

    import torch
    from transformers import AutoProcessor, AutoModelForCausalLM

    config = config or {}

    model_id = config.get("model_id", "microsoft/Florence-2-large")

    # Auto-detect device
    device = config.get("device")
    if device is None:
        if torch.cuda.is_available():
            device = "cuda"
        elif torch.backends.mps.is_available():
            device = "mps"
        else:
            device = "cpu"

    self.device = device

    # Select dtype based on device
    if torch.cuda.is_available() and device == "cuda":
        self.dtype = torch.float16
    else:
        self.dtype = torch.float32

    print(f"Loading Florence-2 model on {device}...")
    self.model = AutoModelForCausalLM.from_pretrained(
        model_id, torch_dtype=self.dtype, trust_remote_code=True
    ).to(device)
    self.processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
    print("Florence-2 model loaded successfully.")

    self.prompt = "<OCR_WITH_REGION>"

recognize #

recognize(image: Image) -> List[FlatOCRResult]

Recognize text in an image.

Parameters:

Name Type Description Default
image Image

Input image as PIL Image.

required

Returns:

Type Description
List[FlatOCRResult]

List of FlatOCRResult with normalized bounding boxes.

Source code in src/panoocr/engines/florence2.py
def recognize(self, image: Image.Image) -> List[FlatOCRResult]:
    """Recognize text in an image.

    Args:
        image: Input image as PIL Image.

    Returns:
        List of FlatOCRResult with normalized bounding boxes.
    """
    import torch

    inputs = self.processor(
        text=self.prompt, images=image, return_tensors="pt"
    ).to(self.device, self.dtype)

    generated_ids = self.model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        max_new_tokens=1024,
        num_beams=3,
        do_sample=False,
    )

    generated_text = self.processor.batch_decode(
        generated_ids, skip_special_tokens=False
    )[0]

    parsed_answer = self.processor.post_process_generation(
        generated_text,
        task="<OCR_WITH_REGION>",
        image_size=(image.width, image.height),
    )

    florence2_results = []
    try:
        ocr_data = parsed_answer.get("<OCR_WITH_REGION>", {})
        quad_boxes = ocr_data.get("quad_boxes", [])
        labels = ocr_data.get("labels", [])

        for quad_box, label in zip(quad_boxes, labels):
            # Clean up text
            label = label.replace("</s>", "").replace("<s>", "")

            # Convert quad_box [x1,y1,x2,y2,x3,y3,x4,y4] to corner points
            bounding_box = [
                [quad_box[0], quad_box[1]],
                [quad_box[2], quad_box[3]],
                [quad_box[4], quad_box[5]],
                [quad_box[6], quad_box[7]],
            ]

            florence2_results.append(
                Florence2OCRResult(
                    text=label,
                    bounding_box=bounding_box,
                    image_width=image.width,
                    image_height=image.height,
                )
            )
    except KeyError:
        print("Error parsing OCR results, returning empty list")

    # Clean up to prevent memory leak
    del inputs
    del generated_ids
    del generated_text
    del parsed_answer
    gc.collect()

    if str(self.device).startswith("cuda"):
        import torch
        torch.cuda.empty_cache()

    return [result.to_flat() for result in florence2_results]

TrOCREngine#

Microsoft's TrOCR transformer-based OCR. Requires the [trocr] extra.

pip install "panoocr[trocr]"

TrOCREngine #

TrOCREngine(config: Dict[str, Any] | None = None)

OCR engine using Microsoft's TrOCR model.

TrOCR is a transformer-based OCR model that excels at single-line text recognition. It does NOT provide bounding boxes - it reads the entire image as a single text line.

WARNING: This engine is experimental and may not work well for panorama OCR since it doesn't detect text regions. Consider using Florence2OCREngine or other engines that provide region detection.

Attributes:

Name Type Description
model

The TrOCR model.

processor

The TrOCR processor.

Example

from panoocr.engines.trocr import TrOCREngine, TrOCRModel

engine = TrOCREngine(config={ ... "model": TrOCRModel.MICROSOFT_TROCR_LARGE_PRINTED, ... })

Note: Returns single result for entire image#

results = engine.recognize(cropped_text_image)

Note

Install with: pip install "panoocr[trocr]" For GPU support, install PyTorch with CUDA.

Initialize the TrOCR engine.

Parameters:

Name Type Description Default
config Dict[str, Any] | None

Configuration dictionary with optional keys: - model: TrOCRModel enum value or model ID string. - device: Device to use ("cuda", "mps", "cpu", or None for auto).

None

Raises:

Type Description
ImportError

If dependencies are not installed.

ValueError

If configuration values are invalid.

Source code in src/panoocr/engines/trocr.py
def __init__(self, config: Dict[str, Any] | None = None) -> None:
    """Initialize the TrOCR engine.

    Args:
        config: Configuration dictionary with optional keys:
            - model: TrOCRModel enum value or model ID string.
            - device: Device to use ("cuda", "mps", "cpu", or None for auto).

    Raises:
        ImportError: If dependencies are not installed.
        ValueError: If configuration values are invalid.
    """
    # Check dependencies first
    _check_trocr_dependencies()

    import torch
    from transformers import TrOCRProcessor, VisionEncoderDecoderModel

    config = config or {}

    # Parse model
    model = config.get("model", DEFAULT_MODEL)
    try:
        model_id = model.value if isinstance(model, TrOCRModel) else model
    except (KeyError, AttributeError):
        raise ValueError("Invalid model specified")

    # Auto-detect device
    device = config.get("device")
    if device is None:
        if torch.cuda.is_available():
            device = "cuda"
        elif torch.backends.mps.is_available():
            device = "mps"
        else:
            device = "cpu"

    self.device = device

    print(f"Loading TrOCR model {model_id} on {device}...")
    self.processor = TrOCRProcessor.from_pretrained(model_id)
    self.model = VisionEncoderDecoderModel.from_pretrained(model_id).to(device)
    print("TrOCR model loaded successfully.")

recognize #

recognize(image: Image) -> List[FlatOCRResult]

Recognize text in an image.

NOTE: TrOCR treats the entire image as a single text line and does not provide bounding boxes. This makes it unsuitable for most panorama OCR use cases. The result will have a bounding box covering the entire image.

Parameters:

Name Type Description Default
image Image

Input image as PIL Image.

required

Returns:

Type Description
List[FlatOCRResult]

List with single FlatOCRResult covering the entire image, or empty

List[FlatOCRResult]

list if no text is recognized.

Source code in src/panoocr/engines/trocr.py
def recognize(self, image: Image.Image) -> List[FlatOCRResult]:
    """Recognize text in an image.

    NOTE: TrOCR treats the entire image as a single text line and does
    not provide bounding boxes. This makes it unsuitable for most panorama
    OCR use cases. The result will have a bounding box covering the entire
    image.

    Args:
        image: Input image as PIL Image.

    Returns:
        List with single FlatOCRResult covering the entire image, or empty
        list if no text is recognized.
    """
    import torch

    # Convert to RGB if needed
    if image.mode != "RGB":
        image = image.convert("RGB")

    pixel_values = self.processor(images=image, return_tensors="pt").pixel_values
    pixel_values = pixel_values.to(self.device)

    with torch.no_grad():
        generated_ids = self.model.generate(pixel_values)

    generated_text = self.processor.batch_decode(
        generated_ids, skip_special_tokens=True
    )[0]

    # TrOCR doesn't provide bounding boxes - return full image bbox
    if generated_text.strip():
        return [
            FlatOCRResult(
                text=generated_text.strip(),
                confidence=1.0,  # TrOCR doesn't provide confidence
                bounding_box=BoundingBox(
                    left=0.0,
                    top=0.0,
                    right=1.0,
                    bottom=1.0,
                    width=1.0,
                    height=1.0,
                ),
                engine="TROCR",
            )
        ]

    return []

Custom Engines#

Any class with a compatible recognize() method works:

from panoocr import PanoOCR, FlatOCRResult, BoundingBox
from PIL import Image

class MyEngine:
    def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
        # Return list of FlatOCRResult with normalized bounding boxes (0-1)
        return [
            FlatOCRResult(
                text="Hello",
                confidence=0.95,
                bounding_box=BoundingBox(
                    left=0.1, top=0.2, right=0.4, bottom=0.3,
                    width=0.3, height=0.1
                ),
                engine="my_engine",
            )
        ]

pano = PanoOCR(engine=MyEngine())