OCR Models#
Data classes for OCR results in different coordinate systems.
BoundingBox#
BoundingBox
dataclass
#
Normalized bounding box with coordinates in 0-1 range.
Coordinates are relative to image dimensions: - (0, 0) is top-left - (1, 1) is bottom-right
Attributes:
| Name | Type | Description |
|---|---|---|
left |
float
|
Distance from left edge (0-1). |
top |
float
|
Distance from top edge (0-1). |
right |
float
|
Distance from left edge to right side (0-1). |
bottom |
float
|
Distance from top edge to bottom side (0-1). |
width |
float
|
Box width (0-1). |
height |
float
|
Box height (0-1). |
FlatOCRResult#
OCR result from a flat (perspective) image with normalized bounding box coordinates.
FlatOCRResult
dataclass
#
FlatOCRResult(text: str, confidence: float, bounding_box: BoundingBox, engine: Optional[str] = None)
OCR result from a flat (perspective) image.
Attributes:
| Name | Type | Description |
|---|---|---|
text |
str
|
Recognized text content. |
confidence |
float
|
Recognition confidence (0-1). |
bounding_box |
BoundingBox
|
Normalized bounding box in image coordinates. |
engine |
Optional[str]
|
Name of the OCR engine used. |
to_dict
#
from_dict
classmethod
#
Create from dictionary.
Source code in src/panoocr/ocr/models.py
to_sphere
#
to_sphere(horizontal_fov: float, vertical_fov: float, yaw_offset: float, pitch_offset: float) -> 'SphereOCRResult'
Convert to spherical OCR result using camera parameters.
Uses proper 3D rotation via perspective_to_sphere() to correctly transform bounding box coordinates from perspective image space to world spherical coordinates. This accounts for the coupling between yaw and pitch that occurs when the camera has a non-zero pitch offset.
All parameters are in degrees.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
horizontal_fov
|
float
|
Horizontal field of view of the camera. |
required |
vertical_fov
|
float
|
Vertical field of view of the camera. |
required |
yaw_offset
|
float
|
Horizontal offset of the camera. |
required |
pitch_offset
|
float
|
Vertical offset of the camera. |
required |
Returns:
| Type | Description |
|---|---|
'SphereOCRResult'
|
SphereOCRResult with spherical coordinates. |
Source code in src/panoocr/ocr/models.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
SphereOCRResult#
OCR result in spherical (panorama) coordinates.
SphereOCRResult
dataclass
#
SphereOCRResult(text: str, confidence: float, yaw: float, pitch: float, width: float, height: float, engine: Optional[str] = None)
OCR result in spherical (panorama) coordinates.
Attributes:
| Name | Type | Description |
|---|---|---|
text |
str
|
Recognized text content. |
confidence |
float
|
Recognition confidence (0-1). |
yaw |
float
|
Horizontal angle in degrees (-180 to 180). |
pitch |
float
|
Vertical angle in degrees (-90 to 90). |
width |
float
|
Angular width in degrees. |
height |
float
|
Angular height in degrees. |
engine |
Optional[str]
|
Name of the OCR engine used. |