TMBLD-YOLO26m — Tibetan Modern book layout dection

A fine-tuned YOLO26m object-detection model for Tibetan Modern book layout dection. The model detects four layout classes in Tibetan modern book page images: header, Text area, footnote, and footer.

Model Description

This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the BDRC/TDLA-Training-Dataset, a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.

Property	Value
Architecture	YOLO26m
Task	Object Detection
Image size	640 × 640
Number of classes	4
Training platform	Ultralytics HUB
Weights file	`Tibetan_modern_book_Layout_detection.pt`

Classes

ID	Class	Description
0	header	Page header region
1	Text area	Main body text region
2	footnote	Footnote region
3	footer	Page footer region

Performance

Evaluated on the validation split of the TDLA Training Dataset.

Metric	Value
Precision	0.966
Recall	0.970
mAP@0.5	0.982
mAP@0.5:0.95	0.799

Training Loss (final epoch)

Loss Component	Train	Val
Box loss	0.515	0.643
Classification loss	0.218	0.276
DFL loss	0.003	0.004

Training Details

Dataset

Dataset: BDRC/TDLA-Training-Dataset
Train images: 2,692
Val images: 103
Test images: 313
Total annotations: 14,705
Train/Val split: Iterative multi-label stratification (seed 42, 80/20 ratio)

Hyperparameters

Parameter	Value
Epochs	150
Patience	100
Batch size	Auto (-1)
Image size	640
Optimizer	Auto (SGD)
Initial learning rate (lr0)	0.01
Final learning rate factor (lrf)	0.01
Momentum	0.937
Weight decay	0.0005
Warmup epochs	3.0
Warmup momentum	0.8
Warmup bias lr	0.1
AMP (mixed precision)	True
Pretrained	True
Deterministic	True
Seed	0

Loss Weights

Component	Weight
Box	7.5
Classification	0.5
DFL	1.5

Augmentation

Augmentation	Value
HSV-Hue	0.015
HSV-Saturation	0.7
HSV-Value	0.4
Translation	0.1
Scale	0.5
Flip left-right	0.5
Mosaic	1.0
Erasing	0.4
Close mosaic (last N epochs)	10
Auto augment	RandAugment

Usage

Inference with Ultralytics

from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("page_image.jpg", imgsz=640)

for result in results:
    boxes = result.boxes
    for box in boxes:
        cls_id = int(box.cls)
        conf = float(box.conf)
        xyxy = box.xyxy[0].tolist()
        print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")

Batch Inference

from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("path/to/images/", imgsz=640, conf=0.25)

Intended Use

This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:

OCR pipelines on Tibetan documents
Document digitization workflows
Structured text extraction from scanned Tibetan texts
Digital library cataloging and indexing

Limitations

Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher imgsz values.
The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.

License

This model is released under the CC0 1.0 Universal (Public Domain Dedication). You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.

Acknowledgements

This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.

Citation

If you use this model, please cite the dataset:

@software{bdrc_tmbld_yolo26m_2026,
  title   = {tmbld-YOLO26m: Tibetan Modern book layout detection Model},
  author  = {Buddhist Digital Resource Center (BDRC)},
  year    = {2026},
  url     = {https://huggingface.co/BDRC/TDLA-YOLO26m},
  license = {CC0-1.0}
}

Downloads last month: 49

Dataset used to train BDRC/Tibetan_Modern_Book_Layout_Detection_Model

Evaluation results

mAP@0.5 on TDLA Training Dataset
self-reported

0.982
mAP@0.5:0.95 on TDLA Training Dataset
self-reported

0.799
Precision on TDLA Training Dataset
self-reported

0.966
Recall on TDLA Training Dataset
self-reported

0.970