InfoBay AI Ltd.

company

AI & ML interests

Accelerate the frontier of AI development with enterprise-grade, deeply curated datasets engineered to enhance pre-training, alignment, and real-world performance.

Recent Activity

RohitManglik updated a collection about 3 hours ago

Dual-channel Podcast Speech Audio Datasets

RohitManglik updated a collection about 3 hours ago

Dual-channel Podcast Speech Audio Datasets

RohitManglik updated a collection about 3 hours ago

Dual-channel Podcast Speech Audio Datasets

View all activity

InfoBayAI 's collections 13

Thesis Dataset

InfoBayAI/English-Thesis-Dataset

Viewer • Updated 1 day ago • 1.35k • 20

Codebase Datasets

Sample Datasets of Coding dataset for benchmarking and domain specific AI models

InfoBayAI/Legacy-Code-Dataset

Viewer • Updated 1 day ago • 15.3k • 76
InfoBayAI/DSA-Coding-Problems-and-Solutions-Dataset

Viewer • Updated 2 days ago • 60 • 63
InfoBayAI/Product-Source-Code-Dataset

Viewer • Updated 1 day ago • 337 • 73

Dual Channel Global Customer-Agent Interaction Datasets

Sample Datasets of dual-channel call center audio with separate agent and customer channels for ASR, diarization, and conversational AI training.

InfoBayAI/English_United_States_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated 2 days ago • 6 • 48 • 1
InfoBayAI/English_United_Kingdom_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated 2 days ago • 9 • 35
InfoBayAI/Hindi_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated about 8 hours ago • 8 • 39
InfoBayAI/Arabic_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated about 8 hours ago • 6 • 41

Dual-channel Podcast Speech Audio Datasets

InfoBayAI/Arabic_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 5
InfoBayAI/Urdu_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 6
InfoBayAI/Tamil_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 4
InfoBayAI/Punjabi_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 2

UGC and STEM Video Datasets

InfoBayAI/stem-videos

Viewer • Updated 2 days ago • 5 • 54 • 1
InfoBayAI/User_Generated_Content

Viewer • Updated 2 days ago • 20 • 37 • 1
InfoBayAI/Storyline_Verticle_Videos

Viewer • Updated 1 day ago • 15 • 163

STEM & Non-STEM Q&A Datasets for LLM Training

Sample datasets from a 6.5M+ enterprise-grade Q&A corpus across STEM and Non-STEM domains, built for LLM training, instruction tuning, and evaluation.

InfoBayAI/Hindi-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 200 • 68
InfoBayAI/English-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 320 • 64
InfoBayAI/English-Non-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 50 • 58
InfoBayAI/Arabic-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 49 • 62

Egocentric videos

InfoBayAI/egocentric_video

Viewer • Updated 2 days ago • 10 • 83 • 1

Longitudinal Time Series Dataset

InfoBayAI/Longitudinal_Time_Series_Dataset

Viewer • Updated 2 days ago • 12 • 42 • 1

Healthcare AI Datasets for Clinical & LLM Training

Sample dataset from an enterprise-grade medical corpus built for clinical AI, diagnosis support, and healthcare LLM training.

InfoBayAI/MRI-Radiology-Reports-Without-Findings-Dataset

Viewer • Updated 2 days ago • 588 • 30
InfoBayAI/CT-Scan-Radiology-Reports-Without-Findings-Dataset

Viewer • Updated 3 days ago • 2.6k • 30 • 2
InfoBayAI/CT-Scan-Radiology-Reports-With-Findings-Dataset

Viewer • Updated 3 days ago • 6.3k • 23 • 2
InfoBayAI/X-Ray-Radiology-Reports-Without-Findings-Dataset

Viewer • Updated 3 days ago • 9 • 24 • 2

SIngle-Channel-Call-Center-Audio-Dataset

InfoBayAI/Arabic-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 3 • 3
InfoBayAI/Somali-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 6
InfoBayAI/Mizo-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 6
InfoBayAI/French-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 3 • 1

Single-channel Podcast Speech Audio Datasets

Sample from a podcast audio dataset, designed for ASR, speech recognition, and conversational AI training using diverse, real-world spoken content.

InfoBayAI/Arabic_Podcast_Audio_Dataset

Viewer • Updated 2 days ago • 2 • 87
InfoBayAI/Punjabi_Podcast_Audio_Dataset

Viewer • Updated 3 days ago • 1 • 15
InfoBayAI/English_Podcast_Audio_Dataset

Viewer • Updated 2 days ago • 2 • 62
InfoBayAI/Hindi_Podcast_Audio_Dataset

Viewer • Updated 2 days ago • 2 • 79

Academic Textbook Corpora for LLM Training

Sample of a 2.6+ word textbook corpus across 39K+ books, 5K+ subjects, and 15 languages for LLM training and multilingual knowledge modeling.

InfoBayAI/Hindi-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 256k • 29
InfoBayAI/English-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 313k • 30
InfoBayAI/English-Non-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 88.1k • 29
InfoBayAI/Arabic-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 90.3k • 26

Computer Vision & Multimodal Datasets

Sample dataset from multilingual image corpus covering medical, STEM, Non-STEM, automobile, and complex domains for computer vision and multimodal AI.

InfoBayAI/Medical-Images-Dataset

Viewer • Updated 1 day ago • 5 • 31
InfoBayAI/Automobile-Images-Dataset

Viewer • Updated 2 days ago • 50 • 34
InfoBayAI/Complex-Images-L1-Dataset

Viewer • Updated 2 days ago • 11 • 27
InfoBayAI/Healthcare-Imaging-Dataset

Viewer • Updated 2 days ago • 298 • 29

Thesis Dataset

InfoBayAI/English-Thesis-Dataset

Viewer • Updated 1 day ago • 1.35k • 20

Longitudinal Time Series Dataset

InfoBayAI/Longitudinal_Time_Series_Dataset

Viewer • Updated 2 days ago • 12 • 42 • 1

Codebase Datasets

Sample Datasets of Coding dataset for benchmarking and domain specific AI models

InfoBayAI/Legacy-Code-Dataset

Viewer • Updated 1 day ago • 15.3k • 76
InfoBayAI/DSA-Coding-Problems-and-Solutions-Dataset

Viewer • Updated 2 days ago • 60 • 63
InfoBayAI/Product-Source-Code-Dataset

Viewer • Updated 1 day ago • 337 • 73

Healthcare AI Datasets for Clinical & LLM Training

Sample dataset from an enterprise-grade medical corpus built for clinical AI, diagnosis support, and healthcare LLM training.

InfoBayAI/MRI-Radiology-Reports-Without-Findings-Dataset

Viewer • Updated 2 days ago • 588 • 30
InfoBayAI/CT-Scan-Radiology-Reports-Without-Findings-Dataset

Viewer • Updated 3 days ago • 2.6k • 30 • 2
InfoBayAI/CT-Scan-Radiology-Reports-With-Findings-Dataset

Viewer • Updated 3 days ago • 6.3k • 23 • 2
InfoBayAI/X-Ray-Radiology-Reports-Without-Findings-Dataset

Viewer • Updated 3 days ago • 9 • 24 • 2

Dual Channel Global Customer-Agent Interaction Datasets

Sample Datasets of dual-channel call center audio with separate agent and customer channels for ASR, diarization, and conversational AI training.

InfoBayAI/English_United_States_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated 2 days ago • 6 • 48 • 1
InfoBayAI/English_United_Kingdom_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated 2 days ago • 9 • 35
InfoBayAI/Hindi_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated about 8 hours ago • 8 • 39
InfoBayAI/Arabic_Call_Center_Audio_Dataset_Dual_Channel

Viewer • Updated about 8 hours ago • 6 • 41

SIngle-Channel-Call-Center-Audio-Dataset

InfoBayAI/Arabic-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 3 • 3
InfoBayAI/Somali-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 6
InfoBayAI/Mizo-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 6
InfoBayAI/French-Call-Center-Audio-Dataset-Single-Channel

Viewer • Updated about 6 hours ago • 3 • 1

Dual-channel Podcast Speech Audio Datasets

InfoBayAI/Arabic_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 5
InfoBayAI/Urdu_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 6
InfoBayAI/Tamil_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 4
InfoBayAI/Punjabi_Podcast_Audio_Dataset_Dual_Channel

Viewer • Updated about 3 hours ago • 2

Single-channel Podcast Speech Audio Datasets

Sample from a podcast audio dataset, designed for ASR, speech recognition, and conversational AI training using diverse, real-world spoken content.

InfoBayAI/Arabic_Podcast_Audio_Dataset

Viewer • Updated 2 days ago • 2 • 87
InfoBayAI/Punjabi_Podcast_Audio_Dataset

Viewer • Updated 3 days ago • 1 • 15
InfoBayAI/English_Podcast_Audio_Dataset

Viewer • Updated 2 days ago • 2 • 62
InfoBayAI/Hindi_Podcast_Audio_Dataset

Viewer • Updated 2 days ago • 2 • 79

UGC and STEM Video Datasets

InfoBayAI/stem-videos

Viewer • Updated 2 days ago • 5 • 54 • 1
InfoBayAI/User_Generated_Content

Viewer • Updated 2 days ago • 20 • 37 • 1
InfoBayAI/Storyline_Verticle_Videos

Viewer • Updated 1 day ago • 15 • 163

Academic Textbook Corpora for LLM Training

Sample of a 2.6+ word textbook corpus across 39K+ books, 5K+ subjects, and 15 languages for LLM training and multilingual knowledge modeling.

InfoBayAI/Hindi-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 256k • 29
InfoBayAI/English-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 313k • 30
InfoBayAI/English-Non-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 88.1k • 29
InfoBayAI/Arabic-STEM-Textbook-Dataset

Viewer • Updated 1 day ago • 90.3k • 26

STEM & Non-STEM Q&A Datasets for LLM Training

Sample datasets from a 6.5M+ enterprise-grade Q&A corpus across STEM and Non-STEM domains, built for LLM training, instruction tuning, and evaluation.

InfoBayAI/Hindi-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 200 • 68
InfoBayAI/English-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 320 • 64
InfoBayAI/English-Non-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 50 • 58
InfoBayAI/Arabic-STEM-QA-MCQ-Dataset

Viewer • Updated 2 days ago • 49 • 62

Computer Vision & Multimodal Datasets

Sample dataset from multilingual image corpus covering medical, STEM, Non-STEM, automobile, and complex domains for computer vision and multimodal AI.

InfoBayAI/Medical-Images-Dataset

Viewer • Updated 1 day ago • 5 • 31
InfoBayAI/Automobile-Images-Dataset

Viewer • Updated 2 days ago • 50 • 34
InfoBayAI/Complex-Images-L1-Dataset

Viewer • Updated 2 days ago • 11 • 27
InfoBayAI/Healthcare-Imaging-Dataset

Viewer • Updated 2 days ago • 298 • 29

Egocentric videos

InfoBayAI/egocentric_video

Viewer • Updated 2 days ago • 10 • 83 • 1

AI & ML interests

Recent Activity

Team members 1

InfoBayAI 's collections 13