KanoonGPT Open Legal Data Initiative
Open legal data for Indian legal AI
We publish structured legal datasets so researchers, builders, and legal-tech teams can build search, RAG, analytics, and evaluation workflows on a common foundation.
Indian Case Laws
Open Indian case-law data for AI, search, and legal research. This dataset is part of the KanoonGPT Open Legal Data Initiative, built for legal-tech, research, and production AI systems.
Free Original Judgments
Download original eCourts judgment PDFs for free. This open dataset helps make court records easier to find, access, and use.
Rows
17,074,214
Total Size
10.9 GB
Timeframe
1950-2026 (rolling updates)
Downloads (last month)
127
Formats
Parquet
Modalities
Tabular, Text
Languages
English
Coverage
Supreme Court of India + 25 High Courts
Why This Dataset Exists
- Auto-converted to Parquet and ready for analytics pipelines.
- Includes provenance links to source JSON and source PDF artifacts.
- Adds parser diagnostics and quality flags for safe downstream filtering.
- Supports retrieval, ranking, RAG, legal NLP benchmarks, and model evaluation.
Schema Highlights
- case_title, party_petitioner, party_respondent, party_caption
- docket_number, cnr_number, neutral_citation, law_report_citation
- court_name, court_code, bench_name, presiding_judge, coram_members
- decision_date, registration_date, decision_year, disposition_text
- source_json_s3_url, source_pdf_s3_url, source_relative_path
- indexable_text, headnote_text, normalized_record_json, parser_json, quality_json
Release Variants
sample
Non-partitioned representative subset for quick exploration, demos, and schema inspection.
Available
structured
Full flattened metadata for judgments, including parties, citations, court details, dates, and quality signals.
Rolling release
full
Structured metadata plus judgment text payloads for retrieval, fine-tuning, and text-heavy downstream tasks.
Coming soon
Responsible Use
- Verify important legal facts against original court records and judgment PDFs.
- Do not treat dataset outputs as legal advice.
- Add your own validation, citations, and human review for high-stakes applications.
- Respect privacy, applicable law, and platform terms in downstream usage.