KanoonGPT Open Legal Data Initiative

Open legal data for Indian legal AI

We publish structured legal datasets so researchers, builders, and legal-tech teams can build search, RAG, analytics, and evaluation workflows on a common foundation.

Hugging FaceDatasetapache-2.0

Indian Case Laws

Open Indian case-law data for AI, search, and legal research. This dataset is part of the KanoonGPT Open Legal Data Initiative, built for legal-tech, research, and production AI systems.

Free Original Judgments

Download original eCourts judgment PDFs for free. This open dataset helps make court records easier to find, access, and use.

Rows

17,074,214

Total Size

10.9 GB

Timeframe

1950-2026 (rolling updates)

Downloads (last month)

127

Formats

Parquet

Modalities

Tabular, Text

Languages

English

Coverage

Supreme Court of India + 25 High Courts

lawlegalindiacase-lawjudgmentslegal-tech

Open Dataset on Hugging FaceRepository: KanoonGPT/indian-case-laws

Why This Dataset Exists

Auto-converted to Parquet and ready for analytics pipelines.
Includes provenance links to source JSON and source PDF artifacts.
Adds parser diagnostics and quality flags for safe downstream filtering.
Supports retrieval, ranking, RAG, legal NLP benchmarks, and model evaluation.

Schema Highlights

case_title, party_petitioner, party_respondent, party_caption
docket_number, cnr_number, neutral_citation, law_report_citation
court_name, court_code, bench_name, presiding_judge, coram_members
decision_date, registration_date, decision_year, disposition_text
source_json_s3_url, source_pdf_s3_url, source_relative_path
indexable_text, headnote_text, normalized_record_json, parser_json, quality_json

Release Variants

sample

Non-partitioned representative subset for quick exploration, demos, and schema inspection.

Available

structured

Full flattened metadata for judgments, including parties, citations, court details, dates, and quality signals.

Rolling release

full

Structured metadata plus judgment text payloads for retrieval, fine-tuning, and text-heavy downstream tasks.

Coming soon

Responsible Use

Verify important legal facts against original court records and judgment PDFs.
Do not treat dataset outputs as legal advice.
Add your own validation, citations, and human review for high-stakes applications.
Respect privacy, applicable law, and platform terms in downstream usage.