Pages Index (pages.json) Documentation

📖 Documentation page

This page documents a workflow, system, feature, tool, or editorial practice used by The Sunil Abraham Project (TSAP). It describes how the project operates and is not itself a primary content article.

The Pages Index is a machine-readable catalogue of content published on The Sunil Abraham Project (TSAP). It is generated automatically from page front matter and exported as a JSON file named pages.json.

The purpose of the Pages Index is to provide a structured representation of TSAP content that can be consumed by external tools, bots, scripts, search systems, and future applications without requiring direct access to Jekyll internals or repository source files.

The implementation was developed with support from ChatGPT. All design decisions, testing, debugging, editorial judgement, and final implementation choices were made by the project maintainer.

Background

As TSAP grew beyond one thousand pages, it became increasingly desirable to expose a structured list of published content.

Human readers can navigate the website through categories, internal links, search engines, and navigation menus. Software tools, however, require a structured source of information.

Several future use cases were identified:

A machine-readable index became increasingly useful as the project expanded.

Why the Pages Index Was Created

Without a dedicated index, external tools would need to:

This creates unnecessary complexity and tightly couples external tools to repository internals.

The Pages Index was therefore created as a simple, portable, and machine-readable representation of published TSAP content.

Architecture

The Pages Index is generated from page front matter.

The script scans Markdown files throughout the repository and extracts selected metadata fields.

Only pages containing a created field are included.

This rule was chosen because:

The process is:

Markdown Files
        ↓
Front Matter Extraction
        ↓
Metadata Selection
        ↓
pages.json Generation
        ↓
Publication

The generated file becomes a structured catalogue of TSAP content.

Script Location

The Pages Index is generated by:

scripts/generate_pages_json.py

The script scans Markdown files throughout the repository, extracts selected front matter metadata, and generates a machine-readable JSON index.

Generated File

The output file is:

pages.json

It is written to the repository root and published automatically by GitHub Pages.

Published URL:

https://sunilabraham.in/pages.json

Included Metadata

Each indexed page may contain:

A typical entry looks like:

{
  "title": "Example Page",
  "description": "Example description",
  "created": "2026-06-08",
  "date": "2026-05-01",
  "source": "Example Source",
  "authors": ["Example Author"],
  "categories": ["Example Category"],
  "permalink": "https://sunilabraham.in/example-page/"
}

Installation Requirements

The script was designed to remain lightweight and portable.

Requirements:

On Ubuntu:

sudo apt install python3-yaml

Alternatively, within a Python virtual environment:

pip install pyyaml

No database is required.

No Jekyll plugin is required.

No GitHub Actions workflow is required.

Running the Generator

Navigate to the root of the repository:

cd /path/to/your/repository

Example:

cd ~/Projects/sunilabraham

Run the generator:

python3 scripts/generate_pages_json.py

Typical output:

Created pages.json with 1048 pages.

First Successful Build

The first successful generation occurred on 8 June 2026.

Results:

Created pages.json with 1048 pages.

Generated file size:

576 KB

This demonstrated that a machine-readable index of the entire project could be generated efficiently while remaining small enough for rapid download.

Maintenance Workflow

A recommended workflow is:

  1. Create, edit, or publish content.
  2. Regenerate pages.json.
  3. Review the result.
  4. Commit the updated index.
  5. Push changes.

Typical usage:

python3 scripts/generate_pages_json.py

git add pages.json

git commit -m "Update pages index"

git push

This ensures that the published index remains synchronised with site content.

Current Uses

The Pages Index was originally created to support retrieval systems and machine-readable access to TSAP content.

Current uses include:

Future tools may consume the same index without requiring direct access to repository source files.

Advantages and Limitations

Advantages:

Limitations:

These limitations are considered acceptable given the project’s emphasis on simplicity and maintainability.

Future Improvements

Potential future enhancements include:

Any future development should continue to prioritise transparency, portability, and compatibility with GitHub Pages.

Development History

Development began on 8 June 2026.

The immediate goal was to create a machine-readable representation of TSAP content that could be consumed by external tools without requiring direct access to repository files.

The chosen approach was deliberately simple. Rather than introducing databases, search engines, build plugins, or external services, a standalone Python script was created to scan Markdown files and export selected front matter metadata into a single JSON file.

The first successful run generated:

Created pages.json with 1048 pages.

The resulting file was approximately 576 KB in size and was published at:

https://sunilabraham.in/pages.json

This established the first structured content index for the project.

Lessons Learned

The development of the Pages Index reinforced an important architectural lesson.

The most valuable improvement was not adding another AI model or another search interface.

The most valuable improvement was creating a structured representation of TSAP’s own content.

By transforming page metadata into a machine-readable index, TSAP content becomes accessible to future tools, bots, reports, and retrieval systems while remaining entirely within the project’s existing static-site architecture.

Future development should continue to favour structured project data and retrieval mechanisms wherever practical.

📄 This page was created on 8 June 2026. You can view its history on GitHub, preview the fileTip: Press Alt+Shift+G, or inspect the .