Day 19

Day 19 – May 20, 2026: Lingua Core Platform Initialization and Governance Foundation

Documenting governance-first repository initialization, TypeScript platform foundations, DevSecOps controls, and architecture decisions for lingua-core-platform.

Day 19 focused on initializing lingua-core-platform as a serious platform foundation before any runtime application code was allowed to accumulate.

The work was intentionally architectural and governance-heavy. The goal was not to ship a visible feature. The goal was to create a repository that could support disciplined AI-assisted development, long-term language-learning product expansion, and future Thai-English and Mandarin learning engines without drifting into an unreviewable pile of experiments.

Goal / Intent

The intent was to initialize lingua-core-platform as a governance-first, framework-agnostic language learning platform foundation.

That order mattered. Runtime application code can create momentum quickly, but it can also lock in accidental architecture before the project has clear boundaries. Day 19 deliberately happened before feature work so the repository could establish review rules, documentation standards, dependency hygiene, security scanning, and architectural decision records while the system was still small enough to shape cleanly.

The platform direction is a modular monolith. That keeps operational complexity low while preserving room for multi-tenant hostname routing, reusable language-learning engines, and long-term expansion across Thai, Mandarin, and related learning surfaces.

The core product idea is bigger than one frontend. The repository needs to be able to hold shared language infrastructure: tokenization, search boundaries, content modeling, public/private separation, data-source policy, and portable TypeScript abstractions that can survive future framework decisions.

Work Performed

The public GitHub repository for lingua-core-platform was created and initialized with a feature-branch workflow rather than direct work on main. Protected main branch rules were established so foundational changes would flow through review instead of bypassing the governance model.

The technical baseline was set around modern JavaScript platform tooling:

Node 22+
pnpm 10
ES Modules
strict TypeScript 6 configuration
explicit package and validation scripts

TypeScript 6 surfaced an early compatibility issue around older compilerOptions.baseUrl-dependent alias behavior. Rather than carrying forward a deprecated pattern, the path targets were changed to explicit relative aliases such as ./src/.... That made the configuration clearer and kept the compiler setup aligned with the direction of TypeScript itself.

Repository normalization was also part of the foundation work. .editorconfig and .gitattributes were added with LF normalization to prevent Windows and Linux line-ending drift. That is small infrastructure, but it matters in a cross-platform repo where local development, CI, and AI-assisted patches all touch the same files.

AI-assisted development guardrails were added early:

.cursorignore reduces noisy indexing from dependency folders and volatile lockfiles
AGENTS.md guides assistant behavior, repository conventions, and governance expectations
pull request expectations were standardized before feature velocity began

The DevSecOps baseline was created at the same time:

Dependabot was configured for weekly dependency monitoring
CodeQL was added for semantic security scanning
CODEOWNERS was added for core infrastructure, tokenization, workflow, and governance paths
a pull request template was added to make review expectations repeatable

Documentation was treated as part of the product foundation, not an afterthought. ARCHITECTURE.md and DATA_SOURCES.md were created to document system shape, data-source boundaries, and future ingestion caution.

Several ADRs were created for the core decisions:

ADR-0001: Modular Monolith Architecture
ADR-0002: TypeScript Platform Foundation
ADR-0003: Open-Core Public/Private Boundary
ADR-0004: Search and Tokenization Abstraction

The data-source documentation was intentionally conservative. Possible Thai-English resources such as LEXITRON and Volubilis were documented as evaluation candidates only. They were not treated as ingested data sources, committed assets, or approved redistribution material. That distinction is important for a language-learning platform where public portfolio code may eventually sit near licensed, proprietary, or restricted linguistic resources.

The initial CodeQL setup also needed a small TypeScript surface area to analyze. An effectively empty TypeScript repository gave CodeQL too little meaningful code to inspect, so a placeholder tokenizer export was added at src/core/tokenizers/index.ts. That created enough platform surface for CodeQL to run while still staying aligned with the planned abstraction boundary.

Challenges

The first technical friction point was TypeScript 6. Older alias patterns that depended on baseUrl tolerance no longer fit the direction of the compiler. That forced the repository to deal with path alias clarity immediately instead of discovering the issue later during feature work.

CodeQL also struggled at first because the repository had almost no TypeScript implementation surface. Security tooling needs something real enough to analyze. A repository can have good intentions and still fail early automation if the code graph is too empty for the tool to understand.

Dependency and lockfile normalization required careful validation. Because this was repository-foundation work, the lockfile, package manager version, module type, TypeScript configuration, and validation commands all needed to agree. Small inconsistencies at this layer become recurring maintenance cost if they are allowed to remain ambiguous.

The larger architectural challenge was restraint. It would have been easy to start building screens, routes, or a specific app framework. That would have created visible progress, but it also could have locked the platform into a runtime shape before the domain boundaries were clear. Day 19 intentionally avoided premature framework lock-in.

Solutions / Decisions

The main decision was to put governance first.

That is not bureaucracy for its own sake. Fast AI-assisted development can accelerate useful work, but it can also bypass quality controls if the repository does not define its boundaries. Branch protection, CODEOWNERS, pull request templates, CI validation, and assistant instructions create a controlled path for speed.

The modular monolith decision keeps infrastructure simple while preserving future flexibility. A language-learning platform does not need distributed systems complexity on day one. It does need clear internal boundaries so that tenant routing, tokenization, search, lessons, content sources, and future language engines can evolve without becoming tangled.

Framework-agnostic TypeScript abstractions keep the core portable. The language-learning engine should not depend too early on one frontend, rendering model, hosting platform, or application framework. The core should be able to support future products rather than being shaped only by the first runtime surface.

The ADRs make decisions explicit and reviewable. They capture why the project is starting as a modular monolith, why TypeScript is the foundation, how open-core boundaries are expected to work, and why tokenization and search need abstraction points from the beginning.

The open-core boundary is especially important. Public portfolio-safe code can show architecture, governance, and reusable platform design. Future private or proprietary content, licensed dictionary assets, restricted language resources, or monetizable product material need a separate boundary. Establishing that early prevents accidental leakage later.

The AI guardrails also matter. AGENTS.md gives assistants the operating model for the repository. .cursorignore reduces context pollution from noisy or volatile files. Together, they make AI assistance more reliable by giving it less irrelevant material and clearer expectations.

Validation / Definition of Done

Day 19 was complete when the repository foundation had been created, reviewed, and validated through the expected local and CI paths.

Validation included commands such as:

git pull origin main
git diff --check
pnpm install
pnpm typecheck
pnpm lint
pnpm format:check
pnpm validate

CodeQL passed after the tokenizer placeholder was added at src/core/tokenizers/index.ts. That confirmed the semantic security scan had enough TypeScript surface area to analyze and that the initial repository shape worked with the intended DevSecOps baseline.

The definition of done was not feature completeness. It was repository readiness:

protected branch workflow established
TypeScript 6 baseline validated
dependency monitoring configured
CodeQL scanning enabled
CODEOWNERS review boundaries added
pull request template created
architecture and data governance documentation written
ADRs recorded for major platform decisions
AI-assisted development instructions added
public/private and data-source cautions documented

Portfolio Framing

Day 19 is the kind of work that does not look flashy in a product screenshot, but it is exactly the work that determines whether a platform can grow responsibly.

The value was in platform engineering maturity: setting the repository up so future contributions have a clear path, security scans run early, dependency automation has review boundaries, and architectural decisions are recorded before implementation pressure makes them harder to change.

It also showed DevSecOps discipline. CodeQL, Dependabot, CODEOWNERS, pull request templates, branch protection, TypeScript strictness, and validation scripts were treated as part of the foundation rather than cleanup tasks for later.

The AI-assisted development governance matters too. As tools like OpenAI ChatGPT and Cursor become part of the development workflow, repositories need clear instructions, ignored noise paths, and review gates. The goal is not to slow the assistants down. The goal is to make their output easier to trust, review, and integrate.

Most importantly, the work kept long-term product thinking connected to engineering restraint. lingua-core-platform is intended to support a Thai-English language learning platform now and broader language expansion later. Starting with a framework-agnostic modular monolith, explicit data governance, and open-core boundaries gives that idea a stronger foundation than rushing directly into runtime code.

Day 19 was about preparing the ground so future product work can move faster without becoming less disciplined.