Day 19
Day 19 – May 20, 2026: Lingua Core Platform Initialization and Governance Foundation
Documenting governance-first repository initialization, TypeScript platform foundations, DevSecOps controls, and architecture decisions for lingua-core-platform.
Day 19 focused on initializing lingua-core-platform as a serious platform
foundation before any runtime application code was allowed to accumulate.
The work was intentionally architectural and governance-heavy. The goal was not to ship a visible feature. The goal was to create a repository that could support disciplined AI-assisted development, long-term language-learning product expansion, and future Thai-English and Mandarin learning engines without drifting into an unreviewable pile of experiments.
Goal / Intent
The intent was to initialize lingua-core-platform as a governance-first,
framework-agnostic language learning platform foundation.
That order mattered. Runtime application code can create momentum quickly, but it can also lock in accidental architecture before the project has clear boundaries. Day 19 deliberately happened before feature work so the repository could establish review rules, documentation standards, dependency hygiene, security scanning, and architectural decision records while the system was still small enough to shape cleanly.
The platform direction is a modular monolith. That keeps operational complexity low while preserving room for multi-tenant hostname routing, reusable language-learning engines, and long-term expansion across Thai, Mandarin, and related learning surfaces.
The core product idea is bigger than one frontend. The repository needs to be able to hold shared language infrastructure: tokenization, search boundaries, content modeling, public/private separation, data-source policy, and portable TypeScript abstractions that can survive future framework decisions.
Work Performed
The public GitHub repository for lingua-core-platform was created and
initialized with a feature-branch workflow rather than direct work on main.
Protected main branch rules were established so foundational changes would
flow through review instead of bypassing the governance model.
The technical baseline was set around modern JavaScript platform tooling:
- Node 22+
- pnpm 10
- ES Modules
- strict TypeScript 6 configuration
- explicit package and validation scripts
TypeScript 6 surfaced an early compatibility issue around older
compilerOptions.baseUrl-dependent alias behavior. Rather than carrying
forward a deprecated pattern, the path targets were changed to explicit
relative aliases such as ./src/.... That made the configuration clearer and
kept the compiler setup aligned with the direction of TypeScript itself.
Repository normalization was also part of the foundation work. .editorconfig
and .gitattributes were added with LF normalization to prevent Windows and
Linux line-ending drift. That is small infrastructure, but it matters in a
cross-platform repo where local development, CI, and AI-assisted patches all
touch the same files.
AI-assisted development guardrails were added early:
.cursorignorereduces noisy indexing from dependency folders and volatile lockfilesAGENTS.mdguides assistant behavior, repository conventions, and governance expectations- pull request expectations were standardized before feature velocity began
The DevSecOps baseline was created at the same time:
- Dependabot was configured for weekly dependency monitoring
- CodeQL was added for semantic security scanning
- CODEOWNERS was added for core infrastructure, tokenization, workflow, and governance paths
- a pull request template was added to make review expectations repeatable
Documentation was treated as part of the product foundation, not an afterthought.
ARCHITECTURE.md and DATA_SOURCES.md were created to document system shape,
data-source boundaries, and future ingestion caution.
Several ADRs were created for the core decisions:
- ADR-0001: Modular Monolith Architecture
- ADR-0002: TypeScript Platform Foundation
- ADR-0003: Open-Core Public/Private Boundary
- ADR-0004: Search and Tokenization Abstraction
The data-source documentation was intentionally conservative. Possible Thai-English resources such as LEXITRON and Volubilis were documented as evaluation candidates only. They were not treated as ingested data sources, committed assets, or approved redistribution material. That distinction is important for a language-learning platform where public portfolio code may eventually sit near licensed, proprietary, or restricted linguistic resources.
The initial CodeQL setup also needed a small TypeScript surface area to analyze.
An effectively empty TypeScript repository gave CodeQL too little meaningful
code to inspect, so a placeholder tokenizer export was added at
src/core/tokenizers/index.ts. That created enough platform surface for CodeQL
to run while still staying aligned with the planned abstraction boundary.
Challenges
The first technical friction point was TypeScript 6. Older alias patterns that
depended on baseUrl tolerance no longer fit the direction of the compiler.
That forced the repository to deal with path alias clarity immediately instead
of discovering the issue later during feature work.
CodeQL also struggled at first because the repository had almost no TypeScript implementation surface. Security tooling needs something real enough to analyze. A repository can have good intentions and still fail early automation if the code graph is too empty for the tool to understand.
Dependency and lockfile normalization required careful validation. Because this was repository-foundation work, the lockfile, package manager version, module type, TypeScript configuration, and validation commands all needed to agree. Small inconsistencies at this layer become recurring maintenance cost if they are allowed to remain ambiguous.
The larger architectural challenge was restraint. It would have been easy to start building screens, routes, or a specific app framework. That would have created visible progress, but it also could have locked the platform into a runtime shape before the domain boundaries were clear. Day 19 intentionally avoided premature framework lock-in.
Solutions / Decisions
The main decision was to put governance first.
That is not bureaucracy for its own sake. Fast AI-assisted development can accelerate useful work, but it can also bypass quality controls if the repository does not define its boundaries. Branch protection, CODEOWNERS, pull request templates, CI validation, and assistant instructions create a controlled path for speed.
The modular monolith decision keeps infrastructure simple while preserving future flexibility. A language-learning platform does not need distributed systems complexity on day one. It does need clear internal boundaries so that tenant routing, tokenization, search, lessons, content sources, and future language engines can evolve without becoming tangled.
Framework-agnostic TypeScript abstractions keep the core portable. The language-learning engine should not depend too early on one frontend, rendering model, hosting platform, or application framework. The core should be able to support future products rather than being shaped only by the first runtime surface.
The ADRs make decisions explicit and reviewable. They capture why the project is starting as a modular monolith, why TypeScript is the foundation, how open-core boundaries are expected to work, and why tokenization and search need abstraction points from the beginning.
The open-core boundary is especially important. Public portfolio-safe code can show architecture, governance, and reusable platform design. Future private or proprietary content, licensed dictionary assets, restricted language resources, or monetizable product material need a separate boundary. Establishing that early prevents accidental leakage later.
The AI guardrails also matter. AGENTS.md gives assistants the operating
model for the repository. .cursorignore reduces context pollution from noisy
or volatile files. Together, they make AI assistance more reliable by giving it
less irrelevant material and clearer expectations.
Validation / Definition of Done
Day 19 was complete when the repository foundation had been created, reviewed, and validated through the expected local and CI paths.
Validation included commands such as:
git pull origin main
git diff --check
pnpm install
pnpm typecheck
pnpm lint
pnpm format:check
pnpm validate
CodeQL passed after the tokenizer placeholder was added at
src/core/tokenizers/index.ts. That confirmed the semantic security scan had
enough TypeScript surface area to analyze and that the initial repository
shape worked with the intended DevSecOps baseline.
The definition of done was not feature completeness. It was repository readiness:
- protected branch workflow established
- TypeScript 6 baseline validated
- dependency monitoring configured
- CodeQL scanning enabled
- CODEOWNERS review boundaries added
- pull request template created
- architecture and data governance documentation written
- ADRs recorded for major platform decisions
- AI-assisted development instructions added
- public/private and data-source cautions documented
Portfolio Framing
Day 19 is the kind of work that does not look flashy in a product screenshot, but it is exactly the work that determines whether a platform can grow responsibly.
The value was in platform engineering maturity: setting the repository up so future contributions have a clear path, security scans run early, dependency automation has review boundaries, and architectural decisions are recorded before implementation pressure makes them harder to change.
It also showed DevSecOps discipline. CodeQL, Dependabot, CODEOWNERS, pull request templates, branch protection, TypeScript strictness, and validation scripts were treated as part of the foundation rather than cleanup tasks for later.
The AI-assisted development governance matters too. As tools like OpenAI ChatGPT and Cursor become part of the development workflow, repositories need clear instructions, ignored noise paths, and review gates. The goal is not to slow the assistants down. The goal is to make their output easier to trust, review, and integrate.
Most importantly, the work kept long-term product thinking connected to
engineering restraint. lingua-core-platform is intended to support a
Thai-English language learning platform now and broader language expansion
later. Starting with a framework-agnostic modular monolith, explicit data
governance, and open-core boundaries gives that idea a stronger foundation
than rushing directly into runtime code.
Day 19 was about preparing the ground so future product work can move faster without becoming less disciplined.