Files
NexaCore/docs/nexacore-foundation.md
nessi 0da224325a chore: initialize NexaCore compiler workspace with basic frontend and CLI
Add initial project structure for NexaCore programming language compiler:
- Create Cargo workspace with 4 crates (cli, driver, frontend, runtime)
- Add lexer with indentation-based tokenization and keyword support
- Add parser for modules, functions, structs, and basic expressions
- Implement CLI with build command and placeholder subcommands
- Add driver crate to orchestrate compilation pipeline
- Include .gitignore for Rust build
2026-04-06 16:57:54 +02:00

558 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# NexaCore Foundation
## 1. Language Vision
### What NexaCore is
NexaCore is a compiled backend language designed for APIs, database-heavy services, internal platforms, and system daemons. The language aims to keep code visually simple and readable while enforcing stronger correctness guarantees than Python. It prioritizes predictable performance, structured concurrency, explicit error handling, and batteries-included backend tooling.
### Target users
- backend engineers building REST APIs and service layers
- platform teams building internal tools and service orchestration
- companies replacing Python microservices that have grown too dynamic or too slow
- teams that want a simpler language than Rust for application-level backend work
### Why it is better suited for backend systems than Python
- compiled deployment artifact instead of shipping source trees and interpreter environments
- static typing with local inference catches failures before production
- explicit error model improves reliability for service code
- structured async runtime designed around network and database workloads
- first-class PostgreSQL and HTTP support as standard capabilities, not bolted-on frameworks
- smaller operational surface for packaging, startup, and deployment
- stronger encapsulation and harder-to-read binaries than plain source code shipping
### Design goals
- keep syntax approachable and easy to scan
- optimize for backend productivity, not language cleverness
- compile to efficient deployable artifacts
- make async IO, HTTP routing, and PostgreSQL first-class
- provide strong type safety with low annotation burden
- support Linux first with a path to Windows later
- keep tooling simple: new, build, run, test, fmt, add, doc
### Non-goals
- replacing C or Rust for kernel, driver, or embedded programming
- full zero-cost manual memory control in the MVP
- metaprogramming-heavy language features in version one
- multiple inheritance, operator overloading, or macros in the MVP
- universal frontend or browser runtime in the MVP
## 2. Language Features
### MVP feature set
- Variables: immutable by default with `let`, mutable with `var`
- Functions: named functions with return types and local type inference
- Structs: product types with methods and visibility control
- Modules/imports: file-based modules with package namespaces
- If/else: expression-friendly branching
- Match: exhaustive matching on enums, literals, and guards
- Loops: `for`, `while`, and iterator-based traversal
- Error handling: `Result<T, E>`, `?` propagation, `defer` later
- Async/await: structured async for network and database operations
- Database access: typed query APIs and row-to-struct mapping
- HTTP API: built-in routing and request/response abstractions in stdlib
- Package management: first-party package manifest and lockfile
### Typing model
Static typing with local inference is the right MVP choice. NexaCore should infer obvious local types while requiring type signatures on public functions, struct fields, and externally visible module boundaries. This gives Python-like authoring speed without Pythons runtime ambiguity.
### Memory model
The MVP should use automatic memory management through reference-counted heap objects plus arena ownership inside the compiler and runtime internals. This is simpler than full tracing GC and easier to implement safely than Rust-like borrow checking in a new language. Long term, the language can evolve toward region-based optimization and escape analysis.
### Concurrency model
Structured async concurrency for IO-bound backend work is the default. The language runtime owns the async scheduler and task model. Shared-state threads are not part of the first language surface; background workers and task spawning go through runtime primitives.
## 3. Syntax Design
NexaCore should be indentation-aware for readability but use explicit block starters so the parser remains simple and code stays visually structured. A colon starts a block, and indentation ends it.
### Hello world
```nexa
fn main() -> Int:
print("Hello, NexaCore")
```
### Variables
```nexa
let host = "127.0.0.1"
var port: Int = 8080
let debug = true
```
### Functions
```nexa
fn add(a: Int, b: Int) -> Int:
a + b
```
### Structs and methods
```nexa
pub struct User:
id: Int
email: String
impl User:
fn display(self) -> String:
"{self.id}:{self.email}"
```
### REST API endpoint
```nexa
use web.http.{App, Request, Response}
fn health(_req: Request) -> Response:
Response.json({
"status": "ok"
})
```
### PostgreSQL query
```nexa
use db.postgres.{Pool, query}
async fn load_user(pool: Pool, id: Int) -> Result<User, DbError>:
let row = await query(pool,
"select id, email from users where id = $1",
[id]
)?.one()
row.into<User>()
```
### Async function
```nexa
async fn fetch_profile(user_id: Int) -> Result<Profile, AppError>:
let profile = await profiles.load(user_id)?
profile
```
### Error handling
```nexa
fn parse_port(raw: String) -> Result<Int, ConfigError>:
match raw.to_int():
ok(value) => value
err(_) => fail ConfigError.invalid("PORT must be numeric")
```
### Imports and modules
```nexa
use core.env
use web.http.{App, Response}
use db.postgres.Pool
```
## 4. Technical Architecture
### Compiler pipeline
1. Lexer
Converts UTF-8 source into tokens, including indentation-sensitive block tokens.
2. Parser
Builds an AST from tokens using a recursive descent parser.
3. AST
Stores module declarations, items, statements, expressions, types, and spans.
4. Semantic analyzer
Resolves names, module symbols, scopes, and visibility.
5. Type checker
Infers local types, validates function signatures, and resolves generic instantiations.
6. HIR and MIR
HIR for resolved source-level structure, MIR for lowered control flow and typed operations.
7. Backend code generation
Emit portable C in the MVP, then compile with a system C compiler.
8. Binary output
Native executable or shared object linked with the NexaCore runtime.
### Module system
- one package contains a `nexa.toml` manifest
- source files live in `src/`
- `src/main.nx` builds an application binary
- `src/lib.nx` builds a library package
- `use` imports symbol paths
- package dependencies resolve through a first-party registry later; local path dependencies first
### Package manager
The `nexacore` CLI owns package management:
- `nexacore new api-service`
- `nexacore add postgres`
- `nexacore build`
- `nexacore test`
Manifest:
```toml
[package]
name = "orders-api"
version = "0.1.0"
edition = "2026"
[dependencies]
postgres = "0.1"
http = "0.1"
```
### Standard library layout
- `core`: strings, collections, io, env, time, result, option
- `async`: tasks, channels, timers
- `web`: http server, routing, requests, responses, middleware
- `db`: postgres client, pooling, migrations later
- `json`: encode, decode, schema helpers
- `auth`: jwt and password utilities
- `log`: structured logging
### Best MVP implementation path
The best MVP path is `NexaCore -> AST/HIR/MIR -> C -> native binary`.
### Why C is the best first backend
- easier to implement than a full LLVM backend
- produces native binaries immediately
- lets the team focus first on language semantics, standard library shape, and runtime
- easier to debug generated output during compiler bring-up
- keeps a clean migration path to LLVM or a direct machine-code backend later
- avoids the operational and implementation overhead of designing a serious VM before validating the language
### Why not LLVM first
LLVM is powerful, but it significantly increases implementation surface area early. For an MVP language team, front-end maturity and runtime design are bigger risks than instruction selection.
### Why not a bytecode VM first
A VM is attractive for portability, but it weakens the deployment and code-protection story and requires designing both a language and a production runtime execution engine at once.
## 5. Security and Code Protection
No compiled format is impossible to reverse engineer. NexaCore should aim for strong practical resistance, not absolute secrecy.
### Realistic protection model
- compile to native binaries for deployment
- strip symbols in release mode
- minimize embedded reflection metadata
- avoid preserving source-like names unless needed for diagnostics
- separate debug symbols from production artifacts
- support link-time optimization and dead-code elimination
- optionally obfuscate private symbol names in hardened builds
- keep secrets out of binaries; load them from environment or secret managers
### Tradeoffs
- native binaries are materially harder to inspect than source, but still reversible with enough effort
- bytecode is easier to decompile than optimized native code
- aggressive obfuscation complicates debugging and incident response
- encrypted assets help with packaged resources, not code secrecy after runtime decryption
### Recommended release modes
- `debug`: symbols and source maps kept
- `release`: optimized and stripped
- `release-hardened`: stripped, symbol-minimized, optional control-flow obfuscation hooks later
## 6. Web Backend Standard Library
### Core backend SDK modules
- HTTP server: TCP listener, HTTP parser integration, request lifecycle
- Routing: method/path routing, path params, nested groups
- Middleware: auth, logging, recovery, tracing
- JSON: serializer and parser with typed model mapping
- Environment config: `.env` loading later, env parsing, typed config helpers
- PostgreSQL driver: async client, pool, prepared statements
- Logging: structured logger with JSON output option
- File handling: streams, safe path utilities, upload helpers later
- JWT/auth helpers: token signing, verification, password hashing later
- Background jobs: runtime task spawning, scheduling, queues later
- WebSocket: defer until after HTTP core is stable
## 7. PostgreSQL Integration
NexaCore should treat PostgreSQL as a first-class backend primitive. The syntax stays explicit, but the API should be much tighter than Python ORMs and less ceremony-heavy than many async driver stacks.
### Opening a connection
```nexa
use db.postgres.Pool
let pool = Pool.connect(env.require("DATABASE_URL"), max: 20)?
```
### Select queries
```nexa
let users = await pool.query<User>(
"select id, email from users order by id"
)?
```
### Inserts and updates
```nexa
let inserted = await pool.exec(
"insert into users(email) values($1)",
["a@example.com"]
)?
```
### Transactions
```nexa
let tx = await pool.begin()?
await tx.exec("update accounts set balance = balance - $1 where id = $2", [10, from])?
await tx.exec("update accounts set balance = balance + $1 where id = $2", [10, to])?
await tx.commit()?
```
### Mapping rows to structs
```nexa
struct User:
id: Int
email: String
let user = await pool.query_one<User>(
"select id, email from users where id = $1",
[id]
)?
```
### Connection pooling
```nexa
let pool = Pool.connect(url, max: 32, min: 4, idle_timeout_sec: 30)?
```
### Async queries
All PostgreSQL APIs are async-first. Blocking database access is not part of the standard application path.
## 8. Developer Experience
### CLI commands
- `nexacore new <name>`: create a new app or library
- `nexacore build`: compile package to binary or library
- `nexacore run`: build and execute
- `nexacore test`: run language and package tests
- `nexacore fmt`: format source code
- `nexacore add <package>`: add dependency
- `nexacore doc`: build documentation
### Example backend project layout
```text
orders-api/
nexa.toml
src/
main.nx
api/
routes.nx
users.nx
db/
models.nx
queries.nx
config.nx
tests/
users_test.nx
```
## 9. Starter Implementation Plan
### Phase 1: language spec MVP
- freeze core syntax rules
- define token grammar and block structure
- define AST and type system MVP
- define package manifest format
### Phase 2: lexer/parser/AST
- implement token definitions
- implement indentation-aware lexer
- implement parser for modules, functions, structs, statements, and expressions
- snapshot parser test fixtures
### Phase 3: semantic analysis
- name resolution
- scope tracking
- visibility rules
- type inference for locals
- public API type validation
### Phase 4: code generation
- HIR and MIR lowering
- C backend emitter
- runtime ABI definition
- compile driver invoking system C compiler
### Phase 5: runtime and stdlib
- string and collection runtime
- result/option representations
- async task scheduler
- IO primitives
### Phase 6: PostgreSQL and HTTP framework
- async socket and HTTP runtime
- routing and JSON helpers
- PostgreSQL client and connection pooling
- example backend service
### Phase 7: package manager and tooling
- manifest parser
- dependency resolver
- formatter
- test runner
- docs generator
## 10. Repository Structure
```text
NexaCore/
Cargo.toml
README.md
docs/
nexacore-foundation.md
crates/
nxc-cli/
nxc-driver/
nxc-frontend/
nxc-runtime/
stdlib/
core/
db/
http/
packages/
examples/
backend-api/
tests/
compiler/
integration/
tools/
```
## 11. MVP Code Generation
Bootstrapping in Rust is the best choice:
- excellent fit for compiler engineering
- strong enums and pattern matching for token/AST modeling
- memory safety for a long-lived systems project
- good ecosystem for CLI, testing, and later LLVM/C toolchain integrations
The starter code in this repo includes:
- token definitions
- lexer
- AST nodes
- parser skeleton
- compiler driver
- CLI entrypoint
## 12. Example NexaCore Program
```nexa
use core.env
use db.postgres.Pool
use web.http.{App, Response}
struct AppState:
pool: Pool
async fn health(state: AppState) -> Response:
let version = env.get("APP_VERSION").or("dev")
let row = await state.pool.query_one<Map>(
"select now() as now"
)?
Response.json({
"status": "ok",
"version": version,
"database_time": row["now"]
})
async fn main() -> Result<Void, AppError>:
let database_url = env.require("DATABASE_URL")?
let port = env.get("PORT").or("8080").to_int()?
let pool = Pool.connect(database_url, max: 16)?
let app = App.new()
.state(AppState { pool: pool })
.get("/health", health)
await app.listen("0.0.0.0", port)?
```
## 13. Codex Execution Rules
- prioritize correctness over false completeness
- implement compileable starter code, not pseudocode disguised as finished work
- leave clear TODOs where the compiler is intentionally incomplete
- keep package boundaries aligned with the compiler pipeline
- avoid inventing third-party dependencies unless they are explicitly added
## 14. Final Deliverable
### Recommended architecture choice
Rust front-end compiler with a C backend for the MVP.
### MVP scope
- parser and type-checked core language
- C code generation
- native binary build flow on Linux
- minimal runtime
- first-party HTTP and PostgreSQL runtime modules
### First coding step
Build the front-end pipeline end to end for a single file:
lex -> parse -> AST dump -> diagnostics.
### First files to generate
- workspace manifest
- `nxc-frontend` token/lexer/parser/AST
- `nxc-driver` compile pipeline
- `nxc-cli` entrypoint
- example `main.nx`
### Build order
1. tokens and spans
2. lexer
3. AST
4. parser
5. diagnostics
6. semantic resolver
7. type checker
8. HIR/MIR lowering
9. C backend
10. runtime and stdlib