erix/e2fsd

generated from erix/meta

e2fsd is the private ext-family filesystem provider daemon for EriX

Rust 99.9%

Find a file

Erik Inkinen dc2b0b07e2 All checks were successful CI / markdown (push) Successful in 2s Details CI / test (push) Successful in 53s Details Install FAT fixture tools in e2fsd CI		2026-05-22 15:56:50 +03:00
.github	Install FAT fixture tools in e2fsd CI	2026-05-22 15:56:50 +03:00
src	Fix e2fsd CI fixture generator lookup	2026-05-22 15:54:37 +03:00
.editorconfig	Initial commit	2026-04-26 08:06:26 +02:00
.gitignore	Initial commit	2026-04-26 08:06:26 +02:00
.markdownlint-cli2.yaml	Tighten CI markdown policy	2026-05-22 15:10:00 +03:00
ARCHITECTURE.md	Tighten CI markdown policy	2026-05-22 15:10:00 +03:00
Cargo.toml	Support bounded ext4 fsverity files	2026-05-11 12:34:01 +03:00
CODE_OF_CONDUCT.md	Tighten CI markdown policy	2026-05-22 15:10:00 +03:00
CONTRIBUTING.md	Tighten CI markdown policy	2026-05-22 15:10:00 +03:00
LICENSE	Initial commit	2026-04-26 08:06:26 +02:00
linker.ld	Add private ext provider service	2026-04-26 12:25:11 +03:00
README.md	Tighten CI markdown policy	2026-05-22 15:10:00 +03:00
ROADMAP.md	Tighten CI markdown policy	2026-05-22 15:10:00 +03:00
rustfmt.toml	Initial commit	2026-04-26 08:06:26 +02:00
SECURITY.md	Tighten CI markdown policy	2026-05-22 15:10:00 +03:00

README.md

e2fsd

e2fsd is the private ext-family filesystem provider daemon behind vfsd. It is started by rootd after keyd and before fatd, and it is reachable only through the provider endpoint delegated to vfsd.

EriX is a clean-room, capability-based microkernel operating system written entirely in Rust.

Technical requirements are tracked in the EriX requirements, conventions, and project documentation.

See:

docs for design documents, specifications, and development plans.
Related architecture repositories for kernel, services, libraries, drivers, and integration tooling.

Purpose of This Repository

This repository implements the EriX ext-family filesystem provider. Its purpose in EriX is to serve validated ext2/ext3/ext4 media behind vfsd without exposing a public service.

Functionally, it parses ext media, validates features and journals, and implements provider file/directory operations. The repository keeps the implementation, interface contracts, tests, and documentation for that behavior in one reviewable ownership boundary.

The maintained responsibilities are:

serve ext-family filesystems only through the private filesystem-provider ABI
validate ext media, journals, checksums, names, and authority before exposing mounts
implement persistent file and directory operations through the assigned blockd endpoint
keep provider authority private with no named entry or public client endpoint

Clean-Room Policy

EriX follows a strict clean-room philosophy:

No external source code may be copied.
No external Rust crates are allowed.
No code generation tools that embed third-party code.
All code must be authored within the project.

Violations will result in rejection of the contribution.

License

All EriX repositories are licensed under the ISC License.

Development Model

EriX development is modular, deterministic, reproducible, authority-explicit, security-first, and self-hosting oriented.

This repository follows the project roadmap and the validation rules documented in its own roadmap.

Current Status

The provider service, startup validation, generic ABI dispatch, CI, and block-backed media path are present. Mount reads device metadata and the ext superblock through the private blockd endpoint, validates core superblock and group/inode geometry, runs every Linux/e2fsprogs ext4 superblock feature bit through an explicit feature registry, verifies supported metadata_csum / gdt_csum metadata using the shared lib-crc CRC-32C primitive, validates internal journals or explicit provider-local external journal mappings, including external JBD2 UUID, superblock, block-size, feature-bit, and sequence-state checks, and rejects obsolete, planned, or unknown ext feature flags, standalone journal_dev volumes offered as filesystem providers, unknown journal feature layouts, read-write mounts of read-only media or read-only-only ext4 feature media (readonly, shared_blocks), and mutation handles after a read-only mount. The exact supported ext geometry envelope is 1 KiB, 2 KiB, and 4 KiB filesystem blocks, matching the current 4 KiB lib-block provider transfer ceiling. Larger ext block sizes are rejected deliberately as a platform ABI limit. Inode records are accepted at 128, 256, or 512 bytes when the record size is a power of two and does not exceed the filesystem block size; larger or irregular records fail closed. Media reads, metadata checksum updates, inline-data handling, xattr handling, quota/orphan cleanup, and mutation writeback preserve unknown bytes in the accepted inode tail.

The media path stores only mount records and open handles in memory. Directory lookup scans media directories, inode metadata is read from group descriptor inode tables, file data follows extents or legacy direct/single/double/triple indirect block maps, and writes allocate/free blocks and inodes through ext bitmaps. New and grown files use inline extent records for small extent sets and spill into indexed extent tree leaf blocks when fragmentation exceeds the inode record capacity; clean ext2 and non-extent ext3 media use legacy direct and indirect block pointers instead. The positive Filesystem provider work VM scenario reads host-created ext media, creates a new marker through vfsd, reopens/stat/readdir-checks it, leaves it on disk, and verifies the partition with host e2fsck -fn after shutdown.

Journaled ext media now routes metadata mutations through a JBD2 transaction on the internal journal inode or an explicitly authorized external journal device. The provider writes file data first, sets the ext recovery-required incompatibility bit, emits descriptor/data/commit records, checkpoints the staged metadata blocks to their home locations, refreshes checksums, and clears recovery-required only after the checkpoint succeeds. If the journal or checkpoint step fails after the recovery bit is set, the bit is left set so the next mount fails closed instead of exposing potentially partial metadata. Provider-originated large mutations are split into bounded full-commit transactions using descriptor/ring capacity accounting, and large file writes publish data blocks before the final size metadata commit.

Current mutation support targets the deterministic clean ext fixtures used by the filesystem-provider VM matrix, including the ext4 metadata_csum + metadata_csum_seed + dir_index + extents + internal journal positive fixture and a fragmented ext4 extent-tree fixture that forces provider-created files out of inline inode extent records. The ext4 bigalloc fixture is mounted read-write with cluster geometry validation, cluster bitmap accounting, full cluster i_blocks accounting, cluster freeing on unlink, malformed bitmap rejection, and host e2fsck -fn verification. The ext4 meta_bg fixture uses e2fsprogs-created meta block groups with 1 KiB blocks and non-default blocks-per-group; descriptor lookup, inode-table reads, checksum refresh, and metadata reservation all use the same descriptor-placement helper, while an impossible s_first_meta_bg variant fails closed at mount. The provider dispatch also parses the expanded generic filesystem-provider ABI for rename, truncate, symlink/readlink, hard-link, and metadata-update requests. Ext media now implements regular-file truncate shrink, no-op, zero-length shrink, and sparse growth for accepted legacy indirect, extent, inline-data, quota, journaled, and bigalloc media, with block/cluster freeing, inode-size and i_blocks updates, checksum refresh, quota sync, and full JBD2 commits where journaling is active. Write-past-EOF regular-file operations allocate only the written logical block range, read unmapped holes as zeroes, preserve legacy and extent holes, and convert only touched unwritten extents before writeback. Verity, immutable/append-only, unsupported encrypted, malformed xattr, read-only, and shared-block states remain denied before writeback begins. Ext media also implements rename for same-directory and cross-directory moves of files, directories, symlinks, and metadata-only special nodes, including compatible overwrite, .. repair for directory moves, HTree entry updates, metadata checksum refresh, journal writeback, quota sync where quota media is mounted, and fail-closed rejection for cycles or non-empty directory overwrite. Ext media also surfaces symlink inode types through stat/readlink, supports fast and block-backed symlink targets, creates symlinks, creates and removes hard links for regular files and symlinks with checked link-count updates, and preserves FIFO/socket/device inode metadata during traversal and directory mutation. Device nodes remain metadata only: open and hard-link mutation for special-file entries return DENIED and do not grant device authority. Ext stat responses expose decoded atime, mtime, ctime, and crtime values with ext extra-epoch and nanosecond fields when the inode size carries them. Metadata updates support controlled mode, uid/gid, atime, mtime, and user-settable filesystem flag changes on accepted writable media. The path preserves high uid/gid bits, generation, project ID, structural inode flags, and unknown inode-tail bytes, refreshes inode and filesystem checksums, routes updates through JBD2 when journaling is active, and denies mutation of current immutable, append-only, imagic, read-only, malformed-xattr, and root-inode states. Integration's advanced ext corpus maintenance rows keep these metadata mutation checks tied to explicit implemented, rejected, obsolete-rejected, planned, and permanent non-goal feature classes so accepted feature bits cannot silently bypass mount, mutation, unit, VM, or documentation evidence. Ext4 stable-resize media carries stable_inodes together with resize_inode: stable-inode identity and UUID-bound encryption-state mutations are rejected, and allocation/freeing treats inode 7 reservation trees and reserved GDT blocks as metadata even if a bitmap is unsafe. The ext2 positive corpus includes a host-generated large-file fixture that crosses the direct-block boundary and exercises legacy indirect allocation, readback, free, and host e2fsck -fn validation. The broader ext2 corpus adds 1 KiB, 2 KiB, and 4 KiB block-size images, 128-byte and 256-byte inode tables, sparse superblocks, non-default blocks-per-group, grown directories, and a 70 MiB sparse file whose tail crosses into triple-indirect mapping while holes still read back as zeroes. The ext2 malformed corpus corrupts block maps, directory records, inode geometry, bitmap metadata, and reserved inode state so mount and mutation paths fail closed. The ext2 compat corpus also mounts media carrying dir_prealloc, imagic_inodes, ext_attr, and resize_inode: allocation skips reserved GDT metadata even if media bitmaps are unsafe, imagic inodes deny provider-originated mutation, and existing external xattr blocks are parsed and preserved for regular writes. Deleting an xattr-bearing inode validates the xattr header, rejects duplicate entry names, decrements shared xattr refcounts, and frees unshared xattr blocks before the inode is cleared. The ext2 xattr corpus carries both a user xattr and a POSIX ACL and is checked with host e2fsck -fn, getfattr, and getfacl. Ext4 xattr handling also accepts inode-body xattrs, metadata-checksummed external xattr blocks, ACL and unknown namespace payload preservation, ea_inode value references, checksum refresh after refcount updates, and deletion cleanup for unshared external xattr blocks. Public list/get/set/remove xattr and POSIX ACL mutation requests are a permanent non-goal for this provider ABI; user, system, trusted, security, and unknown namespaces are never exposed as caller authority and are only parsed for preservation, validation, or deletion cleanup. Ext4 quota/project-quota media is accepted after mount-time comparison of user, group, and project quota files against live inode usage; provider mutations resync quota usage after create, write growth, unlink, and rmdir, and new files/directories inherit project IDs from PROJINHERIT parents. The integration Linux/e2fsprogs interoperability matrix links ext2/ext3/ext4 host-generated images to the feature bits they prove, malformed companions, host tools, and provider-mutation VM scenarios. The targeted 512-byte-inode VM media covers ext2, ext3, and ext4 within the same 4 KiB block envelope. The ext4 geometry fixture combines descriptor and bitmap checksum handling, inode-table reads, internal journaling, extents, HTree directories, inode-body xattrs, and inline-data payloads under the larger inode record. Oversized-geometry VM media now proves that ext2/ext3 block sizes beyond the 4 KiB transfer envelope and ext4 1024-byte inode records fail closed before a VFS mount is exposed. Ext4 inline_data media is accepted for regular files and directories: provider reads and writes inline inode bodies, preserves system.data and other xattrs, creates small files/directories inline when the feature is present, converts them to block-backed storage on growth, and rejects malformed inline payload sizes fail-closed. Ext4 orphan recovery runs before writable mounts are exposed: legacy orphan inode chains and orphan_file records are validated, zero-link orphaned inodes are freed through the normal bitmap/xattr/checksum cleanup path, linked truncate orphans have their orphan pointers cleared, orphan_present is cleared after successful cleanup, and malformed orphan chains or orphan-file checksums fail closed. Ext4 encrypt media is accepted for fscrypt v2 AES-256-XTS file contents and AES-256-CTS filename transforms. Encrypted inodes must carry a valid ext encryption xattr, the referenced key identifier must resolve through the provider-local keyd endpoint, and mismatched or missing key material fails closed before data or names are exposed. fscrypt v1 contexts are parsed for compatibility detection but are an explicit non-goal for data access under the current keyd material-id ABI; v1 media fails closed before key lookup or decryption. Other Linux fscrypt mode or policy variants, including AES-128-CBC/CTS, Adiantum, AES-256-HCTR2, direct-key, IV_INO_LBLK, and non-default data-unit-size policies, are likewise explicit fail-closed states. Ext4 verity media is accepted for read-only fsverity files using the Linux ext4 post-EOF metadata layout. Verity inodes carry a 256-byte Linux v1 descriptor after the Merkle tree, store the descriptor-size footer in the last allocated filesystem block, resolve their SHA-256 root through the provider-local keyd trust-root operation, verify salt-aware Merkle tree blocks before reads are served, and deny every write to verity files. Missing roots, mismatched roots, malformed descriptors, unsupported hash algorithms, malformed built-in signature blobs, syntactically valid PKCS#7 built-in signatures without an explicit signature trust policy, tampered Merkle blocks, and tampered file data fail closed without ambient trust authority. Ext4 readonly and shared_blocks ro-compat media is accepted only when the caller requests a read-only mount. On read-only mounts e2fsd exposes traversal operations (open without write intent, read, stat, and readdir) and denies create, mkdir, write, unlink, rmdir, and write-intent open handles before any media mutation can start. The shared-block fixture exercises host-marker and depth-2 HTree traversal without adding mutation authority. Ext4 casefold media is accepted for UTF-8 encoding ID 1 (utf8-12.1) with encoding flags zero. Casefolded lookup, duplicate detection, HTree hash routing, readdir validation, create, unlink, and rmdir use the Unicode default casefold tables from lib-fs-name; the documented table version is UNICODE_CASEFOLD_VERSION (17.0.0). Unsupported encodings, unsupported casefold/hash combinations, and malformed UTF-8 directory-entry names fail closed before names are exposed or mutated. fscrypt v2 encrypted+casefold directories support read-only lookup and readdir by decrypting names through the private keyd authority and hashing folded plaintext where HTree routing is present; directory mutation under encrypted parents remains denied. Ext4 MMP media is accepted with a conservative phase-4 policy: read-only mounts validate the MMP block and never write it, while writable mounts require a clean sequence, write a deterministic erix-e2fsd claim, verify the checksum and reread state, and reject active, fsck, stale-inconsistent, malformed, or checksum-bad blocks. Stale owner takeover is not attempted because this phase does not grant e2fsd ambient timer authority. Mount-time replay is implemented for the supported JBD2 dialects: clean descriptor/data/revoke/commit streams can be replayed from internal journals or from an external journal device mapped in BootConfig. Standalone journal_dev volumes are not exposed as filesystem providers. The scanner validates journal UUID binding, journal geometry, legacy transaction checksum records, checksum-v1-compatible records, CRC32C checksum-v2/v3 descriptor tails, data-block tags, commit records, and journal superblocks, honors revoke records, handles sequence rollover and partially checkpointed transactions, walks transactions until the journal ring reaches a clean end instead of imposing a fixed transaction-count cap, checkpoints committed metadata, marks the journal clean, and clears ext recovery only after the checkpoint succeeds. Provider-originated writeback also splits large metadata sets across multiple descriptor/data/commit transactions and leaves recovery-required set on journal-capacity failure. JBD2_FEATURE_INCOMPAT_FAST_COMMIT media reserves the fast-commit tail from the full-commit journal ring, replays any pending full commits first, then accepts zeroed tail blocks and HEAD/PAD/TAIL records with feature bits zero, the expected checkpoint transaction ID, and valid CRC-32C tails. Supported host-originated mutating TLVs (ADD_RANGE, DEL_RANGE, CREAT, LINK, UNLINK, and INODE) can span multiple tail-delimited segments and replay through staged writeback. Dentry replay uses the normal checked directory mutation path for HTree parents, handles large linear directories one block at a time, preserves checksum tails, and denies encrypted parent directories without obtaining fscrypt key authority. Unsupported features, unknown tags, bad tails/checksums, malformed ranges, and inconsistent inode or dentry state fail closed without checkpointing partial replay. Provider-originated writes intentionally keep using full JBD2 commits and home checkpoints instead of emitting new fast-commit deltas.

Metadata checksum writeback is implemented for the superblock, group descriptors, block/inode bitmaps, inodes, directory checksum tails, and HTree root/interior/leaf blocks that the current media path mutates. HTree indexed directories support bounded recursive lookup, readdir, insertion, removal, leaf splitting, parent split propagation, cycle rejection, depth-1 operation on ordinary indexed-directory media, and large_dir-gated depth-2 operation; casefolded HTree directories hash and route folded lookup names. The deterministic ext4 large-dir fixture carries a depth-2 HTree, while non-large_dir depth-2 metadata fails closed. The feature registry documents which current Linux/e2fsprogs ext4 feature bits are accepted, read-only-only, planned, obsolete-rejected, or unknown-rejected. Obsolete bits such as compression, dirdata, btree_dir, has_snapshot, replica, lazy_bg, exclude_inode, and exclude_bitmap are rejected with unit and VM negative coverage; undefined gap and high future bits are rejected as unknown, and planned advanced features continue to fail closed until their media semantics are implemented. Malformed/cyclic HTree layouts, unknown JBD2 feature bits, unsupported fast-commit records, malformed checksum dialects, unknown ext feature bits, and unsafe media continue to fail closed by policy.

Validation

cargo fmt --all -- --check
strict cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targets --all-features
integration image builds through the Filesystem provider work fixture path

Governance Principles

e2fsd governance is scoped to private ext-family filesystem service behind vfsd.

The scoped governance rules are:

It serves ext media only through the generic private provider ABI.
It must fail closed on malformed metadata, unsupported feature combinations, missing keys, or journal/trust failures.
It performs media mutation only through the provider-local blockd endpoint, explicit journal mappings, and per-policy key material obtained from provider-local keyd.
It preserves ext xattrs and POSIX ACLs as media metadata only; no public xattr or ACL mutation endpoint is part of the provider contract.
It never exposes a public named service endpoint or peer-provider authority.

Authority Boundaries

e2fsd may hold its provider endpoint, provider-local blockd, provider-local keyd, and authorized external-journal mappings only.
Key and trust material must come from keyd; no filesystem key material is ambient or residual.

Contact

Development occurs in EriX organization and discussions happen in issues and design documents.

No decisions are considered valid without documented rationale.

Maintainers can be reached via email: admin@erikinkinen.fi.