- Rust 99.9%
| .github | ||
| src | ||
| .editorconfig | ||
| .gitignore | ||
| .markdownlint-cli2.yaml | ||
| ARCHITECTURE.md | ||
| Cargo.toml | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| linker.ld | ||
| README.md | ||
| ROADMAP.md | ||
| rustfmt.toml | ||
| SECURITY.md | ||
e2fsd
e2fsd is the private ext-family filesystem provider daemon behind vfsd. It
is started by rootd after keyd and before fatd, and it is reachable only
through the provider endpoint delegated to vfsd.
EriX is a clean-room, capability-based microkernel operating system written entirely in Rust.
Technical requirements are tracked in the EriX requirements, conventions, and project documentation.
See:
- docs for design documents, specifications, and development plans.
- Related architecture repositories for kernel, services, libraries, drivers, and integration tooling.
Purpose of This Repository
This repository implements the EriX ext-family filesystem provider. Its purpose
in EriX is to serve validated ext2/ext3/ext4 media behind vfsd without
exposing a public service.
Functionally, it parses ext media, validates features and journals, and implements provider file/directory operations. The repository keeps the implementation, interface contracts, tests, and documentation for that behavior in one reviewable ownership boundary.
The maintained responsibilities are:
- serve ext-family filesystems only through the private filesystem-provider ABI
- validate ext media, journals, checksums, names, and authority before exposing mounts
- implement persistent file and directory operations through the assigned
blockdendpoint - keep provider authority private with no
namedentry or public client endpoint
Clean-Room Policy
EriX follows a strict clean-room philosophy:
- No external source code may be copied.
- No external Rust crates are allowed.
- No code generation tools that embed third-party code.
- All code must be authored within the project.
Violations will result in rejection of the contribution.
License
All EriX repositories are licensed under the ISC License.
Development Model
EriX development is modular, deterministic, reproducible, authority-explicit, security-first, and self-hosting oriented.
This repository follows the project roadmap and the validation rules documented in its own roadmap.
Current Status
The provider service, startup validation, generic ABI dispatch, CI, and
block-backed media path are present. Mount reads device metadata and the ext
superblock through the private blockd endpoint, validates core superblock and
group/inode geometry, runs every Linux/e2fsprogs ext4 superblock feature bit
through an explicit feature registry, verifies supported metadata_csum /
gdt_csum metadata using the shared lib-crc CRC-32C primitive, validates
internal journals or explicit provider-local external journal mappings,
including external JBD2 UUID, superblock, block-size, feature-bit, and
sequence-state checks, and rejects obsolete, planned, or unknown ext feature
flags, standalone journal_dev volumes offered as filesystem providers, unknown
journal feature layouts, read-write mounts of read-only media or read-only-only
ext4 feature media (readonly, shared_blocks), and mutation handles after a
read-only mount. The exact supported ext geometry envelope is 1 KiB, 2 KiB, and
4 KiB filesystem blocks, matching the current 4 KiB lib-block provider
transfer ceiling. Larger ext block sizes are rejected deliberately as a platform
ABI limit. Inode records are accepted at 128, 256, or 512 bytes when the record
size is a power of two and does not exceed the filesystem block size; larger or
irregular records fail closed. Media reads, metadata checksum updates,
inline-data handling, xattr handling, quota/orphan cleanup, and mutation
writeback preserve unknown bytes in the accepted inode tail.
The media path stores only mount records and open handles in memory. Directory
lookup scans media directories, inode metadata is read from group descriptor
inode tables, file data follows extents or legacy direct/single/double/triple
indirect block maps, and writes allocate/free blocks and inodes through ext
bitmaps. New and grown files use inline extent records for small extent sets and
spill into indexed extent tree leaf blocks when fragmentation exceeds the inode
record capacity; clean ext2 and non-extent ext3 media use legacy direct and
indirect block pointers instead. The positive Filesystem provider work VM
scenario reads host-created ext media, creates a new marker through vfsd,
reopens/stat/readdir-checks it, leaves it on disk, and verifies the partition
with host e2fsck -fn after shutdown.
Journaled ext media now routes metadata mutations through a JBD2 transaction on the internal journal inode or an explicitly authorized external journal device. The provider writes file data first, sets the ext recovery-required incompatibility bit, emits descriptor/data/commit records, checkpoints the staged metadata blocks to their home locations, refreshes checksums, and clears recovery-required only after the checkpoint succeeds. If the journal or checkpoint step fails after the recovery bit is set, the bit is left set so the next mount fails closed instead of exposing potentially partial metadata. Provider-originated large mutations are split into bounded full-commit transactions using descriptor/ring capacity accounting, and large file writes publish data blocks before the final size metadata commit.
Current mutation support targets the deterministic clean ext fixtures used by
the filesystem-provider VM matrix, including the ext4 metadata_csum + metadata_csum_seed + dir_index + extents + internal journal positive fixture
and a fragmented ext4 extent-tree fixture that forces provider-created files out
of inline inode extent records. The ext4 bigalloc fixture is mounted read-write
with cluster geometry validation, cluster bitmap accounting, full cluster
i_blocks accounting, cluster freeing on unlink, malformed bitmap rejection,
and host e2fsck -fn verification. The ext4 meta_bg fixture uses
e2fsprogs-created meta block groups with 1 KiB blocks and non-default
blocks-per-group; descriptor lookup, inode-table reads, checksum refresh, and
metadata reservation all use the same descriptor-placement helper, while an
impossible s_first_meta_bg variant fails closed at mount. The provider
dispatch also parses the expanded generic filesystem-provider ABI for rename,
truncate, symlink/readlink, hard-link, and metadata-update requests. Ext media
now implements regular-file truncate shrink, no-op, zero-length shrink, and
sparse growth for accepted legacy indirect, extent, inline-data, quota,
journaled, and bigalloc media, with block/cluster freeing, inode-size and
i_blocks updates, checksum refresh, quota sync, and full JBD2 commits where
journaling is active. Write-past-EOF regular-file operations allocate only the
written logical block range, read unmapped holes as zeroes, preserve legacy and
extent holes, and convert only touched unwritten extents before writeback.
Verity, immutable/append-only, unsupported encrypted, malformed xattr,
read-only, and shared-block states remain denied before writeback begins. Ext
media also implements rename for same-directory and cross-directory moves of
files, directories, symlinks, and metadata-only special nodes, including
compatible overwrite, .. repair for directory moves, HTree entry updates,
metadata checksum refresh, journal writeback, quota sync where quota media is
mounted, and fail-closed rejection for cycles or non-empty directory overwrite.
Ext media also surfaces symlink inode types through stat/readlink, supports
fast and block-backed symlink targets, creates symlinks, creates and removes
hard links for regular files and symlinks with checked link-count updates, and
preserves FIFO/socket/device inode metadata during traversal and directory
mutation. Device nodes remain metadata only: open and hard-link mutation for
special-file entries return DENIED and do not grant device authority. Ext
stat responses expose decoded atime, mtime, ctime, and crtime values with ext
extra-epoch and nanosecond fields when the inode size carries them. Metadata
updates support controlled mode, uid/gid, atime, mtime, and user-settable
filesystem flag changes on accepted writable media. The path preserves high
uid/gid bits, generation, project ID, structural inode flags, and unknown
inode-tail bytes, refreshes inode and filesystem checksums, routes updates
through JBD2 when journaling is active, and denies mutation of current
immutable, append-only, imagic, read-only, malformed-xattr, and root-inode
states. Integration's advanced ext corpus maintenance rows keep these metadata
mutation checks tied to explicit implemented, rejected, obsolete-rejected,
planned, and permanent non-goal feature classes so accepted feature bits cannot
silently bypass mount, mutation, unit, VM, or documentation evidence. Ext4
stable-resize media carries stable_inodes together with resize_inode:
stable-inode identity and UUID-bound encryption-state mutations are rejected,
and allocation/freeing treats inode 7 reservation trees and reserved GDT blocks
as metadata even if a bitmap is unsafe. The ext2 positive corpus includes a
host-generated large-file fixture that crosses the direct-block boundary and
exercises legacy indirect allocation, readback, free, and host e2fsck -fn
validation. The broader ext2 corpus adds 1 KiB, 2 KiB, and 4 KiB block-size
images, 128-byte and 256-byte inode tables, sparse superblocks, non-default
blocks-per-group, grown directories, and a 70 MiB sparse file whose tail crosses
into triple-indirect mapping while holes still read back as zeroes. The ext2
malformed corpus corrupts block maps, directory records, inode geometry, bitmap
metadata, and reserved inode state so mount and mutation paths fail closed. The
ext2 compat corpus also mounts media carrying dir_prealloc, imagic_inodes,
ext_attr, and resize_inode: allocation skips reserved GDT metadata even if
media bitmaps are unsafe, imagic inodes deny provider-originated mutation, and
existing external xattr blocks are parsed and preserved for regular writes.
Deleting an xattr-bearing inode validates the xattr header, rejects duplicate
entry names, decrements shared xattr refcounts, and frees unshared xattr blocks
before the inode is cleared. The ext2 xattr corpus carries both a user xattr and
a POSIX ACL and is checked with host e2fsck -fn, getfattr, and getfacl.
Ext4 xattr handling also accepts inode-body xattrs, metadata-checksummed
external xattr blocks, ACL and unknown namespace payload preservation,
ea_inode value references, checksum refresh after refcount updates, and
deletion cleanup for unshared external xattr blocks. Public list/get/set/remove
xattr and POSIX ACL mutation requests are a permanent non-goal for this provider
ABI; user, system, trusted, security, and unknown namespaces are never exposed
as caller authority and are only parsed for preservation, validation, or
deletion cleanup. Ext4 quota/project-quota media is accepted after mount-time
comparison of user, group, and project quota files against live inode usage;
provider mutations resync quota usage after create, write growth, unlink, and
rmdir, and new files/directories inherit project IDs from PROJINHERIT parents.
The integration Linux/e2fsprogs interoperability matrix links ext2/ext3/ext4
host-generated images to the feature bits they prove, malformed companions, host
tools, and provider-mutation VM scenarios. The targeted 512-byte-inode VM media
covers ext2, ext3, and ext4 within the same 4 KiB block envelope. The ext4
geometry fixture combines descriptor and bitmap checksum handling, inode-table
reads, internal journaling, extents, HTree directories, inode-body xattrs, and
inline-data payloads under the larger inode record. Oversized-geometry VM media
now proves that ext2/ext3 block sizes beyond the 4 KiB transfer envelope and
ext4 1024-byte inode records fail closed before a VFS mount is exposed. Ext4
inline_data media is accepted for regular files and directories: provider
reads and writes inline inode bodies, preserves system.data and other xattrs,
creates small files/directories inline when the feature is present, converts
them to block-backed storage on growth, and rejects malformed inline payload
sizes fail-closed. Ext4 orphan recovery runs before writable mounts are exposed:
legacy orphan inode chains and orphan_file records are validated, zero-link
orphaned inodes are freed through the normal bitmap/xattr/checksum cleanup path,
linked truncate orphans have their orphan pointers cleared, orphan_present is
cleared after successful cleanup, and malformed orphan chains or orphan-file
checksums fail closed. Ext4 encrypt media is accepted for fscrypt v2
AES-256-XTS file contents and AES-256-CTS filename transforms. Encrypted inodes
must carry a valid ext encryption xattr, the referenced key identifier must
resolve through the provider-local keyd endpoint, and mismatched or missing
key material fails closed before data or names are exposed. fscrypt v1 contexts
are parsed for compatibility detection but are an explicit non-goal for data
access under the current keyd material-id ABI; v1 media fails closed before
key lookup or decryption. Other Linux fscrypt mode or policy variants, including
AES-128-CBC/CTS, Adiantum, AES-256-HCTR2, direct-key, IV_INO_LBLK, and
non-default data-unit-size policies, are likewise explicit fail-closed states.
Ext4 verity media is accepted for read-only fsverity files using the Linux
ext4 post-EOF metadata layout. Verity inodes carry a 256-byte Linux v1
descriptor after the Merkle tree, store the descriptor-size footer in the last
allocated filesystem block, resolve their SHA-256 root through the
provider-local keyd trust-root operation, verify salt-aware Merkle tree blocks
before reads are served, and deny every write to verity files. Missing roots,
mismatched roots, malformed descriptors, unsupported hash algorithms, malformed
built-in signature blobs, syntactically valid PKCS#7 built-in signatures without
an explicit signature trust policy, tampered Merkle blocks, and tampered file
data fail closed without ambient trust authority. Ext4 readonly and
shared_blocks ro-compat media is accepted only when the caller requests a
read-only mount. On read-only mounts e2fsd exposes traversal operations
(open without write intent, read, stat, and readdir) and denies
create, mkdir, write, unlink, rmdir, and write-intent open handles
before any media mutation can start. The shared-block fixture exercises
host-marker and depth-2 HTree traversal without adding mutation authority. Ext4
casefold media is accepted for UTF-8 encoding ID 1 (utf8-12.1) with encoding
flags zero. Casefolded lookup, duplicate detection, HTree hash routing, readdir
validation, create, unlink, and rmdir use the Unicode default casefold tables
from lib-fs-name; the documented table version is UNICODE_CASEFOLD_VERSION
(17.0.0). Unsupported encodings, unsupported casefold/hash combinations, and
malformed UTF-8 directory-entry names fail closed before names are exposed or
mutated. fscrypt v2 encrypted+casefold directories support read-only lookup and
readdir by decrypting names through the private keyd authority and hashing
folded plaintext where HTree routing is present; directory mutation under
encrypted parents remains denied. Ext4 MMP media is accepted with a conservative
phase-4 policy: read-only mounts validate the MMP block and never write it,
while writable mounts require a clean sequence, write a deterministic
erix-e2fsd claim, verify the checksum and reread state, and reject active,
fsck, stale-inconsistent, malformed, or checksum-bad blocks. Stale owner
takeover is not attempted because this phase does not grant e2fsd ambient
timer authority. Mount-time replay is implemented for the supported JBD2
dialects: clean descriptor/data/revoke/commit streams can be replayed from
internal journals or from an external journal device mapped in BootConfig.
Standalone journal_dev volumes are not exposed as filesystem providers. The
scanner validates journal UUID binding, journal geometry, legacy transaction
checksum records, checksum-v1-compatible records, CRC32C checksum-v2/v3
descriptor tails, data-block tags, commit records, and journal superblocks,
honors revoke records, handles sequence rollover and partially checkpointed
transactions, walks transactions until the journal ring reaches a clean end
instead of imposing a fixed transaction-count cap, checkpoints committed
metadata, marks the journal clean, and clears ext recovery only after the
checkpoint succeeds. Provider-originated writeback also splits large metadata
sets across multiple descriptor/data/commit transactions and leaves
recovery-required set on journal-capacity failure.
JBD2_FEATURE_INCOMPAT_FAST_COMMIT media reserves the fast-commit tail from the
full-commit journal ring, replays any pending full commits first, then accepts
zeroed tail blocks and HEAD/PAD/TAIL records with feature bits zero, the
expected checkpoint transaction ID, and valid CRC-32C tails. Supported
host-originated mutating TLVs (ADD_RANGE, DEL_RANGE, CREAT, LINK,
UNLINK, and INODE) can span multiple tail-delimited segments and replay
through staged writeback. Dentry replay uses the normal checked directory
mutation path for HTree parents, handles large linear directories one block at a
time, preserves checksum tails, and denies encrypted parent directories without
obtaining fscrypt key authority. Unsupported features, unknown tags, bad
tails/checksums, malformed ranges, and inconsistent inode or dentry state fail
closed without checkpointing partial replay. Provider-originated writes
intentionally keep using full JBD2 commits and home checkpoints instead of
emitting new fast-commit deltas.
Metadata checksum writeback is implemented for the superblock, group
descriptors, block/inode bitmaps, inodes, directory checksum tails, and HTree
root/interior/leaf blocks that the current media path mutates. HTree indexed
directories support bounded recursive lookup, readdir, insertion, removal, leaf
splitting, parent split propagation, cycle rejection, depth-1 operation on
ordinary indexed-directory media, and large_dir-gated depth-2 operation;
casefolded HTree directories hash and route folded lookup names. The
deterministic ext4 large-dir fixture carries a depth-2 HTree, while
non-large_dir depth-2 metadata fails closed. The feature registry documents
which current Linux/e2fsprogs ext4 feature bits are accepted, read-only-only,
planned, obsolete-rejected, or unknown-rejected. Obsolete bits such as
compression, dirdata, btree_dir, has_snapshot, replica, lazy_bg,
exclude_inode, and exclude_bitmap are rejected with unit and VM negative
coverage; undefined gap and high future bits are rejected as unknown, and
planned advanced features continue to fail closed until their media semantics
are implemented. Malformed/cyclic HTree layouts, unknown JBD2 feature bits,
unsupported fast-commit records, malformed checksum dialects, unknown ext
feature bits, and unsafe media continue to fail closed by policy.
Validation
cargo fmt --all -- --check- strict
cargo clippy --all-targets --all-features -- -D warnings cargo test --all-targets --all-features- integration image builds through the Filesystem provider work fixture path
Governance Principles
e2fsd governance is scoped to private ext-family filesystem service behind
vfsd.
The scoped governance rules are:
- It serves ext media only through the generic private provider ABI.
- It must fail closed on malformed metadata, unsupported feature combinations, missing keys, or journal/trust failures.
- It performs media mutation only through the provider-local
blockdendpoint, explicit journal mappings, and per-policy key material obtained from provider-localkeyd. - It preserves ext xattrs and POSIX ACLs as media metadata only; no public xattr or ACL mutation endpoint is part of the provider contract.
- It never exposes a public
namedservice endpoint or peer-provider authority.
Authority Boundaries
e2fsdmay hold its provider endpoint, provider-localblockd, provider-localkeyd, and authorized external-journal mappings only.- Key and trust material must come from
keyd; no filesystem key material is ambient or residual.
Contact
Development occurs in EriX organization and discussions happen in issues and design documents.
No decisions are considered valid without documented rationale.
Maintainers can be reached via email: admin@erikinkinen.fi.