HDFS edit

A highly resilient distributed cluster file system proven at extreme scale. Consists of a single NameNode service (that's responsible for all metadata management, including the filesystem namespace and block management) plus DataNode services that run on all storage nodes (that manage block IO). Supports NameNode high availability, metadata resilience (via a transaction log), data resilience (via block replication or erasure coding), user authentication, extended ACLs, snapshots, quotas, central caching, a REST API, an NFS gateway, rolling upgrades, rack awareness, transparent encryption, NameNode federation (support for multiple independant NameNodes on the same cluster serving different namespaces) and support for heterogeneous storage. Part of the original Hadoop code base, becoming an Apache Hadoop sub-project in July 2009. Currently being updated to run over the new HDDS (Hadoop Distributed Data Storage) layer, moving block management from the NameNode to a new Storage Container Manager to increase scalability.

Technology Information

Other Names	Hadoop Distributed File System
Type	Sub-Project
Parent Project	Apache Hadoop
Last Updated	September 2018

HDFS edit

Technology Information

HDFS Ecosystem

Links

Blog Posts