HDFS edit  

A highly resilient distributed cluster file system proven at extreme scale. Consists of a single NameNode service (that's responsible for all metadata management, including the filesystem namespace and block management) plus DataNode services that run on all storage nodes (that manage block IO). Supports NameNode high availability, metadata resilience (via a transaction log), data resilience (via block replication or erasure coding), user authentication, extended ACLs, snapshots, quotas, central caching, a REST API, an NFS gateway, rolling upgrades, rack awareness, transparent encryption, NameNode federation (support for multiple independant NameNodes on the same cluster serving different namespaces) and support for heterogeneous storage. Part of the original Hadoop code base, becoming an Apache Hadoop sub-project in July 2009. Currently being updated to run over the new HDDS (Hadoop Distributed Data Storage) layer, moving block management from the NameNode to a new Storage Container Manager to increase scalability.

Technology Information

Other NamesHadoop Distributed File System
TypeSub-Project
Parent ProjectApache Hadoop
Last UpdatedSeptember 2018

HDFS Ecosystem

HDFS Ecosystem

Blog Posts