Scale-out file systems have become very important in the era of big data, and, arguably, none has had a bigger impact than the Hadoop Distributed File System (HDFS).
Being a big data file system means being able to scale out across many machines and to perform reliably even on commodity hardware. Not only has (
HDFS) met those requirements for years for users running Hadoop clusters, but it's now also facilitating the adoption of other technologies by acting, for example, as a persistence layer for
Spark.
[embed]https://youtu.be/FcESXlAIawo[/embed]
During the Mesosphere HackWeek earlier this year, we worked to strengthen the integration of HDFS and Apache Mesos. Work included a configuration option to run HDFS with Mesos-DNS, and efforts to make HDFS more fault-tolerant by handling failover of JournalNodes and NameNodes.
As a result of this work, you can now benefit from a multiplexed setup in which Hadoop runs more reliably, while also sharing cluster space with other big data frameworks and everyday services.
Mesosphere's Datacenter Operating allows users to install HDFS in one command
> dcos package install hdfs
and start taking advantage of these features almost immediately.
To learn more about managing data with HDFS on Mesos and the DCOS, check out the following resources: