Making HDFS more reliable on Mesos [Mesosphere Hackweek]

Jul 01, 2015

Michael Hausenblas

D2iQ

2 min read

Scale-out file systems have become very important in the era of big data, and, arguably, none has had a bigger impact than the Hadoop Distributed File System (HDFS).

Being a big data file system means being able to scale out across many machines and to perform reliably even on commodity hardware. Not only has (HDFS) met those requirements for years for users running Hadoop clusters, but it's now also facilitating the adoption of other technologies by acting, for example, as a persistence layer for Spark.

[embed]https://youtu.be/FcESXlAIawo[/embed]

During the Mesosphere HackWeek earlier this year, we worked to strengthen the integration of HDFS and Apache Mesos. Work included a configuration option to run HDFS with Mesos-DNS, and efforts to make HDFS more fault-tolerant by handling failover of JournalNodes and NameNodes. As a result of this work, you can now benefit from a multiplexed setup in which Hadoop runs more reliably, while also sharing cluster space with other big data frameworks and everyday services.

Mesosphere's Datacenter Operating allows users to install HDFS in one command