Fast Data Science Projects with DC/OS Data Science Engine | D2iQ

Aug 21, 2019

Andrew Hatfield


3 min read


While your Data Science teams deliver great value, they can also be an expensive resource if your organization doesn’t provide what they need to be successful. However, providing the right tools for development, training, and model deployment is a complex and involved process. Each data science project is unique—requiring new environments, datasets, libraries, and applications—all of which are time-consuming and resource intensive to set up. 

In addition, there is the very real risk of data loss due to misconfigured access policies and lost laptops.  Your Data Scientists are often limited to smaller, less usable snapshots of your data instead of the full data lake. As a result, ensuring sufficient data access mechanisms to protect proprietary information without unnecessarily impeding productivity is a balancing act. 

This leads to significant wait times for your Data Scientists as they, or other teams, build and maintain development environments.  Managing the version control and compatibility of all the tools required, while also integrating enterprise services such as authentication, authorization and logging all take time and require expertise your Data Scientists are unlikely to have. In addition, when your Data Scientists are limited to testing their models on individual workstations rather than in a GPU enabled production-like environment, training time is extended with challenges in tuning model accuracy.  Transferring data from your corporate data lake to individual workstations and laptops introduces a significant data security risk.

That’s why today we are pleased to announce the general availability of D2iQ DC/OS Data Science Engine (DSEngine).

Built upon the success and power of D2iQ DC/OS for Fast Data Services, DSEngine delivers a holistic approach to data science environments. Providing full support for Jupyter Notebooks, the leading open source modeling environment for explorative data science work, integrated with Spark, Tensorflow, R, Scala, and Python tools to accelerate Data Scientist on-boarding, as well as developing, training and deploying models.

Further enhancing model accuracy while also reducing risk of data loss due to misplaced laptops, Data Scientists can enjoy full but controlled access to enterprise data lakes.

Organizations can now enjoy greater efficiency by removing the burden of defining, building and maintaining complex environments.  Removing productivity barriers for onboarding new team members, and no longer distracting DevOps Engineers and Operations teams from building cloud native applications, means you can go to market faster at a significantly reduced cost.

Allowing Data Scientists full access to modeling enterprise data while also ensuring data access policies are complied with to mitigate risks of data loss gives organizations confidence in the accuracy of the models and protecting their intellectual property.

"The most impressive thing to me about this setup is that I can have a Spark cluster and a Tensorflow cluster at my fingertips all connected to a Jupyter Notebook in minutes. That is how I work (EMR and a single multi-GPU tensorflow machine), and this is simpler than even doing so on AWS, where multi-node tensorflow is not so easy to setup." — Russell Jurney, Principal Consultant, Data Syndrome

DSEngine is fully available today after being developed with key customers in highly demanding environments. See how DSEngine can accelerate your Data Science dependent cloud native projects and get an advantage over your competitors. Schedule a time to speak with one of our specialists to see a demonstration on how DSEngine can help you today. 

Ready to get started?