All Machine Learning Models are Wrong But Some Make More Sense

Building machine learning models has become as easy as push of a button, especially with a tool like DataRobot. But which model is the right one to choose? In addition to accuracy and calculation speed, interpretability is commonly referred to as a deciding factor in allowing ML models to be used in real life application. In this talk, I will illustrate innovative methods to elucidate and explain learned by models and reasoning behind predictions made by them.

Combining Watson and Spark For Analysis

The importance of using data effectively is a longstanding fact, and related to that, the technology components of Hadoop/Spark have recently been garnering much attention. The words AI, Watson, and Cognitive Systems have also been in the spotlight. In this talk, we will demonstrate a system combining Watson and Spark as a best-practices example of machine learning and big data in the AI/Cognitive era.

Practical Machine Learning Pipeline using Streaming IoT Sensor Data

Come see some real, physical sensors generate data into a working distributed machine learning pipeline. We will demonstrate a working IoT large-scale ML pipeline implemented using the state of the art H2O framework on the MapR Converged Data Platform. We will begin the talk by using working, live IoT sensors made by a Tokyo-based startup. Then, we’ll take you step by step through the process of how we built a real, production ML pipeline that can make real-time predictions. This talk will be intermediate level and accessible to most engineers and data scientists with a minimal understanding of machine learning. We will also release our code and the data to replicate our demo publicly.

A Data Engineering and Data Science Platform Based On Hadoop/Spark

Using Cloudera Enterprise, it is possible to build and operate an enterprise-grade Hadoop/Spark platform. To make use of big data, what kind of platform is needed, and how do you get the most out of it? From the perspective of data engineering and data science, I will introduce machine learning that uses SQL-on-Hadoop, Spark, and Python.

NLP4L 〜 Using Corpus and Learning-to-Rank For Better Search Results

While recall and precision are two popular performance indicators in information retrieval, search engineers are usually are not really aware of them in their daily work. You can, however, find clues to improve search user experiences by re-examining these two indicators. This presentation will show you the process of improving information retrieval performance using these indicators. As the means of realizing performance improvement, this presentation will include demonstrations of utilizing corpus—such as repositories of business documents—and learning-to-rank in practice to explain the process using NLP4L.