Reporting on the ML4ALL machine learning conference
The ML4ALL conference is going on in Portland today and tomorrow. I'm here to check it out, see what it's about, and maybe learn a little bit.
DAY 1 (Monday, 4/29)
12:00 - I caught the tail end of the first speaker: Tyrone Poole of OneApp Oregon, a for-profit company which uses data to help Oregonians find housing.
1:00 - A round of lightning talks, 5-minute presentations on a range of topics including adversarial AI, The Smartest Home In The World, ML Ethics, Bias, and ML and Economics.
2:00 - Erika Pelaez ‐ Building a Machine Learning Classifier to Listen to Killer Whales - interesting presentation on using ML models to identify orca whales from hydrophone data. Valuable perspective on the real-world challenges of working with data. Originally a model was trained on data scraped from the web, and appeared to perform well (99% accuracy). But when applied to newly collected data, the performance plummeted and the model required a lot of reworking. Discussion of using convolutional neural networks to analyze audio data - audio can be processed into a spectrogram or other image representation that can then be fed into a CNN. Mention of some similar work being done by Google
2:30 - Dr. Catherine Nelson (Concur Labs) ‐ Practical Privacy in Machine Learning Systems - a very relevant discussion of data privacy issues in the area of ML. The speaker's organization (Concur Labs) works in the space of optimizing the customer experience for business travelers, so they work heavily with sensitive personal data and business data. Dr. Nelson gives an overview of some approaches being taken to protect customer and business privacy when using their data in machine learning models, including federated learning, k-anonymity and differential privacy. Presentation of results using the new TensorFlow Privacy (tf-privacy) package. Here's a blog post describing the work.
I could only attend for part of the afternoon but enjoyed the presentations I saw. I'll have time to check out a few more of them tomorrow so I'll report back again.
DAY 1 (Tuesday, 4/30)
2:30 - Zhi Yang - Hierarchical Topic Modeling in Cancer Research - Zhi is using topic modeling, specifically latent Dirichlet allocation (paper), applied to genomic data, with the goal is to better understand mutational processes.
Shiraishi et al's have proposed a topic model targeted for somatic mutations to capture the characteristics and burdens contributed by mutational processes. By closely examining the burdens, we'd like to compare them across different categories, say, for example, time, cancer subtype, ethnicity, smoking history, etc.
Then, we'd like to develop the statistical machinery to infer the difference between the mutational profiles across different categories and associate the variations with the know exposures. This tool is potentially useful for identifying novel and existing mutational processes and correlating them with risk factors in which later can be used to monitor any treatment effects in personalized medicine and targeted therapy.
3:29 - I just found out that it's an open bar. Now I feel silly. I ordered a beer and handed the bartender my card and she laughed at me.
3:30 - Karl Weinmeister (Google) - Build, Train, and Serve Your ML Models on Kubernetes with Kubeflow.
"Distributing ML workloads across multiple nodes has become common. To achieve higher and higher levels of accuracy, data scientists are using more data and more complex models than ever before.
Kubeflow is an open-source platform for model building, serving, and training. It is built on industry standard Kubernetes infrastructure and runs in multiple clouds and on-premises."
Addressing the issue of deploying ML applications. There's a huge issue in data science that there's a lot of focus on building and tuning models, but in many ways building a model is only the beginning. In the real world, models need to be deployed as applications that can be built and maintained as efficiently as possible. Kubeflow is a ML cloud platform that provides a solution to this problem. Using Docker (containerization) to deal with the problem of dependencies and versioning. Hands-on demo of application deployment using GCP. Super-interesting talk about the cutting edge of application deployment in the cloud. Kubeflow at GitHub
4:00 - Damon Danieli (DocuSign) -Time-Series Behavioral Analysis for Churn Prediction.
This talk covers the data engineering, feature extraction, and the processing for predicting future user behavior. By the end of the presentation, you should have a direction to explore if you are building your own system as well as some concrete patterns that we found worked for us. We will use real-world examples for B2B SaaS churn prediction, but this talk is equally applicable to predicting any type of outcome that is correlated with user behavior such as conversion to a paying customer, upsell to additional products, etc.We will present lessons learned from several iterations of productizing a system that can take product usage user telemetry events (from Mixpanel, Amplitude, Heap, homegrown, etc) and combine that with business objects (Salesforce) and the application database.
This talk focuses more on the data warehousing and data modeling side of the data equation. Damon discusses issues of managing and leveraging business data to make predictions and gain insight. Valuable discussion of using data modeling to answer key business questions like "which accounts are ripe for upsell" or "which accounts are most likely to churn". Focus specifically on time-series usage data (aka telemetry) - messy data that comes from product instrumenting. This talk focuses more on business application questions than technical machine learning issues but is very valuable nonetheless.
Wrap-up: Attending ML4ALL was informative and enjoyable. The data community is very welcoming and supportive, and that was on show at this event. The organizers made it very accessible cost-wise for individuals to attend. There was a lot of great information on the most current tools and practices, and good opportunities for meeting people and networking. All in all it was definitely worth it for me. Keep your eyes open next year for a discount code (there was one going around that got individuals half off, which brought the price down to just $75).