Day 41

125 Million Building Footprints

Microsoft just released 125 *million* building footprints as open data. These were generated from satellite imagery. This is huge for OpenStreetMap. https://t.co/gpbevTGFZq
— Waldo Jaquith (@waldojaquith) June 29, 2018

World Cup Doppelgangers #dataviz

When data viz is just as much a piece of art as it is informativehttps://t.co/sJmFpBxAD5 pic.twitter.com/LYPEyLhFrN
— Katie Marriner (@kemarriner) June 29, 2018

Research

Modelling Raw Audio at Scale

The challenge of realistic music generation: modelling raw audio at scale

- Paper: https://t.co/0NqNXVBa2n
- Listen to piano music samples here: https://t.co/qOOi34Jl3P
— DeepMind (@DeepMindAI) June 28, 2018

DeepMoji

👩‍🏫📃 This is such a fun paper!#DeepMoji: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. (@MIT @MediaLab, 2017). Built on a corpus of 1246 million tweets and 64 common emojis!https://t.co/YpGorr3Wr8 pic.twitter.com/PWa0z3rfbV
— 👩‍💻 DynamicWebPaige @ Cloud City! ☁ (@DynamicWebPaige) June 29, 2018

Evaluating Feature Importance Estimates

Important work from @sarahookr, @doomie, @piekindermans, and @_beenkim on quantifying the extent to which various saliency methods *actually* find relevant portions of images.

Really happy to see more work towards bringing rigor into interpretability! https://t.co/AHvi2N3Ods pic.twitter.com/kSMONsOFB2
— Ari Morcos (@arimorcos) June 29, 2018

sounds relatively similar to permutation importance, but with removal instead of permutation (https://t.co/5fyL81kL1o)
— Sebastian Raschka (@rasbt) June 30, 2018

Tutorials / Reviews

What a Smartphone User is Doing

This week's #KernelAwards winner uses t-distributed Stochastic Neighbor Embedding (t-SNE) and a LGBMClassifier to determine what a smartphone user is doing: https://t.co/yL1RUxRr5m pic.twitter.com/EiH0ugS2TX
— Kaggle (@kaggle) June 29, 2018

Visualize World Cup Pt1 #rstats #dataviz

⚽️ code-through!
"Visualize the #WorldCup with R! Pt 1: Recreating Goals w/ ggsoccer and ggplot2" ✍️ @R_by_Ryo https://t.co/uLy0Qi1WvY #rstats #dataviz pic.twitter.com/TnpcZ8oFcn
— Mara Averick (@dataandme) June 29, 2018

Understanding Latent Style

Black-box recommendations are common in industry. This is a guide on the opposite: how to do real science with that latent space. @erinselene @iPancreas @stitchfix_algo https://t.co/5FNztXepjG pic.twitter.com/8gCqhJfVWZ
— christopher e moody (@chrisemoody) June 29, 2018

Cooking Up Statistics

💡 brill idea (cluster ingredients), 🎬 and 📽!
"Cooking Up Statistics: The Science & the Art" 👩🏿‍🍳 @LetishaAudrey https://t.co/DChrwfPUIG #rstats via @RLadiesNYC #RLadies pic.twitter.com/0ixdnoDxM5
— Mara Averick (@dataandme) June 29, 2018

Simple Representations for Learning

Slides for my #MLITRW talk – Simple representations for learning: factorizations and similarities

On how to scale matrix factorization to huge data and how to use string similarities to learn on dirty categorical datahttps://t.co/iyAFpbE2w9
— Gael Varoquaux (@GaelVaroquaux) June 29, 2018

Tools

Manipulating Columnar Data

We’ve open-sourced a little library for manipulating columnar data! https://t.co/GiWw2vVBgC
— Mike Bostock (@mbostock) June 29, 2018

Jack the Reader

We just released Jack the Reader - A Machine Reading Framework that allows for quick model prototyping via component reuse, and easy evaluation on new and existing datasets: https://t.co/GPq7nnNOo0 #ACL2018 - currently supporting QA, NLI, and Link Prediction! pic.twitter.com/eJmwnYREmN
— UCL Machine Reading (@uclmr) June 25, 2018

Miscellaneous

A new path towards general intelligence with better metrics, stronger priors and richer models discussed by @fchollet @GoogleAI https://t.co/uKhjfOepBF pic.twitter.com/QvNI1PcuVk
— Nathan Benaich (@NathanBenaich) June 29, 2018

Just think of the ad targeting once Amazon knows what meds we're all on https://t.co/w5gxcZV6iI
— Christopher Mims 🎆 (@mims) June 28, 2018

Attention model (via @PHDcomics ) pic.twitter.com/lEdfJh7xEd
— Delip Rao (@deliprao) June 29, 2018

Many people simply don’t want an algorithm to decide what they should see. And we should respect that. pic.twitter.com/UdcFm9SEKf
— hardmaru (@hardmaru) June 29, 2018

All models are wrong but some scientists share the code for their models and that makes them useful (the scientists, I mean)
— David Nicholson (@nicholdav) June 29, 2018

Ablation studies are crucial for deep learning research -- can't stress this enough.

Understanding causality in your system is the most straightforward way to generate reliable knowledge (the goal of any research). And ablation is a very low-effort way to look into causality.
— François Chollet (@fchollet) June 29, 2018

Scientists on Twitter

Great new study about science outreach via Twitter: Initially, scientists mostly tweet to each other. But after accumulating about 1000 followers, scientists reach an increasing number of journalists, policy makers, and other members of the public.https://t.co/35sZgGfkgv pic.twitter.com/S8ybUQcdiA
— Jason Sheltzer (@JSheltzer) June 29, 2018

Tweeting, therefore, has the potential to disseminate scientific information widely.
[...] encourage scientists to invest in building a social media presence.
👍
Beyond a threshold of ∼1000 followers, the range of follower types became more diverse.
Fromhttps://t.co/33DvzMuRF4 pic.twitter.com/bdWXAq1iOC
— Richard (@RichardSocher) June 30, 2018

Whenever I say (paraphrased) "Twitter still houses the best community for #MachineLearning even though that theoretically and practically makes no sense", this is exactly what I mean. Ever more direct connections between those working in, reporting on, or learning about a field. https://t.co/2E9wQhnYkX
— Smerity (@Smerity) June 29, 2018

Day 41: Scientists on Twitter

125 Million Building Footprints

World Cup Doppelgangers #dataviz

Research

Modelling Raw Audio at Scale

DeepMoji

Evaluating Feature Importance Estimates

Tutorials / Reviews

What a Smartphone User is Doing

Visualize World Cup Pt1 #rstats #dataviz

Understanding Latent Style

Cooking Up Statistics

Simple Representations for Learning

Tools

Manipulating Columnar Data

Jack the Reader

Miscellaneous

Scientists on Twitter

Curated by

@ceshine_en

Supported by