125 Million Building Footprints
Microsoft just released 125 *million* building footprints as open data. These were generated from satellite imagery. This is huge for OpenStreetMap. https://t.co/gpbevTGFZq
— Waldo Jaquith (@waldojaquith) June 29, 2018
World Cup Doppelgangers #dataviz
When data viz is just as much a piece of art as it is informativehttps://t.co/sJmFpBxAD5 pic.twitter.com/LYPEyLhFrN
— Katie Marriner (@kemarriner) June 29, 2018
Research
Modelling Raw Audio at Scale
The challenge of realistic music generation: modelling raw audio at scale
— DeepMind (@DeepMindAI) June 28, 2018
- Paper: https://t.co/0NqNXVBa2n
- Listen to piano music samples here: https://t.co/qOOi34Jl3P
DeepMoji
π©βπ«π This is such a fun paper!#DeepMoji: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. (@MIT @MediaLab, 2017). Built on a corpus of 1246 million tweets and 64 common emojis!https://t.co/YpGorr3Wr8 pic.twitter.com/PWa0z3rfbV
— π©βπ» DynamicWebPaige @ Cloud City! β (@DynamicWebPaige) June 29, 2018
Evaluating Feature Importance Estimates
Important work from @sarahookr, @doomie, @piekindermans, and @_beenkim on quantifying the extent to which various saliency methods *actually* find relevant portions of images.
— Ari Morcos (@arimorcos) June 29, 2018
Really happy to see more work towards bringing rigor into interpretability! https://t.co/AHvi2N3Ods pic.twitter.com/kSMONsOFB2
sounds relatively similar to permutation importance, but with removal instead of permutation (https://t.co/5fyL81kL1o)
— Sebastian Raschka (@rasbt) June 30, 2018
Tutorials / Reviews
What a Smartphone User is Doing
This week's #KernelAwards winner uses t-distributed Stochastic Neighbor Embedding (t-SNE) and a LGBMClassifier to determine what a smartphone user is doing: https://t.co/yL1RUxRr5m pic.twitter.com/EiH0ugS2TX
— Kaggle (@kaggle) June 29, 2018
Visualize World Cup Pt1 #rstats #dataviz
β½οΈ code-through!
— Mara Averick (@dataandme) June 29, 2018
"Visualize the #WorldCup with R! Pt 1: Recreating Goals w/ ggsoccer and ggplot2" βοΈ @R_by_Ryohttps://t.co/uLy0Qi1WvY #rstats #dataviz pic.twitter.com/TnpcZ8oFcn
Understanding Latent Style
Black-box recommendations are common in industry. This is a guide on the opposite: how to do real science with that latent space. @erinselene @iPancreas @stitchfix_algo https://t.co/5FNztXepjG pic.twitter.com/8gCqhJfVWZ
— christopher e moody (@chrisemoody) June 29, 2018
Cooking Up Statistics
π‘ brill idea (cluster ingredients), π¬ and π½!
— Mara Averick (@dataandme) June 29, 2018
"Cooking Up Statistics: The Science & the Art" π©πΏβπ³ @LetishaAudreyhttps://t.co/DChrwfPUIG #rstats via @RLadiesNYC #RLadies pic.twitter.com/0ixdnoDxM5
Simple Representations for Learning
Slides for my #MLITRW talk β Simple representations for learning: factorizations and similarities
— Gael Varoquaux (@GaelVaroquaux) June 29, 2018
On how to scale matrix factorization to huge data and how to use string similarities to learn on dirty categorical datahttps://t.co/iyAFpbE2w9
Tools
Manipulating Columnar Data
Weβve open-sourced a little library for manipulating columnar data! https://t.co/GiWw2vVBgC
— Mike Bostock (@mbostock) June 29, 2018
Jack the Reader
We just released Jack the Reader - A Machine Reading Framework that allows for quick model prototyping via component reuse, and easy evaluation on new and existing datasets: https://t.co/GPq7nnNOo0 #ACL2018 - currently supporting QA, NLI, and Link Prediction! pic.twitter.com/eJmwnYREmN
— UCL Machine Reading (@uclmr) June 25, 2018
Miscellaneous
A new path towards general intelligence with better metrics, stronger priors and richer models discussed by @fchollet @GoogleAIhttps://t.co/uKhjfOepBF pic.twitter.com/QvNI1PcuVk
— Nathan Benaich (@NathanBenaich) June 29, 2018
Just think of the ad targeting once Amazon knows what meds we're all on https://t.co/w5gxcZV6iI
— Christopher Mims π (@mims) June 28, 2018
Attention model (via @PHDcomics ) pic.twitter.com/lEdfJh7xEd
— Delip Rao (@deliprao) June 29, 2018
Many people simply donβt want an algorithm to decide what they should see. And we should respect that. pic.twitter.com/UdcFm9SEKf
— hardmaru (@hardmaru) June 29, 2018
All models are wrong but some scientists share the code for their models and that makes them useful (the scientists, I mean)
— David Nicholson (@nicholdav) June 29, 2018
Ablation studies are crucial for deep learning research -- can't stress this enough.
— FranΓ§ois Chollet (@fchollet) June 29, 2018
Understanding causality in your system is the most straightforward way to generate reliable knowledge (the goal of any research). And ablation is a very low-effort way to look into causality.
Scientists on Twitter
Great new study about science outreach via Twitter: Initially, scientists mostly tweet to each other. But after accumulating about 1000 followers, scientists reach an increasing number of journalists, policy makers, and other members of the public.https://t.co/35sZgGfkgv pic.twitter.com/S8ybUQcdiA
— Jason Sheltzer (@JSheltzer) June 29, 2018
Tweeting, therefore, has the potential to disseminate scientific information widely.
— Richard (@RichardSocher) June 30, 2018
[...] encourage scientists to invest in building a social media presence.
π
Beyond a threshold of βΌ1000 followers, the range of follower types became more diverse.
Fromhttps://t.co/33DvzMuRF4 pic.twitter.com/bdWXAq1iOC
Whenever I say (paraphrased) "Twitter still houses the best community for #MachineLearning even though that theoretically and practically makes no sense", this is exactly what I mean. Ever more direct connections between those working in, reporting on, or learning about a field. https://t.co/2E9wQhnYkX
— Smerity (@Smerity) June 29, 2018