- Use Cases
- 3D recognition
- Audio recognition
- Data Agumentation
- Design
- Games
- Gesture Recognition
- Hyperparameter Tuning
- Image Recognition
- Interpretability
- Programming and ML
- NLP
- Performance
- Personality recognition
- Search
- Robotics
- Transfer Learning
- Uber
- Video recognition
- Visualization
- Multiple Modalities
- Open problems
- Tools
- Amazon SageMaker
- Apple ARCore
- Apple Core ML
- Apple Create ML
- Apple Natural Language Framework
- Firebase ML Kit
- Google AutoML
- Google Datalab
- Google Dataprep
- Google ML Engine
- Google Natural language
- Google Deep Learning Virtual Machine
- Google Mobile Vision
- Google Speech API
- Google Translation API
- Google Video Intelligence
- Google Vision API
- Experiments Frameworks
- Jupyter Notebook
- Lobe
- Microsoft Azure Bot Service
- Microsoft Azure Machine Learning
- Microsoft Cognitive Services
- Microsoft Cognitive Toolkit
- Supervisely
- Syn Bot Oscova
- Tableau
- TensorFlow
- Turi Create
- Playgrounds
- IDEs
- Repositories
- Models
- Guidelines
- Interview preparation
- Books
- MOOC
- Datasets
- Research groups
- Cartoons
- https://github.com/IsaacGuan/PointNet-Plane-Detection
- accuracy around 85% for 100 epochs using TensorFlow
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, 2017
- CNN Architectures for Large-Scale Audio Classification, S. Hershey et al, 2017
- Audio Set: An ontology and human-labeled dataset for audio events, 2017
- Large-Scale Audio Event Discovery in One Million YouTube Videos, A. Jansen et al, ICASSP 2017
- How do I listen for a sound that matches a pre-recorded sound?
- The Sound Sensor Alert App sentector
- Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone, 2018
- Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation by Ariel Ephrat et al, 2018
- https://github.com/facebookresearch/wav2letter
- Data Augmentation Techniques in CNN using Tensorflow, 2017
- How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native, 2017
- My solution for the Galaxy Zoo challenge, 2014
- Niantic is opening its AR platform so others can make games like Pokémon Go, 2018
- Facebook Open Sources ELF OpenGo, 2018
- Mastering the game of Go without human knowledge by David Silver et al, 2017
- Physical Human Activity Recognition Using Wearable Sensors by Ferhat Attal et al, 2015
- Activity Recognition with Smartphone Sensors by Xing Su et al, 2014
- Motion gesture detection using Tensorflow on Android
- Run or Walk : Detecting Motion Activity Type with Machine Learning and Core ML
- Android DetectedActivity class
- Android ActivityRecognitionApi
Apps
Code repositories
- https://github.com/droiddeveloper1/android-wear-gestures-recognition
- https://github.com/drejkim/AndroidWearMotionSensors
- Hyperparameter tuning on Google Cloud Platform is now faster and smarter
- Hyperparameter tuning in Cloud Machine Learning Engine using Bayesian Optimization, 2017
- MobileNetV2: The Next Generation of On-Device Computer Vision Networks, 2018
- Large-Scale Evolution of Image Classifiers by Esteban Real et al, 2017
- Rethinking the Inception Architecture for Computer Vision by Christian Szegedy et al, 2015
- Inception in TensorFlow - 1.4M images and 1000 classes
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard et al, 2017
- Deep Residual Learning for Image Recognition by Kaiming He et al, 2015
- Going Deeper with Convolutions by C. Szegedy et al, 2014
- ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky et al, 2012
- ImageNet
- the model is based on CNN
- Xception: Deep Learning with Depthwise Separable Convolutions by François Chollet, 2017
- ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, 2012
- Вы и Брэд Питт похожи на 99%
- telegram bot telling you which celebrity your face is similar to
- dlib + resnet + nmslib
- Умные фотографии ВКонтакте, 2018 (Smart photos in Vkontakte)
- FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff et al, 2015
- the model: FaceNet
- NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment by Simon Mezgec et al, 2017
- uses 520 food and drink items (in Slovene) and the Google Custom Search API to search for these images
- Food Classification with Deep Learning in Keras / Tensorflow, 2017
- Im2Calories: towards an automated mobile vision food diary by Austin Myers et al, 2015
- Food 101 Dataset, 2014
- Calories nutrition dataset
- Building an image caption generator with Deep Learning in Tensorflow, 2018
- Exploring the Limits of Weakly Supervised Pretraining by Dhruv Mahajan et al, 2018
- https://github.com/neural-nuts/Cam2Caption
- An Android application which converts camera feed to natural language captions in real time
- tested: low accuracy, slow (big .pb file is used)
- Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge by Oriol Vinyals et al, 2016
- https://github.com/tensorflow/models/tree/master/research/im2txt
- training python scripts
- requires a pretrained Inception v3 checkpoint
- https://github.com/KranthiGV/Pretrained-Show-and-Tell-model with checkpoints
- https://github.com/LitleCarl/ShowAndTell - swift app and training scripts using Keras
- https://github.com/tensorflow/models/tree/master/research/im2txt
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention by Kelvin Xu et al, 2016
- Quantizing deep convolutional networks for efficient inference: A whitepaper by Raghuraman Krishnamoorthi, 2018
- Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits
- speedup of 2x-3x for quantized implementations compared to floating point on CPUs
- Fixed Point Quantization with tensorflow
- Graph transform with TensorFlow
- Removing training-only nodes with Tensorflow
- Optimize for inference with TensorFlow
- See example in TensorFlow for Poets 2: TFMobile codelab
- What do we learn from region based object detectors (Faster R-CNN, R-FCN, FPN)? 2018
- What do we learn from single shot object detectors (SSD, YOLOv3), FPN & Focal loss (RetinaNet)? 2018
- Design choices, lessons learned and trends for object detections?
- Semantic Image Segmentation with DeepLab in Tensorflow, 2018
- model DeepLab-v3+ built on top of CNN
- https://github.com/tensorflow/models/tree/master/research/deeplab
- has Checkpoints and frozen inference graphs
- Deeplab demo on python
- support adopting MobileNetv2 for mobile devices and Xception for server-side deployment
- evaluates results in terms of mIOU (mean intersection-over-union)
- use PASCAL VOC 2012 and Cityscapes semantic segmentation benchmarks as an example in the code
- https://github.com/lankastersky/deeplab_background_segmentation (not working android app)
- Rethinking Atrous Convolution for Semantic Image Segmentation by Liang-Chieh Chen et al, 2017
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features by Liang-Chieh Chen et al, 2017](https://arxiv.org/abs/1712.04837)
- present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction
- built on top of the Faster-RCNN object detector
- evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models
- Mask R-CNN by Kaiming He et al, 2017
- https://github.com/facebookresearch/Detectron
- see links to articles at the end of the page
- extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition
- simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps
- easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework
- outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners
- uses the area under the precision recall curve (AP) metrics
- A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN, 2017
- The Building Blocks of Interpretability, 2018
- GoogleNet for image classification is used as an example
- Attributing a deep network’s prediction to its input features by MUKUND SUNDARARAJAN, 2017
- Integrated Gradients method
- A unified approach to interpreting model predictions by Scott M Lundberg et al, 2017
- "Why Should I Trust You?": Explaining the Predictions of Any Classifier by Marco Tulio Ribeiro et al, 2016
- Monotonic Calibrated Interpolated Look-Up Tables by Maya Gupta et al, 2016
- see Decision trees
- see Distillation
- TREE-TO-TREE NEURAL NETWORKS FOR PROGRAM TRANSLATION by Xinyun Chen et al, 2018
- Software is eating the world, but ML is going to eat software by Erik Meijer, Facebook, 2018
- A Survey of Machine Learning for Big Code and Naturalness by Miltiadis Allamanis et al, 2017
- To type or not to type: quantifying detectable bugs in JavaScript by Gao et al, 2017
- Predicting Defects for Eclipse by T Zimmermann at al, 2007
- used code complexity metrics as features and logistic regression for classification (if file/module has defects) and linear regression for ranking (how many defects)
- Predicting Component Failures at Design Time by Adrian Schroter et al, 2006
- showed that design data such as import relationships can predict failures
- used the number of failures in a component as dependent variable and the imported resources used from this component as input features
- Mining Version Histories to Guide Software Changes by T Zimmermann at al, 2004
- used apriory algorithm to predict likely changes in files/modules
- https://codescene.io
- 3 ways AI will change project management for the better, 2017
- A deep learning model for estimating story points by Morakot Choetkiertikul et al, 2016
- estimating story points based on long short-term memory and recurrent highway network
- Deep code search by Xiaodong Gu1 et al, 2018
- How To Create Natural Language Semantic Search For Arbitrary Objects With Deep Learning, 2018
- https://github.com/hamelsmu/code_search
- you can use similar techniques to search video, audio, and other objects
- Improving Language Understanding with Unsupervised Learning - OpenAI
- SentEval: An Evaluation Toolkit for Universal Sentence Representations by A. Conneau et al, 2018
- https://github.com/facebookresearch/SentEval
- the benchmarks may not be appropriate for domain-specific problems
- Text Embedding Models Contain Bias. Here's Why That Matters, 2018
- How to Clean Text for Machine Learning with Python
- https://ipavlov.ai/ - open-source conversational AI framework built on TensorFlow and Keras (En, Ru)
- Behind the Chat: How E-commerce Robot Assistant AliMe Works, 2018
- How I Used Deep Learning To Train A Chatbot To Talk Like Me (Sorta), 2017
- Short-Text Conversations generative model based on Tensorflow’s embedding_rnn_seq2seq() with custom dataset. Deployed as a Facebook chatbot using heroku (hosting)+express(frontend)+flask(backend)
- Deep Learning for Chatbots, Part 1 – Introduction, 2016
- Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow, 2016
- https://github.com/gunthercox/ChatterBot
- Retrieval-based model based on naive Bayesian classification and search algorithms
- see Sequence to sequence
- A Persona-Based Neural Conversation Model by Jiwei Li et al, 2016
- Smart reply
- Chatbot projects: https://github.com/fendouai/Awesome-Chatbot
- see Chatbot platforms
- LEARNING A NATURAL LANGUAGE INTERFACE WITH NEURAL PROGRAMMER by Arvind Neelakantan et al, 2017
- weakly supervised, end-to-end neural network model mapping natural language queries to logical forms or programs that provide the desired response when executed on the database
Also known as deduplication and record linkage (but not entity recognition which is picking up the names and classifying them in running text)
- Collective Entity Resolution in Familial Networks by Pigi Kouki et al, 2017
- combines machine learning (although not NNs) with collective inference
- Entity Resolution Using Convolutional Neural Network by Ram DeepakGottapu et al, 2016
- Adaptive Blocking: Learning to Scale Up Record Linkage by Mikhail Bilenko et al, 2006
- extremely high recall but low precision
- https://stats.stackexchange.com/questions/136755/popular-named-entity-resolution-software
Other name is concept finders Return the name of a concept given a definition or description:
- Learning to Understand Phrases by Embedding the Dictionary by Felix Hill et al, 2016
- used models: Bag-of-Words NLMs and LSTM
- comparing definitions in a database to the input query, and returning the word whose definitionis ‘closest’ to that query
- see RNNs (with LSTMs)
- see bag-of-word
- Smart Compose: Using Neural Networks to Help Write Emails, 2018
- Introducing Semantic Experiences with Talk to Books and Semantris by Rey Kurzweil et al, 2018
- Keras LSTM tutorial – How to easily build a powerful deep learning language model by Andy, 2018
- Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models by Louis Shao et al, 2017
- trained on a combined data set of over 2.3B conversation messages mined from the web
- The model: LSTM on tensorflow
- Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features by Matteo Pagliardini et al, 2017
- the model: Sent2Vec based on vec2vec
- Skip-Thought Vectors by Ryan Kiros et al, 2015
- based on RNN encoder-decoder models
- Sequence to Sequence Learning with Neural Networks by Ilya Sutskever et al, 2014
- the model: seq2seq based on LSTM
- Distributed Representations of Sentences and Documents by Quoc V. Le, Mikolov, 2014
- Distributed Representations of Words and Phrases and their Compositionality by Tomas Mikolov et al, 2013
- word2vec based on Mikolov's Skip-gram model
- Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks by Richard Socher et al, 2010
- based on context-sensitive recursive neural networks (CRNN)
- see Reverse dictionaries
- How to calculate the sentence similarity using word2vec model
- Doc2Vec
- Average w2v vectors
- Weighted average w2v vectors (e.g. tf-idf)
- RNN-based embeddings (e.g. deep LSTM networks)
- Document Similarity With Word Movers Distance
- A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS by Sanjeev Arora et al, 2017
- uses smooth inverse frequency
- computing the weighted average of word vectors in the sentence and then remove the projections of the average vectors on their first principal component
- example
- https://github.com/peter3125/sentence2vec - requires writing the get_word_frequency() method which can be easily accomplished by using Python's Counter() and returning a dict with keys: unique words w, values: #w/#total doc len
- Advances in Semantic Textual Similarity, 2018
- Semantic Textual Similarity Wiki, 2017
- A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks by Soujanya Poria et al, 2017
- Twitter Sentiment Analysis Using Combined LSTM-CNN Models by SOSAVPM, 2018
- https://github.com/pmsosa/CS291K
- used pre-trained embeddings with LSTM-CNN model with dropouts
- 75.2% accuracy for binary classification (positive-negative tweet)
- doc2vec example, 2015
- How To Create Data Products That Are Magical Using Sequence-to-Sequence Models
- A tutorial on how to summarize text and generate features from Github Issues using deep learning with Keras and TensorFlow
- https://github.com/hamelsmu/Seq2Seq_Tutorial
- Generating Wikipedia by Summarizing Long Sequences by Peter J. Liu et al, 2018
- Universal Language Model Fine-tuning for Text Classification by Jeremy Howard et al, 2018
- outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets.
- with only 100 labeled examples, it matches the performance of training from scratch on 100× more data
- http://nlp.fast.ai/ulmfit
- ChatPainter: Improving text-to-image generation by using dialogue
- AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu et al, 2017
- Efficient Neural Audio Synthesis by Nal Kalchbrenner et al, 2018
- Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention by Hideyuki Tachibana et al, 2017
- https://github.com/r9y9/ (Ryuichi Yamamoto)
- https://github.com/keithito/
- WaveNet: A Generative Model for Raw Audio, 2016
- Mining Facebook Data for Predictive Personality Modeling (Dejan Markovikj,Sonja Gievska, Michal Kosinski, David Stillwell)
- Personality Traits Recognition on Social Network — Facebook (Firoj Alam, Evgeny A. Stepanov, Giuseppe Riccardi)
- The Relationship Between Dimensions of Love, Personality, and Relationship Length (Gorkan Ahmetoglu, Viren Swami, Tomas Chamorro-Premuzic)
- Grasp2Vec: Learning Object Representations from Self-Supervised Grasping
- Achieved a success rate of 80 percent on objects seen during data collection and 59% on novel objects the robot hasn’t encountered before
- Can word2vec be used for search?
- alternative search queries can be built using approximate nearest neighbors in embedding vectors space of terms (using https://github.com/spotify/annoy e.g.)
- Improving Document Ranking with Dual Word Embeddings by Eric Nalisnick et al, 2016
- Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies by Alessandro Achille et al, 2018
- possible solution of catastrophic forgetting
- Deep Learning & Art: Neural Style Transfer – An Implementation with Tensorflow in Python, 2018
- Image Classification using Flowers dataset on Cloud ML Enginge, 2018
- Android & TensorFlow: Artistic Style Transfer, 2018 codelab
- The TensorFlow Poet tutorial shows how to retrain a tensorflow graph to classify images of flowers.
- Everybody Dance Now by CAROLINE CHAN et al, 2018
- Real-time Human Pose Estimation in the Browser with TensorFlow.js, 2018 (Medium post)
- Enabling full body AR with Mask R-CNN2Go by Fei Yang et al, 2018
- PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model by George Papandreou et al, 2018
- Towards Accurate Multi-person Pose Estimation in the Wild by George Papandreou et al, 2017
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields by Zhe Cao et al, 2017
Here are video-specific methods. See also Semantic Segmentation.
- Training and serving a realtime mobile object detector in 30 minutes with Cloud TPUs, 2018
- includes checkpoints
- YOLO: Real-Time Object Detection
- Mobile Real-time Video Segmentation, 2018
- integrated into Youtube stories
- The Instant Motion Tracking Behind Motion Stills AR, 2018
- Behind the Motion Photos Technology in Pixel 2, 2018
- Supercharge your Computer Vision models with the TensorFlow Object Detection API, 2017
- Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks by Michael Gygli, 2017
- Video Shot Boundary Detection based on Color Histogram by J. Mas and G. Fernandez, 2003
Detects when one video (shot/scene/chapter) ends and another begins
- Recurrent Switching Linear Dynamical Systems by Scott W. Linderman et al, 2016
- Video Scene Segmentation Using Markov Chain Monte Carlo by Yun Zha et al, 2006
- Automatic Video Scene Segmentation based on Spatial-Temporal Clues and Rhythm by Walid Mahdi et al, 2000
-
Temporal Relational Reasoning in Videos by Bolei Zhou et al, 2018 - Recognizing and forecasting activities by a few frames
-
DeepStory: Video Story QA by Deep Embedded Memory Networks by Kyung-Min Kim et al, 2017
-
Video Understanding: From Video Classification to Captioning by Jiajun Sun et al, 2017
-
Unsupervised Learning from Narrated Instruction Videos by Jean-Baptiste Alayrac et al, 2015
- Learnable pooling with Context Gating for video classification by Antoine Miech et al, 2018
- Rank #1 at Google Cloud & YouTube-8M Video Understanding Challenge
- Slow for inference/training
- NOT a sequential problem
- Needs lots of data for training
- not clear about very long videos
- The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge, 2017
- Hierarchical Deep Recurrent Architecture for Video Understanding by Luming Tang et al, 2017
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? by Kensho Hara et al, 2017
- https://github.com/kenshohara/video-classification-3d-cnn-pytorch
- trained on the Kinetics dataset from scratch using only RGB input
- pretrained ResNeXt-101 achieved 94.5% and 70.2% on UCF-101 and HMDB-51
- https://github.com/kenshohara/video-classification-3d-cnn-pytorch
- Appearance-and-Relation Networks for Video Classification by Limin Wang et al, 2017
- https://github.com/wanglimin/ARTNet
- trained on the Kinetics dataset from scratch using only RGB input
- 70.9% and 94.3% on HMDB51 UCF101
- https://github.com/wanglimin/ARTNet
- Five video classification methods implemented in Keras and TensorFlow by Matt Harvey, 2017
- Video Understanding: From Video Classification to Captioning by Jiajun Sun et al, 2017
- Video Classification using Two Stream CNNs, 2016 code based on articles below
- Two-Stream Convolutional Networks for Action Recognition in Videos
- Fusing Multi-Stream Deep Networks for Video Classification
- Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification
- Towards Good Practices for Very Deep Two-Stream ConvNets
- Beyond Short Snippets: Deep Networks for Video Classification by Joe Yue-Hei Ng et al, 2015
- In order to learn a global description of the video while maintaining a low computational footprint, we propose processing only one frame per second
- Large-scale Video Classification with Convolutional Neural Networks by Andrej Karpathy et al, 2014
- 63.3% on UCF-101
- Google Datalab
- Google Dataprep
- Tableau
- Google Brain: Big Picture Group
- Deeplearn.js - open source hardware-accelerated machine intelligence library for the web
- Facets - open source visualizations for machine learning datasets
- Embedding Projector - an open source, visualization tool for high-dimensional data
- Recycled goods (not solved, no dataset)
- Recycling symbols explained
- similar to traffic signs recognition
- Safety symbols on cardboard boxes (not solved, no dataset)
- 50+ Useful Machine Learning & Prediction APIs, 2018
- Face and Image Recognition
- Text Analysis, NLP, Sentiment Analysis
- Language Translation
- Machine Learning and prediction
- Command-line tricks data scientists
- Deep Video Analytics
- Data-centric platform for Computer Vision
- https://github.com/akshayubhat/deepvideoanalytics
- Distributed Training: You can’t choose the number of workers and parameter servers independently
- Job Startup Latency: Up to 5 minutes single node
- Hyper Parameters Tuning: In-Preview, and only supports the built-in algorithms
- Batch Prediction: Not supported
- GPU readiness: Bring your own docker image with CUDA installed
- Auto-scale Online Serving: You need to specify the number of nodes
- Training Job Monitoring: No monitoring
- https://github.com/google-ar/arcore-android-sdk
- https://github.com/google-ar/sceneform-android-sdk
- Cloud Anchors android codelab
- https://github.com/google-ar/arcore-ios-sdk
iOS framework from Apple to integrate machine learning models into your app.
Apple framework used with familiar tools like Swift and macOS playgrounds to create and train custom machine learning models on your Mac.
- Introducing Create ML on wwdc2018
- Introducing Natural Language Framework on wwdc2018
- ML Kit: Machine Learning SDK for mobile developers (Google I/O '18)
- Uses Google Cloud APIs under the hood
- Uses custom TensorFlow Lite models
- Can compress TensorFlow to TensorFlow Lite models
- Runs on a device (fast, inaccurate) or on a cloud
- Examples and codelabs
Pros:
- let users train their own custom machine learning algorithms from scratch, without having to write a single line of code
- uses Transfer Learning (the more data and customers, the better results)
- is fully integrated with other Google Cloud services (Google Cloud Storage to store data, use Cloud ML or Vision API to customize the model etc.)
Cons:
- limited to image recognition (2018-Q1)
- doesn't allow to download a trained model
- Powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. It runs on Google Compute Engine and connects to multiple cloud services easily so you can focus on your data science tasks.
- Built on Jupyter (formerly IPython), which boasts a thriving ecosystem of modules and a robust knowledge base.
- Enables analysis of your data on Google BigQuery, Cloud Machine Learning Engine, Google Compute Engine, and Google Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions).
Intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless and works at any scale. Easy data preparation with clicks and no code.
- Samples & Tutorials
- Samples for usage
- Distributed Training: Specify number of nodes, types, (workers/PS), associated accelerators, and sizes
- Job Startup Latency: 90 seconds for single node
- Hyper Parameters Tuning: Grid Search, Random Search, and Bayesian Optimisation
- Batch Prediction: You can submit a batch prediction job for high throughputs
- GPU readiness: Out-of-the box, either via scale-tier, or config file
- Auto-scale Online Serving: Scaled up to your specified maximum number of nodes, down to 0 nodes if no requests for 5 minutes
- Training Job Monitoring: Full monitoring to the cluster nodes (CPU, Memory, etc.)
- Automation of ML: AutoML - Vision, NLP, Speech, etc.
- Specialised Hardware: Tensor Processing Units (TPUs)
- SQL-supported ML: BQML
- entiry recognition: extract information about people, places, events, and much more mentioned in text documents, news articles, or blog posts
- sentiment analysis: understand the overall sentiment expressed in a block of text
- multilingual support
- syntax analysis: extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence
- VMs with CPU and GPU
- Detect Faces (finds facial landmarks such as the eyes, nose, and mouth; doesn't identifies a person)
- Scan barcodes
- Recognize Text
- speech recognition
- word hints: Can provide context hints for improved accuracy. Especially useful for device and app use cases.
- noise robustness: No need for signal processing or noise cancellation before calling API; can handle noisy audio from a variety of environments
- realtime results: can stream text results, returning partial recognition results as they become available. Can also be run on buffered or archived audio files.
- over 80 languages
- can also filter inappropriate content in text results
- Supports more than 100 languages and thousands of language pairs
- automatic language detection
- continuous updates: Translation API is learning from logs analysis and human translation examples. Existing language pairs improve and new language pairs come online at no additional cost
- Label Detection - Detect entities within the video, such as "dog", "flower" or "car"
- Shot Change Detection - Detect scene changes within the video
- Explicit Content Detection - Detect adult content within a video
- Video Transcription - Automatically transcribes video content in English
- Object recognition: detect broad sets of categories within an image, ranging from modes of transportation to animals
- Facial sentiment and logos: Analyze facial features to detect emotions: joy, sorrow, anger; detect logos
- Extract text: detect and extract text within an image, with support of many languages and automatic language identification
- Detect inapropriate content: fetect different types of inappropriate content from adult to violent content
Tools to help you configure, organize, log and reproduce experiments
- https://www.reddit.com/r/MachineLearning/comments/5gyzqj/d_how_do_you_keep_track_of_your_experiments/, 2017
- How to Plan and Run Machine Learning Experiments Systematically by Jason Brownlee, 2017
- using a speadsheet with a template
- https://github.com/IDSIA/sacred
Lobe is an easy-to-use visual tool (no coding required) that lets you build custom deep learning models, quickly train them, and ship them directly in your app without writing any code.
- Annotate images for computer vision tasks using AI
- https://github.com/supervisely/supervisely
- finds similarity between the expressions
- https://github.com/SynHub/syn-bot-samples
- MS Visual Studio is required (doesn't work with VS Code)
- activating Deep Learning feature requires license activating
- number of requests to the server is limited by the license
- Data visualization tool created by Tableau Software.
- Connects to files, relational and Big Data sources, allows transforming data into dashboards that look amazing and are also interactive.
- TensorFlow Hub
- https://github.com/tensorflow/models/tree/master/research
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android
- TF Classify
- TF Detect
- TF Stylize
- TF Speech
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/examples
- TF Classify
- TF Detect
- TF Speech
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo
- TF classify using tflite model
- Freeze tensorflow model graph
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android
- TensorFlow Estimator APIs Tutorials
Apple python framework that simplifies the development of custom machine learning models. You don't have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
- Export models to Core ML for use in iOS, macOS, watchOS, and tvOS apps.
- A Guide to Turi Create from wwdc2018
- Training Data Analyst - Labs and demos for Google Cloud Platform courses
- SEEDBANK - Collection of Interactive Machine Learning Examples
- AI Lab: Learn to Code with the Cutting-Edge Microsoft AI Platform, 2018
- Teachable Machine by Google
- Vision Kit - Do-it-yourself intelligent camera. Experiment with image recognition using neural networks on Raspberry Pi.
- Voice Kit - Do-it-yourself intelligent speaker. Experiment with voice recognition and the Google Assistant on Raspberry Pi.
Pros:
- can model nonlinearities
- are highly interpretable
- do not require extensive feature preprocessing
- do not require enormous data sets
Cons:
- tend to overfit
- fixed by building a decision forest with boosting
- unstable/undeterministic (generate different results while trained on the same data)
- fixed by using bootstrap aggregation/bagging (a boosted forest)
- do mapping directly from the raw input to the label
- better use neural nets that can learn intermediate representations
Hyperparameters:
- tree depth
- maximum number of leaf nodes
- trains a model to mimic the behavior of a pretrained model so it can work independently of the pretrained model
- can train the smaller model with unlabeled examples
- not all target classes need to be represented in the distillation training set
- reduces the need for regularization
- Distilling the Knowledge in a Neural Network by Geoffrey Hinton et al, 2015
- “Why Should I Trust You?” Explaining the Predictions of Any Classifier by Marco Tulio Ribeiro et al, 2016
- Detecting Bias in Black-Box Models Using Transparent Model Distillation by Sarah Tan et al, 2017
- https://github.com/Hironsan/awesome-embedding-models
- gensim's word2vec (embedded words and phrases)
- gensim's doc2vec
- https://github.com/jhlau/doc2vec
- see recursive autoencoders
- see bag-of-words models
- Using Evolutionary AutoML to Discover Neural Network Architectures by by Esteban Real, 2018
- Regularized Evolution for Image Classifier Architecture Search by Esteban Real et al, 2018
- Welcoming the Era of Deep Neuroevolution by Jeff Clune, 2017
- Hierarchical Representations for Efficient Architecture Search by Hanxiao Liu et al, 2017
- Learning Transferable Architectures for Scalable Image Recognition by Barret Zoph et al, 2017
- Large-Scale Evolution of Image Classifiers by Esteban Real et al, 2017
- Evolving Neural Networks through Augmenting Topologies by Stanley and Miikkulainen, 2002
- Statistical metrics
- descriptive statistics: dimensionality, unique subject counts, systematic replicates counts, pdfs, cdfs (probability and cumulative distribution fx's)
- cohort design
- power analysis
- sensitivity analysis
- multiple testing correction analysis
- dynamic range sensitivity
- Numerical analysis metrics
- number of clusters
- PCA dimensions
- MDS space dimensions/distances/curves/surfaces
- variance between buckets/bags/trees/branches
- informative/discriminative indices (i.e. how much does the top 10 features differ from one another and the group)
- feature engineering differnetiators
Approaches when our model doesn’t work:
- Fetch more data
- Add more layers to Neural Network
- Try some new approach in Neural Network
- Train longer (increase the number of iterations)
- Change batch size
- Try Regularisation
- Check Bias Variance trade-off to avoid under and overfitting
- Use more GPUs for faster computation
Back-propagation problems:
- it requires labeled training data; while almost all data is unlabeled
- the learning time does not scale well, which means it is very slow in networks with multiple hidden layers
- it can get stuck in poor local optima, so for deep nets they are far from optimal.
- Understanding Hinton’s Capsule Networks by Max Pechyonkin, 2017
- Capsule Networks (CapsNets) – Tutorial, 2017
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer by Jeff Dean et al
- PathNet: Evolution Channels Gradient Descent in Super Neural Networks by deepmind
- Feature extraction - uses layers of a pretrained model as inputs to another model, effectively chaining two models together
- Perceptrons
- Exploring LSTMs, 2017
- Understanding LSTM Networks by Christopher Olah, 2015
- “Almost all exciting results based on recurrent neural networks are achieved with [LSTMs].”
- Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks by Graves & Schmidhuber, 2009
- showed that RNNs with LSTM are currently the best systems for reading cursive writing
- LONG SHORT-TERM MEMORY by Hochreiter & Schmidhuber, 1997
- The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy, 2015
- see Long-Short Term Memory Networks
- Hopfield Nets (without hidden units)
- Boltzmann machines (stochastic recurrent neural network with hidden units)
- Restricted Boltzmann Machines by Salakhutdinov and Hinton, 2014
- Deep Boltzmann Machines by Salakhutdinov and Hinton, 2012
- Deep Reinforcement Learning Doesn't Work Yet, 2018
- Introducing a New Framework for Flexible and Reproducible Reinforcement Learning Research, 2018
- Neural Architecture Search with Reinforcement Learning by Barret Zoph et al, 2017
- Stanford CS-230 cheatsheets
- The top concepts of Deep Learning, CNNs and RNNs summarized in 3 short pages
- AI Transformation Playbook by Andrew Ng, 2018
- Steps for transforming your enterprise with AI, which I will explain in this playbook:
- Execute pilot projects to gain momentum
- Build an in-house AI team
- Provide broad AI training
- Develop an AI strategy
- Develop internal and external communications
- Steps for transforming your enterprise with AI, which I will explain in this playbook:
- AI at Google: our principles, 2018
- Rules of Machine Learning: Best Practices for ML Engineering by Martin Zinkevich, 2018
- Practical advice for analysis of large, complex data sets by PATRICK RILEY, 2016
- What’s your ML test score? A rubric for ML production systems by Eric Breck, 2016
- Machine Learning: The High Interest Credit Card of Technical Debt by D. Sculley et al, 2014
- Complex Models Erode Boundaries
- Entanglement
- Hidden Feedback Loops
- Undeclared Consumers
- Data Dependencies Cost More than Code Dependencies
- Unstable Data Dependencies
- Underutilized Data Dependencies
- Static Analysis of Data Dependencies
- Correction Cascades
- System-level Spaghetti
- Glue Code
- Pipeline Jungles
- Dead Experimental Codepaths
- Configuration Debt
- Dealing with Changes in the External World
- Fixed Thresholds in Dynamic Systems
- When Correlations No Longer Correlate
- Monitoring and Testing
- Complex Models Erode Boundaries
- Principles of Research Code by Charles Sutton, 2012
- Patterns for Research in Machine Learning by Ali Eslami, 2012
- Lessons learned developing a practical large scale machine learning system by Simon Tong, 2010
- The Professional Data Science Manifesto
- Machine Learning Glossary
- Deep Learning: A Critical Appraisal by Gary Marcus, 2018
- Deep learning thus far is data hungry
- Deep learning thus far is shallow and has limited capacity for transfer
- Deep learning thus far has no natural way to deal with hierarchical structure
- Deep learning thus far has struggled with open-ended inference
- Deep learning thus far is not sufficiently transparent
- Deep learning thus far has not been well integrated with prior knowledge
- Deep learning thus far cannot inherently distinguish causation from correlation
- Deep learning presumes a largely stable world, in ways that may be problematic
- Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted
- Deep learning thus far is difficult to engineer with
- Software 2.0 by Andrej Karpathy, 2017
- 20 Questions to Detect Fake Data Scientists and How to Answer Them, 2018
- Собеседование по Data Science: чего от вас ждут, 2018
- Acing AI Interviews
- https://developers.google.com/machine-learning/crash-course/
- for beginners, explains hard things with simple words
- from google gurus
- uses TensorFlow and codelabs
- https://www.coursera.org/specializations/gcp-data-machine-learning
- shows how to use GCP for machine learning
- Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference by Cameron Davidson-Pilon, 2015
- Statistics is Easy! by Dennis Shasha, 2010
- https://ai.google/tools/datasets/
- https://toolbox.google.com/datasetsearch
- Microsoft Research Open Data
- users can also copy datasets directly to an Azure based Data Science virtual machine
- ScanNet - RGB-D video dataset annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations
- SceneNet - Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth
- The VU sound corpus - based on https://freesound.org/ database
- AudioSet - consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos
- Conceptual Captions: A New Dataset and Challenge for Image Captioning, 2018
- Landmarks 2018
- ImageNet
- COCO
- SUN
- Caltech 256
- Pascal
- CIFAR-10 - 60000 32x32 colour images in 10 classes, with 6000 images per class
- commonly used to train image classifiers
- Microsoft multimedia challenge dataset, 2017
- largest dataset in terms of sentence and vocabulary
- challenge: to automatically generate a complete and natural sentence to describe video content
- Kinetics, 2017
- YouTube-8M, 2017
- large, but annotations are slightly noisy and only video-level labels have been assigned (include frames that do not relate to target actions)
- youtube-dl - Command-line program to download videos from YouTube.com and other video sites
- Sports-1M by A. Karpathy, 2016
- large, but annotations are slightly noisy and only video-level labels have been assigned (include frames that do not relate to target actions)
- FCVID
- ActivityNet
- http://crcv.ucf.edu/data/UCF101.php 2013
- Hollywood2
- HMDB-51
- CCV
- DeepMind
- Facebook AI Research (FAIR)
- Google Brain
- Microsoft Research AI
- OpenAI
- Sentient Labs
- Uber Labs
The Browser of a Data Scientist
A statistician drowned crossing a river that was only three feet deep on average