Machine Learning

Содержание

Слайд 2

Inductive logic programming (ILP) is an approach to rule-learning using logic

Inductive logic programming (ILP) is an approach to rule-learning using logic

programming as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples. Inductive programming is a related field that considers any kind of programming language for representing hypotheses (and not only logic programming), such as functional programs.
Слайд 3

Inductive logic programming is particularly useful in bioinformatics and natural language

Inductive logic programming is particularly useful in bioinformatics and natural language

processing. Gordon Plotkin and Ehud Shapiro laid the initial theoretical foundation for inductive machine learning in a logical setting.[67][68][69] Shapiro built their first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples.[70] The term inductive here refers to philosophical induction, suggesting a theory to explain observed facts, rather than mathematical induction, proving a property for all members of a well-ordered set.
Слайд 4

Models Performing machine learning involves creating a model, which is trained

Models
Performing machine learning involves creating a model, which is trained on

some training data and then can process additional data to make predictions. Various types of models have been used and researched for machine learning systems.
Слайд 5

Artificial neural networks An artificial neural network is an interconnected group

Artificial neural networks
An artificial neural network is an interconnected group of

nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another.
Artificial neural networks (ANNs), or connectionist systems, are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules.
Слайд 6

An ANN is a model based on a collection of connected

An ANN is a model based on a collection of connected

units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times.
Слайд 7

The original goal of the ANN approach was to solve problems

The original goal of the ANN approach was to solve problems

in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.
Слайд 8

Deep learning consists of multiple hidden layers in an artificial neural

Deep learning consists of multiple hidden layers in an artificial neural

network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[71]
Слайд 9

Decision trees Main article: Decision tree learning Decision tree learning uses

Decision trees
Main article: Decision tree learning
Decision tree learning uses a decision

tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making.
Слайд 10

Support-vector machines Main article: Support-vector machine Support-vector machines (SVMs), also known

Support-vector machines
Main article: Support-vector machine
Support-vector machines (SVMs), also known as support-vector

networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.[72] An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.
Слайд 11

Illustration of linear regression on a data set. Regression analysis Main

Illustration of linear regression on a data set.
Regression analysis
Main article: Regression

analysis
Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel[73]), logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher-dimensional space.
Слайд 12

Bayesian networks Main article: Bayesian network A simple Bayesian network. Rain

Bayesian networks
Main article: Bayesian network
A simple Bayesian network. Rain influences whether

the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet.
A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.
Слайд 13

Genetic algorithms Main article: Genetic algorithm A genetic algorithm (GA) is

Genetic algorithms
Main article: Genetic algorithm
A genetic algorithm (GA) is a search

algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.[74][75] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[76]
Слайд 14

Training models Usually, machine learning models require a lot of data

Training models
Usually, machine learning models require a lot of data in

order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model. Trained models derived from biased data can result in skewed or undesired predictions. Algorithmic bias is a potential result from data not fully prepared for training.
Слайд 15

Federated learning Main article: Federated learning Federated learning is an adapted

Federated learning
Main article: Federated learning
Federated learning is an adapted form of

distributed artificial intelligence to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.[77]
Слайд 16

Model assessments[edit] Classification of machine learning models can be validated by

Model assessments[edit]
Classification of machine learning models can be validated by accuracy

estimation techniques like the holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[108]
Слайд 17

In addition to overall accuracy, investigators frequently report sensitivity and specificity

In addition to overall accuracy, investigators frequently report sensitivity and specificity

meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the false positive rate (FPR) as well as the false negative rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The total operating characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used receiver operating characteristic (ROC) and ROC's associated area under the curve (AUC).[109]
Слайд 18

Ethics[edit] See also: AI control problem Machine learning poses a host

Ethics[edit]
See also: AI control problem
Machine learning poses a host of ethical

questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[110] For example, in 1988, the UK's Commission for Racial Equality found that St. George's Medical School had been using a computer program trained from data of previous admissions staff and this program had denied nearly 60 candidates who were found to be either women or had non-European sounding names.[97] Using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants by similarity to previous successful applicants.[111][112] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning.
Слайд 19

AI can be well-equipped to make decisions in technical fields, which

AI can be well-equipped to make decisions in technical fields, which

rely heavily on data and historical information. These decisions rely on objectivity and logical reasoning.[113] Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.[114][115]
Other forms of ethical challenges, not related to personal biases, are seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines.[116] This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is potential for machine learning in health care to provide professionals an additional tool to diagnose, medicate, and plan recovery paths for patients, but this requires these biases to be mitigated.[117]
Слайд 20

Hardware[edit] Since the 2010s, advances in both machine learning algorithms and

Hardware[edit]
Since the 2010s, advances in both machine learning algorithms and computer

hardware have led to more efficient methods for training deep neural networks (a particular narrow subdomain of machine learning) that contain many layers of non-linear hidden units.[118] By 2019, graphic processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.[119] OpenAI estimated the hardware compute used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.[120][121]