Train with python. Predict with C++

Июль 30, 2022

Главная
Информатика
Train with python. Predict with C++

Содержание

2. Machine Learning everywhere! Mobile Embedded Automotive Desktops Games Finance Etc. Image from [1]
3. Dream team Developer Data Scientist
4. Dream team – synergy way Developer Data Scientist Research Developer
5. Dream team – process way Developer Data Scientist Communications
6. Machine learning sample cases Energy efficiency prediction Intrusion detection system Image classification
7. Buildings Energy Efficiency ref: [2] Input attributes Relative Compactness Surface Area Wall Area etc. Outcomes Heating
8. Regression problem
9. Regression problem
10. Regression problem
11. Quality metric
12. Baseline model class Predictor { public: using features = std::vector ; virtual ~Predictor() {}; virtual double
13. Linear regression class LinregPredictor: public Predictor { public: LinregPredictor(const std::vector &); double predict(const features& feat) const
14. Polynomial regression
15. Polynomial regression class PolyPredictor: public LinregPredictor { public: using LinregPredictor::LinregPredictor; double predict(const features& feat) const override
16. Integration testing you always have a lot of data for testing use python model output as
17. Intrusion detection system input - network traffic features protocol_type connection duration src_bytes dst_bytes etc. Output normal
18. Classification problem
19. Quality metrics Receive operation characteristics (ROC) curve
20. Baseline model always predict most frequent class ROC area under the curve = 0.5
21. Logistic regression
22. Logistic regression easy to implement template auto sigma(T z) { return 1/(1 + std::exp(-z)); } class
23. Gradient boosting de facto standard universal method multiple well known C++ implementations with python bindings XGBoost
24. CatBoost C API and C++ wrapper own build system (ymake) class CatboostClassifier: public BinaryClassifier { public:
25. CatBoost ROC-AUC = 0.9999
26. Image classification Handwritten digits recognizer – MNIST input – gray-scale pixels 28x28 output – digit on
27. Multilayer perceptron Image from: [4]
28. Quality metrics
29. Multilayer perceptron auto MlpClassifier::predict_proba(const features_t& feat) const { VectorXf x{feat.size()}; auto o1 = sigmav(w1_ * x);
30. Convolutional networks State of the Art algorithms in image processing a lot of C++ implementation with
31. Tensorflow C++ API Bazel build system Hint – prebuild C API
32. Conclusion Don’t be fear of the ML Try simpler things first Get benefits from different languages
33. References Andrew Ng, Machine Learning – coursera Energy efficiency Data Set KDD Cup 1999 MNIST training
35. Скачать презентацию

Слайд 2

Machine Learning everywhere!
Mobile
Embedded
Automotive
Desktops
Games
Finance
Etc.
Image from [1]

Слайд 3

Dream team
Developer
Data Scientist

Слайд 4

Dream team – synergy way
Developer
Data Scientist
Research Developer

Слайд 5

Dream team – process way
Developer
Data Scientist
Communications

Слайд 6

Machine learning sample cases
Energy efficiency prediction
Intrusion detection system
Image classification

Слайд 7

Buildings Energy Efficiency
ref: [2]
Input attributes
Relative Compactness
Surface Area
Wall Area
etc.
Outcomes
Heating Load

Слайд 8

Regression problem

Слайд 9

Regression problem

Слайд 10

Regression problem

Слайд 11

Quality metric

Слайд 12

Baseline model

class Predictor {
public:
using features = std::vector;
virtual ~Predictor() {};
virtual

double predict(const features&) const = 0;
};

class MeanPredictor: public Predictor {
public:
MeanPredictor(double mean);
double predict(const features&) const override { return mean_; }
protected:
double mean_;
};

Слайд 13

Linear regression

class LinregPredictor: public Predictor {
public:
LinregPredictor(const std::vector&);
double predict(const features& feat)

const override {
assert(feat.size() + 1 == coef_.size());
return std::inner_product(feat.begin(), feat.end(), ++coef_.begin(), coef_.front());
}
protected:
std::vector coef_;
};

Слайд 14

Polynomial regression

Слайд 15

Polynomial regression
class PolyPredictor: public LinregPredictor {
public:
using LinregPredictor::LinregPredictor;
double predict(const features& feat)

const override {
features poly_feat{feat};
const auto m = feat.size();
poly_feat.reserve(m*(m+1)/2);
for (size_t i = 0; i < m; ++i) {
for (size_t j = i; j < m; ++j) {
poly_feat.push_back(feat[i]*feat[j]);
}
}
return LinregPredictor::predict(poly_feat);
}
};

Слайд 16

Integration testing
you always have a lot of data for testing
use python

model output as expected values
beware of floating point arithmetic problems

TEST(LinregPredictor, compare_to_python) {
auto predictor = LinregPredictor{coef};
double y_pred_expected = 0.0;
std::ifstream test_data{"../train/test_data_linreg.csv"};
while (read_features(test_data, features)) {
test_data >> y_pred_expected;
auto y_pred = predictor.predict(features);
EXPECT_NEAR(y_pred_expected, y_pred, 1e-4);
}
}

Слайд 17

Intrusion detection system
input - network traffic features
protocol_type
connection duration
src_bytes
dst_bytes
etc.
Output
normal
network attack
ref: [3]

Слайд 18

Classification problem

Слайд 19

Quality metrics
Receive operation characteristics (ROC) curve

Слайд 20

Baseline model
always predict most frequent class
ROC area under the curve =

0.5

Слайд 21

Logistic regression

Слайд 22

Logistic regression
easy to implement
template
auto sigma(T z) {
return 1/(1 +

std::exp(-z));
}
class LogregClassifier: public BinaryClassifier {
public:
float predict_proba(const features_t& feat) const override {
auto z = std::inner_product(feat.begin(), feat.end(), ++coef_.begin(), coef_.front());
return sigma(z);
}
protected:
std::vector coef_;
};

Слайд 23

Gradient boosting
de facto standard universal method
multiple well known C++ implementations

with python bindings
XGBoost
LigthGBM
CatBoost
each implementation has its own custom model format

Слайд 24

CatBoost
C API and C++ wrapper
own build system (ymake)
class CatboostClassifier: public BinaryClassifier

{
public:
CatboostClassifier(const std::string& modepath);
~CatboostClassifier() override;
double predict_proba(const features_t& feat) const override {
double result = 0.0;
if (!CalcModelPredictionSingle(model_, feat.data(), feat.size(), nullptr, 0, &result, 1)) {
throw std::runtime_error{"CalcModelPredictionFlat error message:" + GetErrorString()};
}
return result;
}
private:
ModelCalcerHandle* model_;
}

Слайд 25

CatBoost
ROC-AUC = 0.9999

Слайд 26

Image classification
Handwritten digits recognizer – MNIST
input – gray-scale pixels 28x28
output –

digit on picture (0, 1, … 9)

Слайд 27

Multilayer perceptron
Image from: [4]

Слайд 28

Quality metrics

Слайд 29

$Multilayer perceptron auto MlpClassifier::predict_proba(const features_t& feat) const { VectorXf x{feat.size()}; auto$

Multilayer perceptron

auto MlpClassifier::predict_proba(const features_t& feat) const {
VectorXf x{feat.size()};
auto o1

= sigmav(w1_ * x);
auto o2 = softmax(w2_ * o1);
return o2;
}

Слайд 30

Convolutional networks
State of the Art algorithms in image processing
a lot of

C++ implementation with python bindings
TensorFlow
Caffe
MXNet
CNTK

Слайд 31

Tensorflow
C++ API
Bazel build system
Hint – prebuild C API

Слайд 32

Conclusion
Don’t be fear of the ML
Try simpler things first
Get benefits from

different languages

Слайд 33

References
Andrew Ng, Machine Learning – coursera
Energy efficiency Data Set
KDD Cup 1999
MNIST

training with Multi Layer Perceptron
Code samples

Train with python. Predict with C++

Содержание

Machine Learning everywhere!MobileEmbeddedAutomotiveDesktopsGamesFinanceEtc.Image from [1]

Dream teamDeveloperData Scientist

Dream team – synergy wayDeveloperData ScientistResearch Developer

Dream team – process wayDeveloperData ScientistCommunications

Machine learning sample casesEnergy efficiency prediction Intrusion detection systemImage classification

Buildings Energy Efficiencyref: [2] Input attributesRelative CompactnessSurface AreaWall Areaetc.OutcomesHeating Load

Regression problem

Regression problem

Regression problem

Quality metric

Baseline model class Predictor {public: using features = std::vector; virtual ~Predictor() {};virtual

Linear regression class LinregPredictor: public Predictor {public: LinregPredictor(const std::vector&);double predict(const features& feat)

Polynomial regression

Polynomial regressionclass PolyPredictor: public LinregPredictor {public: using LinregPredictor::LinregPredictor;double predict(const features& feat)

Integration testingyou always have a lot of data for testinguse python

Intrusion detection systeminput - network traffic featuresprotocol_typeconnection durationsrc_bytesdst_bytesetc.Outputnormalnetwork attackref: [3]

Classification problem

Quality metricsReceive operation characteristics (ROC) curve

Baseline modelalways predict most frequent classROC area under the curve =

Logistic regression

Logistic regressioneasy to implementtemplateauto sigma(T z) { return 1/(1 +

Gradient boosting de facto standard universal methodmultiple well known C++ implementations

CatBoostC API and C++ wrapperown build system (ymake)class CatboostClassifier: public BinaryClassifier

CatBoostROC-AUC = 0.9999

Image classificationHandwritten digits recognizer – MNISTinput – gray-scale pixels 28x28output –

Multilayer perceptronImage from: [4]

Quality metrics

Multilayer perceptron auto MlpClassifier::predict_proba(const features_t& feat) const { VectorXf x{feat.size()}; auto o1

Convolutional networksState of the Art algorithms in image processinga lot of

TensorflowC++ APIBazel build systemHint – prebuild C API

ConclusionDon’t be fear of the MLTry simpler things firstGet benefits from

ReferencesAndrew Ng, Machine Learning – courseraEnergy efficiency Data SetKDD Cup 1999MNIST

Похожие презентации

Machine Learning everywhere!
Mobile
Embedded
Automotive
Desktops
Games
Finance
Etc.
Image from [1]

Dream team
Developer
Data Scientist

Dream team – synergy way
Developer
Data Scientist
Research Developer

Dream team – process way
Developer
Data Scientist
Communications

Machine learning sample cases
Energy efficiency prediction
Intrusion detection system
Image classification

Buildings Energy Efficiency
ref: [2]
Input attributes
Relative Compactness
Surface Area
Wall Area
etc.
Outcomes
Heating Load

Baseline model

class Predictor {
public:
using features = std::vector;
virtual ~Predictor() {};
virtual

Linear regression

class LinregPredictor: public Predictor {
public:
LinregPredictor(const std::vector&);
double predict(const features& feat)

Polynomial regression
class PolyPredictor: public LinregPredictor {
public:
using LinregPredictor::LinregPredictor;
double predict(const features& feat)

Integration testing
you always have a lot of data for testing
use python

Intrusion detection system
input - network traffic features
protocol_type
connection duration
src_bytes
dst_bytes
etc.
Output
normal
network attack
ref: [3]

Quality metrics
Receive operation characteristics (ROC) curve

Baseline model
always predict most frequent class
ROC area under the curve =

Logistic regression
easy to implement
template
auto sigma(T z) {
return 1/(1 +

Gradient boosting
de facto standard universal method
multiple well known C++ implementations

CatBoost
C API and C++ wrapper
own build system (ymake)
class CatboostClassifier: public BinaryClassifier

CatBoost
ROC-AUC = 0.9999

Image classification
Handwritten digits recognizer – MNIST
input – gray-scale pixels 28x28
output –

Multilayer perceptron
Image from: [4]

Multilayer perceptron

auto MlpClassifier::predict_proba(const features_t& feat) const {
VectorXf x{feat.size()};
auto o1

Convolutional networks
State of the Art algorithms in image processing
a lot of

Tensorflow
C++ API
Bazel build system
Hint – prebuild C API

Conclusion
Don’t be fear of the ML
Try simpler things first
Get benefits from

References
Andrew Ng, Machine Learning – coursera
Energy efficiency Data Set
KDD Cup 1999
MNIST