深度学习框架之caffe(二) —模型训练和使用

目录
深度学习框架之caffe(一) —编译安装
深度学习框架之caffe(二) —模型训练和使用
深度学习框架之caffe(三) —通过NetSpec自定义网络
深度学习框架之caffe(四) —可视化与参数提取
深度学习框架之caffe(五) —模型转换至其他框架

更新 before 6.23

训练

CAFFE_ROOT/tools目录提供了训练和测试等需要的一些常用操作的源码实现(.cpp文件,文件名的作用一目了然),编译过程会对这些cpp文件进行编译,完成后,会在build/tools目录下生成相应的可执行应用程序,见下图:

image.png
  1. 训练前的数据准备
    这里

  2. 训练过程
    这里

  3. 几个文件说明
    xxx_train_test_full.protxt
    xxx_solver.protxt
    xxx_iter_xxx.caffemodel
    xxx_mean.binaryproto
    xxx_mean.npy
    xxx_classes.txt (注:类别名与索引号对应表,一般在进行使用python/C++进行分类时需要),如下:

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
  1. caffe目录说明
    源码主页
    ./src
    ./include
    ./docs
    ./python python 接口库
    ./matlab
    ./models
    ./example
    ./scripts
    ./tools

注:
a. 关于执行convert_imageset命令时所需3个文件train.txt, test.txt, val.txt的作用说明见这里
b. 所提供的帖子里的需要执行的脚本,只是根据训练过程的具体步骤,将相关程序的执行通过sh脚本实现,如常规流程是:
转为lmdb(convert_imageset) -> 训练(caffe train) -> 测试(caffe test),通过sh脚本,可简化对相关命令的参数设置。但这些脚本的功能并不是最好,尤其是当你进行重复训练时,需要手动删除lmdb转换时创建的两个目录才能顺利执行,如果能在此基础上将这些sh脚本合并成一个,并能自动地删除、创建某些目录,更加自动方便。
c. caffe训练的脚本方式多种多样,某些开源算法 如fasterRCNN,deppID等也会提供python下的训练接口脚本。本文提供的只是一种最原生的训练方式,对于fasterRCNN的训练,可直接采用作者提供的训练接口,其本质都是相通的(按顺序执行tools下的相关应用程序)。

  1. python使用
    python调用第三方库时,会通过在3个目录下进行搜索(系统默认的第三方库目录/usr/lib/python2.7/dist-packages,系统环境变量$PYTHONPATH 以及执行python命令的目录,执行python脚本是通过模块sys获取这些目录并幅值给到 sys.path)。因此首先要确保caffe的python库接口(在CAFFE_ROOT/python 目录)在python的搜索目录下,将第三方库添加到python可搜索路径下的简单方式是在python脚本(即调用caffe的 .py文件)中添加命令:sys.path.insert(0, "CAFFE_ROOT/python")
  2. C++使用
    源码编译完成后,新建一工程,根据 caffe头文件和库文件目录,对工程的头文件路径和库目录进行配置。
    头文件路径:
CAFFE_ROOT/include
CAFFE_ROOT/src
CUDA_ROOT/include
usr/include (其他依赖库头文件,boost,protobuf等)

库文件路径:

CAFFE_ROOT/build/lib
CUDA_ROOT/lib64
usr/lib  (其他依赖库库文件目录,boost,protobuf等)

使用

for python

  1. 参考贴1 以及 参考贴2

  2. 自己进行了封装,代码如下:

import os
from functools import partial

import caffe
import cv2
import numpy as np

from synset_words import WordCode


class CnnClassify(object):
    def __init__(self, path='/trainedCaffeData/',
                 **kwargs):
        """
        :param path:
        :param caffe_files:
        :param imgSize:
        :return:
        """
        print(os.path.abspath(path))
        if kwargs.get("use_gpu", False):
            caffe.set_mode_gpu()  # gpu or cpu
            caffe.set_device(0)
        else:
            caffe.set_mode_cpu()

        self.img_size = kwargs.get("img_size", (48, 48))
        join_func = partial(os.path.join, path)
        for k in ["model_file", "params_file", "mean_file", "synset_file"]:
            kwargs[k] = join_func(kwargs[k])

        self.net = caffe.Net(kwargs["model_file"],  # defines the structure of the model
                             kwargs["params_file"],  # contains the trained weights
                             caffe.TEST)  # use test mode (e.g., don't perform dropout)

        self.__setReadFormat(kwargs["mean_file"])
        self.synset_words = WordCode(filename=kwargs["synset_file"])

    def __setReadFormat(self, model_mean):
        '''
            :param model_mean:训练集的均值
            '''
        print(self.net.blobs['data'].data.shape)
        self.transformer = caffe.io.Transformer({'data': self.net.blobs['data'].data.shape})
        # 加载均值文件,并计算BGR三通道的均值
        mu = np.load(model_mean).mean(1).mean(1)

        # 提取均值
        self.transformer.set_transpose('data', (2, 0, 1))
        self.transformer.set_mean('data', mu)
        self.transformer.set_raw_scale('data', 255)  # 图像尺度从[0,1]归一化为[0,255]

        # swap channels from RGB to BGR
        self.transformer.set_channel_swap('data', (2, 1, 0))

    def predict_batch(self, img_arr):  # , tableList

        self.net.blobs['data'].reshape(len(img_arr), 3, self.img_size[0], self.img_size[1])  # image size is 48x48
        img_inputs = np.zeros((len(img_arr), 3, self.img_size[0], self.img_size[1]))
        for ind, img_data in enumerate(img_arr):
            img_inputs[ind, :, :, :] = self.transformer.preprocess('data', caffe.io.load_image_arr(img_data))

        self.net.blobs['data'].data[...] = img_inputs  # self.transformer.preprocess('data', img_input)  # read image
        out = self.net.forward()

        predictions = []
        for i in range(0, len(img_arr)):
            output_prob = out['prob'][i]  # the output probability vector for the first image in the batch
            pred_label = output_prob.argmax()
            word = self.synset_words.getUnicode(pred_label)
            predictions.append({"Label": word, "Prob": output_prob[pred_label]})
            # print "识别",pred_label
        return predictions  # word, output_prob[pred_label]

    def predict(self, img_arr):
        self.net.blobs['data'].reshape(1, 3, self.img_size[0], self.img_size[1])  # image size is 48x48
        img_input = self.transformer.preprocess('data', caffe.io.load_image_arr(img_arr))

        self.net.blobs['data'].data[...] = img_input  # self.transformer.preprocess('data', img_input)  # read image
        output_prob = self.net.forward()['prob'][0]
        pred_label = output_prob.argmax()
        word = self.synset_words.getUnicode(pred_label)
        return word, output_prob[pred_label]


def testCaffeCnn():
    import glob
    test = CnnClassify(path='E:/TibetOCR/Models/tibet_0323/',
                       model_file='tibet_full_train_test.prototxt',
                       params_file='tibet_full_iter_2000.caffemodel',
                       mean_file='ocr_mean.npy',
                       synset_file='synsetWords_79.pkl',
                       use_gpu=True,
                       imgSize=(48, 48)
                       )

    imageBasePath = 'E:/TibetOCR/Data/samples/*.jpg'
    imageList = glob.glob(imageBasePath)

    predict_labels = []
    for imagefile in imageList:
        # imagefile_abs = os.path.join(imageBasePath, imagefile)
        im = cv2.imread(imagefile)

        label = test.predict(im)
        print("识别结果:{},置信概率:{}".format(label[0], label[1]))
        cv2.imshow('im', im)
        cv2.waitKey(0)
        predict_labels.append(label)

for C++

  1. caffe提供的C++分类接口是CAFFE_ROOT/examples/cpp_classification.cpp

  2. 自己参考已有帖子,封装的C++下Classifier类的声明和实现分别如下:

//classifier.h
#pragma once

#include <algorithm>
#include <vector>  

#include "caffe/caffe.hpp"
#include "caffe/util/io.hpp"
#include "caffe/blob.hpp"
#include "opencv2/opencv.hpp"
#include "boost/smart_ptr/shared_ptr.hpp"


// Caffe's required library
//#pragma comment(lib, "caffe.lib")


using namespace boost;
using namespace caffe;


/* Pair (label, confidence) representing a prediction. */
typedef std::pair<std::string, float> Prediction;
//#define CPU_ONLY      //仅在CPU上运行程序

class Classifier   
{
public:
    Classifier();
    Classifier(const std::string& model_file,
        const std::string& trained_file,
        const std::string& mean_file,
        const std::string& label_file);
    
    ~Classifier();

    //string classFaces(Rect face, Mat frame, int *w, string name);
    int LoadModelFile(std::string caffePath);
    
    Prediction Classify(const cv::Mat& img);
    std::vector<Prediction> ClassifyBatch(std::vector< cv::Mat>& img_batch);
private:
    void SetMean(const std::string& mean_file);

    int InitCaffeNet();

    std::vector<float> Predict(const cv::Mat& img);
    

    void WrapInputLayer(std::vector<cv::Mat>* input_channels);
    
    void Preprocess(const cv::Mat& img,
        std::vector<cv::Mat>* input_channels);

    std::string model_file_;
    std::string trained_file_;
    std::string mean_file_;
    std::string label_file_;
    boost::shared_ptr<Net<float> > net_;
    cv::Size input_geometry_;
    int num_channels_;
    cv::Mat mean_;
    std::vector<string> labels_;    
};

//classifier.cpp
#include "include/Classifier.h"
#include <iomanip>
#include <algorithm>
#include <time.h>
using namespace caffe;
/* Return the indices of the top N values of vector v. */
int Argmax(std::vector<float>& v) {
    
    std::vector<float>::iterator biggest = std::max_element(v.begin(), v.end());
    return std::distance(v.begin(), biggest);
}
void imagePadding(cv::Mat src, cv::Mat &dst)
{
    int maxEdge = MAX(src.cols, src.rows);
    int paddingWidth = abs(src.cols - src.rows);
    int extraPaddingWidth = MIN(src.cols, src.rows) / 2;
    int xPaddingWidth = abs(src.cols - maxEdge) / 2 + extraPaddingWidth;
    int yPaddingWidth = abs(src.rows - maxEdge) / 2 + extraPaddingWidth;
    copyMakeBorder(src.clone(), dst, yPaddingWidth, yPaddingWidth, xPaddingWidth, xPaddingWidth, cv::BORDER_CONSTANT, cv::Scalar(255, 255, 255));

    //imshow("src", src);
    //imshow("dst", dst);
    //waitKey(0);
}
Classifier::~Classifier(){  }
Classifier::Classifier(){ }
int Classifier::LoadModelFile(std::string caffePath)
{
    model_file_ = caffePath + "tibet_full_train_test.prototxt";
    trained_file_ = caffePath + "tibet_full.caffemodel";
    mean_file_ = caffePath + "Tibet_mean.binaryproto";
    label_file_ = caffePath + "synsetWords.txt";

    if (InitCaffeNet())//文件都存在,返回1,否则返回0
        return 1;
}

int Classifier::InitCaffeNet()
{

#ifdef CPU_ONLY
    Caffe::set_mode(Caffe::CPU);
#else
    Caffe::set_mode(Caffe::GPU);
#endif

    /* Load the network. */
    net_.reset(new Net<float>(model_file_, TEST));
    net_->CopyTrainedLayersFrom(trained_file_);

    CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
    CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";

    Blob<float>* input_layer = net_->input_blobs()[0];
    int num_inputs = net_->num_inputs();
    int num_outputs = net_->num_outputs();


    num_channels_ = input_layer->channels();


    CHECK(num_channels_ == 3 || num_channels_ == 1) << "Input layer should have 1 or 3 channels.";
    input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

    /* Load the binaryproto mean file. */
    SetMean(mean_file_);

    /* Load labels. */
    std::ifstream labels(label_file_.c_str());
    CHECK(labels) << "Unable to open labels file " << label_file_;
    string line;
    while (std::getline(labels, line))
        labels_.push_back(string(line));

    Blob<float>* output_layer = net_->output_blobs()[0];

    CHECK_EQ(labels_.size(), output_layer->channels())
        << "Number of labels is different from the output layer dimension.";
    return 1;
}

Classifier::Classifier(const std::string& model_file,
                        const std::string& trained_file,
                        const std::string& mean_file,
                        const std::string& label_file)
{

    model_file_ = model_file;
    trained_file_ = trained_file;
    mean_file_ = mean_file;
    label_file_ = label_file;
    InitCaffeNet();
}



static bool PairCompare(const std::pair<float, int>& lhs,
    const std::pair<float, int>& rhs) 
{
    return lhs.first > rhs.first;
}

/* Return the top N predictions. */
Prediction Classifier::Classify(const cv::Mat& img) {

    std::vector<float> output = Predict(img);
    int maxIdx = Argmax(output);
    //std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
    return std::make_pair(labels_[maxIdx],output[maxIdx]);
    //stringstream stream;
    //stream << maxIdx;
    //return std::make_pair(stream.str(), output[maxIdx]);
}


/* Load the mean file in binaryproto format. */
void Classifier::SetMean(const std::string& mean_file) {

    Blob<float> mean_blob;
    BlobProto blob_proto;
    float *mean_ptr;
    unsigned int num_pixel;

    bool succeed = ReadProtoFromBinaryFile(mean_file, &blob_proto);
    if (succeed)
    {
        mean_blob.FromProto(blob_proto);
        CHECK_EQ(mean_blob.channels(), num_channels_)
            << "Number of channels of mean file doesn't match input layer.";


        num_pixel = mean_blob.count(); /* NCHW=1x3x256x256=196608 */
        //mean_ptr = (float *)mean_blob.cpu_data();
        mean_ptr = mean_blob.mutable_cpu_data();
        
        /* The format of the mean file is planar 32-bit float BGR or grayscale. */
        std::vector<cv::Mat> channels;
        for (int i = 0; i < num_channels_; ++i) 
        {
            /* Extract an individual channel. */
            cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, mean_ptr);
            //cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1);
            //memcpy(channel.data, data, mean_blob.width()*mean_blob.height()*sizeof(float));
            channels.push_back(channel);

            //imshow("img", channel);
            //waitKey(0);

            mean_ptr += mean_blob.height() * mean_blob.width();
        }
        
        /* Merge the separate channels into a single image. */
        //cv::Mat mean(mean_blob.height(), mean_blob.width(), CV_32FC1);//;//
        cv::Mat mean;
        cv::merge(channels, mean);
        
        /* Compute the global mean pixel value and create a mean image
        * filled with this value. */
        cv::Scalar channel_mean = cv::mean(mean);//mean);//channels[0]
        mean_ = cv::Mat(input_geometry_, mean.type(), channel_mean);
        
        //imshow("img1", mean_);
        //waitKey(0);
    }


}

std::vector<float> Classifier::Predict(const cv::Mat& img) 
{
    Blob<float>* input_layer = net_->input_blobs()[0];
    input_layer->Reshape(1, num_channels_,input_geometry_.height, input_geometry_.width);
    /* Forward dimension change to all layers. */
    net_->Reshape();

    std::vector<cv::Mat> input_channels;
    WrapInputLayer(&input_channels);

    Preprocess(img, &input_channels);
    net_->Forward(0);

    Blob<float>* output_layer = net_->output_blobs()[0];
    const float* begin = output_layer->cpu_data();
    const float* end = begin + output_layer->channels();
    return std::vector<float>(begin, end);
}


std::vector<Prediction> Classifier::ClassifyBatch(std::vector< cv::Mat>& img_batch)
{
    Blob<float>* input_layer = net_->input_blobs()[0];
    input_layer->Reshape(img_batch.size(), num_channels_, input_geometry_.height, input_geometry_.width);
    /* Forward dimension change to all layers. */
    net_->Reshape();

    std::vector<cv::Mat> input_data;
    
    WrapInputLayer(&input_data);
    //clock_t st_tm = clock();
    std::vector<cv::Mat>::iterator it = input_data.begin();
    for (int i = 0; i < img_batch.size(); i++)
    {
        std::vector<cv::Mat>tmp_channls(3);
        tmp_channls.assign(input_data.begin() + i*num_channels_, input_data.begin() + (i + 1)*num_channels_);
        Preprocess(img_batch[i], &tmp_channls);
    }
    //std::cout << "do imgPreprocess cost time : " << (double)(clock() - st_tm) / CLOCKS_PER_SEC << std::endl;
    net_->Forward(0);
    Blob<float>* output_layer = net_->output_blobs()[0];

    std::vector<Prediction>predictions;

    /* Copy the output layer to a std::vector */
    for (int i = 0; i < img_batch.size(); i++)
    {
        const float* begin = output_layer->cpu_data()+i*output_layer->channels();
        const float* end = begin + output_layer->channels();
        std::vector<float> output = std::vector<float>(begin, end);
        int maxIdx = Argmax(output);
        //std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
        predictions.push_back(std::make_pair(labels_[maxIdx], output[maxIdx]));
    }

    return predictions;
}
/* Wrap the input layer of the network in separate cv::Mat objects
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
    Blob<float>* input_layer = net_->input_blobs()[0];

    int width = input_layer->width();
    int height = input_layer->height();
    float* input_data = input_layer->mutable_cpu_data();
    for (int j = 0; j < input_layer->num(); j++)
    {
        for (int i = 0; i < input_layer->channels(); ++i) {
            cv::Mat channel(height, width, CV_32FC1, input_data);
            input_channels->push_back(channel);
            input_data += width * height;
        }
    }

}

void Classifier::Preprocess(const cv::Mat& img,
    std::vector<cv::Mat>* input_channels) {
    /* Convert the input image to the input image format of the network. */
    cv::Mat img_padded=img;
    //imagePadding(img, img_padded);

    cv::Mat sample;
    if (img_padded.channels() == 3 && num_channels_ == 1)
        cv::cvtColor(img_padded, sample, cv::COLOR_BGR2GRAY);
    else if (img_padded.channels() == 4 && num_channels_ == 1)
        cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2GRAY);
    else if (img_padded.channels() == 4 && num_channels_ == 3)
        cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2BGR);
    else if (img_padded.channels() == 1 && num_channels_ == 3)
        cv::cvtColor(img_padded, sample, cv::COLOR_GRAY2BGR);
    else
        sample = img_padded;

    cv::Mat sample_resized;
    if (sample.size() != input_geometry_)
        cv::resize(sample, sample_resized, input_geometry_);
    else
        sample_resized = sample;

    cv::Mat sample_float;
    if (num_channels_ == 3)
        sample_resized.convertTo(sample_float, CV_32FC3);
    else
        sample_resized.convertTo(sample_float, CV_32FC1);

    cv::Mat sample_normalized;
    cv::subtract(sample_float, mean_, sample_normalized);

    /* This operation will write the separate BGR planes directly to the
    * input layer of the network because it is wrapped by the cv::Mat
    * objects in input_channels. */
    cv::split(sample_normalized, *input_channels);

    //CHECK(reinterpret_cast<float*>(input_channels->at(0).data)
    //  == net_->input_blobs()[0]->cpu_data())
    //  << "Input channels are not wrapping the input layer of the network.";
}

使用时,在自己的工程中将头文件classifier.h包含进去,即可在调用处实例化一个类对像,并调用Classify方法即可。
在你自己的工程中可能出现的问题(windows上很可能出现):

F0519 14:54:12.494139 14504 layer_factory.hpp:77] Check failed: registry.count(t ype) == 1 (0 vs. 1) Unknown layer type: Convolution (known types: MemoryData)

这里提供一种办法,是再创建一个头文件(cafferegister.h),将未知类型的层声明或注册即可,代码如下:

#ifndef CAFFEREGISTER_H
#define CAFFEREGISTRE_H
#include "caffe/common.hpp"
#include "caffe/layers/data_layer.hpp"
#include "caffe/layers/input_layer.hpp"
#include "caffe/layers/inner_product_layer.hpp"
#include "caffe/layers/conv_layer.hpp"
#include "caffe/layers/relu_layer.hpp"
#include "caffe/layers/pooling_layer.hpp"
#include "caffe/layers/softmax_layer.hpp"
#include "caffe/layers/lrn_layer.hpp"
#include "caffe/layers/dropout_layer.hpp"

namespace caffe
{
    extern INSTANTIATE_CLASS(DataLayer);
    //REGISTER_LAYER_CLASS(Data);
    extern INSTANTIATE_CLASS(InputLayer);
    //REGISTER_LAYER_CLASS(Input);
    extern INSTANTIATE_CLASS(InnerProductLayer);
    extern INSTANTIATE_CLASS(DropoutLayer);
    //REGISTER_LAYER_CLASS(Dropout);
    extern INSTANTIATE_CLASS(ConvolutionLayer);

    extern INSTANTIATE_CLASS(ReLULayer);

    extern INSTANTIATE_CLASS(PoolingLayer);

    extern INSTANTIATE_CLASS(LRNLayer);

    extern INSTANTIATE_CLASS(SoftmaxLayer);
#ifdef WINDOWS
    REGISTER_LAYER_CLASS(Convolution);
    REGISTER_LAYER_CLASS(ReLU);
    REGISTER_LAYER_CLASS(Pooling);
    REGISTER_LAYER_CLASS(Softmax);
    REGISTER_LAYER_CLASS(LRN);
#endif
}

#endif

推荐阅读更多精彩内容