本文共 14392 字,大约阅读时间需要 47 分钟。
一、准备数据集
准备自己的数据
mmdetection支持coco格式和voc格式的数据集,下面将分别介绍这两种数据集的使用方式
coco数据集
官方推荐coco数据集按照以下的目录形式存储,以coco2017数据集为例
mmdetection├── mmdet├── tools├── configs├── data│ ├── coco│ │ ├── annotations│ │ ├── train2017│ │ ├── val2017│ │ ├── test2017
推荐以软连接的方式创建data文件夹,下面是创建软连接的步骤
cd mmdetectionmkdir dataln -s $COCO_ROOT data
其中,$COCO_ROOT需改为你的coco数据集根目录
voc数据集
与coco数据集类似,将voc数据集按照以下的目录形式存储,以VOC2007为例
mmdetection├── mmdet├── tools├── configs├── data│ ├── VOCdevkit│ │ ├── VOC2007│ │ │ ├── Annotations│ │ │ ├── JPEGImages│ │ │ ├── ImageSets│ │ │ │ ├── Main│ │ │ │ │ ├── test.txt│ │ │ │ │ ├── trainval.txt
同样推荐以软连接的方式创建
cd mmdetectionmkdir dataln -s $VOC2007_ROOT data/VOCdevkit
其中,$VOC2007_ROOT需改为你的VOC2007数据集根目录
二、修改一些配置文件和代码文件
修改配置文件,配置文件在configs文件夹下面,根据自己的情况进行选择,
本人选择的是configs/mask_rcnn_r101_fpn_1x.py根据自己情况修改说明,如果选择faster rcnn请根据自己情况进行修改:
# model settingsmodel = dict( type='MaskRCNN', pretrained='torchvision://resnet101', backbone=dict( type='ResNet', depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_scales=[8], anchor_ratios=[0.5, 1.0, 2.0], anchor_strides=[4, 8, 16, 32, 64], target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0], loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='SharedFCBBoxHead', num_fcs=2, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=6,#数据集类别数,默认是81,因为coco数据集为80+1(背景),我的数据集只有5个类别,加上背景也就是6个类别 target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2], reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', out_size=14, sample_num=2), out_channels=256, featmap_strides=[4, 8, 16, 32]), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=6,#数据集类别数,默认是81,因为coco数据集为80+1(背景),我的数据集只有5个类别,加上背景也就是6个类别 loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)))# model training and testing settingstrain_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False))test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100, mask_thr_binary=0.5))# dataset settingsdataset_type = 'CocoDataset'data_root = 'data/coco/'img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),]test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ])]data = dict( imgs_per_gpu=2,#每张gpu训练多少张图片 batch_size = gpu_num(训练使用gpu数量) * imgs_per_gpu workers_per_gpu=1, train=dict( type=dataset_type, ann_file=data_root + 'annotations/instances_train2014.json', img_prefix=data_root + 'train2014/', pipeline=train_pipeline), val=dict( type=dataset_type, ann_file=data_root + 'annotations/instances_val2014.json', img_prefix=data_root + 'val2014/', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'annotations/instances_val2014.json', img_prefix=data_root + 'val2014/', pipeline=test_pipeline))# optimizeroptimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)#lr = 0.00125*batch_sizoptimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))# learning policylr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[8, 11])checkpoint_config = dict(interval=1)# yapf:disablelog_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), # dict(type='TensorboardLoggerHook')#如果需要开启tensorboard将其注释取消 ])# yapf:enable# runtime settingstotal_epochs = 12#总的循环次数dist_params = dict(backend='nccl')log_level = 'INFO'work_dir = './work_dirs/mask_rcnn_r101_fpn_1x'#在训练过程中会将训练日志和权重保存在这个文件夹路径下面,详细看下面load_from = Noneresume_from = Noneworkflow = [('train', 1)]
注意:配置文件中的默认学习率是8个gpu和2个img/gpu(batch size= 8*2 = 16)。根据线性缩放规则,如果您使用不同的GPU数目或img/gpu,您需要设置与batch size成比例的学习率。例如,如果4GPUs * 2 img/gpu的lr=0.01,那么16GPUs * 4 img/gpu的lr=0.08。
每张gpu训练多少张图片 batch_size = gpu_num(训练使用gpu数量) * imgs_per_gpu
lr = 0.00125*batch_siz
进一步修改一些地方:
定义数据种类,需要修改的地方在mmdetection/mmdet/datasets/coco.py。把CLASSES的那个tuple改为自己数据集对应的种类tuple即可。例如:
CLASSES = ('bicycle', 'car', 'bus', 'person','tvmonitor')
接着在mmdetection/mmdet/core/evaluation/class_names.py修改coco_classes数据集类别,这个关系到后面test的时候结果图中显示的类别名称。例如:
def coco_classes(): return [ 'bicycle', 'car', 'bus', 'person','tvmonitor' ]
三、训练:
单个GPU训练:
第一次训练会下载预训练模型,如下
(open-mmlab) bubble@XPS-8930:~/mmdetection/0827/mmdetection$ python tools/train.py configs/mask_rcnn_r101_fpn_1x.py2019-09-16 22:14:50,684 - INFO - Distributed training: False2019-09-16 22:14:51,399 - INFO - load model from: torchvision://resnet101Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /home/bubble/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth100.0%
训练过程log输出
(open-mmlab) bubble@XPS-8930:~/mmdetection/0827/mmdetection$ python tools/train.py configs/mask_rcnn_r101_fpn_1x.py2019-09-18 21:31:10,284 - INFO - Distributed training: False2019-09-18 21:31:11,067 - INFO - load model from: torchvision://resnet1012019-09-18 21:31:13,031 - WARNING - The model and loaded state dict do not match exactlyunexpected key in source state_dict: fc.weight, fc.biasloading annotations into memory...Done (t=0.17s)creating index...index created!2019-09-18 21:31:23,956 - INFO - Start running, host: bubble@XPS-8930, work_dir: /home/bubble/mmdetection/0827/mmdetection/work_dirs/mask_rcnn_r101_fpn_1x2019-09-18 21:31:23,956 - INFO - workflow: [('train', 1)], max: 12 epochs2019-09-18 21:31:46,277 - INFO - Epoch [1][50/8001] lr: 0.00797, eta: 11:53:25, time: 0.446, data_time: 0.012, memory: 3241, loss_rpn_cls: 0.2262, loss_rpn_bbox: 0.0602, loss_cls: 0.7983, acc: 94.5195, loss_bbox: 0.0827, loss_mask: 0.6030, loss: 1.77042019-09-18 21:32:05,929 - INFO - Epoch [1][100/8001] lr: 0.00931, eta: 11:10:49, time: 0.393, data_time: 0.004, memory: 3241, loss_rpn_cls: 0.1831, loss_rpn_bbox: 0.0712, loss_cls: 0.5860, acc: 94.6523, loss_bbox: 0.1161, loss_mask: 0.4851, loss: 1.4414..................2019-09-19 08:42:04,113 - INFO - Epoch [12][7950/8001] lr: 0.00020, eta: 0:00:21, time: 0.427, data_time: 0.005, memory: 3290, loss_rpn_cls: 0.0112, loss_rpn_bbox: 0.0188, loss_cls: 0.1462, acc: 95.1758, loss_bbox: 0.0890, loss_mask: 0.1974, loss: 0.46262019-09-19 08:42:25,439 - INFO - Epoch [12][8000/8001] lr: 0.00020, eta: 0:00:00, time: 0.427, data_time: 0.004, memory: 3290, loss_rpn_cls: 0.0106, loss_rpn_bbox: 0.0215, loss_cls: 0.1168, acc: 95.9844, loss_bbox: 0.0715, loss_mask: 0.1791, loss: 0.3995
在训练过程
mmdetection整个文件夹情况:
.├── build├── checkpoints├── configs├── data├── demo├── docker├── docs├── LICENSE├── mmdet├── mmdet.egg-info├── README.md├── requirements.txt├── setup.py├── tests├── tools└── work_dirs
训练过程中log和权重保存路径
work_dirs/└── mask_rcnn_r101_fpn_1x ├── 20190918_213123.log ├── 20190918_213123.log.json ├── epoch_10.pth ├── epoch_11.pth ├── epoch_12.pth ├── epoch_1.pth ├── epoch_2.pth ├── epoch_3.pth ├── epoch_4.pth ├── epoch_5.pth ├── epoch_6.pth ├── epoch_7.pth ├── epoch_8.pth ├── epoch_9.pth └── latest.pth -> epoch_12.pth
四、训练过程各种损失值和准确率可视化效果
4.1、tensorboard
开启tensorboard,记得在config配置文件里将dict(type='TensorboardLoggerHook')注释取消掉
在新的终端中执行如下命令:
tensorboard --logdir=path --port=8090#port=8090可以自己指定的端口,默认不需要--port其端口是6006
4.1.1、本地跑mmdetection的话直接在PC的浏览器上输入如下链接
http://127.0.0.1:16006
4.1.2、远程访问服务器上面的tensorboard
bubble@XPS-8930:~$ ssh -p 10005 -L 16006:127.0.0.1:8090 root@192.168.1.162#10005是自己的docker系统镜像的端口号,-L 16006:127.0.0.18090意思是将自己PC的16006端口映射成docker里tensorboard的8090#接下来,在PC的浏览器上输入如下链接http://127.0.0.1:16006
过程如下图所示:
tensorboard效果如下所示:
4.2、mmdetection自带的log分析工具
mmdetection会自动收集log信息,存储在work_dirs/目录下,官方提供了tools/analyze_logs.py工具可以轻松的可视化日志信息,如可视化损失值并保存为pdf文件,执行如下命令:
python tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_reg --out losses.pdf
效果如图所示:
mmdetection使用tensorboard可视化训练集与验证集指标参数
mmdetection使用其自带工具可视化训练集与验证集指标参数
具体可以查看官网:
五、Testing
有两个方法可以进行测试。
5.1、如果只是想看一下效果而不要进行定量指标分析的话,可以运行之前那个demo.py文件,但是要改一下checkpoint_file的地址路径,使用我们上一步跑出来的work_dirs下的pth文件。例如:
checkpoint_file = 'work_dirs/epoch_100.pth'
5.2、使用test命令来进行测试评估一些参数
5.2.1 coco数据集,例如:
python tools/test.py configs/your_confige.py work_dirs/your_model_.pth --out ./result/result_100.pkl --eval bbox
类似于下面:
root@64e7169a4f30:/mmdetection# python tools/test.py configs/mask_rcnn_r101_fpn_1x.py work_dirs/mask_rcnn_r101_fpn_1x/latest.pth --eval bbox --out results1.pklloading annotations into memory...Done (t=0.00s)creating index...index created![>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 67/67, 17.7 task/s, elapsed: 4s, ETA: 0swriting results to results1.pklStarting evaluate bboxLoading and preparing results...DONE (t=0.00s)creating index...index created!Running per image evaluation...Evaluate annotation type *bbox*DONE (t=0.04s).Accumulating evaluation results...DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.698 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.917 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.826 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.742 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.734 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.744 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.744 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.744 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
5.2.2 voc数据集,例如
修改tools/voc_eval.py文件中的voc_eval函数
注释如下代码:
if hasattr(dataset, 'year') and dataset.year == 2007: dataset_name = 'voc07' else: dataset_name = dataset.CLASSES
在eval_map上面添加:
dataset_name = dataset.CLASSES
修改后的voc_eval函数代码如下:
def voc_eval(result_file, dataset, iou_thr=0.5): det_results = mmcv.load(result_file) gt_bboxes = [] gt_labels = [] gt_ignore = [] for i in range(len(dataset)): ann = dataset.get_ann_info(i) bboxes = ann['bboxes'] labels = ann['labels'] if 'bboxes_ignore' in ann: ignore = np.concatenate([ np.zeros(bboxes.shape[0], dtype=np.bool), np.ones(ann['bboxes_ignore'].shape[0], dtype=np.bool) ]) gt_ignore.append(ignore) bboxes = np.vstack([bboxes, ann['bboxes_ignore']]) labels = np.concatenate([labels, ann['labels_ignore']]) gt_bboxes.append(bboxes) gt_labels.append(labels) if not gt_ignore: gt_ignore = None # if hasattr(dataset, 'year') and dataset.year == 2007: # dataset_name = 'voc07' # else: # dataset_name = dataset.CLASSES dataset_name = dataset.CLASSES eval_map( det_results, gt_bboxes, gt_labels, gt_ignore=gt_ignore, scale_ranges=None, iou_thr=iou_thr, dataset=dataset_name, print_summary=True)
执行如下命令:
python tools/test.py configs/your_confige.py work_dirs/your_model_.pth --out results.pkl
测试结束后生成results.pkl文件
采用voc标准计算mAP
执行如下命令:
python tools/voc_eval.py results.pkl configs/your_confige.py
类似下面这样子:
参考链接:
转载地址:http://drzlf.baihongyu.com/