[测试]MongoDB机械硬盘多线程多进行写入效率

实验结果

线程进程数量:4
多线程总用时:9049.68405毫秒
多进程总用时:3236.43184毫秒
单线程总用时:5241.13035毫秒

线程进程数量:20
多线程总用时:56752.51198毫秒
多进程总用时:20524.57404毫秒
单线程总用时:29836.78603毫秒

实验证明,在机械硬盘上向mongodb大批量插入数据,使用多进程效率最高,而使用多线程性能甚至不如单线程…
说好的io密集型操作多线程好使呢?说好的io操作不会占用PIL锁,从而达到io操作并发呢(这半句我猜的..)?没搞懂为什么…可能是我哪里理解错了?

测试机环境

ubuntu17.04
python3.5
CPU : i5-4200H 双核四线程
硬盘:酷鱼1T5400转

代码

# -*- coding: utf-8 -*-
from pymongo import MongoClient
import time
from threading import Thread
from multiprocessing import Process

"""

"""

__author__ = 'netAir'

client = MongoClient()
db = client.test
collection = db.test
jump = 100000


def thread_work(start, end):
    # time1 = time.time()
    data = [{'uid': i} for i in range(start, end)]
    # time2 = time.time() - time1
    # print('创建数据列表用时:' + str(round(time2 * 1000, 5)) + '毫秒')

    # time1 = time.time()
    collection.insert_many(data)
    # time2 = time.time() - time1
    # print('存储数据用时:' + str(round(time2 * 1000, 5)) + '毫秒')


def process_work(start, end):
    client = MongoClient()
    db = client.test
    collection = db.test
    # time1 = time.time()
    data = [{'uid': i} for i in range(start, end)]
    # time2 = time.time() - time1
    # print('创建数据列表用时:' + str(round(time2 * 1000, 5)) + '毫秒')

    # time1 = time.time()
    collection.insert_many(data)
    # time2 = time.time() - time1
    # print('存储数据用时:' + str(round(time2 * 1000, 5)) + '毫秒')


worker_num = 4
print('线程进程数量:' + str(worker_num))
# 多线程
workers = []
time1 = time.time()
for i in range(worker_num):
    workers.append(Thread(target=thread_work, args=(i, i + jump)))
for worker in workers:
    worker.start()
for worker in workers:
    worker.join()
time2 = time.time() - time1
print('多线程总用时:' + str(round(time2 * 1000, 5)) + '毫秒')

# 多进程
workers = []
time3 = time.time()
for i in range(worker_num):
    workers.append(Process(target=process_work, args=(i, i + jump)))
for worker in workers:
    worker.start()
for worker in workers:
    worker.join()
time4 = time.time() - time3
print('多进程总用时:' + str(round(time4 * 1000, 5)) + '毫秒')

# 单线程
time5 = time.time()
for i in range(worker_num):
    # time1 = time.time()
    data = [{'uid': i} for i in range(i, i + jump)]
    # time2 = time.time() - time1
    # print('创建数据列表用时:' + str(round(time2 * 1000, 5)) + '毫秒')

    # time1 = time.time()
    collection.insert_many(data)
    # time2 = time.time() - time1
    # print('存储数据用时:' + str(round(time2 * 1000, 5)) + '毫秒')
time6 = time.time() - time5
print('单线程总用时:' + str(round(time6 * 1000, 5)) + '毫秒')