服务
关于
CloudProse博客
无服务器

AWS Lambda函数性能:带有boto3和aioboto3的python中的并行性

异步或不异步的问题是
乔尔·豪伯德Trek10
乔尔·豪伯德 | 2020年3月11日

使用Python Lambda函数并行化AWS API江苏体彩

在过去的几年中,与NodeJ进行主要合作时,每当我在无服务器Python项目上工作时,我都会想念的一件事是NodeJ轻松实现了对AWS的API江苏体彩的并行化。你可以

let promises = listOfS3Keys.forEach(keys => s3.getObjectAcl({
  Bucket: 'yourBucket',
  Key: key,
}).promise());
await Promise.all(promises);

使用几行代码,并行处理虽然在Python中不太容易实现。 在Python 2.x和早期版本的Python 3中,最直接的并行化方法是使用线程。可能的话,这会增加代码的复杂性,现在可以避免。 Python 3.4 introduced the async and await keywords and the asyncio library that can help parallelize network IO operations.

To use async/await in Python you ideally should use non-blocking IO requests. Unfortunately, boto3 uses blocking IO requests. Fortunately, there is a library aioboto3 that aims to be drop in compatible with boto3 but uses async/non-blocking IO requests to make API calls. Now you can also use boto3 if you run it in a thread executor (e.g. loop.run_in_executor)

I decided to do a comparison of the performance and complexity of using aioboto3 or boto3 along with asyncio for parallelizing API calls. To get a baseline I also created a function that made the same API calls in series (i.e. without parallelism)

每个函数中的代码列出了S3存储桶中的所有对象,然后为每个键江苏体彩get_object_acl。桶中有100个物体。

Serial boto3 function code

import os

import boto3

s3 = boto3.client('s3')
BUCKET_NAME = os.getenv('BUCKET_NAME')

def main():
    bucket_contents = s3.list_objects_v2(Bucket=BUCKET_NAME)
    objects = [
        s3.get_object_acl(Bucket=BUCKET_NAME, Key=content_entry['Key'])
        for content_entry in bucket_contents['Contents']
    ]

def handler(event, context):
    return main()

Parallelized boto3 with asyncio function code

import asyncio
import functools
import os

import boto3

BUCKET_NAME = os.getenv('BUCKET_NAME')

s3 = boto3.client('s3')

async def main():
    loop = asyncio.get_running_loop()
    bucket_contents = s3.list_objects_v2(Bucket=BUCKET_NAME)

    objects = await asyncio.gather(
        *[
            loop.run_in_executor(None, functools.partial(s3.get_object_acl, Bucket=BUCKET_NAME, Key=content_entry['Key']))
            for content_entry in bucket_contents['Contents']
        ]
    )

def handler(event, context):
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Parallelized aioboto3 with asyncio function code

import asyncio
import os

import aioboto3

BUCKET_NAME = os.getenv('BUCKET_NAME')

async def main():
    async with aioboto3.client('s3') as s3:
        bucket_contents = await s3.list_objects_v2(Bucket=BUCKET_NAME)
        objects = await asyncio.gather(
            *[
                s3.get_object_acl(Bucket=BUCKET_NAME, Key=content_entry['Key'])
                for content_entry in bucket_contents['Contents']
            ]
        )

def handler(event, context):
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

我在Python 3.8.x Lambda函数中运行了每个函数。我还使用了不同的内存大小来查看对执行时间有多大影响。 我将每个函数运行了100次,并记录了CloudWatch日志中REPORT日志条目所报告的平均运行时间。下图和表格显示了各次运行的平均功能持续时间。

内存(MB) 同步boto3 异步aioboto3 异步boto3
1284771.454792.226097.20
5122020.621259.931446.13
10241888.41734.59707.98
15361921.05615.03486.31
20481824.93682.80483.95
30081799.03616.14572.16
100次江苏体彩的平均执行时间。所有时间均以毫秒为单位。

Surprisingly at low memory (128mb) sequential synchronous calls were faster than either async method. I suspect this was because there was less TLS overhead since Python keeps the connection to S3 open across multiple calls. At higher Lambda memory aioboto3 had no advantage over boto3.

结论:

尽管有许多参数会影响并行API江苏体彩的吞吐量,但此测试表明

  1. 同步江苏体彩有时与并行性一样快。
  2. boto3 is sufficient for basic parallelism and in some cases exceeds the performance of aioboto3.
作者
乔尔·豪伯德Trek10
乔尔·豪伯德