聚光灯
DynamoDB二级索引的最佳实践
为您的索引降价获得更多收益
在过去的几年中,与NodeJ进行主要合作时,每当我在无服务器Python项目上工作时,我都会想念的一件事是NodeJ轻松实现了对AWS的API江苏体彩的并行化。你可以
let promises = listOfS3Keys.forEach(keys => s3.getObjectAcl({
Bucket: 'yourBucket',
Key: key,
}).promise());
await Promise.all(promises);
使用几行代码,并行处理虽然在Python中不太容易实现。
在Python 2.x和早期版本的Python 3中,最直接的并行化方法是使用线程。可能的话,这会增加代码的复杂性,现在可以避免。
Python 3.4 introduced the async
and await
keywords and the asyncio
library that can help parallelize network IO operations.
To use async/await in Python you ideally should use non-blocking IO requests. Unfortunately, boto3
uses blocking IO requests. Fortunately, there is a library aioboto3
that aims to be drop in compatible with boto3
but uses async/non-blocking IO requests to make API calls. Now you can also use boto3
if you run it in a thread executor (e.g. loop.run_in_executor
)
I decided to do a comparison of the performance and complexity of using aioboto3
or boto3
along with asyncio
for parallelizing API calls. To get a baseline I also created a function that made the same API calls in series (i.e. without parallelism)
每个函数中的代码列出了S3存储桶中的所有对象,然后为每个键江苏体彩get_object_acl。桶中有100个物体。
Serial boto3
function code
import os
import boto3
s3 = boto3.client('s3')
BUCKET_NAME = os.getenv('BUCKET_NAME')
def main():
bucket_contents = s3.list_objects_v2(Bucket=BUCKET_NAME)
objects = [
s3.get_object_acl(Bucket=BUCKET_NAME, Key=content_entry['Key'])
for content_entry in bucket_contents['Contents']
]
def handler(event, context):
return main()
Parallelized boto3
with asyncio
function code
import asyncio
import functools
import os
import boto3
BUCKET_NAME = os.getenv('BUCKET_NAME')
s3 = boto3.client('s3')
async def main():
loop = asyncio.get_running_loop()
bucket_contents = s3.list_objects_v2(Bucket=BUCKET_NAME)
objects = await asyncio.gather(
*[
loop.run_in_executor(None, functools.partial(s3.get_object_acl, Bucket=BUCKET_NAME, Key=content_entry['Key']))
for content_entry in bucket_contents['Contents']
]
)
def handler(event, context):
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Parallelized aioboto3
with asyncio
function code
import asyncio
import os
import aioboto3
BUCKET_NAME = os.getenv('BUCKET_NAME')
async def main():
async with aioboto3.client('s3') as s3:
bucket_contents = await s3.list_objects_v2(Bucket=BUCKET_NAME)
objects = await asyncio.gather(
*[
s3.get_object_acl(Bucket=BUCKET_NAME, Key=content_entry['Key'])
for content_entry in bucket_contents['Contents']
]
)
def handler(event, context):
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
我在Python 3.8.x Lambda函数中运行了每个函数。我还使用了不同的内存大小来查看对执行时间有多大影响。 我将每个函数运行了100次,并记录了CloudWatch日志中REPORT日志条目所报告的平均运行时间。下图和表格显示了各次运行的平均功能持续时间。
内存(MB) | 同步boto3 | 异步aioboto3 | 异步boto3 |
---|---|---|---|
128 | 4771.45 | 4792.22 | 6097.20 |
512 | 2020.62 | 1259.93 | 1446.13 |
1024 | 1888.41 | 734.59 | 707.98 |
1536 | 1921.05 | 615.03 | 486.31 |
2048 | 1824.93 | 682.80 | 483.95 |
3008 | 1799.03 | 616.14 | 572.16 |
100次江苏体彩的平均执行时间。所有时间均以毫秒为单位。 |
Surprisingly at low memory (128mb) sequential synchronous calls were faster than either async method. I suspect this was because there was less TLS overhead since Python keeps the connection to S3 open across multiple calls. At higher Lambda memory aioboto3
had no advantage over boto3
.
尽管有许多参数会影响并行API江苏体彩的吞吐量,但此测试表明
boto3
is sufficient for basic parallelism and in some cases exceeds the performance of aioboto3
.