Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

python - What is the Unicode normalization form for an AWS S3 Buckets

Upon working with file names which are in UTF-8 format on AWS s3 bucket, I've found out that some of the quoted file names( in a Link to a file on s3 bucket) may differ from same file names which were quoted by code of my python app ( I'am using boto library). As I've found out they differs due to different normalization forms of unicode and problem goes away after using unicodedata.normalize.

However I haven't found any information about normalization form which being used by AWS ( NFC, NFKC, NFD or NFKD), so I will highly appreciate any suggestance of trasted source which provides that information, thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It looks like S3 doesn't apply any normalization itself. If I upload (using the S3 web console) a file with a unicode name (eg A?rende.txt) to S3 from a Mac and again from Windows, I'll end up with two files in S3. They look the same in the S3 console, but they are considered distinct by S3 because the encoding of the name is different.

You will have to consider exactly how it affects your application (users) and adjust accordingly. For example, if your users may switch between environments (Mac vs Windows vs Linux) and expect consistent cross-platform behaviour, then it seems you will need to normalize the names yourself. If your users work from a single platform consistently, then you wouldn't need to care most likely.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...