Using Blob storage GZip files in Python Azure Functions
I recently had a project where I needed to process a lot of GZipped files, and decided to give Azure Functions a whirl.
I had the data flowing into Azure blob storage, but when I tried to process the GZip files using the below, I kept getting an error: Exception: OSError: Not a gzipped file (b'\x1f\xef')
.
def main(msg: func.QueueMessage, inputblob: func.InputStream) -> None:
filename = msg.get_body().decode('utf-8')
logging.info(f"Working on queue item: {filename}")
with gzip.open(inputblob, mode='rt', encoding="utf-8") as gzf:
# Do stuff
By logging out with print(inputblob.read())
, I saw something completely unexpected: the first few bytes of the input stream were \x1f\xef\xbf\xbd\x08\x00\x00\x00\x00\x00\x00\x00\xef\xbf\xbd
, which had lots of UTF-8 BOM sequences (\xef\xbf\xbd
).
Long story short, turns out Azure was trying to serve the GZip binary as a UTF-8 string, and there's a hidden option you can set in your function.json
called dataType, which stops this sillyness.
{
"type": "blob",
"direction": "in",
"name": "inputblob",
"dataType": "binary",
"path": "files/{queueTrigger}",
"connection": "MyStorage"
},
Once I set the dataType to binary, gzip.open()
worked fine.