Python3 Download and Parse xml.gz Files
This article was translated by AI (LLM). There may be errors or inaccuracies. For the original content, please refer to the original version.
Scenario: Need to fetch a GZip-compressed XML file from the web and parse it using Python3.
Requirement: Want to parse the raw data directly without saving as a file first.
Common approach: First save as a file, then use Python3’s gzip library to open and parse the file.
References: Parsing a xml.gz file in python, tmpfile and gzip combination problem
Implementation (using Xposed Module Repository as example)
import gzip
import requests
import tempfile
from xml.etree import ElementTree
raw = requests.get("https://dl-xda.xposed.info/repo/full.xml.gz", stream=True).raw.data # Get raw data (bytes)
with tempfile.TemporaryFile(mode='w+b') as f: # Create virtual file (in memory, cleared when closed)
f.write(raw)
f.flush()
f.seek(0) # Write data
with gzip.GzipFile(mode='r', fileobj=f) as gzip_file: # Parse virtual file
tree = ElementTree.parse(gzip_file)
print(tree.getroot())