Python3 Download and Parse xml.gz Files


🤖This article was translated by AI (LLM). There may be errors or inaccuracies. For the original content, please refer to the original version.

Scenario: Need to fetch a GZip-compressed XML file from the web and parse it using Python3.

Requirement: Want to parse the raw data directly without saving as a file first.

Common approach: First save as a file, then use Python3’s gzip library to open and parse the file.

References: Parsing a xml.gz file in python, tmpfile and gzip combination problem

Implementation (using Xposed Module Repository as example)

import gzip
import requests
import tempfile
from xml.etree import ElementTree

raw = requests.get("https://dl-xda.xposed.info/repo/full.xml.gz", stream=True).raw.data  # Get raw data (bytes)

with tempfile.TemporaryFile(mode='w+b') as f:  # Create virtual file (in memory, cleared when closed)
    f.write(raw)
    f.flush()
    f.seek(0)  # Write data
    with gzip.GzipFile(mode='r', fileobj=f) as gzip_file:  # Parse virtual file
        tree = ElementTree.parse(gzip_file)
        print(tree.getroot())