Weibo's top searches json format | 叶某人的碎碎念

Since October 2019, I have a Python script scraping Weibo's top searches (微博热搜榜) hourly. Each hour's results are stored in a JSON file with its name in the pattern YYYYMMDDHH.json. YYYY is for year, MM is zero padded two-digits number for month, DD is zero padded two digits number for day, and HH is zero padded two digits number for hour. Note that the time is in UTC+8 timezone. For example, 2020100316.json stores top searches results of Oct. 3, 2020 at 4pm (in UTC+8). The first avalibale JSON file is 2019101510.json.

Within each JSON file, it has the following layout:

[
    {
        "hotness": ###,
        "tag": "xxxx",
        "weibo": [
            {
                "content": "xxxx",
                "nickname": "xxxx"
            },
            ...
        ]
    },
    ...
]

hotness (热度): int. An integer indicating how popular the topic is.
tag: str. The name of the topic.
weibo: list. A list of most recent tweets in this topic.
- content: str. The conent of the tweets.
- nickname: str. The nickname of the owner of the tweets.

A sample is provided: 2019101510.json