Since October 2019, I have a Python script scraping Weibo's top searches (微博热搜榜) hourly. Each hour's results are stored in a JSON file with its name in the pattern YYYYMMDDHH.json. YYYY is for year, MM is zero padded two-digits number for month, DD is zero padded two digits number for day, and HH is zero padded two digits number for hour. Note that the time is in UTC+8 timezone. For example, 2020100316.json stores top searches results of Oct. 3, 2020 at 4pm (in UTC+8). The first avalibale JSON file is 2019101510.json.

Within each JSON file, it has the following layout:

[
{
"hotness": ###,
"tag": "xxxx",
"weibo": [
{
"content": "xxxx",
"nickname": "xxxx"
},
...
]
},
...
]

• hotness (热度): int. An integer indicating how popular the topic is.
• tag: str. The name of the topic.
• weibo: list. A list of most recent tweets in this topic.
• content: str. The conent of the tweets.
• nickname: str. The nickname of the owner of the tweets.

A sample is provided: 2019101510.json