Weibo's top searches json format
Since October 2019, I have a Python script scraping Weibo's top searches (微博热搜榜) hourly. Each hour's results are stored in a JSON file with its name in the pattern YYYYMMDDHH.json
. YYYY
is for year, MM
is zero padded two-digits number for month, DD
is zero padded two digits number for day, and HH
is zero padded two digits number for hour. Note that the time is in UTC+8 timezone. For example, 2020100316.json
stores top searches results of Oct. 3, 2020 at 4pm (in UTC+8). The first avalibale JSON file is 2019101510.json
.
Within each JSON file, it has the following layout:
[
{
"hotness": ###,
"tag": "xxxx",
"weibo": [
{
"content": "xxxx",
"nickname": "xxxx"
},
...
]
},
...
]
- hotness (热度): int. An integer indicating how popular the topic is.
- tag: str. The name of the topic.
- weibo: list. A list of most recent tweets in this topic.
- content: str. The conent of the tweets.
- nickname: str. The nickname of the owner of the tweets.
A sample is provided: 2019101510.json