处理多城市
Contents
处理多城市¶
获取全球城市¶
我们已经知道如何获取一个城市的的天气信息,如果有多个城市,操作方式也是类似的,无非就是添加一个循环而已。
但是如何获取城市列表呢? 这里推荐一个网站: https://simplemaps.com/data/world-cities
这里面包含全球城市的完整信息,不仅包含城市名称,还包括国家,经纬度,是否是首都,省会等信息。
下载后,解压,就可以通过pandas 读取了。
import pandas as pd
city_file = '../data/worldcities.csv'
city_df = pd.read_csv(city_file,encoding='utf-8')
city_df
city | city_ascii | lat | lng | country | iso2 | iso3 | admin_name | capital | population | id | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Tokyo | Tokyo | 35.6897 | 139.6922 | Japan | JP | JPN | Tōkyō | primary | 37977000.0 | 1392685764 |
1 | Jakarta | Jakarta | -6.2146 | 106.8451 | Indonesia | ID | IDN | Jakarta | primary | 34540000.0 | 1360771077 |
2 | Delhi | Delhi | 28.6600 | 77.2300 | India | IN | IND | Delhi | admin | 29617000.0 | 1356872604 |
3 | Mumbai | Mumbai | 18.9667 | 72.8333 | India | IN | IND | Mahārāshtra | admin | 23355000.0 | 1356226629 |
4 | Manila | Manila | 14.6000 | 120.9833 | Philippines | PH | PHL | Manila | primary | 23088000.0 | 1608618140 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
40996 | Tukchi | Tukchi | 57.3670 | 139.5000 | Russia | RU | RUS | Khabarovskiy Kray | NaN | 10.0 | 1643472801 |
40997 | Numto | Numto | 63.6667 | 71.3333 | Russia | RU | RUS | Khanty-Mansiyskiy Avtonomnyy Okrug-Yugra | NaN | 10.0 | 1643985006 |
40998 | Nord | Nord | 81.7166 | -17.8000 | Greenland | GL | GRL | Sermersooq | NaN | 10.0 | 1304217709 |
40999 | Timmiarmiut | Timmiarmiut | 62.5333 | -42.2167 | Greenland | GL | GRL | Kujalleq | NaN | 10.0 | 1304206491 |
41000 | Nordvik | Nordvik | 74.0165 | 111.5100 | Russia | RU | RUS | Krasnoyarskiy Kray | NaN | 0.0 | 1643587468 |
41001 rows × 11 columns
数据筛选¶
我们仅仅对国内的主要城市感兴趣,所以,我们需要对DataFrame进行筛选。
capital_china = city_df[(city_df['country']=='China') & (city_df['capital'] is not None)]
capital_china
city | city_ascii | lat | lng | country | iso2 | iso3 | admin_name | capital | population | id | |
---|---|---|---|---|---|---|---|---|---|---|---|
5 | Shanghai | Shanghai | 31.1667 | 121.4667 | China | CN | CHN | Shanghai | admin | 22120000.0 | 1156073548 |
9 | Guangzhou | Guangzhou | 23.1288 | 113.2590 | China | CN | CHN | Guangdong | admin | 20902000.0 | 1156237133 |
10 | Beijing | Beijing | 39.9050 | 116.3914 | China | CN | CHN | Beijing | primary | 19433000.0 | 1156228865 |
17 | Shenzhen | Shenzhen | 22.5350 | 114.0540 | China | CN | CHN | Guangdong | minor | 15929000.0 | 1156158707 |
29 | Nanyang | Nanyang | 32.9987 | 112.5292 | China | CN | CHN | Henan | NaN | 12010000.0 | 1156192287 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
40725 | Taoyan | Taoyan | 34.7706 | 103.7903 | China | CN | CHN | Gansu | NaN | 5329.0 | 1156019900 |
40744 | Jingping | Jingping | 33.7844 | 104.3652 | China | CN | CHN | Gansu | NaN | 5149.0 | 1156005145 |
40776 | Dayi | Dayi | 33.8312 | 104.0362 | China | CN | CHN | Gansu | NaN | 5114.0 | 1156108713 |
40782 | Biancang | Biancang | 33.9007 | 104.0321 | China | CN | CHN | Gansu | NaN | 5040.0 | 1156724811 |
40938 | Nichicun | Nichicun | 29.5333 | 94.4167 | China | CN | CHN | Tibet | NaN | 100.0 | 1156860651 |
1498 rows × 11 columns
小技巧¶
一个好的习惯是,先对单个数据进行处理,确保无误后,再对所有数据进行循环。
所以我们这里依然需要测一下我们的函数能否获取单个城市的天气信息。
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
print(module_path)
if module_path not in sys.path:
sys.path.append(module_path)
C:\Users\renb\PycharmProjects\weather_dashapp\weather_book
from weather_app.models.query_api import get_geo_from_city,generate_url,request_weather_info,transform_weather_raw,add_city_info
city_info = capital_china.iloc[0,:].to_dict()
city_info
{'city': 'Shanghai',
'city_ascii': 'Shanghai',
'lat': 31.1667,
'lng': 121.4667,
'country': 'China',
'iso2': 'CN',
'iso3': 'CHN',
'admin_name': 'Shanghai',
'capital': 'admin',
'population': 22120000.0,
'id': 1156073548}
city = city_info['city']
lon = city_info['lng']
lat = city_info['lat']
url = generate_url(longitude=lon,latitude=lat)
text_j = request_weather_info(url)
weather_info_df = transform_weather_raw(text_j)
weather_info_df = add_city_info(weather_info_df,lon,lat,city)
weather_info_df
cloudcover | lifted_index | prec_type | prec_amount | temp2m | rh2m | weather | timestamp | wind_direction | wind_speed | longitude | latitude | city | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9 | 15 | none | 0 | 5 | 53 | cloudyday | 2022-02-03 03:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
1 | 9 | 15 | none | 0 | 5 | 51 | cloudyday | 2022-02-03 06:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
2 | 9 | 15 | none | 1 | 4 | 72 | cloudyday | 2022-02-03 09:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
3 | 9 | 15 | rain | 1 | 4 | 81 | lightrainnight | 2022-02-03 12:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
4 | 9 | 15 | rain | 1 | 4 | 66 | lightrainnight | 2022-02-03 15:00:00 | NE | 2 | 121.4667 | 31.1667 | Shanghai |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
59 | 3 | 15 | none | 4 | 5 | 72 | pcloudynight | 2022-02-10 12:00:00 | E | 3 | 121.4667 | 31.1667 | Shanghai |
60 | 9 | 15 | none | 4 | 5 | 70 | cloudynight | 2022-02-10 15:00:00 | SE | 2 | 121.4667 | 31.1667 | Shanghai |
61 | 9 | 15 | none | 4 | 5 | 71 | cloudynight | 2022-02-10 18:00:00 | SE | 2 | 121.4667 | 31.1667 | Shanghai |
62 | 9 | 15 | none | 4 | 5 | 74 | cloudynight | 2022-02-10 21:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
63 | 9 | 15 | none | 4 | 5 | 63 | cloudyday | 2022-02-11 00:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
64 rows × 13 columns
获取多个城市信息¶
确保单个城市可以准确获取信息后,我们建立循环。通过iterrows 逐行进行处理。
这里我们先用10个城市测测性能。
可以看到每个循环需要等网站回复很慢,2-3秒一个循环,如果是2000个城市的话,需要80分钟
from tqdm.notebook import tqdm
all_cities_df = pd.DataFrame()
for index,city_info in tqdm(capital_china.iloc[0:10,:].iterrows()):
city = city_info['city']
print(city)
lon = city_info['lng']
lat = city_info['lat']
url = generate_url(longitude=lon,latitude=lat)
text_j = request_weather_info(url)
weather_info_df = transform_weather_raw(text_j)
weather_info_df = add_city_info(weather_info_df,lon,lat,city)
all_cities_df = pd.concat([all_cities_df,weather_info_df],axis=0)
Shanghai
Guangzhou
Beijing
Shenzhen
Nanyang
Baoding
Chengdu
Linyi
Tianjin
Shijiazhuang
all_cities_df.head()
cloudcover | lifted_index | prec_type | prec_amount | temp2m | rh2m | weather | timestamp | wind_direction | wind_speed | longitude | latitude | city | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9 | 15 | none | 0 | 5 | 50 | cloudyday | 2022-02-03 03:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
1 | 9 | 15 | none | 0 | 5 | 55 | cloudyday | 2022-02-03 06:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
2 | 9 | 15 | none | 1 | 4 | 73 | cloudyday | 2022-02-03 09:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
3 | 9 | 15 | rain | 1 | 4 | 83 | lightrainnight | 2022-02-03 12:00:00 | NE | 3 | 121.4667 | 31.1667 | Shanghai |
4 | 9 | 15 | rain | 1 | 4 | 69 | lightrainnight | 2022-02-03 15:00:00 | NE | 2 | 121.4667 | 31.1667 | Shanghai |
小结¶
我们可以通过简单的循环来获取多个城市的天气信息,但是性能不佳。