处理多城市

获取全球城市

我们已经知道如何获取一个城市的的天气信息,如果有多个城市,操作方式也是类似的,无非就是添加一个循环而已。

但是如何获取城市列表呢? 这里推荐一个网站: https://simplemaps.com/data/world-cities

这里面包含全球城市的完整信息,不仅包含城市名称,还包括国家,经纬度,是否是首都,省会等信息。

下载后,解压,就可以通过pandas 读取了。

import pandas as pd
city_file = '../data/worldcities.csv'
city_df = pd.read_csv(city_file,encoding='utf-8')
city_df
city city_ascii lat lng country iso2 iso3 admin_name capital population id
0 Tokyo Tokyo 35.6897 139.6922 Japan JP JPN Tōkyō primary 37977000.0 1392685764
1 Jakarta Jakarta -6.2146 106.8451 Indonesia ID IDN Jakarta primary 34540000.0 1360771077
2 Delhi Delhi 28.6600 77.2300 India IN IND Delhi admin 29617000.0 1356872604
3 Mumbai Mumbai 18.9667 72.8333 India IN IND Mahārāshtra admin 23355000.0 1356226629
4 Manila Manila 14.6000 120.9833 Philippines PH PHL Manila primary 23088000.0 1608618140
... ... ... ... ... ... ... ... ... ... ... ...
40996 Tukchi Tukchi 57.3670 139.5000 Russia RU RUS Khabarovskiy Kray NaN 10.0 1643472801
40997 Numto Numto 63.6667 71.3333 Russia RU RUS Khanty-Mansiyskiy Avtonomnyy Okrug-Yugra NaN 10.0 1643985006
40998 Nord Nord 81.7166 -17.8000 Greenland GL GRL Sermersooq NaN 10.0 1304217709
40999 Timmiarmiut Timmiarmiut 62.5333 -42.2167 Greenland GL GRL Kujalleq NaN 10.0 1304206491
41000 Nordvik Nordvik 74.0165 111.5100 Russia RU RUS Krasnoyarskiy Kray NaN 0.0 1643587468

41001 rows × 11 columns

数据筛选

我们仅仅对国内的主要城市感兴趣,所以,我们需要对DataFrame进行筛选。

capital_china = city_df[(city_df['country']=='China') & (city_df['capital'] is not None)]
capital_china
city city_ascii lat lng country iso2 iso3 admin_name capital population id
5 Shanghai Shanghai 31.1667 121.4667 China CN CHN Shanghai admin 22120000.0 1156073548
9 Guangzhou Guangzhou 23.1288 113.2590 China CN CHN Guangdong admin 20902000.0 1156237133
10 Beijing Beijing 39.9050 116.3914 China CN CHN Beijing primary 19433000.0 1156228865
17 Shenzhen Shenzhen 22.5350 114.0540 China CN CHN Guangdong minor 15929000.0 1156158707
29 Nanyang Nanyang 32.9987 112.5292 China CN CHN Henan NaN 12010000.0 1156192287
... ... ... ... ... ... ... ... ... ... ... ...
40725 Taoyan Taoyan 34.7706 103.7903 China CN CHN Gansu NaN 5329.0 1156019900
40744 Jingping Jingping 33.7844 104.3652 China CN CHN Gansu NaN 5149.0 1156005145
40776 Dayi Dayi 33.8312 104.0362 China CN CHN Gansu NaN 5114.0 1156108713
40782 Biancang Biancang 33.9007 104.0321 China CN CHN Gansu NaN 5040.0 1156724811
40938 Nichicun Nichicun 29.5333 94.4167 China CN CHN Tibet NaN 100.0 1156860651

1498 rows × 11 columns

小技巧

一个好的习惯是,先对单个数据进行处理,确保无误后,再对所有数据进行循环。

所以我们这里依然需要测一下我们的函数能否获取单个城市的天气信息。

import os
import sys

module_path = os.path.abspath(os.path.join('..'))
print(module_path)
if module_path not in sys.path:
    sys.path.append(module_path)
C:\Users\renb\PycharmProjects\weather_dashapp\weather_book
from weather_app.models.query_api import get_geo_from_city,generate_url,request_weather_info,transform_weather_raw,add_city_info
city_info = capital_china.iloc[0,:].to_dict()
city_info
{'city': 'Shanghai',
 'city_ascii': 'Shanghai',
 'lat': 31.1667,
 'lng': 121.4667,
 'country': 'China',
 'iso2': 'CN',
 'iso3': 'CHN',
 'admin_name': 'Shanghai',
 'capital': 'admin',
 'population': 22120000.0,
 'id': 1156073548}
city = city_info['city']
lon = city_info['lng']
lat = city_info['lat']
url = generate_url(longitude=lon,latitude=lat)
text_j = request_weather_info(url)
weather_info_df = transform_weather_raw(text_j)
weather_info_df = add_city_info(weather_info_df,lon,lat,city)
weather_info_df
cloudcover lifted_index prec_type prec_amount temp2m rh2m weather timestamp wind_direction wind_speed longitude latitude city
0 9 15 none 0 5 53 cloudyday 2022-02-03 03:00:00 NE 3 121.4667 31.1667 Shanghai
1 9 15 none 0 5 51 cloudyday 2022-02-03 06:00:00 NE 3 121.4667 31.1667 Shanghai
2 9 15 none 1 4 72 cloudyday 2022-02-03 09:00:00 NE 3 121.4667 31.1667 Shanghai
3 9 15 rain 1 4 81 lightrainnight 2022-02-03 12:00:00 NE 3 121.4667 31.1667 Shanghai
4 9 15 rain 1 4 66 lightrainnight 2022-02-03 15:00:00 NE 2 121.4667 31.1667 Shanghai
... ... ... ... ... ... ... ... ... ... ... ... ... ...
59 3 15 none 4 5 72 pcloudynight 2022-02-10 12:00:00 E 3 121.4667 31.1667 Shanghai
60 9 15 none 4 5 70 cloudynight 2022-02-10 15:00:00 SE 2 121.4667 31.1667 Shanghai
61 9 15 none 4 5 71 cloudynight 2022-02-10 18:00:00 SE 2 121.4667 31.1667 Shanghai
62 9 15 none 4 5 74 cloudynight 2022-02-10 21:00:00 NE 3 121.4667 31.1667 Shanghai
63 9 15 none 4 5 63 cloudyday 2022-02-11 00:00:00 NE 3 121.4667 31.1667 Shanghai

64 rows × 13 columns

获取多个城市信息

确保单个城市可以准确获取信息后,我们建立循环。通过iterrows 逐行进行处理。

这里我们先用10个城市测测性能。

可以看到每个循环需要等网站回复很慢,2-3秒一个循环,如果是2000个城市的话,需要80分钟

from tqdm.notebook  import tqdm
all_cities_df = pd.DataFrame()
for index,city_info in tqdm(capital_china.iloc[0:10,:].iterrows()):
    city = city_info['city']
    print(city)
    lon = city_info['lng']
    lat = city_info['lat']
    url = generate_url(longitude=lon,latitude=lat)
    text_j = request_weather_info(url)
    weather_info_df = transform_weather_raw(text_j)
    weather_info_df = add_city_info(weather_info_df,lon,lat,city)
    all_cities_df = pd.concat([all_cities_df,weather_info_df],axis=0)
Shanghai
Guangzhou
Beijing
Shenzhen
Nanyang
Baoding
Chengdu
Linyi
Tianjin
Shijiazhuang
all_cities_df.head()
cloudcover lifted_index prec_type prec_amount temp2m rh2m weather timestamp wind_direction wind_speed longitude latitude city
0 9 15 none 0 5 50 cloudyday 2022-02-03 03:00:00 NE 3 121.4667 31.1667 Shanghai
1 9 15 none 0 5 55 cloudyday 2022-02-03 06:00:00 NE 3 121.4667 31.1667 Shanghai
2 9 15 none 1 4 73 cloudyday 2022-02-03 09:00:00 NE 3 121.4667 31.1667 Shanghai
3 9 15 rain 1 4 83 lightrainnight 2022-02-03 12:00:00 NE 3 121.4667 31.1667 Shanghai
4 9 15 rain 1 4 69 lightrainnight 2022-02-03 15:00:00 NE 2 121.4667 31.1667 Shanghai

小结

我们可以通过简单的循环来获取多个城市的天气信息,但是性能不佳。