{ "cells": [ { "cell_type": "markdown", "id": "3e087ca9", "metadata": {}, "source": [ "# 处理多城市\n", "\n", "## 获取全球城市\n", "我们已经知道如何获取一个城市的的天气信息,如果有多个城市,操作方式也是类似的,无非就是添加一个循环而已。\n", "\n", "但是如何获取城市列表呢? \n", "这里推荐一个网站: https://simplemaps.com/data/world-cities\n", "\n", "这里面包含全球城市的完整信息,不仅包含城市名称,还包括国家,经纬度,是否是首都,省会等信息。\n", "\n", "下载后,解压,就可以通过pandas 读取了。" ] }, { "cell_type": "code", "execution_count": 1, "id": "1d1c95ce", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "c9adf614", "metadata": {}, "outputs": [], "source": [ "city_file = '../data/worldcities.csv'\n", "city_df = pd.read_csv(city_file,encoding='utf-8')" ] }, { "cell_type": "code", "execution_count": 3, "id": "400d50ce", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citycity_asciilatlngcountryiso2iso3admin_namecapitalpopulationid
0TokyoTokyo35.6897139.6922JapanJPJPNTōkyōprimary37977000.01392685764
1JakartaJakarta-6.2146106.8451IndonesiaIDIDNJakartaprimary34540000.01360771077
2DelhiDelhi28.660077.2300IndiaININDDelhiadmin29617000.01356872604
3MumbaiMumbai18.966772.8333IndiaININDMahārāshtraadmin23355000.01356226629
4ManilaManila14.6000120.9833PhilippinesPHPHLManilaprimary23088000.01608618140
....................................
40996TukchiTukchi57.3670139.5000RussiaRURUSKhabarovskiy KrayNaN10.01643472801
40997NumtoNumto63.666771.3333RussiaRURUSKhanty-Mansiyskiy Avtonomnyy Okrug-YugraNaN10.01643985006
40998NordNord81.7166-17.8000GreenlandGLGRLSermersooqNaN10.01304217709
40999TimmiarmiutTimmiarmiut62.5333-42.2167GreenlandGLGRLKujalleqNaN10.01304206491
41000NordvikNordvik74.0165111.5100RussiaRURUSKrasnoyarskiy KrayNaN0.01643587468
\n", "

41001 rows × 11 columns

\n", "
" ], "text/plain": [ " city city_ascii lat lng country iso2 iso3 \\\n", "0 Tokyo Tokyo 35.6897 139.6922 Japan JP JPN \n", "1 Jakarta Jakarta -6.2146 106.8451 Indonesia ID IDN \n", "2 Delhi Delhi 28.6600 77.2300 India IN IND \n", "3 Mumbai Mumbai 18.9667 72.8333 India IN IND \n", "4 Manila Manila 14.6000 120.9833 Philippines PH PHL \n", "... ... ... ... ... ... ... ... \n", "40996 Tukchi Tukchi 57.3670 139.5000 Russia RU RUS \n", "40997 Numto Numto 63.6667 71.3333 Russia RU RUS \n", "40998 Nord Nord 81.7166 -17.8000 Greenland GL GRL \n", "40999 Timmiarmiut Timmiarmiut 62.5333 -42.2167 Greenland GL GRL \n", "41000 Nordvik Nordvik 74.0165 111.5100 Russia RU RUS \n", "\n", " admin_name capital population \\\n", "0 Tōkyō primary 37977000.0 \n", "1 Jakarta primary 34540000.0 \n", "2 Delhi admin 29617000.0 \n", "3 Mahārāshtra admin 23355000.0 \n", "4 Manila primary 23088000.0 \n", "... ... ... ... \n", "40996 Khabarovskiy Kray NaN 10.0 \n", "40997 Khanty-Mansiyskiy Avtonomnyy Okrug-Yugra NaN 10.0 \n", "40998 Sermersooq NaN 10.0 \n", "40999 Kujalleq NaN 10.0 \n", "41000 Krasnoyarskiy Kray NaN 0.0 \n", "\n", " id \n", "0 1392685764 \n", "1 1360771077 \n", "2 1356872604 \n", "3 1356226629 \n", "4 1608618140 \n", "... ... \n", "40996 1643472801 \n", "40997 1643985006 \n", "40998 1304217709 \n", "40999 1304206491 \n", "41000 1643587468 \n", "\n", "[41001 rows x 11 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "city_df" ] }, { "cell_type": "markdown", "id": "9284ef76", "metadata": {}, "source": [ "## 数据筛选\n", "\n", "我们仅仅对国内的主要城市感兴趣,所以,我们需要对DataFrame进行筛选。" ] }, { "cell_type": "code", "execution_count": 4, "id": "bc950707", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citycity_asciilatlngcountryiso2iso3admin_namecapitalpopulationid
5ShanghaiShanghai31.1667121.4667ChinaCNCHNShanghaiadmin22120000.01156073548
9GuangzhouGuangzhou23.1288113.2590ChinaCNCHNGuangdongadmin20902000.01156237133
10BeijingBeijing39.9050116.3914ChinaCNCHNBeijingprimary19433000.01156228865
17ShenzhenShenzhen22.5350114.0540ChinaCNCHNGuangdongminor15929000.01156158707
29NanyangNanyang32.9987112.5292ChinaCNCHNHenanNaN12010000.01156192287
....................................
40725TaoyanTaoyan34.7706103.7903ChinaCNCHNGansuNaN5329.01156019900
40744JingpingJingping33.7844104.3652ChinaCNCHNGansuNaN5149.01156005145
40776DayiDayi33.8312104.0362ChinaCNCHNGansuNaN5114.01156108713
40782BiancangBiancang33.9007104.0321ChinaCNCHNGansuNaN5040.01156724811
40938NichicunNichicun29.533394.4167ChinaCNCHNTibetNaN100.01156860651
\n", "

1498 rows × 11 columns

\n", "
" ], "text/plain": [ " city city_ascii lat lng country iso2 iso3 admin_name \\\n", "5 Shanghai Shanghai 31.1667 121.4667 China CN CHN Shanghai \n", "9 Guangzhou Guangzhou 23.1288 113.2590 China CN CHN Guangdong \n", "10 Beijing Beijing 39.9050 116.3914 China CN CHN Beijing \n", "17 Shenzhen Shenzhen 22.5350 114.0540 China CN CHN Guangdong \n", "29 Nanyang Nanyang 32.9987 112.5292 China CN CHN Henan \n", "... ... ... ... ... ... ... ... ... \n", "40725 Taoyan Taoyan 34.7706 103.7903 China CN CHN Gansu \n", "40744 Jingping Jingping 33.7844 104.3652 China CN CHN Gansu \n", "40776 Dayi Dayi 33.8312 104.0362 China CN CHN Gansu \n", "40782 Biancang Biancang 33.9007 104.0321 China CN CHN Gansu \n", "40938 Nichicun Nichicun 29.5333 94.4167 China CN CHN Tibet \n", "\n", " capital population id \n", "5 admin 22120000.0 1156073548 \n", "9 admin 20902000.0 1156237133 \n", "10 primary 19433000.0 1156228865 \n", "17 minor 15929000.0 1156158707 \n", "29 NaN 12010000.0 1156192287 \n", "... ... ... ... \n", "40725 NaN 5329.0 1156019900 \n", "40744 NaN 5149.0 1156005145 \n", "40776 NaN 5114.0 1156108713 \n", "40782 NaN 5040.0 1156724811 \n", "40938 NaN 100.0 1156860651 \n", "\n", "[1498 rows x 11 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "capital_china = city_df[(city_df['country']=='China') & (city_df['capital'] is not None)]\n", "capital_china" ] }, { "cell_type": "markdown", "id": "c16830e9", "metadata": {}, "source": [ "## 小技巧\n", "一个好的习惯是,先对单个数据进行处理,确保无误后,再对所有数据进行循环。\n", "\n", "所以我们这里依然需要测一下我们的函数能否获取单个城市的天气信息。" ] }, { "cell_type": "code", "execution_count": 5, "id": "85187c7e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C:\\Users\\renb\\PycharmProjects\\weather_dashapp\\weather_book\n" ] } ], "source": [ "import os\n", "import sys\n", "\n", "module_path = os.path.abspath(os.path.join('..'))\n", "print(module_path)\n", "if module_path not in sys.path:\n", " sys.path.append(module_path)" ] }, { "cell_type": "code", "execution_count": 6, "id": "77892dbe", "metadata": {}, "outputs": [], "source": [ "from weather_app.models.query_api import get_geo_from_city,generate_url,request_weather_info,transform_weather_raw,add_city_info\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "0c7a3bcc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'city': 'Shanghai',\n", " 'city_ascii': 'Shanghai',\n", " 'lat': 31.1667,\n", " 'lng': 121.4667,\n", " 'country': 'China',\n", " 'iso2': 'CN',\n", " 'iso3': 'CHN',\n", " 'admin_name': 'Shanghai',\n", " 'capital': 'admin',\n", " 'population': 22120000.0,\n", " 'id': 1156073548}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "city_info = capital_china.iloc[0,:].to_dict()\n", "city_info" ] }, { "cell_type": "code", "execution_count": 8, "id": "b2a6ad3a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cloudcoverlifted_indexprec_typeprec_amounttemp2mrh2mweathertimestampwind_directionwind_speedlongitudelatitudecity
0915none0553cloudyday2022-02-03 03:00:00NE3121.466731.1667Shanghai
1915none0551cloudyday2022-02-03 06:00:00NE3121.466731.1667Shanghai
2915none1472cloudyday2022-02-03 09:00:00NE3121.466731.1667Shanghai
3915rain1481lightrainnight2022-02-03 12:00:00NE3121.466731.1667Shanghai
4915rain1466lightrainnight2022-02-03 15:00:00NE2121.466731.1667Shanghai
..........................................
59315none4572pcloudynight2022-02-10 12:00:00E3121.466731.1667Shanghai
60915none4570cloudynight2022-02-10 15:00:00SE2121.466731.1667Shanghai
61915none4571cloudynight2022-02-10 18:00:00SE2121.466731.1667Shanghai
62915none4574cloudynight2022-02-10 21:00:00NE3121.466731.1667Shanghai
63915none4563cloudyday2022-02-11 00:00:00NE3121.466731.1667Shanghai
\n", "

64 rows × 13 columns

\n", "
" ], "text/plain": [ " cloudcover lifted_index prec_type prec_amount temp2m rh2m \\\n", "0 9 15 none 0 5 53 \n", "1 9 15 none 0 5 51 \n", "2 9 15 none 1 4 72 \n", "3 9 15 rain 1 4 81 \n", "4 9 15 rain 1 4 66 \n", ".. ... ... ... ... ... ... \n", "59 3 15 none 4 5 72 \n", "60 9 15 none 4 5 70 \n", "61 9 15 none 4 5 71 \n", "62 9 15 none 4 5 74 \n", "63 9 15 none 4 5 63 \n", "\n", " weather timestamp wind_direction wind_speed longitude \\\n", "0 cloudyday 2022-02-03 03:00:00 NE 3 121.4667 \n", "1 cloudyday 2022-02-03 06:00:00 NE 3 121.4667 \n", "2 cloudyday 2022-02-03 09:00:00 NE 3 121.4667 \n", "3 lightrainnight 2022-02-03 12:00:00 NE 3 121.4667 \n", "4 lightrainnight 2022-02-03 15:00:00 NE 2 121.4667 \n", ".. ... ... ... ... ... \n", "59 pcloudynight 2022-02-10 12:00:00 E 3 121.4667 \n", "60 cloudynight 2022-02-10 15:00:00 SE 2 121.4667 \n", "61 cloudynight 2022-02-10 18:00:00 SE 2 121.4667 \n", "62 cloudynight 2022-02-10 21:00:00 NE 3 121.4667 \n", "63 cloudyday 2022-02-11 00:00:00 NE 3 121.4667 \n", "\n", " latitude city \n", "0 31.1667 Shanghai \n", "1 31.1667 Shanghai \n", "2 31.1667 Shanghai \n", "3 31.1667 Shanghai \n", "4 31.1667 Shanghai \n", ".. ... ... \n", "59 31.1667 Shanghai \n", "60 31.1667 Shanghai \n", "61 31.1667 Shanghai \n", "62 31.1667 Shanghai \n", "63 31.1667 Shanghai \n", "\n", "[64 rows x 13 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "city = city_info['city']\n", "lon = city_info['lng']\n", "lat = city_info['lat']\n", "url = generate_url(longitude=lon,latitude=lat)\n", "text_j = request_weather_info(url)\n", "weather_info_df = transform_weather_raw(text_j)\n", "weather_info_df = add_city_info(weather_info_df,lon,lat,city)\n", "weather_info_df" ] }, { "cell_type": "markdown", "id": "96ee56a4", "metadata": {}, "source": [ "## 获取多个城市信息\n", "\n", "确保单个城市可以准确获取信息后,我们建立循环。通过iterrows 逐行进行处理。\n", "\n", "这里我们先用10个城市测测性能。\n", "\n", "可以看到每个循环需要等网站回复很慢,2-3秒一个循环,如果是2000个城市的话,需要80分钟" ] }, { "cell_type": "code", "execution_count": 9, "id": "f2ae20e5", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5cde173a3cc74f1a8e543f63ad0e88c5", "version_major": 2, "version_minor": 0 }, "text/plain": [ "0it [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Shanghai\n", "Guangzhou\n", "Beijing\n", "Shenzhen\n", "Nanyang\n", "Baoding\n", "Chengdu\n", "Linyi\n", "Tianjin\n", "Shijiazhuang\n" ] } ], "source": [ "from tqdm.notebook import tqdm\n", "all_cities_df = pd.DataFrame()\n", "for index,city_info in tqdm(capital_china.iloc[0:10,:].iterrows()):\n", " city = city_info['city']\n", " print(city)\n", " lon = city_info['lng']\n", " lat = city_info['lat']\n", " url = generate_url(longitude=lon,latitude=lat)\n", " text_j = request_weather_info(url)\n", " weather_info_df = transform_weather_raw(text_j)\n", " weather_info_df = add_city_info(weather_info_df,lon,lat,city)\n", " all_cities_df = pd.concat([all_cities_df,weather_info_df],axis=0)" ] }, { "cell_type": "code", "execution_count": 10, "id": "11daba18", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cloudcoverlifted_indexprec_typeprec_amounttemp2mrh2mweathertimestampwind_directionwind_speedlongitudelatitudecity
0915none0550cloudyday2022-02-03 03:00:00NE3121.466731.1667Shanghai
1915none0555cloudyday2022-02-03 06:00:00NE3121.466731.1667Shanghai
2915none1473cloudyday2022-02-03 09:00:00NE3121.466731.1667Shanghai
3915rain1483lightrainnight2022-02-03 12:00:00NE3121.466731.1667Shanghai
4915rain1469lightrainnight2022-02-03 15:00:00NE2121.466731.1667Shanghai
\n", "
" ], "text/plain": [ " cloudcover lifted_index prec_type prec_amount temp2m rh2m \\\n", "0 9 15 none 0 5 50 \n", "1 9 15 none 0 5 55 \n", "2 9 15 none 1 4 73 \n", "3 9 15 rain 1 4 83 \n", "4 9 15 rain 1 4 69 \n", "\n", " weather timestamp wind_direction wind_speed longitude \\\n", "0 cloudyday 2022-02-03 03:00:00 NE 3 121.4667 \n", "1 cloudyday 2022-02-03 06:00:00 NE 3 121.4667 \n", "2 cloudyday 2022-02-03 09:00:00 NE 3 121.4667 \n", "3 lightrainnight 2022-02-03 12:00:00 NE 3 121.4667 \n", "4 lightrainnight 2022-02-03 15:00:00 NE 2 121.4667 \n", "\n", " latitude city \n", "0 31.1667 Shanghai \n", "1 31.1667 Shanghai \n", "2 31.1667 Shanghai \n", "3 31.1667 Shanghai \n", "4 31.1667 Shanghai " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_cities_df.head()" ] }, { "cell_type": "markdown", "id": "44f8727c", "metadata": {}, "source": [ "## 小结\n", "我们可以通过简单的循环来获取多个城市的天气信息,但是性能不佳。" ] } ], "metadata": { "kernelspec": { "display_name": "dash", "language": "python", "name": "dash" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }