清理代码

我们有必要清理一下我们的代码,让整个项目更整洁。

创建函数

为了更好的复用代码,我们将我们关键代码变为函数。

import requests
import json
import pandas as pd
from geopy.geocoders import Nominatim
# function to get url by inputing the city name

def get_geo_from_city(city):
    geolocator = Nominatim(user_agent='baidu')
    location = geolocator.geocode(city)
    return location.longitude,location.latitude

def generate_url(longitude,latitude):
    url = f'https://www.7timer.info/bin/api.pl?lon={longitude}&lat={latitude}&product=civil&output=json'
    return url

# function get weather raw info
def request_weather_info(url):
    r = requests.get(url)
    text_j= json.loads(r.text)
    return text_j

测试函数

对于创建好的函数,一定要先测试一下,否则后面bug套bug,比较累。

# test functions
city='shanghai'
lon,lat = get_geo_from_city(city)
url = generate_url(lon,lat)
text_j = request_weather_info(url)
text_j['dataseries'][0]
{'timepoint': 3,
 'cloudcover': 9,
 'lifted_index': 15,
 'prec_type': 'rain',
 'prec_amount': 1,
 'temp2m': 4,
 'rh2m': '70%',
 'wind10m': {'direction': 'NE', 'speed': 3},
 'weather': 'lightrainday'}

添加更多变换

我们需要对原始的代码稍作升级,这样可以保存更多的信息。

  1. 我们需要把wind列分成两列,风速和风向

  2. 同时,我们需要把更多的信息保存到DataFrame中,比如城市信息。因为我们后面会添加多个城市的信息

# transform data

def transform_weather_raw(text_j):
    weather_info = pd.DataFrame(text_j['dataseries'])
    start_time = pd.to_datetime(text_j['init'],format='%Y%m%d%H')
    weather_info['timepoint'] = pd.to_timedelta(weather_info['timepoint'],unit='h')
    weather_info['timestamp'] = start_time+ weather_info['timepoint']
    weather_info.drop('timepoint',axis=1,inplace=True)
    # more clean data steps
    wind_df = pd.json_normalize(weather_info['wind10m'])
    wind_df.columns = ['wind_'+col for col in wind_df.columns]
    weather_info = pd.concat([weather_info,wind_df],axis=1)
    weather_info.drop('wind10m',axis=1,inplace=True)
    weather_info['rh2m'] = weather_info['rh2m'].str.rstrip('%')
    #['']
    return weather_info

def add_city_info(weather_info,longitude,latitude,city):
    weather_info['longitude'] = longitude
    weather_info['latitude'] = latitude
    weather_info['city'] = city
    return weather_info

weather_info_df = transform_weather_raw(text_j)
weather_info_df = add_city_info(weather_info_df,lon,lat,city)
weather_info_df
cloudcover lifted_index prec_type prec_amount temp2m rh2m weather timestamp wind_direction wind_speed longitude latitude city
0 9 15 rain 1 4 70 lightrainday 2022-02-02 03:00:00 NE 3 121.469207 31.232276 shanghai
1 9 15 rain 2 4 70 lightrainday 2022-02-02 06:00:00 NE 3 121.469207 31.232276 shanghai
2 9 15 rain 2 4 70 lightrainday 2022-02-02 09:00:00 NE 3 121.469207 31.232276 shanghai
3 9 15 rain 2 3 87 lightrainnight 2022-02-02 12:00:00 NE 3 121.469207 31.232276 shanghai
4 9 15 rain 2 3 90 lightrainnight 2022-02-02 15:00:00 NE 3 121.469207 31.232276 shanghai
... ... ... ... ... ... ... ... ... ... ... ... ... ...
59 9 15 rain 4 5 74 rainnight 2022-02-09 12:00:00 N 3 121.469207 31.232276 shanghai
60 9 15 none 4 5 73 cloudynight 2022-02-09 15:00:00 N 3 121.469207 31.232276 shanghai
61 7 15 none 4 4 78 mcloudynight 2022-02-09 18:00:00 N 3 121.469207 31.232276 shanghai
62 2 15 none 4 4 83 clearnight 2022-02-09 21:00:00 N 3 121.469207 31.232276 shanghai
63 2 15 none 4 4 75 clearday 2022-02-10 00:00:00 N 3 121.469207 31.232276 shanghai

64 rows × 13 columns