清理代码
Contents
清理代码¶
我们有必要清理一下我们的代码,让整个项目更整洁。
创建函数¶
为了更好的复用代码,我们将我们关键代码变为函数。
import requests
import json
import pandas as pd
from geopy.geocoders import Nominatim
# function to get url by inputing the city name
def get_geo_from_city(city):
geolocator = Nominatim(user_agent='baidu')
location = geolocator.geocode(city)
return location.longitude,location.latitude
def generate_url(longitude,latitude):
url = f'https://www.7timer.info/bin/api.pl?lon={longitude}&lat={latitude}&product=civil&output=json'
return url
# function get weather raw info
def request_weather_info(url):
r = requests.get(url)
text_j= json.loads(r.text)
return text_j
测试函数¶
对于创建好的函数,一定要先测试一下,否则后面bug套bug,比较累。
# test functions
city='shanghai'
lon,lat = get_geo_from_city(city)
url = generate_url(lon,lat)
text_j = request_weather_info(url)
text_j['dataseries'][0]
{'timepoint': 3,
'cloudcover': 9,
'lifted_index': 15,
'prec_type': 'rain',
'prec_amount': 1,
'temp2m': 4,
'rh2m': '70%',
'wind10m': {'direction': 'NE', 'speed': 3},
'weather': 'lightrainday'}
添加更多变换¶
我们需要对原始的代码稍作升级,这样可以保存更多的信息。
我们需要把wind列分成两列,风速和风向
同时,我们需要把更多的信息保存到DataFrame中,比如城市信息。因为我们后面会添加多个城市的信息
# transform data
def transform_weather_raw(text_j):
weather_info = pd.DataFrame(text_j['dataseries'])
start_time = pd.to_datetime(text_j['init'],format='%Y%m%d%H')
weather_info['timepoint'] = pd.to_timedelta(weather_info['timepoint'],unit='h')
weather_info['timestamp'] = start_time+ weather_info['timepoint']
weather_info.drop('timepoint',axis=1,inplace=True)
# more clean data steps
wind_df = pd.json_normalize(weather_info['wind10m'])
wind_df.columns = ['wind_'+col for col in wind_df.columns]
weather_info = pd.concat([weather_info,wind_df],axis=1)
weather_info.drop('wind10m',axis=1,inplace=True)
weather_info['rh2m'] = weather_info['rh2m'].str.rstrip('%')
#['']
return weather_info
def add_city_info(weather_info,longitude,latitude,city):
weather_info['longitude'] = longitude
weather_info['latitude'] = latitude
weather_info['city'] = city
return weather_info
weather_info_df = transform_weather_raw(text_j)
weather_info_df = add_city_info(weather_info_df,lon,lat,city)
weather_info_df
cloudcover | lifted_index | prec_type | prec_amount | temp2m | rh2m | weather | timestamp | wind_direction | wind_speed | longitude | latitude | city | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9 | 15 | rain | 1 | 4 | 70 | lightrainday | 2022-02-02 03:00:00 | NE | 3 | 121.469207 | 31.232276 | shanghai |
1 | 9 | 15 | rain | 2 | 4 | 70 | lightrainday | 2022-02-02 06:00:00 | NE | 3 | 121.469207 | 31.232276 | shanghai |
2 | 9 | 15 | rain | 2 | 4 | 70 | lightrainday | 2022-02-02 09:00:00 | NE | 3 | 121.469207 | 31.232276 | shanghai |
3 | 9 | 15 | rain | 2 | 3 | 87 | lightrainnight | 2022-02-02 12:00:00 | NE | 3 | 121.469207 | 31.232276 | shanghai |
4 | 9 | 15 | rain | 2 | 3 | 90 | lightrainnight | 2022-02-02 15:00:00 | NE | 3 | 121.469207 | 31.232276 | shanghai |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
59 | 9 | 15 | rain | 4 | 5 | 74 | rainnight | 2022-02-09 12:00:00 | N | 3 | 121.469207 | 31.232276 | shanghai |
60 | 9 | 15 | none | 4 | 5 | 73 | cloudynight | 2022-02-09 15:00:00 | N | 3 | 121.469207 | 31.232276 | shanghai |
61 | 7 | 15 | none | 4 | 4 | 78 | mcloudynight | 2022-02-09 18:00:00 | N | 3 | 121.469207 | 31.232276 | shanghai |
62 | 2 | 15 | none | 4 | 4 | 83 | clearnight | 2022-02-09 21:00:00 | N | 3 | 121.469207 | 31.232276 | shanghai |
63 | 2 | 15 | none | 4 | 4 | 75 | clearday | 2022-02-10 00:00:00 | N | 3 | 121.469207 | 31.232276 | shanghai |
64 rows × 13 columns