{ "cells": [ { "cell_type": "markdown", "id": "904406ec", "metadata": {}, "source": [ "# 清理代码\n", "\n", "我们有必要清理一下我们的代码,让整个项目更整洁。\n" ] }, { "cell_type": "markdown", "id": "23419d67", "metadata": {}, "source": [ "## 创建函数\n", "\n", "为了更好的复用代码,我们将我们关键代码变为函数。\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "82ac2aeb", "metadata": {}, "outputs": [], "source": [ "import requests\n", "import json\n", "import pandas as pd\n", "from geopy.geocoders import Nominatim" ] }, { "cell_type": "code", "execution_count": 2, "id": "5c3dd5a1", "metadata": {}, "outputs": [], "source": [ "# function to get url by inputing the city name\n", "\n", "def get_geo_from_city(city):\n", " geolocator = Nominatim(user_agent='baidu')\n", " location = geolocator.geocode(city)\n", " return location.longitude,location.latitude\n", "\n", "def generate_url(longitude,latitude):\n", " url = f'https://www.7timer.info/bin/api.pl?lon={longitude}&lat={latitude}&product=civil&output=json'\n", " return url\n", "\n", "# function get weather raw info\n", "def request_weather_info(url):\n", " r = requests.get(url)\n", " text_j= json.loads(r.text)\n", " return text_j" ] }, { "cell_type": "markdown", "id": "339c3f1c", "metadata": {}, "source": [ "## 测试函数\n", "对于创建好的函数,一定要先测试一下,否则后面bug套bug,比较累。" ] }, { "cell_type": "code", "execution_count": 3, "id": "49ea4d4e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'timepoint': 3,\n", " 'cloudcover': 9,\n", " 'lifted_index': 15,\n", " 'prec_type': 'rain',\n", " 'prec_amount': 1,\n", " 'temp2m': 4,\n", " 'rh2m': '70%',\n", " 'wind10m': {'direction': 'NE', 'speed': 3},\n", " 'weather': 'lightrainday'}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# test functions\n", "city='shanghai'\n", "lon,lat = get_geo_from_city(city)\n", "url = generate_url(lon,lat)\n", "text_j = request_weather_info(url)\n", "text_j['dataseries'][0]" ] }, { "cell_type": "markdown", "id": "d30161b8", "metadata": {}, "source": [ "## 添加更多变换\n", "\n", "我们需要对原始的代码稍作升级,这样可以保存更多的信息。\n", "\n", "1. 我们需要把wind列分成两列,风速和风向\n", "2. 同时,我们需要把更多的信息保存到DataFrame中,比如城市信息。因为我们后面会添加多个城市的信息" ] }, { "cell_type": "code", "execution_count": 4, "id": "dd4129e5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cloudcoverlifted_indexprec_typeprec_amounttemp2mrh2mweathertimestampwind_directionwind_speedlongitudelatitudecity
0915rain1470lightrainday2022-02-02 03:00:00NE3121.46920731.232276shanghai
1915rain2470lightrainday2022-02-02 06:00:00NE3121.46920731.232276shanghai
2915rain2470lightrainday2022-02-02 09:00:00NE3121.46920731.232276shanghai
3915rain2387lightrainnight2022-02-02 12:00:00NE3121.46920731.232276shanghai
4915rain2390lightrainnight2022-02-02 15:00:00NE3121.46920731.232276shanghai
..........................................
59915rain4574rainnight2022-02-09 12:00:00N3121.46920731.232276shanghai
60915none4573cloudynight2022-02-09 15:00:00N3121.46920731.232276shanghai
61715none4478mcloudynight2022-02-09 18:00:00N3121.46920731.232276shanghai
62215none4483clearnight2022-02-09 21:00:00N3121.46920731.232276shanghai
63215none4475clearday2022-02-10 00:00:00N3121.46920731.232276shanghai
\n", "

64 rows × 13 columns

\n", "
" ], "text/plain": [ " cloudcover lifted_index prec_type prec_amount temp2m rh2m \\\n", "0 9 15 rain 1 4 70 \n", "1 9 15 rain 2 4 70 \n", "2 9 15 rain 2 4 70 \n", "3 9 15 rain 2 3 87 \n", "4 9 15 rain 2 3 90 \n", ".. ... ... ... ... ... ... \n", "59 9 15 rain 4 5 74 \n", "60 9 15 none 4 5 73 \n", "61 7 15 none 4 4 78 \n", "62 2 15 none 4 4 83 \n", "63 2 15 none 4 4 75 \n", "\n", " weather timestamp wind_direction wind_speed longitude \\\n", "0 lightrainday 2022-02-02 03:00:00 NE 3 121.469207 \n", "1 lightrainday 2022-02-02 06:00:00 NE 3 121.469207 \n", "2 lightrainday 2022-02-02 09:00:00 NE 3 121.469207 \n", "3 lightrainnight 2022-02-02 12:00:00 NE 3 121.469207 \n", "4 lightrainnight 2022-02-02 15:00:00 NE 3 121.469207 \n", ".. ... ... ... ... ... \n", "59 rainnight 2022-02-09 12:00:00 N 3 121.469207 \n", "60 cloudynight 2022-02-09 15:00:00 N 3 121.469207 \n", "61 mcloudynight 2022-02-09 18:00:00 N 3 121.469207 \n", "62 clearnight 2022-02-09 21:00:00 N 3 121.469207 \n", "63 clearday 2022-02-10 00:00:00 N 3 121.469207 \n", "\n", " latitude city \n", "0 31.232276 shanghai \n", "1 31.232276 shanghai \n", "2 31.232276 shanghai \n", "3 31.232276 shanghai \n", "4 31.232276 shanghai \n", ".. ... ... \n", "59 31.232276 shanghai \n", "60 31.232276 shanghai \n", "61 31.232276 shanghai \n", "62 31.232276 shanghai \n", "63 31.232276 shanghai \n", "\n", "[64 rows x 13 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# transform data\n", "\n", "def transform_weather_raw(text_j):\n", " weather_info = pd.DataFrame(text_j['dataseries'])\n", " start_time = pd.to_datetime(text_j['init'],format='%Y%m%d%H')\n", " weather_info['timepoint'] = pd.to_timedelta(weather_info['timepoint'],unit='h')\n", " weather_info['timestamp'] = start_time+ weather_info['timepoint']\n", " weather_info.drop('timepoint',axis=1,inplace=True)\n", " # more clean data steps\n", " wind_df = pd.json_normalize(weather_info['wind10m'])\n", " wind_df.columns = ['wind_'+col for col in wind_df.columns]\n", " weather_info = pd.concat([weather_info,wind_df],axis=1)\n", " weather_info.drop('wind10m',axis=1,inplace=True)\n", " weather_info['rh2m'] = weather_info['rh2m'].str.rstrip('%')\n", " #['']\n", " return weather_info\n", "\n", "def add_city_info(weather_info,longitude,latitude,city):\n", " weather_info['longitude'] = longitude\n", " weather_info['latitude'] = latitude\n", " weather_info['city'] = city\n", " return weather_info\n", "\n", "weather_info_df = transform_weather_raw(text_j)\n", "weather_info_df = add_city_info(weather_info_df,lon,lat,city)\n", "weather_info_df" ] } ], "metadata": { "kernelspec": { "display_name": "dash", "language": "python", "name": "dash" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }