Python matplotlib + pandas クロス集計
やろうとしたこと(1):
天気と気温のクロス集計
使ったこと:
pandas.crosstab — pandas 0.16.2 documentation
参考:
エラー:
ValueError: If using all scalar values, you must pass an index
原因:
read.csvの代わりに appendを使って変数にデータを入れていた。
コード:
import MySQLdb
from pylab import *
import sys
import numpy as np
import matplotlib.pyplot as plt
import datetime
from matplotlib.dates import YearLocator, MonthLocator, DayLocator, DateFormatter
# new
from pandas import *
connection = MySQLdb.connect(host="localhost", db="agemono", user="root", passwd="password", charset="utf8")
cursor= connection.cursor()
weather=
avg_max_temp=
avg_sales=[]
cursor.execute("select avg_sales, weather, avg_max_temp from minatoa")
data = cursor.fetchall()
for row in data:
avg_sales.append(row[0])
weather.append(row[1])
avg_max_temp.append(row[2])
sales_weather_maxtemp = crosstab(weather, avg_max_temp)
# サンプルコードの場合: tips = read_csv('tips.csv')
# do cross examination
counts = crosstab(weather, avg_max_temp)
print( counts )
解決:
コード:
minatoa = read_csv('minatoA.csv')
# do cross examination
counts = crosstab(minatoa.weather, minatoa.avg_max_temp, aggfunc=[len, np.mean])
print( counts )
結果:
➜ matplotlib_test python cross_ex1.py
avg_max_temp 0 9 10 11 12 13 14 15 16 17 ... 23 24 25 26 27 \
weather ...
J 0 0 0 0 1 0 0 1 0 1 ... 1 0 1 0 0
J/ 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0
J/ 0 0 1 0 1 0 0 1 0 0 ... 1 0 1 0 1
/J 0 2 0 0 2 0 1 1 0 0 ... 1 0 1 1 0
/ 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0
/ 0 3 10 1 1 3 2 1 3 3 ... 2 3 3 2 3
0 13 22 12 9 3 7 6 4 2 ... 0 4 6 3 0
0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0
0 2 4 3 0 1 1 2 0 0 ... 2 1 0 4 3
/J 0 1 4 4 2 1 3 2 2 1 ... 0 3 6 1 5
/ 0 0 2 0 3 0 1 1 0 0 ... 1 1 1 1 1
/ 0 0 1 1 0 0 0 0 0 0 ... 0 0 0 0 0
ݒ 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0
avg_max_temp 28 29 30 31 32
weather
J 0 0 0 0 0
J/ 0 0 0 0 0
J/ 0 0 1 0 0
/J 0 1 2 0 0
/ 0 0 0 0 0
/ 2 3 9 3 1
2 0 7 1 6
0 0 0 0 0
2 0 4 1 1
/J 4 8 9 3 0
/ 0 0 6 2 0
/ 0 0 0 0 0
ݒ 0 0 0 0 0
[13 rows x 25 columns]
ーーーーーーーーー
やろうとしたこと(2):
天気と気温の組み合わせごとに、売り上げの平均値を出す
使ったこと:
pandas.DataFrame.groupby — pandas 0.16.2 documentation
コード:
import MySQLdb
from pylab import *
import sys
import numpy as np
import matplotlib.pyplot as plt
from pandas import *
minatoa = read_csv('minatoA.csv')
m = minatoa.groupby(['weather', 'avg_max_temp'])['avg_sales'].mean()
print (m)
結果:
➜ matplotlib_test python cross_ex1.py
weather avg_max_temp
cloudy 9 209.500000
10 329.000000
11 309.666667
13 385.000000
14 394.000000
15 375.500000
18 570.000000
19 422.500000
20 393.500000
21 234.000000
22 246.000000
23 405.000000
24 288.000000
26 360.250000
27 362.333333
28 387.500000
30 327.750000
31 212.000000
32 350.000000
cloudy/rainy 9 327.000000
10 422.250000
11 320.500000
12 442.000000
13 446.000000
14 338.333333
15 392.000000
16 293.500000
17 578.000000
18 364.500000
19 466.000000
...
sunny/cloudy 17 335.000000
18 261.000000
19 339.833333
20 427.600000
21 377.333333
22 348.000000
23 456.500000
24 381.000000
25 312.333333
26 350.000000
27 361.666667
28 369.000000
29 313.333333
30 277.555556
31 326.000000
32 209.000000
sunny/rainy 9 324.000000
12 344.000000
14 365.000000
15 496.000000
18 303.000000
20 320.666667
21 298.000000
22 486.000000
23 397.000000
25 347.000000
26 295.000000
29 550.000000
30 239.500000
sunny/snowy 10 277.000000
Name: avg_sales, dtype: float64