Python matplotlib + pandas クロス集計

やろうとしたこと（１）：

天気と気温のクロス集計

使ったこと：

pandas.crosstab — pandas 0.16.2 documentation

参考：

<a href="http://qiita.com/ynakayama/items/e37c222771db53a0e629" data-mce-href="http://qiita.com/ynakayama/items/e37c222771db53a0e629">Python - matplotlib (+ pandas) によるデータ可視化の方法 (4) - Qiita</a>qiita.com

エラー：

ValueError: If using all scalar values, you must pass an index

原因：

read.csvの代わりに appendを使って変数にデータを入れていた。

コード：

import MySQLdb
from pylab import *
import sys
import numpy as np
import matplotlib.pyplot as plt
import datetime
from matplotlib.dates import YearLocator, MonthLocator, DayLocator, DateFormatter
# new
from pandas import *

connection = MySQLdb.connect(host="localhost", db="agemono", user="root", passwd="password", charset="utf8")
cursor= connection.cursor()

weather=
avg_max_temp=
avg_sales=[]

cursor.execute("select avg_sales, weather, avg_max_temp from minatoa")
data = cursor.fetchall()
for row in data:
avg_sales.append(row[0])
weather.append(row[1])
avg_max_temp.append(row[2])

sales_weather_maxtemp = crosstab(weather, avg_max_temp)

# サンプルコードの場合: tips = read_csv('tips.csv')

# do cross examination
counts = crosstab(weather, avg_max_temp)
print( counts )

解決：

コード：

minatoa = read_csv('minatoA.csv')
# do cross examination
counts = crosstab(minatoa.weather, minatoa.avg_max_temp, aggfunc=[len, np.mean])
print( counts )

結果：

➜ matplotlib_test python cross_ex1.py

avg_max_temp 0 9 10 11 12 13 14 15 16 17 ... 23 24 25 26 27 \

weather ...

J 0 0 0 0 1 0 0 1 0 1 ... 1 0 1 0 0

J/ 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0

J/ 0 0 1 0 1 0 0 1 0 0 ... 1 0 1 0 1

/J 0 2 0 0 2 0 1 1 0 0 ... 1 0 1 1 0

/ 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0

/ 0 3 10 1 1 3 2 1 3 3 ... 2 3 3 2 3

0 13 22 12 9 3 7 6 4 2 ... 0 4 6 3 0

0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0

0 2 4 3 0 1 1 2 0 0 ... 2 1 0 4 3

/J 0 1 4 4 2 1 3 2 2 1 ... 0 3 6 1 5

/ 0 0 2 0 3 0 1 1 0 0 ... 1 1 1 1 1

/ 0 0 1 1 0 0 0 0 0 0 ... 0 0 0 0 0

ݒ 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0

avg_max_temp 28 29 30 31 32

weather

J 0 0 0 0 0

J/ 0 0 0 0 0

J/ 0 0 1 0 0

/J 0 1 2 0 0

/ 0 0 0 0 0

/ 2 3 9 3 1

2 0 7 1 6

0 0 0 0 0

2 0 4 1 1

/J 4 8 9 3 0

/ 0 0 6 2 0

/ 0 0 0 0 0

ݒ 0 0 0 0 0

[13 rows x 25 columns]

ーーーーーーーーー

やろうとしたこと(２)：

天気と気温の組み合わせごとに、売り上げの平均値を出す

使ったこと：

pandas.DataFrame.groupby — pandas 0.16.2 documentation

コード：

import MySQLdb
from pylab import *
import sys
import numpy as np
import matplotlib.pyplot as plt
from pandas import *

minatoa = read_csv('minatoA.csv')

m = minatoa.groupby(['weather', 'avg_max_temp'])['avg_sales'].mean()
print (m)

結果：

➜ matplotlib_test python cross_ex1.py

weather avg_max_temp

cloudy 9 209.500000

10 329.000000

11 309.666667

13 385.000000

14 394.000000

15 375.500000

18 570.000000

19 422.500000

20 393.500000

21 234.000000

22 246.000000

23 405.000000

24 288.000000

26 360.250000

27 362.333333

28 387.500000

30 327.750000

31 212.000000

32 350.000000

cloudy/rainy 9 327.000000

10 422.250000

11 320.500000

12 442.000000

13 446.000000

14 338.333333

15 392.000000

16 293.500000

17 578.000000

18 364.500000

19 466.000000

...

sunny/cloudy 17 335.000000

18 261.000000

19 339.833333

20 427.600000

21 377.333333

22 348.000000

23 456.500000

24 381.000000

25 312.333333

26 350.000000

27 361.666667

28 369.000000

29 313.333333

30 277.555556

31 326.000000

32 209.000000

sunny/rainy 9 324.000000

12 344.000000

14 365.000000

15 496.000000

18 303.000000

20 320.666667

21 298.000000

22 486.000000

23 397.000000

25 347.000000

26 295.000000

29 550.000000

30 239.500000

sunny/snowy 10 277.000000

Name: avg_sales, dtype: float64