{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CS 1656 – Introduction to Data Science (Spring 2017) \n", "\n", "## Instructor: Alexandros Labrinidis / Teaching Assistant: Anatoli Shein\n", "\n", "### Additional Credits: Zuha Agha\n", "---\n", "In this recitation you will be learning pandas dataframe basics and plotting in Python. Packages you will need are,\n", "* pandas\n", "* matplotlib\n", "\n", "First step is to import the packages above. If import fails, it means that the package is not installed. " ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import datetime" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the sake of interactive display in Jupyter, we will enable matplotlib inline." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataframe Basics\n", "\n", "DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table. DataFrame accepts many different kinds of input:\n", "*Dict of 1D ndarrays, lists, dicts, or Series\n", "*2-D numpy.ndarray\n", "*Structured or record ndarray\n", "*A Series\n", "*Another DataFrame\n", "\n", "Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments.\n", "\n", "Now what is a Series?\n", "Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). You can think of it as a 1-dimensional dataframe. Series objects can also have index. \n", "### Creating a Dataframe\n", "We will start off by creating a dataframe from Weather Undergraound Data retreived from the url below." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df = pd.read_csv('https://www.wunderground.com/history/airport/KPIT/2016/1/1/MonthlyHistory.html?format=1',sep='\\\\s*,\\\\s*', engine='python',parse_dates=['EST'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To display the top 'n' rows of the dataframe, use the head() command below. The default is 5 rows." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | EST | \n", "Max TemperatureF | \n", "Mean TemperatureF | \n", "Min TemperatureF | \n", "Max Dew PointF | \n", "MeanDew PointF | \n", "Min DewpointF | \n", "Max Humidity | \n", "Mean Humidity | \n", "Min Humidity | \n", "... | \n", "Max VisibilityMiles | \n", "Mean VisibilityMiles | \n", "Min VisibilityMiles | \n", "Max Wind SpeedMPH | \n", "Mean Wind SpeedMPH | \n", "Max Gust SpeedMPH | \n", "PrecipitationIn | \n", "CloudCover | \n", "Events | \n", "WindDirDegrees<br /> | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2016-01-01 | \n", "35 | \n", "32 | \n", "29 | \n", "22 | \n", "19 | \n", "16 | \n", "75 | \n", "61 | \n", "47 | \n", "... | \n", "10 | \n", "10 | \n", "10 | \n", "23 | \n", "13 | \n", "30 | \n", "T | \n", "7 | \n", "Snow | \n", "260<br /> | \n", "
1 | \n", "2016-01-02 | \n", "36 | \n", "31 | \n", "26 | \n", "22 | \n", "20 | \n", "18 | \n", "75 | \n", "66 | \n", "56 | \n", "... | \n", "10 | \n", "10 | \n", "10 | \n", "17 | \n", "8 | \n", "22 | \n", "0.00 | \n", "6 | \n", "NaN | \n", "241<br /> | \n", "
2 | \n", "2016-01-03 | \n", "34 | \n", "31 | \n", "28 | \n", "26 | \n", "23 | \n", "19 | \n", "85 | \n", "73 | \n", "61 | \n", "... | \n", "10 | \n", "9 | \n", "2 | \n", "17 | \n", "11 | \n", "25 | \n", "0.01 | \n", "7 | \n", "Snow | \n", "271<br /> | \n", "
3 | \n", "2016-01-04 | \n", "28 | \n", "20 | \n", "12 | \n", "21 | \n", "12 | \n", "3 | \n", "81 | \n", "63 | \n", "45 | \n", "... | \n", "10 | \n", "9 | \n", "1 | \n", "26 | \n", "13 | \n", "31 | \n", "0.03 | \n", "6 | \n", "Snow | \n", "341<br /> | \n", "
4 | \n", "2016-01-05 | \n", "27 | \n", "17 | \n", "7 | \n", "7 | \n", "4 | \n", "1 | \n", "80 | \n", "59 | \n", "37 | \n", "... | \n", "10 | \n", "10 | \n", "10 | \n", "9 | \n", "3 | \n", "12 | \n", "0.00 | \n", "0 | \n", "NaN | \n", "116<br /> | \n", "
5 rows × 23 columns
\n", "\n", " | EST | \n", "Mean TemperatureF | \n", "
---|---|---|
0 | \n", "2016-01-01 | \n", "32 | \n", "
1 | \n", "2016-01-02 | \n", "31 | \n", "
2 | \n", "2016-01-03 | \n", "31 | \n", "
3 | \n", "2016-01-04 | \n", "20 | \n", "
4 | \n", "2016-01-05 | \n", "17 | \n", "