Thursday, September 27, 2018

Using Excel with pandas

it's hard to find an organization that doesn't work with it in some way. From analysts, to sales VPs, to CEOs, various professionals use Excel for both quick stats and serious data crunching.
With Excel being so pervasive, data professionals must be familiar with it. You'll also want a tool that can easily read and write Excel files — pandas is perfect for this.
Pandas has excellent methods for reading all kinds of data from Excel files. You can also export your results from pandas back to Excel, if that's preferred by your intended audience. Pandas is great for other routine data analysis tasks, such as:
  • quick Exploratory Data Analysis (EDA)
  • drawing attractive plots
  • feeding data into machine learning tools like scikit-learn
  • building machine learning models on your data
  • taking cleaned and processed data to any number of data tools
Pandas is better at automating data processing tasks than Excel, including processing Excel files.

 

System prerequisites

We will use Python 3 and Jupyter Notebook to demonstrate the code in this tutorial.
In addition to Python and Jupyter Notebook, you will need the following Python modules:
  • matplotlib - data visualization
  • NumPy - numerical data functionality
  • OpenPyXL - read/write Excel 2010 xlsx/xlsm files
  • pandas - data import, clean-up, exploration, and analysis
  • xlrd - read Excel data
  • xlwt - write to Excel
  • XlsxWriter - write to Excel (xlsx) files
    


The data set

In this tutorial, we will use a multi-sheet Excel file we created from Kaggle's IMDB Scores data





Read data from the Excel file

We need to first import the data from the Excel file into pandas. To do that, we start by importing the pandas module.

"We need to first import the data from the Excel file into pandas. To do that, we start by importing the pandas module."

import pandas as pd

excel method to read in data from the Excel file. The easiest way to call this method is to pass the file name. If no sheet name is specified then it will read the first sheet

excel_file = 'movies.xls'
movies = pd.read_excel(excel_file)

method read the data from the Excel file into a pandas DataFrame object. Pandas defaults to storing data in DataFrames. We then stored this DataFrame into a variable called movies.
Pandas has a built-in DataFrame.head() method that we can use to easily display the first few rows of our DataFrame. If no argument is passed, it will display first five rows. If a number is passed, it will display the equal number of rows from the top.


movies.head()

Output

cast TotalFacebook likes - MovieFacenumber in postersUser VotesReviews by UsersReviews by CrtiicsIMDB Score
0Intolerance: Love's Struggle Throughout the Ages1916Drama|History|WarNaNUSANot Rated1231.33385907.0NaN...436229.04816911107188869.08.0
1Over the Hill to the Poorhouse1920Crime|DramaNaNUSANaN1101.33100000.03000000.0...220.0401511.04.8
2The Big Parade1925Drama|Romance|WarNaNUSANot Rated1511.33245000.0NaN...81126.0108226048494548.08.3
3Metropolis1927Drama|Sci-FiGermanGermanyNot Rated1451.336000000.026435.0...1362318.0203120001111841413260.08.3
4Pandora's Box1929Crime|Drama|RomanceGermanGermanyNot Rated1101.33NaN9950.0...426203.0455926174318471.08.0


















 

No comments:

Post a Comment

JOB in 2019

Automated - Digital One o One Meetings- An Idea

                                            Automated - Digital One o One Discussion                                                       ...