For some datasets you might encounter, the headers may be completely missing, partially missing, or they might exist, but you may want to rename them. Headers refer to the column names of your dataset. Let's use some of the function's customizable options, particularly for the way it deals with headers, incorrect data types, and missing data.
#Razorsql import csv code#
( Note: the environment for every DataCamp session is temporary, so the working directory you saw in the previous section may not be identical to the one you see in the code chunk above.)Ĭontinue on and see how else pandas makes importing CSV files easier.
#Razorsql import csv full#
The read_csv() function is smart enough to decipher whether it's working with full or relative file paths and convert your flat file as a DataFrame without a problem. You can use the full file path which is prefixed by a / and includes the working directory in the specification, or use the relative file path which doesn't. Print(pd.DataFrame.equals(cereal_df, cereal_df2))Īs you can see in the code chunk above, the file path is the main argument to read_csv() and it was specified in two ways. You're now ready to import the CSV file into Python using read_csv() from pandas: import pandas as pdĬereal_df = pd.read_csv("/tmp/tmp07wuam09/data/cereal.csv")Ĭereal_df2 = pd.read_csv("data/cereal.csv") Now that you know what your current working directory is and where the dataset is in your filesystem, you can specify the file path to it. Enter the magic commands one-by-one in the IPython Shell, and see if you can locate the dataset!ĮyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiISBta2RpciBmb2xkZXIxXG4hIG1rZGlyIGZvbGRlcjJcbiEgbWtkaXIgZGF0YVxuaW1wb3J0IHBhbmRhcyBhcyBwZFxuY2VyZWFsID0gcGQucmVhZF9jc3YoXCJodHRwczovL2Fzc2V0cy5kYXRhY2FtcC5jb20vcHJvZHVjdGlvbi9yZXBvc2l0b3JpZXMvMjg3My9kYXRhc2V0cy85MDU5NDM0NjU4Y2VhZDdjZTE1OGZkN2U5OGEwZDkyOTQ4YWE5ODUzL2NlcmVhbC5jc3ZcIilcbmNlcmVhbC50b19jc3YoJ2NlcmVhbC5jc3YnKVxuISBtdiBjZXJlYWwuY3N2IGRhdGEiLCJzYW1wbGUiOiIjIExpc3QgY29udGVudHMgaW4gdGhlIGN1cnJlbnQgd29ya2luZyBkaXJlY3RvcnlcbiEgbHNcblxuIyBOYXZpZ2F0ZSBpbnRvIHRoZSBgZGF0YWAgc3ViLWRpcmVjdG9yeVxuJWNkIF9fX1xuXG4jIExpc3QgY29udGVudHMgb2YgYGRhdGFgXG4hIGxzXG5cbiMgUHJpbnQgbmV3IHdvcmtpbmcgZGlyZWN0b3J5XG4hIHB3ZCJ9ĭid you find it in the data directory? Excellent work! Loading your data
In your filesystem, there's a file called cereal.csv that contains nutrition data on 80 cereals. The working directory is also printed after changing into it in IPython, which isn't the case in the command line.
Often, you'll work with data in Comma Separated Value (CSV) files and run into problems at the very start of your workflow. The first step to any data science project is to import your data. Pandas Tutorial: Importing Data with read_csv()