Tuesday, April 26, 2022

Python - Handle CSV file and pandas (Day 25)

This is a 100 Days challenge to learn a new language (Python). 100 Days of Code - The Complete Python Pro Bootcamp 

I will post some notes to motivate myself to finish this challenge.


Read a CSV file



Using readlines of file object

Ex:
# Open file
with open("./weather_data.csv", encoding="utf-8") as csv_file:
    # Using readlines function of file object
    # to read all the lines of a file in a list
    data = csv_file.readlines()
    print(data)

Result:
 [
'day,temp,condition\n',
'Monday,12,Sunny\n',
'Tuesday,14,Rain\n',
'Wednesday,15,Rain\n',
'Thursday,14,Cloudy\n',
'Friday,21,Sunny\n',
'Saturday,22,Sunny\n',
'Sunday,24,Sunny'
]

According to the result, we need to spend lots of efforts to clean up the data (comma or newline character) before analyzing it.


Utilize csv module



The csv module implements classes to read and write tabular data in CSV format.

Ex:
import csv

# Open a file
with open("./weather_data.csv", encoding="utf-8") as csv_file:
    # csv.reader() will return a reader object
# which will iterate over lines
    data = csv.reader(csv_file)

    # Go through each line
    for row in data:
        print(row)

Result:
 ['day', 'temp', 'condition']
   ['Monday', '12', 'Sunny']
   ['Tuesday', '14', 'Rain']
   ['Wednesday', '15', 'Rain']
   ['Thursday', '14', 'Cloudy']
   ['Friday', '21', 'Sunny']
   ['Saturday', '22', 'Sunny']
   ['Sunday', '24', 'Sunny']

Now the result looks much better. Using csv module save lots of our time.

But if we want to get all temperatures from the previous result and save them into a list. Is there any better way to handle it?

Ex:
import csv

# Open a file
with open("./weather_data.csv", encoding="utf-8") as csv_file:
    # csv.reader() will return a reader object
# which will iterate over lines
    data = csv.reader(csv_file)

    # Define a list
    temperature = []

    # Go through each line
    for row in data:
        # Skip csv header row
        if row[1] != "temp":
            # 1 is a magic number
            temperature.append(row[1])

print(temperature)

Result:
 ['12', '14', '15', '14', '21', '22', '24']


pandas Library



The code above looks long and tedious.

There are some libraries can save our time, and pandas module is popular.

Ex:
import pandas

# Read a comma-separated values (csv) file into DataFrame.
data = pandas.read_csv("weather_data.csv")

# Read temp column
print(data["temp"])

Result:
  0    12
  1    14
  2    15
  3    14
  4    21
  5    22
  6    24
  Name: temp, dtype: int64

Now the code looks cleaner.

That is why python developers always use pandas module (or other library) to deal with csv file no matter how easy the tasks.


Exp 1 - Get the average temperature



Ex:
import pandas

# Read a comma-separated values (csv) file into DataFrame.
data = pandas.read_csv("weather_data.csv")

# Transform Series to List first
temp_list = data["temp"].to_list()
# Then using built-in function to finish the calculation
average = sum(temp_list) / len(temp_list)
print(average)

# Or Using Series.mean()
print(data["temp"].mean())


Exp 2 - Get rows with conditions



Ex:
import pandas

# Read a comma-separated values (csv) file into DataFrame.
data = pandas.read_csv("weather_data.csv")

# To select rows based on a conditional expression,
# use a condition inside the selection brackets [].
# Condition: data["day"] == "Monday"
monday_data = data[data["day"] == "Monday"]
print(monday_data)


Project - US State Game






* Load US Map image to Turtle Graphic
* Read all states and their relating x and y position from csv file
* Ask for user input (screen.textinput)
* Use title of screen.textinput to keep tracking the scores of this game
* Show correct guesses in specific position (which is read from csv)
* Use loop to allow users to keep guessing
* Track the correct guesses and output the missing states in csv when game is over

main.py
from turtle import Screen, Turtle
import pandas

# Constants
BG_IMAGE_PATH = "blank_states_img.gif"
INPUT_CSV_PATH = "50_states.csv"
OUTPUT_CSV_PATH = "states_to_learn.csv"

# Init Screen
screen = Screen()
screen.title("US State Game")
screen.bgpic(BG_IMAGE_PATH)

# Read state info from csv
state_data_frame = pandas.read_csv(INPUT_CSV_PATH)
states_list = state_data_frame["state"].to_list()
# Track the correct guesses
correct_guess_states_list = []

def write_state_to_screen(name, pos_x, pos_y):
    """Write state text in map with input position"""
    tim = Turtle()
    tim.penup()
    tim.color("black")
    tim.hideturtle()
    tim.setpos(pos_x, pos_y)
    tim.write(name)

def write_message_to_screen(text):
    """Show Message with input text"""
    tim = Turtle()
    tim.penup()
    tim.color("red")
    tim.hideturtle()
    tim.setpos(0, 0)
    tim.write(text, False, align="center", font=("Courier", 24, "normal"))

def get_user_input():
    """Generate a textinput and return the user_input to the caller"""
    textinput_title = "Guess the state"
    # Update textinput title if needed
    if len(correct_guess_states_list) > 0:
        textinput_title = f"{len(correct_guess_states_list)}/50 State
Correct"

    # Ask user input
    user_input = screen.textinput(
        title=textinput_title, prompt="What's another state's name?"
    )

    # If user clicked 'cancel'
    if user_input is None:
        return None

    # Uisng title() to get the title case of user input
    return user_input.title()

def write_the_missing_states_to_csv():
    """Write missing states to csv"""
    # Define a dictionary
    output_disc = {"state": []}

    for state in states_list:
        if state not in correct_guess_states_list:
            output_disc["state"].append(state)

    # Use a dictionary to initialize a DataFrame Object
    output_data_frame = pandas.DataFrame(output_disc)
    # Use 'to_csv' function of the DataFrame Object to output a csv file
    output_data_frame.to_csv(OUTPUT_CSV_PATH)


while len(correct_guess_states_list) < 50:
    # Get User Input
    answer = get_user_input()

    # Exit this game
    # 1. if users clicked cancel button of the input box
    # 2. if users typed 'exit'
    if answer is None or answer == "Exit":
        break

    if answer in states_list:
        # Select the state info from csv source
        answer_state_series = state_data_frame[
            state_data_frame["state"] == answer
        ].iloc[0]

        # Show the state
        write_state_to_screen(
            answer_state_series["state"],
            answer_state_series["x"],
            answer_state_series["y"],
        )

        # Append the correct guess to the tracking list
        correct_guess_states_list.append(answer)

# Determine the game result
if len(correct_guess_states_list) == 50:
    write_message_to_screen("You Win")
else:
    write_message_to_screen("You Lose")

    # Generate a csv file for the missing states for player to learn
    write_the_missing_states_to_csv()

screen.exitonclick()

No comments:

Post a Comment