Data parsing is defined as an automatic collection and processing of information, which is often used in the case of CSV files. Here parsing means slicing CSV files into rows, columns, and values. In doing so, the data can be analyzed, filtered, and extracted for further work effortlessly. In this article we will explain how to use Python for reading csv files. Additionally, we will show how to parse data from a CSV file in Python.
CSV, or (Comma Separated Values), is a file format that saves data in a way that has values separated by commas and new line shifts. Because of this, CSV format can be used in a variety of contexts, such as creating or modifying data in Excel.
One main strength of CSV files is the ease of accessing and sharing information. Its uniqueness permits the file to be opened and processed regardless of the software being used. This makes it convenient to export such data in the form of a spreadsheet or a database.
Now, let us show how to open and read CSV in Python in the following block.
Python has a built-in CSV library which is able to read and write data with ease. Installing external libraries is not necessary which makes analyzing content and opening files such an easy task.
The following segments of code show how to open and print a СSV file called university_records in Python. It uses read mode to open the file, and then it reads the CSV file, finally, it prints the data with a for loop.
import csv
with open('university_records.csv', 'r') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
print(row)
For this purpose, we will employ the CSV module to write data. There are useful methods to assist you in writing information in the CSV module:
The methods of the module are comprehensively illustrated in the code below:
import csv
row = ['David', 'MCE', '3', '7.8']
row1 = ['Monika', 'PIE', '3', '9.1']
row2 = ['Raymond', 'ECE', '2', '8.5']
with open('university_records.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(row)
writer.writerow(row1)
writer.writerow(row2)
Using python to parse CSV files is crucial nowadays: from spreadsheets for finance to colossal databases for machine learning. Sometimes working with those files is a pain, especially when you need more features than what Python provides out of the box. In such cases, the Pandas library can come in handy.
Full capability of writing data with DataFrame is demonstrated below. DataFrame is one of the main data structures in the Pandas library and is used for working with tabular data.
import pandas as pd
data = {"Name": ["David", "Monika", "Raymond"],
"Age": [30, 25, 40],
"City": ["Kyiv", "Lviv", "Odesa"]
}
df = pd.DataFrame(data)
file_path = "data.csv"
df.to_csv(file_path, index=False, encoding="utf-8")
For Python the Pandas library is considered one of the most effective ones to parse CSV and here are the reasons why it is so powerful and convenient:
These features show that the library is best for quickly analyzing CSV files as other tools are limited in comparison. At the same time, it is able to process large quantities of data making it extremely useful in the world of information.
Before you are able to use the CSV document, the first step is uploading it.
import pandas as pd
df = pd.read_csv("data.csv")
When dealing with extensive datasets, Pandas tools are appropriate for use. Let’s explore how a Python script can parse a CSV file.
df.head() # Shows the first 5 rows
df.tail(10) # Shows the last 10 rows
df.info() # Outputs a list of columns, data types, and the number of filled values
For selecting one or multiple columns, execute:
df["Name"] # Get the column "Name"
df[["Name", "Age"]] # Extract only "Name" and "Age"
Now let’s look at how to insert, modify, and remove particular rows.
Inserting a new row:
# Load the CSV file
df = pd.read_csv(file_path)
# Add a new row
new_row = pd.DataFrame([{"Name": "Denys", "Age": 35, "City": "Kharkiv"}]) df = pd.concat([df, new_row], ignore_index=True)
# Save
df.to_csv(file_path, index=False, encoding="utf-8")
Modifying a particular row:
df = pd.read_csv(file_path)
# Change the age of Ivan
df.loc[df["Name"] == "Ivan", "Age"] = 26
df.to_csv(file_path, index=False, encoding="utf-8")
Removing a row:
df = pd.read_csv(file_path)
# Remove the row where Name == "Mykhailo"
df = df[df["Name"] != "Mykhailo"]
df.to_csv(file_path, index=False, encoding="utf-8")
To sum up, in this article we showed how to open and read a CSV file in Python. But whenever a user requires greater accuracy and powerful interpreting tools, Pandas works perfectly. Automating repetitive processes, allowing for the handling of massive files and saving time, this library is very effective. Hence, it can be concluded that for basic functions, the standard CSV library provides the requirements, while Pandas is made to deal with extensive information data.
Comments: 0