[ML_2] Let's convert pd.Dataframe to train_test dataset.

2025. 7. 10. 06:49python/ML

What if our dataset is a pd.DataFrame? In this case, we convert the DataFrame into a training dataset using the following code.

from sklearn.datasets import load_iris 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.metrics import accuracy_score 
from sklearn.model_selection import train_test_split 
import pandas as pd 
iris_df = pd.DataFrame(iris_data.data, columns = iris_data.feature_names) 
iris_df['target'] = iris_data.target 

iris_df_data = iris_df.iloc[:,:-1] 
iris_df_label = iris_df.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(iris_df_data, iris_df_label, test_size = 0.2 , random_state=121) 

iris_model = DecisionTreeClassifier() 
iris_model.fit(X_train, y_train) 
y_pred = iris_model.predict(X_test) 
accuracy_score(y_test, y_pred)

 

First of all, we import the necessary modules and create a DataFrame. We then add the target column to the DataFrame. Next, we index the DataFrame using the iloc method: iloc[:, :-1] and iloc[:, -1]. This means we are separating the features and labels. Finally, we split the dataset into training and test sets, fit the model, make predictions, and evaluate the results.