'python/ML' 카테고리의 글 목록 (2 Page)

[ML_7] Classification by using DecisionTreeClassifier(+)

#%%from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score, roc_curve,roc_auc_score, f1_score, precision_recall_curve from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import LabelEncoder import numpy as np import pandas as pd import matplotlib.pyplot as plt feature_name_df = pd.read_csv("data/har_dataset/features.txt..

2025.07.18

[ML_code] visualize_boundary(model, X, y)

import numpy as np# Classifier의 Decision Boundary를 시각화 하는 함수def visualize_boundary(model, X, y): fig,ax = plt.subplots() # 학습 데이타 scatter plot으로 나타내기 ax.scatter(X[:, 0], X[:, 1], c=y, s=25, cmap='rainbow', edgecolor='k', clim=(y.min(), y.max()), zorder=3) ax.axis('tight') ax.axis('off') xlim_start , xlim_end = ax.get_xlim() ylim_start , ylim_end = ax.get_yl..

2025.07.17

[ML_6] Prediction of pima diabetes using Scikitlearn

Procedures are as follows: Introduction of confusion matrix, precision, recall, f1 score and roc_aucData PreprocessingData Splitting (Train/Test)Model Training and PredictionEvaluation (We will focus specifically on evaluation metrics.) 1. Introduction to Confusion Matrix, Precision, Recall, F1 Score, and ROC AUC→ Confusion MatrixThis is a matrix composed of four quadrants: False Negative (FN),..

2025.07.17

[ML_5] Prediction of Titanic survival by Scikitlearn

Today, we are going to build a model using scikit-learn.Our workflow is as follows:Data Preprocessing :In this step, we will preprocess our dataset. For example, we will handle missing values, perform feature selection, and apply label encoding.Splitting the Data : Next, we will divide the data into training and test sets.Model Training : Here, we will train the model using the training data. We..

2025.07.15

[ML_4] Data preprocessing

Data processing은 ML에서 algorithm만큼 중요하다고 한다. 그 이유는 결국 model이 학습하는 양분 자체가 data이기 때문이다. 이번 글에서는 Data를 어떻게 가공할 것이냐에 대해서 알아보고자 한다. 1. Data encoding 1.1 Label Encoding은 문자열(string) 또는 범주형(categorical) 데이터를 숫자형 category 값으로 변환하는 것을 말한다.가령, “냉장고”, “TV”, “에어컨”과 같은 제품명이 있다고 가정하자. 이러한 값들은 머신러닝 모델에 바로 사용할 수 없기 때문에, 숫자 형태로 변경해 주어야 한다. 이때 Label Encoding 과정을 이용할 수 있다. scikit-learn에는 LabelEncoder라는 클래스가 제공되어 ..

2025.07.13

[ML_3] cross_validation

1. Cross validation이란 말 그대로 교차 검증을 뜻한다. 기존에는 dataset을 train과 test로 한 번만 나누고, train 데이터로 학습한 뒤 test 데이터로 평가를 진행했다. 그러나 이 방법은 데이터 분할에 따라 성능 평가가 달라질 수 있고, 모델이 학습 데이터에 과도하게 적합되어(overfitting) 새로운 데이터에 대한 성능이 떨어질 위험이 있다. 이러한 문제를 완화하기 위해, 데이터를 여러 번 분할해 반복적으로 학습과 평가를 수행하는 방식이 필요하게 되었고, 이를 Cross validation이라 부른다. 교차 검증은 전체 데이터를 여러 fold로 나누어 매번 다른 부분을 검증 데이터로 사용하고, 나머지를 학습 데이터로 활용해 평가를 반복하는 방식이다. 이렇게 하면 데..

2025.07.12

rudgh99

rudgh99

태그

최근글

댓글

공지사항

아카이브

python/ML(14)

티스토리툴바