Classification datasets
File |
Description |
Source link (with details) |
Preprocessing applied |
Label column |
---|---|---|---|---|
|
Automatically-generated dataset containing data samples separated into very well-delineated categories. This can be considered a “best-case scenario” test case. |
|
||
|
Defaults on credit card payments |
Minor (column name reformatting) |
|
|
|
Quality ratings of Portuguese white wines |
Added binarized label column |
|
|
|
Recognizing vehicle type from its silhouette |
None |
|
|
|
EEG eye state measurements |
Dropped a few outlier rows |
|
|
|
Kick stater project state |
Dropped unnamed columns; Minor column name reformatting; Calculated duration of the project and dropped start and end dates; Dropped some rows with wrong input type; Dropped main category column and kept category column; randomply sampled 30% of the data; Filled NA with 0 for numeric values |
|
|
|
Classification mushrooms edibility based on physical features |
Renamed the column |
|
|
|
Surgical cases related to complication |
None |
|
|
|
use hobbies to guess gender |
None |
|
These can all be loaded using Pandas:
import pandas as pd
dataset = pd.read_csv("file.csv")