[pandas] 데이터 출력 시 중복 값 제거하고 출력하기

[pandas] 데이터 출력 시 중복 값 제거하고 출력하기

2020. 2. 5. 03:50ㆍBIG DATA/Big Data

데이터 값을 다양하게 가공해서 테스트해보고 싶을 때 dataframe을 출력하면 순서대로 처음/끝에서 임의의 데이터 중 일부만 보여준다. 그러면 생각한 규칙을 테스트하기 불편하다. 그래서 여러 경우의 값을 비교해보고 싶어서 중복을 제거해서 다양한 경우의 데이터를 보고 싶어졌다.

기준이 되는 컬럼명으로 중복 제거한 데이터를 보고 싶을 때 아래처럼 사용할 수 있다.

데이터.drop_duplicates('컬럼명', keep='first')

Signature: train.drop_duplicates(subset=None, keep='first', inplace=False) Docstring: Return DataFrame with duplicate rows removed, optionally only considering certain columns. Indexes, including time indexes are ignored. Parameters ---------- subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns keep : {'first', 'last', False}, default 'first' - ``first`` : Drop duplicates except for the first occurrence. - ``last`` : Drop duplicates except for the last occurrence. - False : Drop all duplicates. inplace : boolean, default False Whether to drop duplicates in place or to return a copy Returns ------- DataFrame File: /opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py Type: method

Parameters

subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by
default use all of the columns
nkeep : {'first', 'last', False}, default 'first'
- first : Drop duplicates except for the first occurrence.
- last : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

Returns

DataFrame
File: /opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py
Type: method

저작자표시 비영리

'BIG DATA > Big Data' 카테고리의 다른 글

빅데이터 - 데이터 분석 유형 (0)	2021.11.13
빅데이터 - 데이터 분석 (0)	2021.11.13
[seaborn] 그래프 설정 (0)	2020.02.05
R vs Python (0)	2020.01.19

Windy Miky의 이런저런

Windy Miky의 이런저런

Parameters

Returns

'BIG DATA > Big Data' 카테고리의 다른 글

관련글

티스토리툴바