从 pandas DataFrame 列标题获取列表

我想从 pandas DataFrame 获取列标题的列表。 DataFrame 来自用户输入,所以我不知道会有多少列或它们将被称为什么。

例如,如果给我这样的 DataFrame:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

我想要一个这样的列表:

>>> header_list
['y', 'gdp', 'cap']

答案

list(my_dataframe.columns.values)
list(my_dataframe)
my_dataframe.columns.values.tolist()
my_dataframe.columns.tolist()
%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
list(df)
In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop
df.columns.tolist()
>>> list(my_dataframe)
['y', 'gdp', 'cap']
>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']
>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

可以通过my_dataframe.columns

df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x
[*df]
# ['A', 'B', 'C']
{*df}
# {'A', 'B', 'C'}
*df,  # Please note the trailing comma
# ('A', 'B', 'C')
*cols, = df  # A wild comma appears, again
cols
# ['A', 'B', 'C']
print(*df)
A B C

print(*df, sep='\n')
A
B
C
In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop
my_dataframe.keys()
my_dataframe.keys().to_list()
list(my_dataframe.keys())
[column for column in my_dataframe]
xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)
sorted(df)
df.columns