Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
245 views
in Technique[技术] by (71.8m points)

How to plot a histogram using Matplotlib in Python with a list of data?

I am trying to plot a histogram using the matplotlib.hist() function but I am not sure how to do it.

I have a list

probability = [0.3602150537634409, 0.42028985507246375, 
  0.373117033603708, 0.36813186813186816, 0.32517482517482516, 
  0.4175257731958763, 0.41025641025641024, 0.39408866995073893, 
  0.4143222506393862, 0.34, 0.391025641025641, 0.3130841121495327, 
  0.35398230088495575]

and a list of names(strings).

How do I make the probability as my y-value of each bar and names as x-values?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you want a histogram, you don't need to attach any 'names' to x-values, as on x-axis you would have data bins:

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

np.random.seed(42)
x = np.random.normal(size=1000)

plt.hist(x, density=True, bins=30)  # density=False would make counts
plt.ylabel('Probability')
plt.xlabel('Data');

enter image description here

Note, the number of bins=30 was chosen arbitrarily, and there is Freedman–Diaconis rule to be more scientific in choosing the "right" bin width:

![enter image description here , where IQR is Interquartile range and n is total number of datapoints to plot

So, according to this rule one may calculate number of bins as:

q25, q75 = np.percentile(x, [0.25, 0.75])
bin_width = 2 * (q75 - q25) * len(x) ** (-1/3)
bins = round((x.max() - x.min()) / bin_width)
print("Freedman–Diaconis number of bins:", bins)
plt.hist(x, bins=bins);

Freedman–Diaconis number of bins: 82

enter image description here

And finally you can make your histogram a bit fancier with PDF line, titles, and legend:

import scipy.stats as st

plt.hist(x, density=True, bins=82, label="Data")
mn, mx = plt.xlim()
plt.xlim(mn, mx)
kde_xs = np.linspace(mn, mx, 300)
kde = st.gaussian_kde(x)
plt.plot(kde_xs, kde.pdf(kde_xs), label="PDF")
plt.legend(loc="upper left")
plt.ylabel("Probability")
plt.xlabel("Data")
plt.title("Histogram");

enter image description here

If you're willing to explore other opportunities, there is a shortcut with seaborn:

# !pip install seaborn
import seaborn as sns
sns.displot(x, bins=82, kde=True);

enter image description here

Now back to the OP.

If you have limited number of data points, a bar plot would make more sense to represent your data. Then you may attach labels to x-axis:

x = np.arange(3)
plt.bar(x, height=[1,2,3])
plt.xticks(x, ['a','b','c']);

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...