想请教各位大侠 KL divergence的问题,wiki上说 KL measures the expected number of extra bits required to
code samples from
P when using a code based on
Q, rather than using a code based on
P. Typically
P represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution. The measure
Q typically represents a theory, model, description, or approximation of
P.
公式是

通常P是unknown的,Q是一个对P的approximation, 也就是说Q的概率是可求出来的,但是对于unknown的P,如何求出相对熵呢?也就是说这里p(x)是怎么得到的呢?
谢谢指导