正确理解TensorFlow中的logits

【问题】I was going through the tensorflow API docs here. In the tensorflow documentation, they used a keyword called logits. What is it? In a lot of methods in the API docs it is written like
我正想通过tensorflow API文档在这里。在tensorflow文档中，他们使用了一个叫做关键字logits。它是什么？API文档中的很多方法都是这样写的

tf.nn.softmax(logits, name=None)

If what is written is those logits are only Tensors, why keeping a different name like logits?

Another thing is that there are two methods I could not differentiate. They were
如果写的是logits只有这些Tensors，为什么要保留一个不同的名字logits？

另一件事是有两种方法我不能区分。他们是

tf.nn.softmax(logits, name=None)
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

What are the differences between them? The docs are not clear to me. I know what tf.nn.softmaxdoes. But not the other. An example will be really helpful.
他们之间有什么不同？文档对我不明确。我知道是什么tf.nn.softmax。但不是其他。一个例子会非常有用。
Short version:

Suppose you have two tensors, where y_hat contains computed scores for each class (for example, from y = W*x +b) and y_true contains one-hot encoded true labels.
假设您有两个张量，其中y_hat包含每个类的计算得分（例如，从y = W * x + b），并y_true包含一个热点编码的真实标签。

y_hat  = ... # Predicted label, e.g. y = tf.matmul(X, W) + b
y_true = ... # True label, one-hot encoded

If you interpret the scores in y_hat as unnormalized log probabilities, then they are logits.

Additionally, the total cross-entropy loss computed in this manner:
如果您将分数解释为y_hat非标准化的日志概率，那么它们就是logits。

另外，以这种方式计算的总交叉熵损失：

y_hat_softmax = tf.nn.softmax(y_hat)
total_loss = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), [1]))

本质上等价于用函数计算的总交叉熵损失softmax_cross_entropy_with_logits()：
is essentially equivalent to the total cross-entropy loss computed with the function softmax_cross_entropy_with_logits():

total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

Long version:

In the output layer of your neural network, you will probably compute an array that contains the class scores for each of your training instances, such as from a computation y_hat = W*x + b. To serve as an example, below I've created a y_hat as a 2 x 3 array, where the rows correspond to the training instances and the columns correspond to classes. So here there are 2 training instances and 3 classes.
在神经网络的输出层中，您可能会计算一个数组，其中包含每个训练实例的类分数，例如来自计算y_hat = W*x + b。作为一个例子，下面我创建了y_hat一个2×3数组，其中行对应于训练实例，列对应于类。所以这里有2个训练实例和3个类别。

import tensorflow as tf
import numpy as np

sess = tf.Session()

# Create example y_hat.
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
sess.run(y_hat)
# array([[ 0.5,  1.5,  0.1],
#        [ 2.2,  1.3,  1.7]])

Note that the values are not normalized (i.e. the rows don't add up to 1). In order to normalize them, we can apply the softmax function, which interprets the input as unnormalized log probabilities (aka logits) and outputs normalized linear probabilities.
请注意，这些值没有标准化（即每一行的和不等于1）。为了对它们进行归一化，我们可以应用softmax函数，它将输入解释为非归一化对数概率（又名logits）并输出归一化的线性概率。

y_hat_softmax = tf.nn.softmax(y_hat)
sess.run(y_hat_softmax)
# array([[ 0.227863  ,  0.61939586,  0.15274114],
#        [ 0.49674623,  0.20196195,  0.30129182]])

It's important to fully understand what the softmax output is saying. Below I've shown a table that more clearly represents the output above. It can be seen that, for example, the probability of training instance 1 being "Class 2" is 0.619. The class probabilities for each training instance are normalized, so the sum of each row is 1.0.
充分理解softmax输出的含义非常重要。下面我列出了一张更清楚地表示上面输出的表格。可以看出，例如，训练实例1为“2类”的概率为0.619。每个训练实例的类概率被归一化，所以每行的总和为1.0。

                      Pr(Class 1)  Pr(Class 2)  Pr(Class 3)
                    ,--------------------------------------
Training instance 1 | 0.227863   | 0.61939586 | 0.15274114
Training instance 2 | 0.49674623 | 0.20196195 | 0.30129182

So now we have class probabilities for each training instance, where we can take the argmax() of each row to generate a final classification. From above, we may generate that training instance 1 belongs to "Class 2" and training instance 2 belongs to "Class 1".

Are these classifications correct? We need to measure against the true labels from the training set. You will need a one-hot encoded y_true array, where again the rows are training instances and columns are classes. Below I've created an example y_true one-hot array where the true label for training instance 1 is "Class 2" and the true label for training instance 2 is "Class 3".
所以现在我们有每个训练实例的类概率，我们可以在每个行的argmax（）中生成最终的分类。从上面，我们可以生成训练实例1属于“2类”，训练实例2属于“1类”。

这些分类是否正确？我们需要根据训练集中的真实标签进行测量。您将需要一个热点编码y_true数组，其中行又是训练实例，列是类。下面我创建了一个示例y_trueone-hot数组，其中训练实例1的真实标签为“Class 2”，训练实例2的真实标签为“Class 3”。

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
sess.run(y_true)
# array([[ 0.,  1.,  0.],
#        [ 0.,  0.,  1.]])

Is the probability distribution in y_hat_softmax close to the probability distribution in y_true? We can use cross-entropy loss to measure the error.
概率分布是否y_hat_softmax接近概率分布y_true？我们可以使用交叉熵损失来衡量错误。

Formula for cross-entropy loss

We can compute the cross-entropy loss on a row-wise basis and see the results. Below we can see that training instance 1 has a loss of 0.479, while training instance 2 has a higher loss of 1.200. This result makes sense because in our example above, y_hat_softmax showed that training instance 1's highest probability was for "Class 2", which matches training instance 1 in y_true; however, the prediction for training instance 2 showed a highest probability for "Class 1", which does not match the true class "Class 3".
我们可以逐行计算交叉熵损失并查看结果。下面我们可以看到，训练实例1损失了0.479，而训练实例2损失了1.200。这个结果是有道理的，因为在我们上面的例子中y_hat_softmax，训练实例1的最高概率是“类2”，它与训练实例1匹配y_true; 然而，训练实例2的预测显示“1类”的最高概率，其与真实类“3类”不匹配。

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
sess.run(loss_per_instance_1)
# array([ 0.4790107 ,  1.19967598])

What we really want is the total loss over all the training instances. So we can compute:
我们真正想要的是所有培训实例的全部损失。所以我们可以计算：

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
sess.run(total_loss_1)
# 0.83934333897877944

Using softmax_cross_entropy_with_logits()

We can instead compute the total cross entropy loss using the tf.nn.softmax_cross_entropy_with_logits() function, as shown below.
使用softmax_cross_entropy_with_logits（）

我们可以用tf.nn.softmax_cross_entropy_with_logits()函数来计算总的交叉熵损失，如下所示。

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
sess.run(loss_per_instance_2)
# array([ 0.4790107 ,  1.19967598])

total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
sess.run(total_loss_2)
# 0.83934333897877922

Note that total_loss_1 and total_loss_2 produce essentially equivalent results with some small differences in the very final digits. However, you might as well use the second approach: it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax_cross_entropy_with_logits().
请注意，total_loss_1并total_loss_2产生基本相同的结果，在最后一位数字中有一些小的差异。但是，你可以使用第二种方法：它只需要少一行代码，并累积更少的数字错误，因为softmax是在你内部完成的softmax_cross_entropy_with_logits()。

form Stack Overflow[https://stackoverflow.com/questions/34240703/what-is-logits-softmax-and-softmax-cross-entropy-with-logits?noredirect=1&lq=1]

正确理解TensorFlow中的logits

推荐阅读更多精彩内容