aac编码器的选择和使用

之前一直使用ffmpeg来压制aac格式的音频。因为ffmpeg这个命令行工具，使用起来十分方便，所以音视频都是用它来压制。然而实际上，ffmpeg内置的编码器编码质量不一定是最好的。个人使用的情况下，音频我们比较关心的的是，经过有损压缩后音质损失多不多，这一点不同的编码器由于算法实现不一样还是有很大的差异的。

01.编码格式的选择opus，aac

其实在讨论编码器选择之前，还有一个问题，就是音频编码的选择。除了aac之外，还有一些新出的音频编码格式，比如说opus，从wiki百科的描述上来看，曾经在听力测试上吊打其他一众格式。

Opus replaces both Vorbis and Speex for new applications, and several blind listening tests have ranked it higher-quality than any other standard audio format at any given bitrate until transparency is reached, including MP3, AAC, and HE-AAC.[[6]](https://en.wikipedia.org/wiki/Opus_(audio_format)#cite_note-testsummary-6)[7]

还有这样一张测试结果图。

搜狗截图20200210182218.png

我这里主要压制的音源是一些同人音声，opus和aac最大的区别在于opus会过滤掉高频信息，它将20khz以上高频完全抛弃（aac会保留20-khz-22khz的信息），因为一般20khz以上被认为是人类听力无法感受到的。还有opus最高只有48khz，而aac支持更高频率。

因为，从需求上考虑，我的需求是用无损格式代替有损格式，（而不是一种听的最舒服的有损格式，而且我耳朵没那么好，估计也分不清aac和opus），所以和无损格式信息越接近越好。高频的信息即使人耳朵听不到，但是有和没完全质变，以后有可能通过软件来分析高频声音的来源。

在删除原始音频的情况下，显然，aac格式还原后会更接近原始音频，没有高频和有高频就是质的差距。

但是从人听的角度上未必是这样，我们对高频不敏感，所以更多码率留给我们最敏感的区域，我们听的效果可能更好一些。

实际上有这样的例子，我平时关注的一位搞asmr生放送的主播macoto，她和她的观众有这样的结论，niconico asmr直播的音质要好于youtube（我没有实际试过，都是看的别人录制的视频，因为niconico在国内的使用体验太糟糕了，本身不好用，加上网络差），niconico和youtube直播的音频流的区别就在于，nico应该是aac的编码，youtube是opus编码，而opus编码对高频进行了过滤。所以我猜想这就是原因。

虽然我们平时听的音乐和语音这些没有高频，但是asmr中是否有一些高频在起作用呢？

还有我使用的码率是320k，已经是足够高的码率，所以无论aac还是opus应该都分不出和原音频的差别了，但是opus没有高频，所以还是选择aac。

还有一点，也是最终要的原因aac的应用比opus要普遍的多，所以设备兼容性方面考虑肯定选择aac。

02.关于aac编码器的选择

网上找了半天，关于编码器对比的资料太少了，本来期望能找到听音测试之类的报告之类的，毕竟不是研究音频编码的，只能看看别人的评测之类的，数据详实的可信度就高。

wiki上有关于aac编码器比较详细的总结。如下表

从下表我们可以看出，支持aac特性比较完整的是fdk_aac和nero_aac,还有qaac（apple）。

image-20200209212020020.png

ffmpeg的vbr性能可以看出不行，然而我犯了一个很大的错误，就是之前一直在用ffmpeg的vbr来压制。

查了一下ffmpeg官方wiki，上面是这么写的：

Effective range for -q:a is around 0.1-2. This VBR is experimental and likely to get even worse results than the CBR.

首先是2011年的一次测试，表明qaac是96kbs下的王者

image-20200209221819940.png

之后opus出来后qaac就跑到第二了。

image-20200209221913536.png

然后关于aac编码器的推荐度表，从wiki上找到了

image-20200209222434378.png

001.ffmpeg支持的编码器

首先去ffmpeg文档上找找相关信息，ffmpeg内置的aac编码器是个什么水平需要了解。

这是ffmpeg wiki上关于aac编码方法的wiki地址 https://trac.ffmpeg.org/wiki/Encode/AAC

20200209，目前版本的ffmpeg支持两种aac的编码器，libfdk_aac和内置的aac编码器。

libfdk_aac编码器由于不是GPL证书和ffmpeg的GPL证书有冲突，所以不能编译成二进制文件分发，需要我们自己编译，所以说我们下载到的ffmpeg编译完成的版本通常只有一个内置的aac编码器。

内置aac编码器只支持aac-lc编码。

libfdk_aac则支持aac-lc和aac-he编码。

根据官方的说明，libfdk_aac是优于官方内置的编码器的。就我个人体验来说，ffmpeg内置aac压制的aac文件听同人音声和源文件，反正我是感觉不出来。因为给的320k的码率，已经是很高的一个码率了，估计码率给的够高以后，无论哪个编码器来都一样听不出来。

The Fraunhofer FDK AAC codec library. This is currently the highest-quality AAC encoder available with ffmpeg

ffmpeg以前还支持libfaac，似乎是自己的内置的编码音频质量已经优于faac的质量了，所以后来的版本移除了，所以说ffmpeg在使用aac-lc压制的时候，音频质量已经比较好，跟libfdk_aac比较接近了。

aac-he主要用在一些码率比较低的场合，所以内置编码器其实在我的使用场景下和libfaac差距不大。

下面是wiki百科的描述，自从2016年的更新后，ffmpeg的内置aac编码器已经变得可用，可以和其他主流编码器同台竞技了。

The native AAC encoder created in FFmpeg, and forked with Libav, was considered experimental and poor. A significant amount of work was done for the 3.0 release of FFmpeg (February 2016) to make its version usable and competitive with the rest of the AAC encoders. Libav has not merged this work and continues to use the older version of the AAC encoder. These encoders are LGPL-licensed open-source and can be built for any platform that the FFmpeg or Libav frameworks can be built.

Both FFmpeg and Libav can use the Fraunhofer FDK AAC library via libfdk-aac, and while the FFmpeg native encoder has become stable and good enough for common use, FDK is still considered the highest quality encoder available for use with FFmpeg. [[2]](https://trac.ffmpeg.org/wiki/Encode/AAC#fdk_aac) Libav also recommends using FDK AAC if it is available. [3]

关于使用libfdk_aac有一个注意点会影响音质，所以说还是内置aac方便，就不用操心那么多了。

Note: libfdk_aac defaults to a low-pass filter of around 14kHz (details). If you want to preserve higher frequencies, use -cutoff 18000. Adjust the number to the upper frequency limit only if you need to; keeping in mind that a higher limit may audibly reduce the overall quality.

libfdk_aac使用aac-he编码的例子：

Examples

HE-AAC version 1:

ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k output.m4a

HE-AAC version 2:

ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he_v2 -b:a 32k output.m4a

ffmpeg内置aac编码器的使用。

总之，注意用cbr就行了，其余按默认配置

ffmpeg -i input.wav -c:a aac -b:a 160k output.m4a

Encoder aac [AAC (Advanced Audio Coding)]:
    General capabilities: delay small
    Threading capabilities: none
    Supported sample rates: 96000 88200 64000 48000 44100 32000 24000 22050 16000 12000 11025 8000 7350
    Supported sample formats: fltp
AAC encoder AVOptions:
  -aac_coder         <int>        E...A..... Coding algorithm (from 0 to 2) (default fast)
     anmr            0            E...A..... ANMR method
     twoloop         1            E...A..... Two loop searching method
     fast            2            E...A..... Default fast search
  -aac_ms            <boolean>    E...A..... Force M/S stereo coding (default auto)
  -aac_is            <boolean>    E...A..... Intensity stereo coding (default true)
  -aac_pns           <boolean>    E...A..... Perceptual noise substitution (default true)
  -aac_tns           <boolean>    E...A..... Temporal noise shaping (default true)
  -aac_ltp           <boolean>    E...A..... Long term prediction (default false)
  -aac_pred          <boolean>    E...A..... AAC-Main prediction (default false)
  -aac_pce           <boolean>    E...A..... Forces the use of PCEs (default false)

twoloop 选项看来是一个更多计算资源寻找更优编码的选项。

‘twoloop’

Two loop searching (TLS) method.

This method first sets quantizers depending on band thresholds and then tries to find an optimal combination by adding or subtracting a specific value from all quantizers and adjusting some individual quantizer a little. Will tune itself based on whether aac_is, aac_ms and aac_pns are enabled.

anmr是实验性功能，所以就不要用了。

‘anmr’

Average noise to mask ratio (ANMR) trellis-based solution.

This is an experimental coder which currently produces a lower quality, is more unstable and is slower than the default twoloop coder but has potential. Currently has no support for the aac_is or aac_pns options. Not currently recommended.

然后还有修改元数据的方法

You can add metadata to any of the examples on this guide:

ffmpeg -i input ... \
-metadata author="FFmpeg Bayou Jug Band" \
-metadata title="Decode my Heart (Let's Mux)" \
output.mp4

aac_ms

Sets mid/side coding mode. The default value of "auto" will automatically use M/S with bands which will benefit from such coding. Can be forced for all bands using the value "enable", which is mainly useful for debugging or disabled using "disable".

aac_is

Sets intensity stereo coding tool usage. By default, it’s enabled and will automatically toggle IS for similar pairs of stereo bands if it’s beneficial. Can be disabled for debugging by setting the value to "disable".

aac_tns

Enables the use of a multitap FIR filter which spans through the high frequency bands to hide quantization noise during the encoding process and is reverted by the decoder. As well as decreasing unpleasant artifacts in the high range this also reduces the entropy in the high bands and allows for more bits to be used by the mid-low bands. By default it’s enabled but can be disabled for debugging by setting the option to "disable".

ltp和pred不兼容，所以按默认来就行了，默认就是ltp

aac_ltp

Enables the use of the long term prediction extension which increases coding efficiency in very low bandwidth situations such as encoding of voice or solo piano music by extending constant harmonic peaks in bands throughout frames. This option is implied by profile:a aac_low and is incompatible with aac_pred. Use in conjunction with -ar to decrease the samplerate.

aac_pred

Enables the use of a more traditional style of prediction where the spectral coefficients transmitted are replaced by the difference of the current coefficients minus the previous "predicted" coefficients. In theory and sometimes in practice this can improve quality for low to mid bitrate audio. This option implies the aac_main profile and is incompatible with aac_ltp.

默认用aac-lc的预设，这里也不需要修改。

Sets the encoding profile, possible values:

‘aac_low’

The default, AAC "Low-complexity" profile. Is the most compatible and produces decent quality.
‘mpeg2_aac_low’

Equivalent to -profile:a aac_low -aac_pns 0. PNS was introduced with the MPEG4 specifications.
‘aac_ltp’

Long term prediction profile, is enabled by and will enable the aac_ltp option. Introduced in MPEG4.
‘aac_main’

Main-type prediction profile, is enabled by and will enable the aac_pred option. Introduced in MPEG2.

If this option is unspecified it is set to ‘aac_low’.

002.nero编码器

nero是一个专门搞光盘刻录，音视频编码相关的软件公司。

官方公布了nero aac codec 的命令行软件，不过最终版本是1.5.4，2010年2月，已经十年没有更新过了，我感觉这个命令行也是不太方便，首先就是一点只接受wav的输入，然而我有时候要压制flac这种无损压缩的格式。不能直接输入就不太方便。

A commercial implementation of both LC AAC and HE AAC, Nero AAC is produced by Nero AG as part of their Nero Digital line of products. When it was new, it was generally perceived to have the highest quality VBR LC AAC implementation (although QuickTime AAC outperformed it in CBR mode at 128kbps). The codec can also create HEv1/v2 AAC streams for extremely low bitrates and supports multi-channel surround sound encoding. Nero AAC is available for free as a suite of command line tools called "Nero AAC Codec" [5] (formerly Nero Digital Audio).

The Nero AAC encoder was based on the earlier PsyTEL AAC encoder by Ivan Dimkovic.

https://www.videohelp.com/software/Nero-AAC-Codec

003.qaac编码器

算是目前最被推荐的aac编码器

我是用一个foobar2000整合版提供的qaac命令行编码工具，去qaac工具官网看了一下好像正常使用需要安装itunes才能使用。但是我这个整合版可能已经集成了相关的模块吧，所以直接就能用了

码率控制方式总共有4种

abr 平均码率，是vbr和cbr的折中，给定一个平均码率，每一个区间码率都在平均码率的上下根据质量浮动，相比于纯vbr，可以控制文件大小，相比于cvbr和真vbr编码速度更快，波动程度偏小
vbr 真vbr，给定一个质量值，对每一部分音频进行评估，给出对应值得码率。无法控制最终文件大小。
cvbr 可以给定平均值，也可以给定一个范围，可以控制文件大小。波动程度就类似与vbr。
cbr 恒定比特率，每一帧比特率恒定，最普通的模式，就算是无音的文件也是和正常文件一样大小。

我们看看itunes压制的设定是怎样的，作为参考

image-20200210011643512.png

所以我们只要学itunes使用cvbr编码模式就好了

qaac64.exe -v320 -q2 input.wav

qaac 2.68
Usage: qaac [options] infiles....

"-" as infile means stdin.
On ADTS/WAV output mode, "-" as outfile means stdout.

Main options:
--formats              Show available AAC formats and exit
-a, --abr <bitrate>    AAC ABR mode / bitrate
-V, --tvbr <n>         AAC True VBR mode / quality [0-127]
-v, --cvbr <bitrate>   AAC Constrained VBR mode / bitrate
-c, --cbr <bitrate>    AAC CBR mode / bitrate
                       For -a, -v, -c, "0" as bitrate means "highest".
                       Highest bitrate available is automatically chosen.
                       For LC, default is -V90
                       For HE, default is -v0
--he                   HE AAC mode (TVBR is not available)
-q, --quality <n>      AAC encoding Quality [0-2]
--adts                 ADTS output (AAC only)
--no-smart-padding     Don't apply smart padding for gapless playback.
                       By default, beginning and ending of input is
                       extrapolated to achieve smooth transition between
                       songs. This option also works as a workaround for
                       bug of CoreAudio HE-AAC encoder that stops encoding
                       1 frame too early.
                       Setting this option can lead to gapless playback
                       issue especially on HE-AAC.
                       However, resulting bitstream will be identical with
                       iTunes only when this option is set.
-d <dirname>           Output directory. Default is current working dir.
--check                Show library versions and exit.
-A, --alac             ALAC encoding mode
-D, --decode           Decode to a WAV file.
--caf                  Output to CAF file instead of M4A/WAV/AAC.
--play                 Decode to a WaveOut device (playback).
-r, --rate <keep|auto|n>
                       keep: output sampling rate will be same as input
                             if possible.
                       auto: output sampling rate will be automatically
                             chosen by encoder.
                       n: desired output sampling rate in Hz.
--lowpass <number>     Specify lowpass filter cut-off frequency in Hz.
                       Use this when you want lower cut-off than
                       Apple default.
-b, --bits-per-sample <n>
                       Bits per sample of output (for WAV/ALAC only)
--no-dither            Turn off dither when quantizing to lower bit depth.
--peak                 Scan + print peak (don't generate output file).
                       Cannot be used with encoding mode or -D.
                       When DSP options are set, peak is computed
                       after all DSP filters have been applied.
--gain <f>             Adjust gain by f dB.
                       Use negative value to decrese gain, when you want to
                       avoid clipping introduced by DSP.
-N, --normalize        Normalize (works in two pass. can generate HUGE
                       tempfile for large piped input)
--drc <thresh:ratio:knee:attack:release>
                       Dynamic range compression.
                       Loud parts over threshold are attenuated by ratio.
                         thresh:  threshold (in dBFS, < 0.0)
                         ratio:   compression ratio (> 1.0)
                         knee:    knee width (in dB, >= 0.0)
                         attack:  attack time (in millis, >= 0.0)
                         release: release time (in millis, >= 0.0)
--limiter              Apply smart limiter that softly clips portions
                       where peak exceeds (near) 0dBFS
--start <[[hh:]mm:]ss[.ss..]|<n>s|<mm:ss:ff>f>
                       Specify start point of the input.
                       You specify either in seconds(hh:mm:ss.sss..form) or
                       number of samples followed by 's' or
                       cuesheet frames(mm:ss:ff form) followed by 'f'.
                       When negative value is given, instead of trimming,
                       specified amount of silence is prepended.
                       Example:
                         --start 4010160s : start at 4010160 samples
                         --start 1:30:70f : same as above, in cuepoint
                         --start 1:30.93333 : same as above
--end <[[hh:]mm:]ss[.ss..]|<n>s|<mm:ss:ff>f>
                       Specify end point of the input (exclusive).
--delay <[[hh:]mm:]ss[.ss..]|<n>s|<mm:ss:ff>f>
                       Same as --start, with the sign reversed.
                       Positive value will prepend silence.
                       (This option exists due to historical reason)
--no-delay             Compensate encoder delay by prepending 960 samples
                       of scilence, then trimming 3 AAC frames from
                       the beginning (and also tweak iTunSMPB).
                       This option is mainly intended for resolving
                       A/V sync issue of video.
--num-priming <n>      (Experimental). Set arbitrary number of priming
                       samples in range from 0 to 2112 (default 2112).
                       Applicable only for AAC LC.
                       --num-priming=0 is the same as --no-delay.
                       Doesn't work with --no-smart-padding.
--gapless-mode <n>     Encoder delay signaling for gapless playback.
                         0: iTunSMPB (default)
                         1: ISO standard (elst + sbgp + sgpd)
                         2: Both
--matrix-preset <name> Specify user defined preset for matrix mixer.
--matrix-file <file>   Matrix file for remix.
--no-matrix-normalize  Don't automatically normalize(scale) matrix
                       coefficients for the matrix mixer.
--chanmap <n1,n2...>   Rearrange input channels to the specified order.
                       Example:
                         --chanmap 2,1 -> swap L and R.
                         --chanmap 2,3,1 -> C+L+R -> L+R+C.
--chanmask <n>         Force input channel mask(bitmap).
                       Either decimal or hex number with 0x prefix
                       can be used.
                       When 0 is given, qaac works as if no channel mask is
                       present in the source and picks default layout.
--no-optimize          Don't optimize MP4 container after encoding.
--tmpdir <dirname>     Specify temporary directory. Default is %TMP%
-s, --silent           Suppress console messages.
--verbose              More verbose console messages.
-i, --ignorelength     Assume WAV input and ignore the data chunk length.
--threading            Enable multi-threading.
-n, --nice             Give lower process priority.
--sort-args            Sort filenames given by command line arguments.
--text-codepage <n>    Specify text code page of cuesheet/chapter/lyrics.
                       Example: 1252 for Latin-1, 65001 for UTF-8.
                       Use this when bogus values are written into tags
                       due to automatic encoding detection failure.
-S, --stat             Save bitrate statistics into file.
--log <filename>       Output message to file.

Option for output filename generation:
--fname-from-tag       Generate filename based on metadata of input.
                       By default, output filename will be the same as input
                       (only different by the file extension).
                       Name generation can be tweaked by --fname-format.
--fname-format <string>   Format string for output filename.

Option for single output:
-o <filename>          Specify output filename
--concat               Encodes whole inputs into a single file.
                       Requires output filename (with -o)

Option for cuesheet input only:
--cue-tracks <n[-n][,n[-n]]*>
                       Limit extraction to specified tracks.
                       Tracks can be specified with comma separated numbers.
                       Hyphen can be used to denote range of numbers.
                       Tracks non-existent in the cue are just ignored.
                       Numbers must be in the range 0-99.
                       Example:
                         --cue-tracks 1-3,6-9,11
                           -> equivalent to --cue-tracks 1,2,3,6,7,8,9,11
                         --cue-tracks 2-99
                           -> can be used to skip first track (and HTOA)

Options for Raw PCM input only:
-R, --raw              Raw PCM input.
--raw-channels <n>     Number of channels, default 2.
--raw-rate     <n>     Sample rate, default 44100.
--raw-format   <str>   Sample format, default S16L.
                       Sample format spec:
                       1st char: S(igned) | U(nsigned) | F(loat)
                       2nd part: Bitwidth
                       Last part: L(ittle Endian) | B(ig Endian)
                       Last part can be omitted, L is assumed by default.
                       Cases are ignored. u16b is OK.

Options for CoreAudio sample rate converter:
--native-resampler[=line|norm|bats,n]
                       Arguments followed by '=' are optional.
                       First argument before comma is complexity from
                       one of the following:
                         line: linear (worst, don't use this)
                         norm: normal
                         bats: mastering (best, default)
                       Second argument after comma is integer quality
                       between 0-127 (default 0).
                       Example:
                         --native-resampler
                         --native-resampler=norm,96

Tagging options:
 (same value is set to all files, so use with care for multiple files)
--title <string>
--artist <string>
--band <string>       This means "Album Artist".
--album <string>
--grouping <string>
--composer <string>
--comment <string>
--genre <string>
--date <string>
--track <number[/total]>
--disk <number[/total]>
--compilation[=0|1]
                      By default, iTunes compilation flag is not set.
                      --compilation or --compilation=1 sets flag on.
                      --compilation=0 is same as default.
--lyrics <filename>
--artwork <filename>
--artwork-size <n>    Specify maximum width or height of artwork in pixels.
                      If specified artwork (with --artwork) is larger than
                      this, artwork is automatically resized.
--copy-artwork        Copy front cover art(APIC:type 3) from the source.
                      When --artwork is also given, this option is ignored.
--chapter <filename>
                      Set chapter from file.
--tag <fcc>:<value>
                      Set iTunes pre-defined tag with fourcc key
                      and value.
                      1) When key starts with U+00A9 (copyright sign),
                         you can use 3 chars starting from the second char
                         instead.
                      2) Some known tags having type other than UTF-8 string
                         are taken care of. Others are just stored as UTF-8
                         string.
--tag-from-file <fcc>:<path>
                      Same as above, but value is read from file.
--long-tag <key>:<value>
                      Set long tag (iTunes custom metadata) with
                      arbitrary key/value pair. Value is always stored as
                      UTF8 string.

然后有一个问题，当我们压制hires音乐的时候，会发现，aac 总会把96khz转为48khz，然后采样位数也由24bit降低到16bit

这是两个问题，下面分别来进行回答

qaac 所使用的的apple aac编码有所限制，只支持特定格式的编码。支持的比特率可以说最高就是320kbps，更高的比特率只有多声道的情况下，也就是说两声道立体声最多320kbps。
采样位数这个概念主要用于wav格式的文件，有损编码并没有这个概念，统一都只有16bit。

qaac --formats命令可以查看qaac支持的编码格式

可以看到 aac-lc最高只支持到48khz，只有aac-he支持96khz，但是最高比特率还是只有320k，采样位数也是16位

相当于说，96khz也就是每秒采样次数增加了一倍，但是比特率仍然是320k/bps,每次采样的信息量相当于减少了一半。所以说还不如用aac-lc 压44khz的音频

 qaac64.exe --formats
LC 8000Hz Mono -- 8,12,16,20,24
LC 8000Hz Stereo -- 16,20,24,28,32,40,48
LC 8000Hz 3.0 (C L R) -- 24,28,32,40,48,56,64,72
LC 8000Hz 4.0 (L R Ls Rs) -- 32,40,48,56,64,72,80,96
LC 8000Hz 4.0 (C L R Cs) -- 32,40,48,56,64,72,80,96
LC 8000Hz 5.0 (C L R Ls Rs) -- 40,48,56,64,72,80,96,112
LC 8000Hz 5.1 (C L R Ls Rs LFE) -- 40,48,56,64,72,80,96,112
LC 8000Hz 6.0 (C L R Ls Rs Cs) -- 48,56,64,72,80,96,112,128,144
LC 8000Hz 6.1 (C L R Ls Rs Cs LFE) -- 48,56,64,72,80,96,112,128,144
LC 8000Hz 7.0 (C L R Ls Rs Rls Rrs) -- 56,64,72,80,96,112,128,144,160
LC 8000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 56,64,72,80,96,112,128,144,160
LC 8000Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 64,72,80,96,112,128,144,160,192
LC 11025Hz Mono -- 8,12,16,20,24,28,32
LC 11025Hz Stereo -- 16,20,24,28,32,40,48,56,64
LC 11025Hz 3.0 (C L R) -- 40,48,56,64,72,80,96
LC 11025Hz 4.0 (L R Ls Rs) -- 48,56,64,72,80,96,112,128
LC 11025Hz 4.0 (C L R Cs) -- 48,56,64,72,80,96,112,128
LC 11025Hz 5.0 (C L R Ls Rs) -- 64,72,80,96,112,128,144,160
LC 11025Hz 5.1 (C L R Ls Rs LFE) -- 64,72,80,96,112,128,144,160
LC 11025Hz 6.0 (C L R Ls Rs Cs) -- 72,80,96,112,128,144,160,192
LC 11025Hz 6.1 (C L R Ls Rs Cs LFE) -- 72,80,96,112,128,144,160,192
LC 11025Hz 7.0 (C L R Ls Rs Rls Rrs) -- 96,112,128,144,160,192,224
LC 11025Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 96,112,128,144,160,192,224
LC 11025Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 96,112,128,144,160,192,224,256
LC 12000Hz Mono -- 12,16,20,24,28,32
LC 12000Hz Stereo -- 24,28,32,40,48,56,64
LC 12000Hz 3.0 (C L R) -- 40,48,56,64,72,80,96
LC 12000Hz 4.0 (L R Ls Rs) -- 48,56,64,72,80,96,112,128
LC 12000Hz 4.0 (C L R Cs) -- 48,56,64,72,80,96,112,128
LC 12000Hz 5.0 (C L R Ls Rs) -- 64,72,80,96,112,128,144,160
LC 12000Hz 5.1 (C L R Ls Rs LFE) -- 64,72,80,96,112,128,144,160
LC 12000Hz 6.0 (C L R Ls Rs Cs) -- 72,80,96,112,128,144,160,192
LC 12000Hz 6.1 (C L R Ls Rs Cs LFE) -- 72,80,96,112,128,144,160,192
LC 12000Hz 7.0 (C L R Ls Rs Rls Rrs) -- 96,112,128,144,160,192,224
LC 12000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 96,112,128,144,160,192,224
LC 12000Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 96,112,128,144,160,192,224,256
LC 16000Hz Mono -- 12,16,20,24,28,32,40,48
LC 16000Hz Stereo -- 24,28,32,40,48,56,64,72,80,96
LC 16000Hz 3.0 (C L R) -- 40,48,56,64,72,80,96,112,128,144
LC 16000Hz 4.0 (L R Ls Rs) -- 48,56,64,72,80,96,112,128,144,160,192
LC 16000Hz 4.0 (C L R Cs) -- 48,56,64,72,80,96,112,128,144,160,192
LC 16000Hz 5.0 (C L R Ls Rs) -- 64,72,80,96,112,128,144,160,192,224
LC 16000Hz 5.1 (C L R Ls Rs LFE) -- 64,72,80,96,112,128,144,160,192,224
LC 16000Hz 6.0 (C L R Ls Rs Cs) -- 72,80,96,112,128,144,160,192,224,256,288
LC 16000Hz 6.1 (C L R Ls Rs Cs LFE) -- 72,80,96,112,128,144,160,192,224,256,288
LC 16000Hz 7.0 (C L R Ls Rs Rls Rrs) -- 96,112,128,144,160,192,224,256,288,320
LC 16000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 96,112,128,144,160,192,224,256,288,320
LC 16000Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 96,112,128,144,160,192,224,256,288,320,384
LC 22050Hz Mono -- 16,20,24,28,32,40,48,56,64
LC 22050Hz Stereo -- 32,40,48,56,64,72,80,96,112,128
LC 22050Hz 3.0 (C L R) -- 48,56,64,72,80,96,112,128,144,160,192
LC 22050Hz 4.0 (L R Ls Rs) -- 64,72,80,96,112,128,144,160,192,224,256
LC 22050Hz 4.0 (C L R Cs) -- 64,72,80,96,112,128,144,160,192,224,256
LC 22050Hz 5.0 (C L R Ls Rs) -- 80,96,112,128,144,160,192,224,256,288,320
LC 22050Hz 5.1 (C L R Ls Rs LFE) -- 80,96,112,128,144,160,192,224,256,288,320
LC 22050Hz 6.0 (C L R Ls Rs Cs) -- 96,112,128,144,160,192,224,256,288,320,384
LC 22050Hz 6.1 (C L R Ls Rs Cs LFE) -- 96,112,128,144,160,192,224,256,288,320,384
LC 22050Hz 7.0 (C L R Ls Rs Rls Rrs) -- 112,128,144,160,192,224,256,288,320,384,448
LC 22050Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 112,128,144,160,192,224,256,288,320,384,448
LC 22050Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 128,144,160,192,224,256,288,320,384,448,512
LC 24000Hz Mono -- 16,20,24,28,32,40,48,56,64
LC 24000Hz Stereo -- 32,40,48,56,64,72,80,96,112,128
LC 24000Hz 3.0 (C L R) -- 48,56,64,72,80,96,112,128,144,160,192
LC 24000Hz 4.0 (L R Ls Rs) -- 64,72,80,96,112,128,144,160,192,224,256
LC 24000Hz 4.0 (C L R Cs) -- 64,72,80,96,112,128,144,160,192,224,256
LC 24000Hz 5.0 (C L R Ls Rs) -- 80,96,112,128,144,160,192,224,256,288,320
LC 24000Hz 5.1 (C L R Ls Rs LFE) -- 80,96,112,128,144,160,192,224,256,288,320
LC 24000Hz 6.0 (C L R Ls Rs Cs) -- 96,112,128,144,160,192,224,256,288,320,384
LC 24000Hz 6.1 (C L R Ls Rs Cs LFE) -- 96,112,128,144,160,192,224,256,288,320,384
LC 24000Hz 7.0 (C L R Ls Rs Rls Rrs) -- 112,128,144,160,192,224,256,288,320,384,448
LC 24000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 112,128,144,160,192,224,256,288,320,384,448
LC 24000Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 128,144,160,192,224,256,288,320,384,448,512
LC 32000Hz Mono -- 24,28,32,40,48,56,64,72,80,96
LC 32000Hz Stereo -- 48,56,64,72,80,96,112,128,144,160,192
LC 32000Hz 3.0 (C L R) -- 72,80,96,112,128,144,160,192,224,256,288
LC 32000Hz 4.0 (L R Ls Rs) -- 96,112,128,144,160,192,224,256,288,320,384
LC 32000Hz 4.0 (C L R Cs) -- 96,112,128,144,160,192,224,256,288,320,384
LC 32000Hz 5.0 (C L R Ls Rs) -- 128,144,160,192,224,256,288,320,384,448
LC 32000Hz 5.1 (C L R Ls Rs LFE) -- 128,144,160,192,224,256,288,320,384,448
LC 32000Hz 6.0 (C L R Ls Rs Cs) -- 144,160,192,224,256,288,320,384,448,512,576
LC 32000Hz 6.1 (C L R Ls Rs Cs LFE) -- 144,160,192,224,256,288,320,384,448,512,576
LC 32000Hz 7.0 (C L R Ls Rs Rls Rrs) -- 192,224,256,288,320,384,448,512,576,640
LC 32000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 192,224,256,288,320,384,448,512,576,640
LC 32000Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 192,224,256,288,320,384,448,512,576,640,768
LC 44100Hz Mono -- 32,40,48,56,64,72,80,96,112,128,144,160,192,224,256
LC 44100Hz Stereo -- 64,72,80,96,112,128,144,160,192,224,256,288,320
LC 44100Hz 3.0 (C L R) -- 96,112,128,144,160,192,224,256,288,320,384,448
LC 44100Hz 4.0 (L R Ls Rs) -- 128,144,160,192,224,256,288,320,384,448,512,576,640
LC 44100Hz 4.0 (C L R Cs) -- 128,144,160,192,224,256,288,320,384,448,512,576,640
LC 44100Hz 5.0 (C L R Ls Rs) -- 160,192,224,256,288,320,384,448,512,576,640,768
LC 44100Hz 5.1 (C L R Ls Rs LFE) -- 160,192,224,256,288,320,384,448,512,576,640,768
LC 44100Hz 6.0 (C L R Ls Rs Cs) -- 192,224,256,288,320,384,448,512,576,640,768,960
LC 44100Hz 6.1 (C L R Ls Rs Cs LFE) -- 192,224,256,288,320,384,448,512,576,640,768,960
LC 44100Hz 7.0 (C L R Ls Rs Rls Rrs) -- 224,256,288,320,384,448,512,576,640,768,960
LC 44100Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 224,256,288,320,384,448,512,576,640,768,960
LC 44100Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 256,288,320,384,448,512,576,640,768,960,1280
LC 48000Hz Mono -- 32,40,48,56,64,72,80,96,112,128,144,160,192,224,256
LC 48000Hz Stereo -- 64,72,80,96,112,128,144,160,192,224,256,288,320
LC 48000Hz 3.0 (C L R) -- 96,112,128,144,160,192,224,256,288,320,384,448
LC 48000Hz 4.0 (L R Ls Rs) -- 128,144,160,192,224,256,288,320,384,448,512,576,640
LC 48000Hz 4.0 (C L R Cs) -- 128,144,160,192,224,256,288,320,384,448,512,576,640
LC 48000Hz 5.0 (C L R Ls Rs) -- 160,192,224,256,288,320,384,448,512,576,640,768
LC 48000Hz 5.1 (C L R Ls Rs LFE) -- 160,192,224,256,288,320,384,448,512,576,640,768
LC 48000Hz 6.0 (C L R Ls Rs Cs) -- 192,224,256,288,320,384,448,512,576,640,768,960
LC 48000Hz 6.1 (C L R Ls Rs Cs LFE) -- 192,224,256,288,320,384,448,512,576,640,768,960
LC 48000Hz 7.0 (C L R Ls Rs Rls Rrs) -- 224,256,288,320,384,448,512,576,640,768,960
LC 48000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 224,256,288,320,384,448,512,576,640,768,960
LC 48000Hz 8.0 (C L R Ls Rs Rls Rrs Cs) -- 256,288,320,384,448,512,576,640,768,960,1280
HE 16000Hz Mono -- 10,12,16,20,24,28,32
HE 16000Hz Stereo -- 20,24,28,32,40,48,56,64
HE 16000Hz 4.0 (L R Ls Rs) -- 40,48,56,64,80,96,112,128
HE 16000Hz 5.1 (C L R Ls Rs LFE) -- 56,64,80,96,112,128,160
HE 16000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 80,96,112,128,160,192,224
HE 22050Hz Mono -- 10,12,16,20,24,28,32
HE 22050Hz Stereo -- 20,24,28,32,40,48,56,64
HE 22050Hz 4.0 (L R Ls Rs) -- 40,48,56,64,80,96,112,128
HE 22050Hz 5.1 (C L R Ls Rs LFE) -- 56,64,80,96,112,128,160
HE 22050Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 80,96,112,128,160,192,224
HE 24000Hz Mono -- 12,16,20,24,28,32
HE 24000Hz Stereo -- 24,28,32,40,48,56,64
HE 24000Hz 4.0 (L R Ls Rs) -- 48,56,64,80,96,112,128
HE 24000Hz 5.1 (C L R Ls Rs LFE) -- 64,80,96,112,128,160
HE 24000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 96,112,128,160,192,224
HE 32000Hz Mono -- 12,16,20,24,28,32,40
HE 32000Hz Stereo -- 24,28,32,40,48,56,64,80
HE 32000Hz 4.0 (L R Ls Rs) -- 48,56,64,80,96,112,128,160
HE 32000Hz 5.1 (C L R Ls Rs LFE) -- 64,80,96,112,128,160,192
HE 32000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 96,112,128,160,192,224,256
HE 44100Hz Mono -- 16,20,24,28,32,40
HE 44100Hz Stereo -- 32,40,48,56,64,80
HE 44100Hz 4.0 (L R Ls Rs) -- 64,80,96,112,128,160
HE 44100Hz 5.1 (C L R Ls Rs LFE) -- 80,96,112,128,160,192
HE 44100Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 112,128,160,192,224,256
HE 48000Hz Mono -- 16,20,24,28,32,40
HE 48000Hz Stereo -- 32,40,48,56,64,80
HE 48000Hz 4.0 (L R Ls Rs) -- 64,80,96,112,128,160
HE 48000Hz 5.1 (C L R Ls Rs LFE) -- 80,96,112,128,160,192
HE 48000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 112,128,160,192,224,256
HE 88200Hz Mono -- 32,40,48,56,64,80,96,112,128,160
HE 88200Hz Stereo -- 64,80,96,112,128,160,192,224,256,320
HE 88200Hz 4.0 (L R Ls Rs) -- 128,160,192,224,256,320,448,640
HE 88200Hz 5.1 (C L R Ls Rs LFE) -- 160,192,224,256,320,448,640
HE 88200Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 224,256,320,448,640,1120
HE 96000Hz Mono -- 32,40,48,56,64,80,96,112,128,160
HE 96000Hz Stereo -- 64,80,96,112,128,160,192,224,256,320
HE 96000Hz 4.0 (L R Ls Rs) -- 128,160,192,224,256,320,448,640
HE 96000Hz 5.1 (C L R Ls Rs LFE) -- 160,192,224,256,320,448,640
HE 96000Hz 7.1 (C Lc Rc L R Ls Rs LFE) -- 224,256,320,448,640,1120

03.总结

使用qaac编码器即可，使用这种参数即可：

qaac64.exe -v320 -q2 input.wav

使用ffmpeg内置aac编码器的情况，使用cbr编码，vbr编码还不成熟

也可以使用ffmpeg编译fdk_aac

nero aac 因为很久没有更新维护了，也不是开源，所以不推荐

04.powershell脚本的编写

随便写了个脚本，先凑合用。



param(
    [string]$qaacParam='--verbose --rate keep -v320 -q2',
    [string]$targetPath='.',
    [switch]$nodelete,
    [switch]$he
)

if($he){
    $qaacParam='--verbose --copy-artwork --rate keep --he -v320 -q2 '
}
$losslessFiles= Get-ChildItem -Recurse -File  -LiteralPath $targetPath | Where-Object {($_.Extension  -eq '.flac') -or ($_.Extension  -eq '.wav') }
$fileCounts = $losslessFiles.Length
Write-Host -ForegroundColor Green ('Totally found '+ $fileCounts+' lossless audio files')

$index = 0
foreach($losslessFile in $losslessFiles){
    $index+=1
    Write-Host "`n"
    $progeressPercent=[int]($index/$fileCounts*100)
    $restCounts = $fileCounts-$index
    #Write-Host  -BackgroundColor Gray -ForegroundColor Black ('converting '+$index +' audio file ,progressing '+[int]($index/$fileCounts*100)+'% ,'+($fileCounts-$index)+' rest files' )
    Write-Host  -BackgroundColor Gray -ForegroundColor Black ('converting {0} audio file ,progressing {1}% , {2} rest files' -f $index,$progeressPercent,$restCounts )

    #$audiofileName = $losslessFile.BaseName
    $audiofilePath = $losslessFile.FullName
    $audiofileExt = $losslessFile.Extension
    #$newfilename = $audiofileName+'.m4a'
    $newfilepath=$audiofilePath.SubString(0,$audiofilePath.Length-$audiofileExt.Length)+'.m4a'
    Write-Host -ForegroundColor Green ('audio path:'+ $audiofilePath)
    if($audiofileExt -eq '.wav'){
        $commandStr = ('qaac64.exe  '+$qaacParam+' "' +$audiofilePath +'" -o "' +$newfilepath+'"')
        Write-Host $commandStr
       Invoke-Expression $commandStr
    }elseif($audiofileExt -eq '.flac'){
        # 安装了flac解码的模块以后，qaac就可以直接接受flac文件了，所以不用通过cmd转码了和wav是一样的操作
       # Write-Host $audiofilePath
      #  cmd /c 'ffmpeg  -i $($a) qaac64.exe   --verbose --rate keep -v320 -q2 -loglevel quiet '
      #cmd /c ('ffmpeg -loglevel quiet -i "'+$audiofilePath +'" -f wav - | qaac64.exe '+$qaacParam+'  - -o "'+$newfilepath+'"')
      $commandStr = ('qaac64.exe  '+$qaacParam+' "' +$audiofilePath +'" -o "' +$newfilepath+'"')
      Write-Host $commandStr
     Invoke-Expression $commandStr
    }else{
            Write-Host -ForegroundColor Red 'Error'
    }


    if(Test-Path -LiteralPath  $newfilepath ){
        if($nodelete){
            Write-Host -BackgroundColor Yellow -ForegroundColor Green 'no-delete flag is open'
        } else{
            Write-Host -Verbose -ForegroundColor   Cyan 'convert finshed, deleting source audio file...'
            Remove-Item -Force -LiteralPath  $audiofilePath
        }
    } else
    {
        # 新文件没有创建成功，说明转换没有成功
        Write-Host -ForegroundColor Red 'convert file failed'
        
    }
    
}
Write-Host -ForegroundColor Green 'done'