Webrtc AGC 算法（二）

1. IOS平台相关概念

1.1 IOS平台自带AGC功能

可以通过以下方法查看ios平台是否支持AGC和设置AGC（代码摘自webrtc）：

//1. 创建Component，设置开启硬件音频处理
// Create an audio component description to identify the Voice Processing
  // I/O audio unit.
  AudioComponentDescription vpio_unit_description;
  vpio_unit_description.componentType = kAudioUnitType_Output;
  vpio_unit_description.componentSubType = kAudioUnitSubType_VoiceProcessingIO;//设置这个 sub type 后，会启用系统的 AEC, AGC, NS 等
  vpio_unit_description.componentManufacturer = kAudioUnitManufacturer_Apple;
  vpio_unit_description.componentFlags = 0;
  vpio_unit_description.componentFlagsMask = 0;

  // Obtain an audio unit instance given the description.
  AudioComponent found_vpio_unit_ref =
      AudioComponentFindNext(nullptr, &vpio_unit_description);

  // Create a Voice Processing IO audio unit.
  OSStatus result = noErr;
  result = AudioComponentInstanceNew(found_vpio_unit_ref, &vpio_unit_);
  if (result != noErr) {
    vpio_unit_ = nullptr;
    RTCLogError(@"AudioComponentInstanceNew failed. Error=%ld.", (long)result);
    return false;
  }


// 2. 查看是否支持AGC，webRTC中封装在GetAGCState方法中
UInt32 size = sizeof(*enabled);
  OSStatus result = AudioUnitGetProperty(vpio_unit_,
                                         kAUVoiceIOProperty_VoiceProcessingEnableAGC,
                                         kAudioUnitScope_Global,
                                         kInputBus,
                                         enabled,
                                         &size);
  RTCLog(@"VPIO unit AGC: %u", static_cast<unsigned int>(*enabled));

  // 设置开启AGC
    UInt32 enable_agc = 1;
    result =
        AudioUnitSetProperty(vpio_unit_,
                             kAUVoiceIOProperty_VoiceProcessingEnableAGC,
                             kAudioUnitScope_Global, kInputBus, &enable_agc,
                             sizeof(enable_agc));
    if (result != noErr) {
      RTCLogError(@"Failed to enable the built-in AGC. "
                   "Error=%ld.",
                  (long)result);
      RTC_HISTOGRAM_COUNTS_SPARSE_100000(
          "WebRTC.Audio.SetAGCStateErrorCode", (-1) * result);
    }

//3. 初始化AudioUnit
AudioUnitInitialize(vpio_unit_);

1.2 设置采集回调

在直播录制中比较关键的一步就是Render Callback Function（webrtc中对应设置的方法为OnDeliverRecordedData）。AudioUnit每次都是处理一段音频数据，每次处理完成一段数据的时候，这个回调函数就会被调用一次。代码摘自webrtc：

/ Disable AU buffer allocation for the recorder, we allocate our own.
  // TODO(henrika): not sure that it actually saves resource to make this call.
  UInt32 flag = 0;
  result = AudioUnitSetProperty(
      vpio_unit_, kAudioUnitProperty_ShouldAllocateBuffer,
      kAudioUnitScope_Output, kInputBus, &flag, sizeof(flag));
  if (result != noErr) {
    DisposeAudioUnit();
    RTCLogError(@"Failed to disable buffer allocation on the input bus. "
                 "Error=%ld.",
                (long)result);
    return false;
  }

  // Specify the callback to be called by the I/O thread to us when input audio
  // is available. The recorded samples can then be obtained by calling the
  // AudioUnitRender() method.
  AURenderCallbackStruct input_callback;
  input_callback.inputProc = OnDeliverRecordedData;
  input_callback.inputProcRefCon = this;
  result = AudioUnitSetProperty(vpio_unit_,
                                kAudioOutputUnitProperty_SetInputCallback,
                                kAudioUnitScope_Global, kInputBus,
                                &input_callback, sizeof(input_callback));
  if (result != noErr) {
    DisposeAudioUnit();
    RTCLogError(@"Failed to specify the input callback on the input bus. "
                 "Error=%ld.",
                (long)result);
    return false;
  }

1.3 获取pcm数据

WebRTC 禁用了 AudioUnit 为采集数据的 buffer 分配，而是自己管理 buffer，在收到 AudioUnit 的回调之后，再手动调用 AudioUnitRender 把采集到的数据取出来。通过AudioUnit的AudioUnitRender方法，可以AUGraph中的某一个节点中获取到一段处理后的音频PCM数据。同时，如果需要进行耳返播放，在这个回调中也需要将取得的音频数据送入到回调函数的最后一个参数ioData对应的buffer中。代码摘自webrtc：

OSStatus VoiceProcessingAudioUnit::Render(AudioUnitRenderActionFlags* flags,
                                          const AudioTimeStamp* time_stamp,
                                          UInt32 output_bus_number,
                                          UInt32 num_frames,
                                          AudioBufferList* io_data) {
  RTC_DCHECK(vpio_unit_) << "Init() not called.";

  OSStatus result = AudioUnitRender(vpio_unit_, flags, time_stamp,
                                    output_bus_number, num_frames, io_data);
  if (result != noErr) {
    RTCLogError(@"Failed to render audio unit. Error=%ld", (long)result);
  }
  return result;
}

2. webrtc中的AGC

iOS平台的VPIO自己本身已经支持AEC、AGC和NS所以不使用WebRTC的软件算法。WebRTC iOS 并未实现开关 AEC 的逻辑，但是提供了一个 ios_force_software_aec_HACK 选项，用以在硬件 AEC 不生效的机器上强制开启软件 AEC 实现，但这种做法的具体表现如何，WebRTC 也是持观望态度。webrtc中在iOS平台可以通过ios_force_software_aec_HACK 强制开启软件回声消除:echo_cancellation/extended_filter_aec，但是AGC和NS目前没有选项可以设置。如果安卓平台内置了AEC、AGC和NS也是不使用WebRTC软件算法而使用平台内置算法。顺便看了一下android上使用平台内置硬件AEC，NS的代码，代码目录在：

android平台sdk目录.png

在java中使用AudioEffect检测是否支持硬件AEC和NS，使用AcousticEchoCanceler和NoiseSuppressor处理AEC和NS。而AGC没有平台硬件支持，直接使用WebRTC中的算法。

那如何开启webrtc的agc呢？我们可以通过 media/engine/webrtc_voice_engine.cc 的WebRtcVoiceEngine::ApplyOptions(const AudioOptions& options_in) 方法，可以强制设置走webrtc的agc，测试代码如下：

...
// Set and adjust noise suppressor options.
#if defined(WEBRTC_IOS)
  // On iOS, VPIO provides built-in NS.
//  options.noise_suppression = false;
//  options.typing_detection = false;
//  options.experimental_ns = false;
    //lygtest
    options.noise_suppression = true;
    options.typing_detection = true;
    options.experimental_ns = true;
  RTC_LOG(LS_INFO) << "Always disable NS on iOS. Use built-in instead.";
#elif defined(WEBRTC_ANDROID)
  options.typing_detection = false;
  options.experimental_ns = false;
#endif

// Set and adjust gain control options.
#if defined(WEBRTC_IOS)
  // On iOS, VPIO provides built-in AGC.
//  options.auto_gain_control = false;
//  options.experimental_agc = false;
    //lygtest
    options.auto_gain_control = true;
    options.experimental_agc = true;
  RTC_LOG(LS_INFO) << "Always disable AGC on iOS. Use built-in instead.";
#elif defined(WEBRTC_ANDROID)
  options.experimental_agc = false;
#endif

2.1 接口调用逻辑和调用堆栈（自己绘制的分享给大家）

接口调用逻辑.png

调用堆栈.png

关键代码：

int AudioProcessingImpl::ProcessCaptureStreamLocked() {
  HandleCaptureRuntimeSettings();

  // Ensure that not both the AEC and AECM are active at the same time.
  // TODO(peah): Simplify once the public API Enable functions for these
  // are moved to APM.
  RTC_DCHECK_LE(!!private_submodules_->echo_controller +
                    !!private_submodules_->echo_cancellation +
                    !!private_submodules_->echo_control_mobile,
                1);

  AudioBuffer* capture_buffer = capture_.capture_audio.get();  // For brevity.

  if (private_submodules_->pre_amplifier) {
    private_submodules_->pre_amplifier->ApplyGain(AudioFrameView<float>(
        capture_buffer->channels(), capture_buffer->num_channels(),
        capture_buffer->num_frames()));
  }

  capture_input_rms_.Analyze(rtc::ArrayView<const float>(
      capture_buffer->channels_const()[0],
      capture_nonlocked_.capture_processing_format.num_frames()));
  const bool log_rms = ++capture_rms_interval_counter_ >= 1000;
  if (log_rms) {
    capture_rms_interval_counter_ = 0;
    RmsLevel::Levels levels = capture_input_rms_.AverageAndPeak();
    RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureInputLevelAverageRms",
                                levels.average, 1, RmsLevel::kMinLevelDb, 64);
    RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureInputLevelPeakRms",
                                levels.peak, 1, RmsLevel::kMinLevelDb, 64);
  }

  if (private_submodules_->echo_controller) {
    // Detect and flag any change in the analog gain.
    int analog_mic_level = agc1()->stream_analog_level();
    capture_.echo_path_gain_change =
        capture_.prev_analog_mic_level != analog_mic_level &&
        capture_.prev_analog_mic_level != -1;
    capture_.prev_analog_mic_level = analog_mic_level;

    // Detect and flag any change in the pre-amplifier gain.
    if (private_submodules_->pre_amplifier) {
      float pre_amp_gain = private_submodules_->pre_amplifier->GetGainFactor();
      capture_.echo_path_gain_change =
          capture_.echo_path_gain_change ||
          (capture_.prev_pre_amp_gain != pre_amp_gain &&
           capture_.prev_pre_amp_gain >= 0.f);
      capture_.prev_pre_amp_gain = pre_amp_gain;
    }

    // Detect volume change.
    capture_.echo_path_gain_change =
        capture_.echo_path_gain_change ||
        (capture_.prev_playout_volume != capture_.playout_volume &&
         capture_.prev_playout_volume >= 0);
    capture_.prev_playout_volume = capture_.playout_volume;

    private_submodules_->echo_controller->AnalyzeCapture(capture_buffer);
  }

  if (constants_.use_experimental_agc &&
      public_submodules_->gain_control->is_enabled()) {
    private_submodules_->agc_manager->AnalyzePreProcess(
        capture_buffer->channels_f()[0], capture_buffer->num_channels(),
        capture_nonlocked_.capture_processing_format.num_frames());

    if (constants_.use_experimental_agc_process_before_aec) {
      private_submodules_->agc_manager->Process(
          capture_buffer->channels_const()[0],
          capture_nonlocked_.capture_processing_format.num_frames(),
          capture_nonlocked_.capture_processing_format.sample_rate_hz());
    }
  }

  if (submodule_states_.CaptureMultiBandSubModulesActive() &&
      SampleRateSupportsMultiBand(
          capture_nonlocked_.capture_processing_format.sample_rate_hz())) {
    capture_buffer->SplitIntoFrequencyBands();
  }

  const bool experimental_multi_channel_capture =
      config_.pipeline.experimental_multi_channel &&
      constants_.experimental_multi_channel_capture_support;
  if (private_submodules_->echo_controller &&
      !experimental_multi_channel_capture) {
    // Force down-mixing of the number of channels after the detection of
    // capture signal saturation.
    // TODO(peah): Look into ensuring that this kind of tampering with the
    // AudioBuffer functionality should not be needed.
    capture_buffer->set_num_channels(1);
  }

  if (private_submodules_->high_pass_filter) {
    private_submodules_->high_pass_filter->Process(capture_buffer);
  }
  RETURN_ON_ERR(
      public_submodules_->gain_control->AnalyzeCaptureAudio(capture_buffer));
  public_submodules_->noise_suppression->AnalyzeCaptureAudio(capture_buffer);

  if (private_submodules_->echo_control_mobile) {
    // Ensure that the stream delay was set before the call to the
    // AECM ProcessCaptureAudio function.
    if (!was_stream_delay_set()) {
      return AudioProcessing::kStreamParameterNotSetError;
    }

    if (public_submodules_->noise_suppression->is_enabled()) {
      private_submodules_->echo_control_mobile->CopyLowPassReference(
          capture_buffer);
    }

    public_submodules_->noise_suppression->ProcessCaptureAudio(capture_buffer);

    RETURN_ON_ERR(private_submodules_->echo_control_mobile->ProcessCaptureAudio(
        capture_buffer, stream_delay_ms()));
  } else {
    if (private_submodules_->echo_controller) {
      data_dumper_->DumpRaw("stream_delay", stream_delay_ms());

      if (was_stream_delay_set()) {
        private_submodules_->echo_controller->SetAudioBufferDelay(
            stream_delay_ms());
      }

      private_submodules_->echo_controller->ProcessCapture(
          capture_buffer, capture_.echo_path_gain_change);
    } else if (private_submodules_->echo_cancellation) {
      // Ensure that the stream delay was set before the call to the
      // AEC ProcessCaptureAudio function.
      if (!was_stream_delay_set()) {
        return AudioProcessing::kStreamParameterNotSetError;
      }

      RETURN_ON_ERR(private_submodules_->echo_cancellation->ProcessCaptureAudio(
          capture_buffer, stream_delay_ms()));
    }

    public_submodules_->noise_suppression->ProcessCaptureAudio(capture_buffer);
  }

  if (config_.voice_detection.enabled) {
    capture_.stats.voice_detected =
        private_submodules_->voice_detector->ProcessCaptureAudio(
            capture_buffer);
  } else {
    capture_.stats.voice_detected = absl::nullopt;
  }

  if (constants_.use_experimental_agc &&
      public_submodules_->gain_control->is_enabled() &&
      !constants_.use_experimental_agc_process_before_aec) {
    private_submodules_->agc_manager->Process(
        capture_buffer->split_bands_const_f(0)[kBand0To8kHz],
        capture_buffer->num_frames_per_band(), capture_nonlocked_.split_rate);
  }
  // TODO(peah): Add reporting from AEC3 whether there is echo.
  RETURN_ON_ERR(public_submodules_->gain_control->ProcessCaptureAudio(
      capture_buffer,
      private_submodules_->echo_cancellation &&
          private_submodules_->echo_cancellation->stream_has_echo()));

  if (submodule_states_.CaptureMultiBandProcessingPresent() &&
      SampleRateSupportsMultiBand(
          capture_nonlocked_.capture_processing_format.sample_rate_hz())) {
    capture_buffer->MergeFrequencyBands();
  }

  if (capture_.capture_fullband_audio) {
    const auto& ec = private_submodules_->echo_controller;
    bool ec_active = ec ? ec->ActiveProcessing() : false;
    // Only update the fullband buffer if the multiband processing has changed
    // the signal. Keep the original signal otherwise.
    if (submodule_states_.CaptureMultiBandProcessingActive(ec_active)) {
      capture_buffer->CopyTo(capture_.capture_fullband_audio.get());
    }
    capture_buffer = capture_.capture_fullband_audio.get();
  }

  if (config_.residual_echo_detector.enabled) {
    RTC_DCHECK(private_submodules_->echo_detector);
    private_submodules_->echo_detector->AnalyzeCaptureAudio(
        rtc::ArrayView<const float>(capture_buffer->channels()[0],
                                    capture_buffer->num_frames()));
  }

  // TODO(aluebs): Investigate if the transient suppression placement should be
  // before or after the AGC.
  if (capture_.transient_suppressor_enabled) {
    float voice_probability =
        private_submodules_->agc_manager.get()
            ? private_submodules_->agc_manager->voice_probability()
            : 1.f;

    public_submodules_->transient_suppressor->Suppress(
        capture_buffer->channels()[0], capture_buffer->num_frames(),
        capture_buffer->num_channels(),
        capture_buffer->split_bands_const(0)[kBand0To8kHz],
        capture_buffer->num_frames_per_band(),
        capture_.keyboard_info.keyboard_data,
        capture_.keyboard_info.num_keyboard_frames, voice_probability,
        capture_.key_pressed);
  }

  // Experimental APM sub-module that analyzes |capture_buffer|.
  if (private_submodules_->capture_analyzer) {
    private_submodules_->capture_analyzer->Analyze(capture_buffer);
  }

  if (config_.gain_controller2.enabled) {
    private_submodules_->gain_controller2->NotifyAnalogLevel(
        agc1()->stream_analog_level());
    private_submodules_->gain_controller2->Process(capture_buffer);
  }

  if (private_submodules_->capture_post_processor) {
    private_submodules_->capture_post_processor->Process(capture_buffer);
  }

  // The level estimator operates on the recombined data.
  if (config_.level_estimation.enabled) {
    private_submodules_->output_level_estimator->ProcessStream(*capture_buffer);
    capture_.stats.output_rms_dbfs =
        private_submodules_->output_level_estimator->RMS();
  } else {
    capture_.stats.output_rms_dbfs = absl::nullopt;
  }

  capture_output_rms_.Analyze(rtc::ArrayView<const float>(
      capture_buffer->channels_const()[0],
      capture_nonlocked_.capture_processing_format.num_frames()));
  if (log_rms) {
    RmsLevel::Levels levels = capture_output_rms_.AverageAndPeak();
    RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureOutputLevelAverageRms",
                                levels.average, 1, RmsLevel::kMinLevelDb, 64);
    RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureOutputLevelPeakRms",
                                levels.peak, 1, RmsLevel::kMinLevelDb, 64);
  }

  capture_.was_stream_delay_set = false;
  return kNoError;
}

其中，ProcessRenderAudio 这个函数在远端调用，主要目的是分析远端信号的VAD属性。调用了AGC中的WebRtcAgc_AddFarend函数。实际上WebRtcAgc_AddFarend函数只是做了一些参数的校验，最后调用了WebRtcAgc_AddFarendToDigital，WebRtcAgc_AddFarendToDigital调用到了WebRtcAgc_ProcessVad。再Check input的流程中，只支持以下采样率和帧长的输入数据：
1、8K采样率，10或者20ms长度数据，subFrames为80；
2、16K采样率，10或者20ms长度数据，subFrames为160；
3、32K采样率，5或者10ms长度数据，subFrames为160；
AnalyzeCaptureAudio 流程主要用于分析没有经过处理的声音。根据不同模式调用了数字不同的处理。如果是选择了kFixedDigital的AGC模式，则AnalyzeCaptureAudio不起作用。
ProcessCaptureAudio 是处理的核心，包括了AGC的重要调用WebRtcAgc_Process。除了正常的参数检测之外还有有关音量调节的流程。capture_levels_是通过WebRtcAgc_Process计算出来，之后的analog_capture_level_=capture_levels_。同时analog_capture_level_会用在AnalyzeCaptureAudio中的WebRtcAgc_VirtualMic中，做为设置音量输入，同时计算出capture_levels_。事实上由mode控制，如果选择了数字模式，WebRtcAgc_VirtualMic输入analog_capture_level_获得的capture_levels_用于做WebRtcAgc_Process的输入，但是WebRtcAgc_Process输入值写到capture_levels_后给废弃了。如果选择了模拟模式不执行WebRtcAgc_VirtualMic，那么WebRtcAgc_Process输入capture_levels_，每次更新，并且把结果保存到analog_capture_level_中。

2.2 AGC相关接口

>创建AGC：WebRtcAgc_Create
>初始化AGC：WebRtcAgc_Init
>设置配置：WebRtcAgc_set_config
>初始化capture_level = 0
>对于kAdaptiveDigital，调用VirtualMic：WebRtcAgc_VirtualMic
> Process Buffer with capture_level：WebRtcAgc_Process
>获取从WebRtcAgc_Process返回的捕获级别，并将其设置为capture_level
>对于音频缓冲区重复5到7
>销毁AGC：WebRtcAgc_Free

欢迎关注公众号“音视频开发之旅”，一起学习成长。

参考文章：
Webrtc AGC 算法原理初识（一)

最后编辑于：2021.02.13 14:10:03

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 159,290评论 4赞 363
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 67,399评论 1赞 294
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 109,021评论 0赞 243
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 44,034评论 0赞 207
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 52,412评论 3赞 287
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,651评论 1赞 219
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,902评论 2赞 313
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,605评论 0赞 199
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 34,339评论 1赞 246
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,586评论 2赞 246
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 32,076评论 1赞 261
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,400评论 2赞 253
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 33,060评论 3赞 236
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 26,083评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,851评论 0赞 195
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,685评论 2赞 274
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,595评论 2赞 270