Error in Conv.pad after converting to half #905

Rikorose · 2022-12-13T10:24:03Z

I converted my model to half using:

// ...
let mut m = pulsed.into_typed()?.into_optimized()?;
if half_floats {
    use tract_core::model::translator::Translate;
    m = tract_core::half::HalfTranslator.translate_model(&m)?;
// ...
let erb = to_f16(&self.erb_buf)?;  // Convert f32 -> f16 dt
let cpl = to_f16(&self.cplx_buf)?.permute_axes(&[0, 3, 1, 2])?;
m.run(tvec!(erb, cpl))?

Which results in the following error:

Error: Evaluating #3 "/erb_conv0/1/Conv.pad" Pad

Caused by:
    Tensor datum type error: tensor is F32, accessed as F16

To reproduce, see: Rikorose/DeepFilterNet#211

The text was updated successfully, but these errors were encountered:

kali · 2022-12-13T15:56:45Z

I'm having a look. There are a couple of issue. Support for f16 is experimental, this is a good exercise. Congrats for venturing once again in an unbaked area of tract ;)

kali · 2022-12-13T16:38:39Z

Some fixes on the branch #907 . Also, you'll need to do a couple of fixes on your side. Gist of it: don't call the f16 translator /after/ optimize, just after pulsing or declutter, and cast some f16 outputs back to f32.

Tell me how it goes.

diff --git a/libDF/src/tract.rs b/libDF/src/tract.rs
index 38543da..deaf504 100644
--- a/libDF/src/tract.rs
+++ b/libDF/src/tract.rs
@@ -411,8 +411,7 @@ impl DfTract {
         };
         #[cfg(feature = "timings")]
         let t2 = Instant::now();
-        let &lsnr = enc_emb.pop().unwrap().to_scalar::<f32>()?;
-        dbg!(lsnr);
+        let lsnr = enc_emb.pop().unwrap().cast_to_scalar::<f32>()?;
         let c0 = enc_emb.pop().unwrap().into_tensor();
         let emb = enc_emb.pop().unwrap().into_tensor();
         let (apply_erb, apply_erb_zeros, apply_df) = if lsnr < self.min_db_thresh {
@@ -454,6 +453,8 @@ impl DfTract {
                 .unwrap()
                 .into_tensor()
                 .into_shape(&[self.ch, self.nb_erb])?
+                .cast_to::<f32>()?
+                .into_owned()
                 .into_array()?;
             if self.ch > 1 {
                 m = match self.reduce_mask {
@@ -559,6 +560,7 @@ fn df(
             .into_dimensionality()?;
     // Zero relevant frequency bins of output
     o_f.slice_mut(s![.., ..nb_df]).fill(Complex32::default());
+    let coefs = coefs.cast_to::<f32>()?;
     let coefs_arr: ArrayView3<Complex32> =
         as_arrayview_complex(coefs.to_array_view::<f32>()?, &[ch, nb_df, df_order])
             .into_dimensionality()?;
@@ -615,12 +617,12 @@ fn init_encoder_impl(
     let pulsed = PulsedModel::new(&m, 1)?;
     let delay = pulsed.output_fact(0)?.delay;
     log::info!("Init encoder with delay: {}", delay);
-    let mut m = pulsed.into_typed()?.into_optimized()?;
-
+    let mut m = pulsed.into_typed()?;
     if half_floats {
         use tract_core::model::translator::Translate;
         m = tract_core::half::HalfTranslator.translate_model(&m)?;
     }
+    m = m.into_optimized()?;

     Ok((m, delay))
 }
@@ -691,12 +693,13 @@ fn init_erb_decoder_impl(
     let pulsed = PulsedModel::new(&m, 1)?;
     let delay = pulsed.output_fact(0)?.delay;
     log::info!("Init ERB decoder with delay: {}", delay);
-    let mut m = pulsed.into_typed()?.into_optimized()?;
+    let mut m = pulsed.into_typed()?;

     if half_floats {
         use tract_core::model::translator::Translate;
         m = tract_core::half::HalfTranslator.translate_model(&m)?;
     }
+    m = m.into_optimized()?;
     Ok((m, delay))
 }
 fn init_erb_decoder(
@@ -757,12 +760,13 @@ fn init_df_decoder_impl(
     let pulsed = PulsedModel::new(&m, 1)?;
     let delay = pulsed.output_fact(0)?.delay;
     log::info!("Init DF decoder with delay: {}", delay);
-    let mut m = pulsed.into_typed()?.into_optimized()?;
+    let mut m = pulsed.into_typed()?;

     if half_floats {
         use tract_core::model::translator::Translate;
         m = tract_core::half::HalfTranslator.translate_model(&m)?;
     }
+    m = m.into_optimized()?;
     Ok((m, delay))
 }
 fn init_df_decoder(

Rikorose · 2022-12-14T12:27:37Z

I'm having a look. There are a couple of issue. Support for f16 is experimental, this is a good exercise. Congrats for venturing once again in an unbaked area of tract ;)

Woops, I just found the option and wanted to try out.

Tell me how it goes.

Works now, thanks. But is approx 63 times slower than f32 on x86.

kali · 2022-12-14T14:13:17Z

Lol, good to know it's working now. No big surprise about the performance on intel, the only thing which has been optimised at this stage is matrix multiplication on armv8.2+ .

VariantXYZ · 2022-12-15T03:14:14Z

@Rikorose might also be worth profiling it on your target, as the last time I tried to do a half-precision model (especially at something running on the order of 10ms), a major bottleneck was f32/f16 conversion (more-so than the “useful work”).

Rikorose closed this as completed Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Conv.pad after converting to half #905

Error in Conv.pad after converting to half #905

Rikorose commented Dec 13, 2022

kali commented Dec 13, 2022

kali commented Dec 13, 2022

Rikorose commented Dec 14, 2022

kali commented Dec 14, 2022

VariantXYZ commented Dec 15, 2022 •

edited

Loading

Error in Conv.pad after converting to half #905

Error in Conv.pad after converting to half #905

Comments

Rikorose commented Dec 13, 2022

kali commented Dec 13, 2022

kali commented Dec 13, 2022

Rikorose commented Dec 14, 2022

kali commented Dec 14, 2022

VariantXYZ commented Dec 15, 2022 • edited Loading

VariantXYZ commented Dec 15, 2022 •

edited

Loading