Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Conv.pad after converting to half #905

Closed
Rikorose opened this issue Dec 13, 2022 · 5 comments
Closed

Error in Conv.pad after converting to half #905

Rikorose opened this issue Dec 13, 2022 · 5 comments

Comments

@Rikorose
Copy link
Contributor

I converted my model to half using:

// ...
let mut m = pulsed.into_typed()?.into_optimized()?;
if half_floats {
    use tract_core::model::translator::Translate;
    m = tract_core::half::HalfTranslator.translate_model(&m)?;
// ...
let erb = to_f16(&self.erb_buf)?;  // Convert f32 -> f16 dt
let cpl = to_f16(&self.cplx_buf)?.permute_axes(&[0, 3, 1, 2])?;
m.run(tvec!(erb, cpl))?

Which results in the following error:

Error: Evaluating #3 "/erb_conv0/1/Conv.pad" Pad

Caused by:
    Tensor datum type error: tensor is F32, accessed as F16

To reproduce, see: Rikorose/DeepFilterNet#211

@kali
Copy link
Collaborator

kali commented Dec 13, 2022

I'm having a look. There are a couple of issue. Support for f16 is experimental, this is a good exercise. Congrats for venturing once again in an unbaked area of tract ;)

@kali
Copy link
Collaborator

kali commented Dec 13, 2022

Some fixes on the branch #907 . Also, you'll need to do a couple of fixes on your side. Gist of it: don't call the f16 translator /after/ optimize, just after pulsing or declutter, and cast some f16 outputs back to f32.

Tell me how it goes.

diff --git a/libDF/src/tract.rs b/libDF/src/tract.rs
index 38543da..deaf504 100644
--- a/libDF/src/tract.rs
+++ b/libDF/src/tract.rs
@@ -411,8 +411,7 @@ impl DfTract {
         };
         #[cfg(feature = "timings")]
         let t2 = Instant::now();
-        let &lsnr = enc_emb.pop().unwrap().to_scalar::<f32>()?;
-        dbg!(lsnr);
+        let lsnr = enc_emb.pop().unwrap().cast_to_scalar::<f32>()?;
         let c0 = enc_emb.pop().unwrap().into_tensor();
         let emb = enc_emb.pop().unwrap().into_tensor();
         let (apply_erb, apply_erb_zeros, apply_df) = if lsnr < self.min_db_thresh {
@@ -454,6 +453,8 @@ impl DfTract {
                 .unwrap()
                 .into_tensor()
                 .into_shape(&[self.ch, self.nb_erb])?
+                .cast_to::<f32>()?
+                .into_owned()
                 .into_array()?;
             if self.ch > 1 {
                 m = match self.reduce_mask {
@@ -559,6 +560,7 @@ fn df(
             .into_dimensionality()?;
     // Zero relevant frequency bins of output
     o_f.slice_mut(s![.., ..nb_df]).fill(Complex32::default());
+    let coefs = coefs.cast_to::<f32>()?;
     let coefs_arr: ArrayView3<Complex32> =
         as_arrayview_complex(coefs.to_array_view::<f32>()?, &[ch, nb_df, df_order])
             .into_dimensionality()?;
@@ -615,12 +617,12 @@ fn init_encoder_impl(
     let pulsed = PulsedModel::new(&m, 1)?;
     let delay = pulsed.output_fact(0)?.delay;
     log::info!("Init encoder with delay: {}", delay);
-    let mut m = pulsed.into_typed()?.into_optimized()?;
-
+    let mut m = pulsed.into_typed()?;
     if half_floats {
         use tract_core::model::translator::Translate;
         m = tract_core::half::HalfTranslator.translate_model(&m)?;
     }
+    m = m.into_optimized()?;

     Ok((m, delay))
 }
@@ -691,12 +693,13 @@ fn init_erb_decoder_impl(
     let pulsed = PulsedModel::new(&m, 1)?;
     let delay = pulsed.output_fact(0)?.delay;
     log::info!("Init ERB decoder with delay: {}", delay);
-    let mut m = pulsed.into_typed()?.into_optimized()?;
+    let mut m = pulsed.into_typed()?;

     if half_floats {
         use tract_core::model::translator::Translate;
         m = tract_core::half::HalfTranslator.translate_model(&m)?;
     }
+    m = m.into_optimized()?;
     Ok((m, delay))
 }
 fn init_erb_decoder(
@@ -757,12 +760,13 @@ fn init_df_decoder_impl(
     let pulsed = PulsedModel::new(&m, 1)?;
     let delay = pulsed.output_fact(0)?.delay;
     log::info!("Init DF decoder with delay: {}", delay);
-    let mut m = pulsed.into_typed()?.into_optimized()?;
+    let mut m = pulsed.into_typed()?;

     if half_floats {
         use tract_core::model::translator::Translate;
         m = tract_core::half::HalfTranslator.translate_model(&m)?;
     }
+    m = m.into_optimized()?;
     Ok((m, delay))
 }
 fn init_df_decoder(

@Rikorose
Copy link
Contributor Author

I'm having a look. There are a couple of issue. Support for f16 is experimental, this is a good exercise. Congrats for venturing once again in an unbaked area of tract ;)

Woops, I just found the option and wanted to try out.

Tell me how it goes.

Works now, thanks. But is approx 63 times slower than f32 on x86.

@kali
Copy link
Collaborator

kali commented Dec 14, 2022

Lol, good to know it's working now. No big surprise about the performance on intel, the only thing which has been optimised at this stage is matrix multiplication on armv8.2+ .

@VariantXYZ
Copy link
Contributor

VariantXYZ commented Dec 15, 2022

@Rikorose might also be worth profiling it on your target, as the last time I tried to do a half-precision model (especially at something running on the order of 10ms), a major bottleneck was f32/f16 conversion (more-so than the “useful work”).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants