FBGEMM version mismatch on ARM #304

ayanchak1508 · 2024-09-27T00:53:37Z

I was trying to run the DLRMv2 benchmark of MLPerf Inference on an ARM server using the instructions here.

I run into the issue when the tool tries to install torchrec==0.3.2
torchrec==0.3.2 requires fbgemm-gpu==0.3.2 but fbgemm-gpu only introduced support for ARM starting from v0.5.0: https://download.pytorch.org/whl/cpu/fbgemm-gpu/

I tried two alternate approaches:

Build fbgemm-gpu v0.3.2 from source. This does not work because it needs a compiler with AVX-512 support (which is clearly absent on ARM).
Try with a newer version of fbgemm-gpu (v0.5.0 or above) but the cm tool remains inflexible and keeps trying to search for v0.3.2

Previously, I did run the benchmark without any problems on ARM (without using the cm tool) using newer versions of fbgemm-gpu. (Note that I did need to use fbgemm-gpu-cpu too)

Command to reproduce the issue:

cm run script --tags=run-mlperf,inference,_r4.1-dev    --model=dlrm-v2-99.9    --implementation=reference    --framework=pytorch    --category=datacenter    --scenario=Server   --server_target_qps=10    --execution_mode=valid    --device=cpu    --quiet --repro

Error message:

ERROR: Could not find a version that satisfies the requirement fbgemm-gpu==0.3.2 (from versions: none)
ERROR: No matching distribution found for fbgemm-gpu==0.3.2

The repro folder and the logfile is present in the attached tarball.
cm-repro.tar.gz

The text was updated successfully, but these errors were encountered:

arjunsuresh · 2024-09-27T01:38:14Z

Hi @ayanchak1508 You can just remove the version requirement in this file locally which should be inside $HOME/repos/mlcommons@cm4mlops/script/

https://github.com/GATEOverflow/cm4mlops/blob/mlperf-inference/script/app-mlperf-inference-mlcommons-python/_cm.yaml#L1129

We never had success using a higher version of fbgemm with the available inference implementation. If you can share the exact versions which worked, we can test them.

ayanchak1508 · 2024-09-27T11:22:44Z

Thanks for the quick reply!
Yes, indeed after changing the version, it seems to be working.

These are the versions (that changed from the default) that work for me:
fbgemm_gpu==0.8.0+cpu
fbgemm_gpu-cpu==0.8.0
torch==2.4.0
torchrec==0.8.0

I have attached the full requirements.txt file in case if needed
requirements.txt

I sometimes run into a bus error (core dumped) error afterward, but it seems to be more of a memory capacity issue unrelated to the toolchain/benchmark?

arjunsuresh · 2024-09-27T11:33:48Z

Thanks a lot @ayanchak1508 . Let me check that. This issue might help with the bus error.

arjunsuresh · 2024-09-27T13:38:06Z

yes, with pytorch 2.4 we could use fbgemm_gpu==0.8.0 and it worked fine. We have removed the version dependency in the CM script now. You can just do cm pull repo and it should be visible.

arjunsuresh · 2024-09-27T15:08:43Z

Just to add ulimit=9999 was not enough to run 1000 inputs. I think it'll be incredibly hard to do a full run of 204800 inputs using the current reference implementation on CPUs.

ayanchak1508 · 2024-09-27T16:33:58Z

Thanks a lot for the quick updates!

I did a fresh, clean setup to see the effects. I have two observations:

pip doesn't automatically know where to find fbgemm-gpu for ARM, it needs to be installed via pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cpu/
I actually ran into more dependency conflicts this time, and the benchmark started complaining about functions it couldn't find inside modules (such as ModuleNotFoundError: No module named 'fbgemm_gpu.split_embedding_configs')

I'm not sure if I'm doing anything wrong, but if I create a new virtual environment and use the requirements file I posted earlier, the benchmark runs without problems. Maybe this is an ARM-specific problem?

Regarding the bus error problem, thank you again for the references. Is there any way to use the debug dataset or limit the max inputs, i.e., deviate from the official submission rules in any way? (of course I understand it wouldn't count as a valid submission, but I'm just interested in the model performance)

I guess one possible solution could be to edit the conf file manually, but is there a better way?
(Sorry for bringing the bus error into this issue, we can move it to a separate issue if needed)

arjunsuresh · 2024-09-27T18:31:30Z

For 1, may be the problem is with the .whl file?

"but if I create a new virtual environment and use the requirements file I posted earlier, the benchmark runs without problems."

Is it on the same ARM machine? If so, you can try the venv for CM flow also as follows:

cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"

For the bus error - what's the available RAM on the system?

ayanchak1508 · 2024-09-27T18:49:49Z

Sorry, I should have been more specific. Runs are on a clean and empty docker container (ubuntu:22.04) on an ARM server.

I created two python venvs (in the same container), one for installing packages through the CM-based flow and one for installing packages from the requirements file. Although I didn't use the command you mentioned, I simply created a normal python venv as mentioned here: https://docs.mlcommons.org/inference/install/ and ran the CM commands for the benchmark there. Does the command you mentioned do something more?

For the bus error, the RAM is not too big, it's about ~250GB (the docker container has no resource constraints). I remember I faced a similar problem before when I processed the dataset myself some time back, and had to move to a different machine with 512 GB RAM. So, I understand maybe its not big enough to run the entire dataset, but should be fine at least for the debug dataset?

arjunsuresh · 2024-09-27T19:02:53Z

Thank you.

Yes, the commands are a bit different. CM is a python package and when you use a venv for CM, it gets installed in the venv. Now when you run any workflow using CM, any available python on the system can be picked by the flow unless we force one using "cm run script - -tags=get,python" and doing the appropriate selection. The command I shared is a safer option as long as the name used is new.

Coming to 256GB, it should be good enough. We have run Dlrmv2 full comfortably on 192GB. It worked even on 64GB, but had to use a lot of swap space.

I believe your problem could be the shm size as docker is used. Are you explicitly setting shm size during docker run? We typically set 32GB shm size for dlrm.

ayanchak1508 · 2024-09-27T19:11:43Z

Thank you very much for the clarification!

I did not set the shm size, and the default seems to be 64MB, much smaller than the 32GB you mentioned.
I will try it out (both using the command you mentioned and increasing the shm size), and get back to you.

Thanks once again for all the quick help.

arjunsuresh · 2024-09-27T21:11:24Z

Sure @ayanchak1508 Just a correction to what I told earlier - the 64G system where we had run dlrmv2 was on GPUs and not CPUs. On CPUs we could only do a test run on 192G for 10 inputs.

ayanchak1508 · 2024-09-28T09:25:57Z

Update:

Increasing the shm size to 32G fixes the bus error problem, thank you! I can now run the benchmark, albeit with a very low qps
Using the CM venv flow as you mentioned before doesn't help, it runs into the same problems:

ImportError: cannot import name 'DLRM_DCN' from 'torchrec.models.dlrm' (/root/CM/repos/local/cache/b1d060ef5c0c4217/mlperf/lib/python3.10/site-packages/torchrec/models/dlrm.py)
ModuleNotFoundError: No module named 'fbgemm_gpu.split_embedding_configs'

These are the packages it installs in the mlperf venv: current.txt
Doing a diff with the requirements file I posted before, and then manually installing the correct package versions in the mlperf venv solves the problem:

pip install torch==2.4.0 torchrec==0.8.0
pip uninstall fbgemm-gpu
pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cpu/

I am not sure why I had to reinstall the same version of fbgemm-gpu but otherwise it runs into the ModuleNotFoundError

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FBGEMM version mismatch on ARM #304

FBGEMM version mismatch on ARM #304

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024 •

edited

Loading

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 28, 2024

FBGEMM version mismatch on ARM #304

FBGEMM version mismatch on ARM #304

Comments

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024 • edited Loading

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 27, 2024

arjunsuresh commented Sep 27, 2024

ayanchak1508 commented Sep 28, 2024

ayanchak1508 commented Sep 27, 2024 •

edited

Loading