Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measurements from C920 core #1

Open
camel-cdr opened this issue Feb 27, 2024 · 5 comments
Open

Measurements from C920 core #1

camel-cdr opened this issue Feb 27, 2024 · 5 comments

Comments

@camel-cdr
Copy link

Hi, I just read your lovely article.

I tried running the measurements on the more powerful C920 cores, and everything worked out of the box. 👍

Here are the results:

result

sgemm_riscv.csv

@Zhao-Dongyu
Copy link
Owner

I haven’t verified it on other platforms yet, so happy to see your results!

There are still some optimizations that can be done, and we look forward to higher performance!

@Javipove
Copy link

Javipove commented May 31, 2024

Which system and hardware had been used for this testing? I imagine the SG2042 with a Fedora? Thanks for sharing the csv.

@camel-cdr
Copy link
Author

@Javipove I ran it on the SG2042 server from perfXlab. I'm not sure which exact distro was on the system, but iirc it was debian based.

@Javipove
Copy link

@Zhao-Dongyu Hello, I am also trying to run the bandwidth test for the floating and vector versions in the SG2042 but I don't really understand how the math is done to calculate it. Where does the first number that you substract the result of the test program comes from? Thanks (and sorry if its a dumb question)

@Zhao-Dongyu
Copy link
Owner

@Zhao-Dongyu Hello, I am also trying to run the bandwidth test for the floating and vector versions in the SG2042 but I don't really understand how the math is done to calculate it. Where does the first number that you substract the result of the test program comes from? Thanks (and sorry if its a dumb question)

This is a good question. I did not explain it in detail in the article and code because I wrote this part of the code too messy...

The memory bandwidth test is explained here: https://github.com/Zhao-Dongyu/sgemm_riscv/tree/main/prepare
(Sorry, I did not write the English version here. I used a lot of Chinese, which makes it difficult for you to read.)

For example, I used the flw method to test, and the result was:
flw: 4000MB/(2678.298 − 1171.154)ms = 2.592GB/s
You must be wondering where the number 1171.154 comes from?

截屏2024-06-11 上午10 26 34

In this file, you can see that this test code contains not only flw instructions, but also addi, slti, and bnez instructions. I don’t think these instructions are doing memory transfer work, so I commented out all the flw instructions in the assembly code and ran it again, and got the number 1171.154.
In fact, I am not sure whether this calculation method is correct and whether it conforms to the underlying logic of the computer? If you have a better answer, remember to call me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants