Rufous

Regarding optimization flags for latest Mesa build from source. : r ...

Format: jsonScore: 10Link: https://www.reddit.com
{
  "post": {
    "title": "Regarding optimization flags for latest Mesa build from source.",
    "selftext": "This is the first time I am thinking to build Mesa from source. I am very excited and wondering if adding these fancy compiler optimization flags could further make the code faster. Hoping to achieve even better minimum fps for my RX 5700 XT. Has anybody ever found any noticeable improvement in gaming with such an optimized Mesa build? Should I also use \"`-march=native -mtune=native`\"? Right now I have only enabled `-O3` and link time optimization as follows:\n\n    mesa 22.3.0-devel\n    \n    backend     : ninja\n    \n    optimization: 3\n    \n    b_lto       : true\n    \n    c_args      : -O3 \n    \n    cpp_args    : -O3\n\n​\n\nMany thanks in advance. Not that it's needed because the context here is independent of what hardware you have but for information, I am using this old quad Core i7-5775C 4.2 GHz + RX 5700 XT + 32 GB RAM DDR3 2200.",
    "url": "https://www.reddit.com/r/linux_gaming/comments/woc07h/regarding_optimization_flags_for_latest_mesa/"
  },
  "comments": [
    {
      "body": "I compiled Mesa many times. With various options. Even things like link time optimization (`-flto`), automatic vectorization, tunning for specific CPU (`-march=znver2`), and few other things (like `-fno-semantic-interposition`). I didn't notice any difference. Maybe 1% in some places. Usually you are GPU limited so it is not going to help much.\n\nI do recommend setting SSE fpu for 32 bit Mesa tho. By default 32 binaries use legacy x87 fpu, but sse/sse2 are preferred due to number of registers, better parallelism, and more CPU optimisation. SSE2 is available on all 64 CPUs. Just add ` -mfpmath=sse`\n\nHere are my build scripts for Debian: https://gist.github.com/baryluk/1041204eff4cc4fad6f1508afe67b562",
      "replies": [
        {
          "body": "I think I am getting there too in that \"may be 1% in some places\".  \nThis is despite the following:  \n`User defined options`  \n`backend      : ninja`  \n`buildtype    : release`  \n`debug        : false`  \n`optimization : 3`  \n`b_lto        : true`  \n`c_args       : -O3 -march=broadwell -mtune=broadwell -flto`  \n`c_link_args  : -flto`  \n`cpp_args     : -O3 -march=broadwell -mtune=broadwell -flto`  \n`cpp_link_args: -flto`",
          "replies": [
            {
              "body": "P.S. Goddamn this editor creating different colors everytime I paste and select the text as code lol. nevermind."
            }
          ]
        }
      ]
    },
    {
      "body": "FYI. `-march=native` automatically implies `-mtune=native` if not specified manually.",
      "replies": [
        {
          "body": "Now that I didn't know. Many thanks!",
          "replies": [
            {
              "body": "If you are interested in stuff like this worth a read:\n\nhttps://gcc.gnu.org/onlinedocs/gcc/x86-Options.html\n\nAnd\n\nhttps://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options"
            }
          ]
        }
      ]
    },
    {
      "body": "Some heaven benchmarks comparing different options. There's a little improvement under CPU contention, hard to measure in actual gaming conditions though. Some benefit on 1% lows in all scenarios. Heaven only tests radeonsi, not radv. This is mesa 22.3.0_devel.158059.5fd8ae15415b.d41d8cd98f00b204e9800998ecf8427e-1, and native here is znver1.\n\nhttps://wtf.roflcopter.fr/pics/jh8EEvPG/nojlnHft.png\n\nhttps://wtf.roflcopter.fr/pics/L8pGbrDK/kacppzmr.png",
      "replies": [
        {
          "body": "Many thanks for the results.\n\nI am using low latency kernel, and with that and -O3 -march=native -flto, I am getting better lows and minimums too. Overall experience is kind of looking smooth beyond placebo effect. As someone told already, -Ofast may create graphics glitches and with -Ofast I did get graphics glitches in 'Ryse Son of Rome'. Reverted to -O3, issue solved."
        }
      ]
    },
    {
      "body": "I use `-Ofast -march=native -flto` without issues. I haven't bothered measuring before/after.\n\n>Hoping to achieve even better minimum fps\n\nTry any tweaks that reduce latency, for example `DXVK_ASYNC=1`, esync/fsync, `mitigations=off preempt=full`, performance governor for cpu/gpu, chrt/ionice, putting the game (and your mesa shader cache) on a SSD, disabling compositing etc.",
      "replies": [
        {
          "body": "Yeah I was kinda afraid of that much aggresiveness of the `-Ofast`, but since you had no issue, I am gonna use that instead.\n\nI am already using all those configuration of the second paragraph already, except that `preempt=full` I had never heard until now.\n\nEdit: Ah the `preempt=full` means low latency kernel, got it. Already using it.",
          "replies": [
            {
              "body": "`-Ofast` is a bit risky, but I guess for graphical stuff and gaming, risk is small. Worst that could happen is slight rendering issue from time to time. Unlikely tho."
            }
          ]
        }
      ]
    },
    {
      "body": "here use this https://github.com/AdelKS/LinuxGamingGuide"
    },
    {
      "body": "Have you even checked whether your issues are on the CPU side of things? Besides, I wouldnt bother with optimization flags. It's highly unlikely that it'll make a noticable difference besides potentially breaking stuff.",
      "replies": [
        {
          "body": "Actually there are no issues here. I have i7-5775C. And in GTA-V I get 120-140+ avg fps 1080p max settings minus the AA. The GPU usage stays at around 70-90% on this RX 5700 XT. I was just bored and thought, \"how about compile the Mesa driver?\". This is supposed to be just a fun experiment to see if such a custom compiled driver would process more draw calls (even lower CPU overhead)."
        }
      ]
    },
    {
      "body": "I doubt it would change much. I build Mesa with `-march=znver3` though to target the CPU better.",
      "replies": [
        {
          "body": "Yeah, I didn't notice anything enough other than some placebo like \"game IS kinda running smoother lol\". Will need more time before I can make any conclusions.\n\n`-march=native` actually already provides proper value for the CPU it's going to be compiled on, for your CPU the `-march=native` is `-march=znver3`, and for mine it's `-march=broadwell`"
        }
      ]
    }
  ]
}