Benchmarking and Tuning

Here are the instructions to tune the rieMiner\'s parameters in order to get the best performance, and also includes indications about how to benchmark and compare values. We assume that you already read one of the other rieMiner\'s guide that explain how to mine or find a record.

Benchmarking

Comparing Riecoin mining performance is relatively difficult, and here is what you should know before comparing performance or tuning the settings. There are some benchmarks further below to have ideas about how a given computer should perform or examples of the following remarks.

Metrics

In rieMiner, the performance is based on two metrics,

If you are looking for k-tuples, you can calculate the k-tuple find rate (tuples per second) by doing c/rk. So, multiplying this by 86400 will give the estimated average number of k-tuples every day. This is the relevant metric for comparing performance. Computing the inverse of this value gives the average time to find a k-tuple. This is how the time to find a block is estimated in rieMiner.

That means, in general, do not just consider the candidates/s metric! If it is lower after changing a setting (in particular, the PrimeTableLimit), it does not always mean that the mining performance was reduced. You must look at the k-tuple rate or average time to find one instead. Similarly, a lower candidates/s with a higher Difficulty does not mean that the mining performance is lower.

There are some specific situations where it is enough to consider the candidates/s. This is the case if you can guarantee that the ratio and the Difficulty are always the same across the different benchmarks.

Convergence

The performance metrics take some time to converge, so do not make conclusions too fast about the performance! Test actual mining or use the Benchmark Mode during 10-20 minutes or more. Testing during a couple of minutes will in general not be enough. Note that if you test during mining and more blocks are found, it will reduce the candidates/s a bit, so you might take this in account when comparing the metrics.

Benchmark Mode

You should use this mode in order to compare performance of different computers or settings. Indeed, measuring performance during mining is subject to the random block occurrences, which as said above affect the performance. The Benchmark Mode allows to do "dummy mining" with reproducible conditions and compare more easily performance.

Here is a template of the Benchmark Mode. This is for a benchmark at Difficulty 1024 during 16 minutes. Blocks will appear every 150 s.

Mode = Benchmark
Difficulty = 1024
BenchmarkBlockInterval = 150
BenchmarkTimeLimit = 960
BenchmarkPrimeCountLimit = 0
# PrimorialNumber = 70

You must reproduce the current mining conditions (put the current Difficulty). You should also use the same PrimorialNumber as the one used in mining (the guessed value is slightly different between the modes).

The Search Mode is an alternative for benchmarking, but it is less reproducible and does not propose dummy blocks. In the other direction, do not use the Benchmark Mode to find new records!

Tuning

Relevant configuration options

The options that can play a role to the mining performance are PrimeTableLimit, SieveWorkers, SieveBits and SieveIterations. Threads can also be used to reduce the number of threads if wanted. Here is a template (to append to the templates from the other guides or the Benchmark template above).

Threads = 0
PrimeTableLimit = 0
SieveWorkers = 0
SieveBits = 0
SieveIterations = 0

You can learn what these settings actually mean by reading the mining algorithm explanation. 0 is a special value that makes an initial but rough guess. Start the miner once with the automatic settings and report the guessed values, shown at the beginning. Then, you can use these values as starting points and tune the parameters like explained below and progressively fill the configuration file with manual values.

PrimeTableLimit and SieveWorkers

They are the main parameters for rieMiner tuning. Generally,

To tune them, first look at your CPU usage during mining. It should be maxed out most of the times. If not, then you are experiencing CPU Underuse. For example, the CPU usage graph of the Windows 10\'s Task Manager may look like this:

No Cpu Underuse Example
No CPU Underuse (if there are occasional drops, especially when a new block appears, it is fine).
Cpu Underuse Example
CPU Underuse.

If there is no CPU Underuse, try both, not in a particular order (you can use your intuition after few tries):

If there is CPU Underuse, do the inverse operations.

Repeat the process until you feel that the settings are optimal. In all cases, it is trial and error and there is no precise quantity to increase or decrease. Multiply or divide the PrimeTableLimit by something like 1.1, 1.5, 2, 3 or something else. But you should vary the SieveWorkers only by steps of 1 or 2.

Other parameters

If you change these values, you should try to retune the PrimeTableLimit and SieveWorkers to see if you can still gain more performance.

Remarks for record attempters

The instructions above are also valid for those using the Search Mode or mining for records. Here are few additional remarks in these cases:

Benchmark Results

This section shows some rieMiner benchmark results in order to help comparing different processors, provide an idea on how to tune the parameters, or highlight some observations about current Riecoin mining.

Except when mentioned, an AMD Ryzen R7 3700X was used for the benchmarks, using all the 16 threads, and default settings were used; the constellation pattern is 0, 2, 4, 2, 4, 6, 2 (7-tuples). The benchmarks were done during a Debian 10 Live USB session. rieMiner was recompiled for the machine during the live session just before the benchmarks.

Ratios and blocks per day

Before showing the actual results, it is worth to remind as mentioned above that the ratio is an essential metric of Riecoin mining, the candidates/s metric alone does not mean much usually. Due to how the mining algorithm is constructed, it is actually possible to compute it using the formula

r = log ( 2 D ) p = 2  prime L p 1 p = D log ( 2 ) p = 2  prime L p 1 p {\displaystyle r^{*}=\log(2^{D})\prod _{p=2{\text{ prime}}}^{L}{\frac {p-1}{p}}=D\log(2)\prod _{p=2{\text{ prime}}}^{L}{\frac {p-1}{p}}}
D is the Difficulty (searched numbers will be around 2D, L the Prime Table Limit, log(2) ≈ 0.69314718056.

It is not obvious in normal circumstances that the ratios between k and (k + 1)-tuples counts or rates are the same for any k, though the tendency may be observed after long mining sessions or if generating very large numbers of tuples in a benchmark. Here are some values of the product for various PrimeTableLimits.

LProduct
235 = 343597383680.0231432770
234 = 171798691840.0238239564
233 = 85899345920.0245458897
232 = 42949672960.0253129494
231 = 21474836480.0261294878
230 = 10737418240.0270004472

Calculated ratios will be used in the benchmarks below. The blocks/day for k-tuplets is then given by

Blocks per day = 86400 Candidates per second r k {\displaystyle {\text{Blocks per day}}=86400{\frac {\text{Candidates per second}}{{r^{*}}^{k}}}}

Results for different processors

Here are benchmarks with different CPUs.

The turbo/boost features were disabled and the CPU always ran at the mentioned frequency.

a is a normalized metric, and corresponds to the candidates/s without HT/SMT divided by the number of cores and the GHz, yielding a result that can be interpreted as the architecture performance (speed of a single core at 1 GHz for this benchmark). This number is useful to make Riecoin profitability calculators as various processors with the same architecture should have a similar a. The list is sorted by this metric.

Highlighted lines are benchmarks done with actual hardware. Others were extrapolated. Do not compare these values with the ones that you currently obtain while mining! To compare your CPU, you must run the Benchmark Mode in the same conditions as these benchmarks (see above)!

Processor (memory) Architecture c/s r* b/d a Remarks or specific parameters
AMD Ryzen R9 5950X @ 4 GHz (DDR4 3200 CL14) Zen 3 46137.3 18.546 5.282 554.0 Extrapolated from 3700X using 19% IPC improvement over Zen 2. 35456.6 c/s extrapolated without SMT (speedup 1.301x).
Intel Core i7-10900K @ 4 GHz (DDR4 3200 CL14) Skylake 21162.5 18.546 2.422 472.4 Extrapolated using old rieMiner benchmarks for 6700K. HT speedup assumed to be 1.12x (18895.1 c/s).
AMD Ryzen R7 3700X @ 4 GHz (DDR4 3200 CL14) Zen 2 19385.4 18.546 2.219 465.6 rieMiner 0.92, 4 Sieve Workers. 14897.8 c/s for 8 Threads (3 Sieve Workers), meaning that the SMT speedup is about 1.301x.
AMD Ryzen R7 2700X @ 4 GHz (DDR4 3200 CL14) Zen+ 16446.4 18.546 1.882 395.0 Extrapolated from 3700X using old rieMiner benchmarks. 12639.2 c/s extrapolated without SMT (speedup 1.301x).
AMD Ryzen R7 1800X @ 4 GHz (DDR4 3200 CL14) Zen 15663.2 18.546 1.793 376.2 Extrapolated from 2700X assuming 5% IPC improvement over Zen. 12037.3 c/s extrapolated without SMT (speedup 1.301x).
Intel Core i7-5775C @ 4 GHz (DDR3 1600 CL8) Broadwell 7614.8 18.546 0.872 427.5 rieMiner 0.92, 2 Sieve Workers. 6839.5 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.113x.
Intel Core i7-4790K @ 4 GHz (DDR3 1600 CL8) Haswell 6406.5 18.546 0.733 369.1 rieMiner 0.92, 2 Sieve Workers. 5905.0 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.0849x.
Intel Core i7-3770K @ 4 GHz (DDR3 1600 CL8) Ivy Bridge 5910.4 18.546 0.677 327.9 rieMiner 0.92, 2 Sieve Workers. 5245.7 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.127x.
Intel Core i7-2700K @ 4 GHz (DDR3 1600 CL8) Sandy Bridge 5628.9 18.546 0.644 312.2 Extrapolated from 3770K assuming 5% IPC improvement over Sandy Bridge. 4995.9 c/s extrapolated without HT (speedup 1.127x).
AMD Phenom II X6 1100T @ 13 x 0.3 = 3.9 GHz (DDR3 1600 CL8) K10 6933.53 18.546 0.794 296.2 2 Sieve Workers, 24 Sieve Bits.
Intel Core i7-875K @ 4 GHz (DDR3 1600 CL8) Nehalem 4690.8 18.546 0.537 261.8 Extrapolated from 2700K assuming 20% IPC improvement over Nehalem. HT speedup assumed to be 1.12x (4188.2 c/s).
AMD Athlon 64 X2 6400+ @ 3.2 GHz (DDR2 800 CL5) K8 1498.2 18.546 0.172 234.1 23 Sieve Bits.
Intel Core 2 Quad QX9650 @ 4 GHz (DDR3 1600 CL8) Core 2 3707.1 18.546 0.424 231.7 rieMiner 0.92
AMD FX-8350 @ 13.5 x 0.3 = 4.05 GHz (DDR3 1600 CL8) Piledriver 7308.9 18.546 0.837 225.7 2 Sieve Workers.
Broadcom BCM2712 @ 2.4 GHz Cortex-A76 1653.2 18.546 0.189 172.2 Raspberry Pi 5, Raspberry Pi OS 64 bits, 24 Sieve Bits
Broadcom BCM2711 @ 1.6 GHz Cortex-A72 918.1 18.546 0.105 143.5 rieMiner 0.92, Raspberry Pi 4, rieMinerL, Raspberry Pi OS 64 bits, 23 Sieve Bits, 24 Sieve Iterations
Intel Pentium D 965 @ 4 GHz (DDR3 1067 CL6) Netburst 806.6 18.546 0.0492 65.4 24 Sieve Bits. 523.3 c/s for 2 Threads, meaning that the HyperThreading speedup is about 1.54x.
Intel Atom D525 @ 1.8 GHz (DDR3 800 CL6) Bonnell 294.1 18.546 0.0336 40.1 24 Sieve Bits. 144.4 c/s for 2 Threads, meaning that the HyperThreading speedup is about 2x!

Results for different memory speeds

We notice that memory speed does not matter much (despite rieMiner using a lot of memory) as much worse frequency and latency (DDR4 2400 CL18 vs 3200 CL14) is only about 3% slower.

Memory Speedc/sr*b/d
DDR4 3200 CL1419385.418.5462.219
DDR4 3200 CL1819025.118.5462.178
DDR4 2400 CL1419011.218.5462.176
DDR4 2400 CL1818794.418.5462.152

The prime table generation is more sensitive to memory performance (especially the frequency).

Memory SpeedPrime table generation time (s)
DDR4 3200 CL145.37404
DDR4 3200 CL185.63299
DDR4 2400 CL146.31868
DDR4 2400 CL186.55031

Results for Different Difficulties

The notable observation is that the ratio is proportional to the difficulty and follows the formula above. It also gives an idea about how the candidates/s metric depends on the difficulty, though the relation is difficult to establish. It can be approximated by the assumption that it is proportional to about D−2.2 to D−2.6 (D−2.3 is used in the Riecoin protocol).

Difficulty c/s r r* b/d Inverse c/s factor (logD1024)
8192 100.2 156.58 148.370 0.00000000547 197.537 (2.542)
6144 205.0 111.10 111.278 0.0000000839 96.541 (2.551)
4096 561.4 74.04 74.185 0.00000392 35.260 (2.570)
3072 1256.7 56.47 55.639 0.0000658 15.751 (2.509)
2048 3703.0 37.10 37.093 0.00331 5.346 (2.418)
1536 7909.3 27.75 27.819 0.0530 2.503 (2.263)
1024 19795.2 18.54 18.546 2.266 1.000

Results for Different Prime Table Limits

These benchmarks highlight the importance of the PrimeTableLimit parameter and that it is important to not just look at the candidates/s metric. They were run at Difficulty 2048 as there is no CPU Underuse with only 1 Sieve Worker in every case. The higher the PrimeTableLimit is, the lower is the ratio, but also the candidates per second.

PrimeTableLimit c/s r r* b/d
234 = 171798691843334.633.9533.8200.005693
233 = 85899345923536.934.8934.8440.004900
232 = 42949672963641.835.9635.9330.004068
231 = 21474836483703.737.1037.0930.003312
230 = 10737418243738.738.3838.3290.002658
224 = 167772163806.847.8147.9110.000567
216 = 655353843.171.9971.8490.000033

Despite the candidates/s being lower at higher difficulties, the blocks/days are better.

Results for Different Constellation Patterns

Length Pattern c/s r* b/d Remarks
5 0, 2, 6, 8, 12 3778.6 37.093 4.649
6 0, 4, 6, 10, 12, 16 3767.9 37.093 0.125
7 0, 2, 6, 8, 12, 18, 20 3703.7 37.093 0.00331
8 0, 2, 6, 8, 12, 18, 20, 26 3534.3 37.093 0.0000852
9 0, 2, 6, 8, 12, 18, 20, 26, 30 3002.0 37.093 0.00000195 3 Sieve Workers