Benchmarking and Tuning

Here are the instructions to tune the rieMiner\'s parameters in order to get the best performance, and also includes indications about how to benchmark and compare values. We assume that you already read one of the other rieMiner\'s guide that explain how to mine or find a record.

Benchmarking

Comparing Riecoin mining performance is relatively difficult, and here is what you should know before comparing performance or tuning the settings. There are some benchmarks further below to have ideas about how a given computer should perform or examples of the following remarks.

Metrics

In rieMiner, the performance is based on two metrics,

The candidates/s c: how many candidates (numbers that could be the first member of a prime constellation) are generated and tested every second. Higher is better. The higher the Difficulty is, the lower the candidates/s will be for the same computing power
The ratio r: the ratio of candidates found to prime numbers. Lower is better, because that means that you will find more blocks for a same c during mining. The higher the Difficulty is, the larger the ratio will be (it is proportional). It is independent of the computing power.

If you are looking for k-tuples, you can calculate the k-tuple find rate (tuples per second) by doing c/r^k. So, multiplying this by 86400 will give the estimated average number of k-tuples every day. This is the relevant metric for comparing performance. Computing the inverse of this value gives the average time to find a k-tuple. This is how the time to find a block is estimated in rieMiner.

That means, in general, do not just consider the candidates/s metric! If it is lower after changing a setting (in particular, the PrimeTableLimit), it does not always mean that the mining performance was reduced. You must look at the k-tuple rate or average time to find one instead. Similarly, a lower candidates/s with a higher Difficulty does not mean that the mining performance is lower.

There are some specific situations where it is enough to consider the candidates/s. This is the case if you can guarantee that the ratio and the Difficulty are always the same across the different benchmarks.

Convergence

The performance metrics take some time to converge, so do not make conclusions too fast about the performance! Test actual mining or use the Benchmark Mode during 10-20 minutes or more. Testing during a couple of minutes will in general not be enough. Note that if you test during mining and more blocks are found, it will reduce the candidates/s a bit, so you might take this in account when comparing the metrics.

Benchmark Mode

You should use this mode in order to compare performance of different computers or settings. Indeed, measuring performance during mining is subject to the random block occurrences, which as said above affect the performance. The Benchmark Mode allows to do "dummy mining" with reproducible conditions and compare more easily performance.

Here is a template of the Benchmark Mode. This is for a benchmark at Difficulty 1024 during 16 minutes. Blocks will appear every 150 s.

Mode = Benchmark
Difficulty = 1024
BenchmarkBlockInterval = 150
BenchmarkTimeLimit = 960
BenchmarkPrimeCountLimit = 0
# PrimorialNumber = 70

You must reproduce the current mining conditions (put the current Difficulty). You should also use the same PrimorialNumber as the one used in mining (the guessed value is slightly different between the modes).

The Search Mode is an alternative for benchmarking, but it is less reproducible and does not propose dummy blocks. In the other direction, do not use the Benchmark Mode to find new records!

Tuning

Relevant configuration options

The options that can play a role to the mining performance are PrimeTableLimit, SieveWorkers, SieveBits and SieveIterations. Threads can also be used to reduce the number of threads if wanted. Here is a template (to append to the templates from the other guides or the Benchmark template above).

Threads = 0
PrimeTableLimit = 0
SieveWorkers = 0
SieveBits = 0
SieveIterations = 0

You can learn what these settings actually mean by reading the mining algorithm explanation. 0 is a special value that makes an initial but rough guess. Start the miner once with the automatic settings and report the guessed values, shown at the beginning. Then, you can use these values as starting points and tune the parameters like explained below and progressively fill the configuration file with manual values.

PrimeTableLimit and SieveWorkers

They are the main parameters for rieMiner tuning. Generally,

Higher PrimeTableLimit is better until a certain point, though increasing this will also increase the memory usage and may cause CPU Underuse. When increasing the PrimeTableLimit, the candidates/s metric will be lower, but the ratio too. So, do not assume that the mining is slower due to a lower candidates/s: you must use the estimated time to find a block instead like explained above.
Less SieveWorkers is better, as more will increase the memory usage and reduce the candidates/s a bit. However, there is a required minimum, as not having enough SieveWorkers will cause CPU Underuse.

To tune them, first look at your CPU usage during mining. It should be maxed out most of the times. If not, then you are experiencing CPU Underuse. For example, the CPU usage graph of the Windows 10\'s Task Manager may look like this:

No Cpu Underuse Example — No CPU Underuse (if there are occasional drops, especially when a new block appears, it is fine).

If there is no CPU Underuse, try both, not in a particular order (you can use your intuition after few tries):

If you have available free memory, increase the PrimeTableLimit until you get some CPU Underuse, run out of memory, or lose performance.
Try to decrement the SieveWorkers until there is CPU Underuse.

If there is CPU Underuse, do the inverse operations.

Repeat the process until you feel that the settings are optimal. In all cases, it is trial and error and there is no precise quantity to increase or decrease. Multiply or divide the PrimeTableLimit by something like 1.1, 1.5, 2, 3 or something else. But you should vary the SieveWorkers only by steps of 1 or 2.

Other parameters

SieveBits: higher is better until a certain point, but normally, 25 is already a good value. If you have a CPU with less than 8 MiB of L3 Cache, or have a lot of SieveWorkers (more than 4), you can try to decrement this. If you have a lot of L3 Cache (for example with a server CPU), you may also try 26.
SieveIterations: normally, 16 is a good value and you should not have to touch this. It is unclear how this affects performance. You can try to change the value a bit and see if there is any improvement. Smaller values will reduce memory usage.

If you change these values, you should try to retune the PrimeTableLimit and SieveWorkers to see if you can still gain more performance.

Remarks for record attempters

The instructions above are also valid for those using the Search Mode or mining for records. Here are few additional remarks in these cases:

The longer the constellation pattern length is, the lower the PrimeTableLimit should be. While it could be well over billions 5-tuples and shorter, it should not exceed a few millions or tens of millions for 10 (at Difficulty ~540) and 9-tuples (~725) for example.
Longer tuples will also usually require a lot of SieveWorkers, do not be surprised if you need to raise a lot this number. However, you cannot by default use more than 64 Sieve Workers. If you need more, you will have to add manually some PrimorialOffsets in the options, though in that case you should rather look for shorter tuples.

Benchmark Results

This section shows some rieMiner benchmark results in order to help comparing different processors, provide an idea on how to tune the parameters, or highlight some observations about current Riecoin mining.

Except when mentioned, an AMD Ryzen R7 3700X was used for the benchmarks, using all the 16 threads, and default settings were used; the constellation pattern is 0, 2, 4, 2, 4, 6, 2 (7-tuples). The benchmarks were done during a Debian 10 Live USB session. rieMiner was recompiled for the machine during the live session just before the benchmarks.

Ratios and blocks per day

Before showing the actual results, it is worth to remind as mentioned above that the ratio is an essential metric of Riecoin mining, the candidates/s metric alone does not mean much usually. Due to how the mining algorithm is constructed, it is actually possible to compute it using the formula

r^{*}=\log(2^{D})\prod _{p=2{\text{ prime}}}^{L}{\frac {p-1}{p}}=D\log(2)\prod _{p=2{\text{ prime}}}^{L}{\frac {p-1}{p}}

D

is the Difficulty (searched numbers will be around 2^D, L the Prime Table Limit, log(2) ≈ 0.69314718056.

It is not obvious in normal circumstances that the ratios between k and (k + 1)-tuples counts or rates are the same for any k, though the tendency may be observed after long mining sessions or if generating very large numbers of tuples in a benchmark. Here are some values of the product for various PrimeTableLimits.

L	Product
2³⁵ = 34359738368	0.0231432770
2³⁴ = 17179869184	0.0238239564
2³³ = 8589934592	0.0245458897
2³² = 4294967296	0.0253129494
2³¹ = 2147483648	0.0261294878
2³⁰ = 1073741824	0.0270004472

Calculated ratios will be used in the benchmarks below. The blocks/day for k-tuplets is then given by

{\text{Blocks per day}}=86400{\frac {\text{Candidates per second}}{{r^{*}}^{k}}}

Results for different processors

Here are benchmarks with different CPUs.

rieMiner 0.93 except if mentioned otherwise
Difficulty 1024
150 s Block Interval, during 16 minutes
Prime Table Limit 2³¹. By default, 1 Sieve Worker and 25 Sieve Bits
Using the calculated ratio r^* ≈ 1024*log(2)*0.0261294878 ≈ 18.546259

The turbo/boost features were disabled and the CPU always ran at the mentioned frequency.

a is a normalized metric, and corresponds to the candidates/s without HT/SMT divided by the number of cores and the GHz, yielding a result that can be interpreted as the architecture performance (speed of a single core at 1 GHz for this benchmark). This number is useful to make Riecoin profitability calculators as various processors with the same architecture should have a similar a. The list is sorted by this metric.

Highlighted lines are benchmarks done with actual hardware. Others were extrapolated. Do not compare these values with the ones that you currently obtain while mining! To compare your CPU, you must run the Benchmark Mode in the same conditions as these benchmarks (see above)!

Processor (memory)	Architecture	c/s	r*	b/d	a	Remarks or specific parameters
AMD Ryzen R9 5950X @ 4 GHz (DDR4 3200 CL14)	Zen 3	46137.3	18.546	5.282	554.0	Extrapolated from 3700X using 19% IPC improvement over Zen 2. 35456.6 c/s extrapolated without SMT (speedup 1.301x).
Intel Core i7-10900K @ 4 GHz (DDR4 3200 CL14)	Skylake	21162.5	18.546	2.422	472.4	Extrapolated using old rieMiner benchmarks for 6700K. HT speedup assumed to be 1.12x (18895.1 c/s).
AMD Ryzen R7 3700X @ 4 GHz (DDR4 3200 CL14)	Zen 2	19385.4	18.546	2.219	465.6	rieMiner 0.92, 4 Sieve Workers. 14897.8 c/s for 8 Threads (3 Sieve Workers), meaning that the SMT speedup is about 1.301x.
AMD Ryzen R7 2700X @ 4 GHz (DDR4 3200 CL14)	Zen+	16446.4	18.546	1.882	395.0	Extrapolated from 3700X using old rieMiner benchmarks. 12639.2 c/s extrapolated without SMT (speedup 1.301x).
AMD Ryzen R7 1800X @ 4 GHz (DDR4 3200 CL14)	Zen	15663.2	18.546	1.793	376.2	Extrapolated from 2700X assuming 5% IPC improvement over Zen. 12037.3 c/s extrapolated without SMT (speedup 1.301x).
Intel Core i7-5775C @ 4 GHz (DDR3 1600 CL8)	Broadwell	7614.8	18.546	0.872	427.5	rieMiner 0.92, 2 Sieve Workers. 6839.5 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.113x.
Intel Core i7-4790K @ 4 GHz (DDR3 1600 CL8)	Haswell	6406.5	18.546	0.733	369.1	rieMiner 0.92, 2 Sieve Workers. 5905.0 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.0849x.
Intel Core i7-3770K @ 4 GHz (DDR3 1600 CL8)	Ivy Bridge	5910.4	18.546	0.677	327.9	rieMiner 0.92, 2 Sieve Workers. 5245.7 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.127x.
Intel Core i7-2700K @ 4 GHz (DDR3 1600 CL8)	Sandy Bridge	5628.9	18.546	0.644	312.2	Extrapolated from 3770K assuming 5% IPC improvement over Sandy Bridge. 4995.9 c/s extrapolated without HT (speedup 1.127x).
AMD Phenom II X6 1100T @ 13 x 0.3 = 3.9 GHz (DDR3 1600 CL8)	K10	6933.53	18.546	0.794	296.2	2 Sieve Workers, 24 Sieve Bits.
Intel Core i7-875K @ 4 GHz (DDR3 1600 CL8)	Nehalem	4690.8	18.546	0.537	261.8	Extrapolated from 2700K assuming 20% IPC improvement over Nehalem. HT speedup assumed to be 1.12x (4188.2 c/s).
AMD Athlon 64 X2 6400+ @ 3.2 GHz (DDR2 800 CL5)	K8	1498.2	18.546	0.172	234.1	23 Sieve Bits.
Intel Core 2 Quad QX9650 @ 4 GHz (DDR3 1600 CL8)	Core 2	3707.1	18.546	0.424	231.7	rieMiner 0.92
AMD FX-8350 @ 13.5 x 0.3 = 4.05 GHz (DDR3 1600 CL8)	Piledriver	7308.9	18.546	0.837	225.7	2 Sieve Workers.
Broadcom BCM2712 @ 2.4 GHz	Cortex-A76	1653.2	18.546	0.189	172.2	Raspberry Pi 5, Raspberry Pi OS 64 bits, 24 Sieve Bits
Broadcom BCM2711 @ 1.6 GHz	Cortex-A72	918.1	18.546	0.105	143.5	rieMiner 0.92, Raspberry Pi 4, rieMinerL, Raspberry Pi OS 64 bits, 23 Sieve Bits, 24 Sieve Iterations
Intel Pentium D 965 @ 4 GHz (DDR3 1067 CL6)	Netburst	806.6	18.546	0.0492	65.4	24 Sieve Bits. 523.3 c/s for 2 Threads, meaning that the HyperThreading speedup is about 1.54x.
Intel Atom D525 @ 1.8 GHz (DDR3 800 CL6)	Bonnell	294.1	18.546	0.0336	40.1	24 Sieve Bits. 144.4 c/s for 2 Threads, meaning that the HyperThreading speedup is about 2x!

Results for different memory speeds

We notice that memory speed does not matter much (despite rieMiner using a lot of memory) as much worse frequency and latency (DDR4 2400 CL18 vs 3200 CL14) is only about 3% slower.

Difficulty 1024
Prime Table Limit 2³¹. 4 Sieve Workers, 150 s Block Interval, during 16 minutes
Using the calculated ratio r^* ≈ 18.546259

Memory Speed	c/s	r*	b/d
DDR4 3200 CL14	19385.4	18.546	2.219
DDR4 3200 CL18	19025.1	18.546	2.178
DDR4 2400 CL14	19011.2	18.546	2.176
DDR4 2400 CL18	18794.4	18.546	2.152

The prime table generation is more sensitive to memory performance (especially the frequency).

Memory Speed	Prime table generation time (s)
DDR4 3200 CL14	5.37404
DDR4 3200 CL18	5.63299
DDR4 2400 CL14	6.31868
DDR4 2400 CL18	6.55031

Results for Different Difficulties

The notable observation is that the ratio is proportional to the difficulty and follows the formula above. It also gives an idea about how the candidates/s metric depends on the difficulty, though the relation is difficult to establish. It can be approximated by the assumption that it is proportional to about D^−2.2 to D^−2.6 (D^−2.3 is used in the Riecoin protocol).

Difficulty	c/s	r	r*	b/d	Inverse c/s factor ( $\log_{\frac{D}{1024}}$ )
8192	100.2	156.58	148.370	0.00000000547	197.537 (2.542)
6144	205.0	111.10	111.278	0.0000000839	96.541 (2.551)
4096	561.4	74.04	74.185	0.00000392	35.260 (2.570)
3072	1256.7	56.47	55.639	0.0000658	15.751 (2.509)
2048	3703.0	37.10	37.093	0.00331	5.346 (2.418)
1536	7909.3	27.75	27.819	0.0530	2.503 (2.263)
1024	19795.2	18.54	18.546	2.266	1.000

Results for Different Prime Table Limits

These benchmarks highlight the importance of the PrimeTableLimit parameter and that it is important to not just look at the candidates/s metric. They were run at Difficulty 2048 as there is no CPU Underuse with only 1 Sieve Worker in every case. The higher the PrimeTableLimit is, the lower is the ratio, but also the candidates per second.

Difficulty 2048
1 Sieve Worker, no blocks, during 15 minutes
r is the ratio, r* the calculated ratio, the latter is used to calculate the blocks/day

PrimeTableLimit	c/s	r	r*	b/d
2³⁴ = 17179869184	3334.6	33.95	33.820	0.005693
2³³ = 8589934592	3536.9	34.89	34.844	0.004900
2³² = 4294967296	3641.8	35.96	35.933	0.004068
2³¹ = 2147483648	3703.7	37.10	37.093	0.003312
2³⁰ = 1073741824	3738.7	38.38	38.329	0.002658
2²⁴ = 16777216	3806.8	47.81	47.911	0.000567
2¹⁶ = 65535	3843.1	71.99	71.849	0.000033

Despite the candidates/s being lower at higher difficulties, the blocks/days are better.

Results for Different Constellation Patterns

Difficulty 2048
No blocks, during 15 minutes
Prime Table Limit 2³¹. By default, 1 Sieve Worker, 25 Sieve Bits
Using the calculated ratio r^* ≈ 2048*log(2)*0.0261294878 ≈ 37.092517

Length	Pattern	c/s	r*	b/d	Remarks
5	0, 2, 6, 8, 12	3778.6	37.093	4.649
6	0, 4, 6, 10, 12, 16	3767.9	37.093	0.125
7	0, 2, 6, 8, 12, 18, 20	3703.7	37.093	0.00331
8	0, 2, 6, 8, 12, 18, 20, 26	3534.3	37.093	0.0000852
9	0, 2, 6, 8, 12, 18, 20, 26, 30	3002.0	37.093	0.00000195	3 Sieve Workers