Sunday, January 12, 2020

Rust channel vs Go channel

There is an interesting article demonstrating how to write channel using Rust:

Multithreading in Rust with MPSC (Multi-Producer, Single Consumer) channels

I rewrote this code in Go lang to see the coding style difference and the runtime performance.

Go channel is mixing the notion of sender/receiver into a single channel. the same channel is used to send the message(data), and receiving the message.
Also termination condition are using special library wg.  

I definitely like the clean data race free Rust approach.

For the performance, as expected, Rust is faster than GO. but not so significantly.
But this is a rather simple case, other kind of processes which allocate from heap in Go, there might be more significant difference.

for DIFICULTY is '0000000', GO: 12 sec Rust: 9.7 sec

GO sample:
BASE: 42, THREADS 8, DIFFICULTY: 0000000
&{7 50443823 411e3c717da473d023d6c5aa11d330ffed3fd4c641bd75eafcc779b5e0000000}

[Done] exited with code=0 in 11.982 seconds
RUST sample:
    Finished release [optimized] target(s) in 0.03s
     Running `target\release\mpsc-crypto-mining.exe`
Attempting to find a number, which - while multiplied by 42 and hashed using SHA-256 - will result in a hash ending with 0000000.
Please wait...
Found the solution.
The number is: 50443823.
Result hash: 411e3c717da473d023d6c5aa11d330ffed3fd4c641bd75eafcc779b5e0000000.

real    0m9.661s

-----

for  DIFICULTY is '00000004':GO: 1551 sec Rust: 1444 sec

GO sample:
BASE: 42, THREADS 8, DIFFICULTY: 00000004
&{0 6829102344 aa60ef885f7d41903661d03a55aca85ae195fdb63bb1b4cbc03e804d00000004}

[Done] exited with code=0 in 1551.796 seconds
RUST sample:
$ time cargo run --release
   Compiling mpsc-crypto-mining v0.1.0 (C:\Users\nnaka\rust_projects\mpsc-crypto-mining)
    Finished release [optimized] target(s) in 1.34s
     Running `target\release\mpsc-crypto-mining.exe`
Attempting to find a number, which - while multiplied by 42 and hashed using SHA-256 - will result in a hash ending with 00000004.
Please wait...
Found the solution.
The number is: 6829102344.
Result hash: aa60ef885f7d41903661d03a55aca85ae195fdb63bb1b4cbc03e804d00000004.

real    24m4.707s

----

Go code:

package main

import (
 "crypto/sha1"
 "encoding/hex"
 "fmt"
 "strconv"
 "strings"
 "sync"
)

var BASE = 42
var THREADS = 8
var DIFFICULTY = "0000000" // 7 digit

func compute_sha256(number int) string {
 s := strconv.Itoa(number * BASE)
 h := sha256.New()
 h.Write([]byte(s))
 return fmt.Sprintf("%x", h.Sum(nil))
}

type Solution struct {
 start_at int
 x        int
 hash     string
}

func create_solution() chan *Solution {
 solution := make(chan *Solution)
 return solution
}

func verify_number(start_at int, number int) *Solution {
 hash := compute_sha256(number)
 if strings.HasSuffix(hash, DIFFICULTY) {
  return &Solution{start_at: start_at, x: number, hash: hash}
 } else {
  return nil
 }
}

func search_for_solution(start_at int, sender chan *Solution) {
 for number := start_at; ; number += THREADS {
  var solution = verify_number(start_at, number)
  if solution != nil {
   sender < - solution
   return
  }
 }
}

func main() {
 sender := create_solution()
 go func() {
  defer close(sender)
  var wg sync.WaitGroup
  wg.Add(1)

  for i := 0; i < THREADS; i++ {
   go func(start_at int) {
    defer wg.Done()
    search_for_solution(start_at, sender)
   }(i)
  }
  wg.Wait()
 }()

 for i := range sender { // infos_reveiver.recv()
  fmt.Printf("BASE: %d, THREADS %d, DIFFICULTY: %s\n", BASE, THREADS, DIFFICULTY)
  fmt.Println(i)
 }
}

Test Machine:

I used my laptop I purchced recently: Lenovo Flex 14 2-in-1 Convertible Laptop, 14 Inch FHD Touchscreen Display, AMD Ryzen 5 3500U Processor, 12GB DDR4 RAM, 256GB NVMe SSD, Windows 10.( i did not know AMD will release Ryzen 4000H, but this may not be so bad buy.)

This test was also useful to know the performance of this CPU. When we run this with 8 threads, it utilize all CPU threads. when we run 4 threads version, it uses mainly 4 threads. and it takes 15 sec vs 12 (for 8 threads). Although actual core number  is 4, 8 threads version is faster than 4 threads version. but not twice faster..


BASE: 42, THREADS 4, DIFFICULTY: 0000000
&{3 50443823 411e3c717da473d023d6c5aa11d330ffed3fd4c641bd75eafcc779b5e0000000}

[Done] exited with code=0 in 15.137 seconds

And when running full threads, the frequency is boosted to 2.8 GHz. not so fast..





So if we use Threadripper 3990X, this may be done 16X(4/3)=21 times faster.
I wonder which version, i.e,  GPU accelerated version vs multicore version will provide faster result.


No comments:

Post a Comment

Recursive Matrix and the parallel matrix multiplication using crossbeam and generic constant

This was planned project I posted before. Basically in order to evaluate Rust's claim for zero cost abstraction and the effectiveness o...