2022年7月21日 星期四

What does the "lock" instruction mean in x86 assembly?

 


Code

#include <atomic>
#include <cassert>
#include <iostream>
#include <thread>
#include <vector>

std::atomic_ulong my_atomic_ulong(0);
unsigned long my_non_atomic_ulong = 0;
unsigned long my_arch_atomic_ulong = 0;
unsigned long my_arch_non_atomic_ulong = 0;
size_t niters;

void threadMain() {
for (size_t i = 0; i < niters; ++i) {
my_atomic_ulong++;
my_non_atomic_ulong++;
__asm__ __volatile__ (
"incq %0;"
: "+m" (my_arch_non_atomic_ulong)
:
:
);
__asm__ __volatile__ (
"lock;"
"incq %0;"
: "+m" (my_arch_atomic_ulong)
:
:
);
}
}

int main(int argc, char **argv) {
size_t nthreads;
if (argc > 1) {
nthreads = std::stoull(argv[1], NULL, 0);
} else {
nthreads = 2;
}
if (argc > 2) {
niters = std::stoull(argv[2], NULL, 0);
} else {
niters = 10000;
}
std::vector<std::thread> threads(nthreads);
for (size_t i = 0; i < nthreads; ++i)
threads[i] = std::thread(threadMain);
for (size_t i = 0; i < nthreads; ++i)
threads[i].join();
assert(my_atomic_ulong.load() == nthreads * niters);
assert(my_atomic_ulong == my_atomic_ulong.load());
std::cout << "my_atomic_ulong " << my_atomic_ulong << std::endl;
std::cout << "my_non_atomic_ulong " << my_non_atomic_ulong << std::endl;
assert(my_arch_atomic_ulong == nthreads * niters);
std::cout << "my_arch_atomic_ulong " << my_arch_atomic_ulong << std::endl;
std::cout << "my_arch_non_atomic_ulong " << my_arch_non_atomic_ulong << std::endl;
}


Results

g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o main.out main.cpp -pthread
./main.out 2 10000
ricky@ricky-gn41:~/play/playatomicadd$ ./main.out 2 10000
my_atomic_ulong 20000
my_non_atomic_ulong 19494
my_arch_atomic_ulong 20000
my_arch_non_atomic_ulong 19405
ricky@ricky-gn41:~/play/playatomicadd$ ./main.out 2 10000
my_atomic_ulong 20000
my_non_atomic_ulong 17804
my_arch_atomic_ulong 20000
my_arch_non_atomic_ulong 17977




End





2022年7月20日 星期三

MESI Protocol Cache Invalidate 2022

首页 / MESI缓存一致性协议详解 

存储器分成几个级别

Register, L1 Cache, L2 Cache, L3 Cache, Memory, SSD or HDD

L1P for program (sometimes L1I for L1 instruction)
L1D for data
L2 Cache is per CPU core (private)
L3 Cache is shared among core (public)

Bus

Memory 16GB

J4105

https://www.cpu-world.com/CPUs/Celeron/Intel-Celeron%20J4105.html

L1$ 256 KiB
L1I$ 128 KiB 4x32 KiB 8-way set associative
L1D$ 96 KiB 4x24 KiB 6-way set associative write-back
L2$ 4 MiB 1x4 MiB 16-way set associative write-back

Problem

If both core execute x=x+1, its private L1D data is not visible to other. Parallel computing causes x==1 not x==2.

Solution 1

给bus总线加锁,谁访问内存的数据,就给bus加上lock,这个时候不处理完谁也不能加载和改变内存的数据,也就会导致cpu间的阻塞。这个问题可以解决但是性能太低了。

Solution 2

MESI 缓存一致性协议了。他是对单个缓存行来进行加锁,不会影响内存中其他的数据的读写操作。

MESI

分别是M(modified)、E(exclusive)、S(shared)、I(invalid)。

M(修改):该Cache line有效,数据被修改了和内存中的数据不一致,但是仅存在本cache中。
E(独享):该Cache line有效,数据和内存一致,仅存在本cache中。
S(共享):该Cache line有效,数据存在多个缓存中。
I(无效):该Cache line无效。

四个状态,所以每个缓存行都有2个bit来进行表示当前Cache line的状态。

CPU缓存行


最常见的缓存行大小是64个字节。当多线程修改互相独立的变量时,如果这些变量共享同一个缓存行,就会无意中影响彼此的性能,这就是伪共享。

缓存是由缓存行组成的,通常是64字节,可以存8个long类型的变量。

Why Array? 得到免费缓存加载所带来的优势

请记住我们必须以整个缓存行作为单位来处理(译注:这是CPU的实现所规定的






as
dasdasdasd

Read


1. https://www.jianshu.com/ CPU缓存行 胖虎大哥
2. https://blog.csdn.net/qq_27680317/article/details/78486220
3. 一篇对伪共享、缓存行填充和CPU缓存讲的很透彻的文章 https://blog.csdn.net/qq_27680317/article/details/78486220
4. MESI 缓存一致性协议引发的一些思考 https://segmentfault.com/a/1190000040984124
5. Visualize MESI protocol https://www.scss.tcd.ie/Jeremy.Jones/vivio/caches/MESIHelp.htm
6. https://www.ics.uci.edu/~aburtsev/cs5460/lectures/lecture13-memory-ordering/lecture13-memory-barriers.pdf

TBC since 2022-07-20

1. LINUX KERNEL MEMORY BARRIERS Hard, not yet read all. https://www.kernel.org/doc/Documentation/memory-barriers.txt













More

End

2022年7月2日 星期六

C++11 Review on 2022 + C++14

 C++ Weekly EP 176 C++11 in 12 Minutes by Jason Turner 85.6K

1. auto

2. range based for loop

3. lambda [](int i){return i<3;}

4. variadic templates

template<class Pack>
Pack call(const Pack& v) {
return v;
}

template<class Pack, class... Packs>
Pack call(const Pack& pack, const Packs&... packs) {
std::cout << __PRETTY_FUNCTION__ << "\n";
return pack + call(packs...);
}

5. unique_ptr

6. constexpr int get_value() {return 5*3; }

That's all 6 of them.

But not mentioned

std::move
T&&
thread, mutex, condition_variable, packaged_task, std::future
std::tuple, chrono, decltype, override, nullptr, enum class, unordered_map, std::ref

C++ Weekly EP 178 C++14 in 9 Minutes by Jason Turner 85.6K

Like a bugfix to C++11

1. auto return type for template

template<typename T>
auto function(const T& vec, int value) {
const auto count = std::count(begin(vec), end(vec), value);
return count
}

2. lambda generic

const auto count = std::count(begin(vec), end(vec), [](const auto i){return i<3;});

3. lambda generailized capture

[value = [](){return 3;}](const auto i){return i<value;};

4. make_unique

auto pointer(std::make_unique<int>(5));

5. constexpr with loops, switch, multiple returns

constexpr int get_value() {
int x = 5;
int y = 3;
return x*y;  // C++14 more statements ok, C++11 only 1 statement
}

C++ Weekly EP 190 C++17 in 10 Minutes by Jason Turner 85.6K

1. Guaranteed copy or move elision

#include <memory>
auto factory() {return std::make_unqiue<int>();}
int main() {auto object=factory();}  // must have no copy here

2. non const constexpr reference

reference operator[](size_type pos);
constexpr reference operator[](size_type pos);
const_reference operator[](size_type pos) const;
constexpr const_reference operator[](size_type pos) const;

3. constexpr lambdas

constexpr auto l = [](){};

4. std::string_view

constexpr std::string_view name = "hello";




a

sfd

sadf

safs

df

sadf

sadf

sad

asdads















End

2023 Promox on Morefine N6000 16GB 512GB

2023 Promox on Morefine N6000 16GB 512GB Software Etcher 100MB (not but can be rufus-4.3.exe 1.4MB) Proxmox VE 7.4 ISO Installer (1st ISO re...