Global Memory
The main, large memory space on a GPU accessible by all threads
Global memory is the largest memory space on a GPU (i.e., VRAM). It is accessible by all threads across all blocks, but has higher latency compared to shared memory or registers.
Example
// Allocate and copy data to global memory
float *d_data;
cudaMalloc(&d_data, n * sizeof(float));
cudaMemcpy(d_data, h_data, n * sizeof(float), cudaMemcpyHostToDevice);
Key Characteristics
- Size: Gigabytes (matches VRAM capacity)
- Latency: ~400-800 cycles
- Accessible by: All threads in all blocks
- Coalesced access (consecutive threads reading consecutive addresses) is critical for performance