This is stars and forks stats for /BlinkDL/RWKV-CUDA repository. As of 03 May, 2024 this repository has 135 stars and 27 forks.
RWKV-CUDA The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM ) Towards RWKV-4 (see the wkv folder) I have a basic RWKV-4 kernel in the wkv folder. Let's optimize it. Experiment 1 - depthwise_conv1d - 20x faster than pytorch The formula: w.shape = (C, T) k.shape = (B, C, T) out.shape = (B, C, T) out[b][c][t] = sum_u{ w[c][(T-1)-(t-u)] * k[b][c][u] } pytorch = fwd 94ms bwd 529ms CUDA kernel v0 = fwd 45ms bwd 84ms (simple) CUDA kernel v1 = fwd 17ms bwd 43ms (shared memory) CUDA...
RWKV-CUDA The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM ) Towards RWKV-4 (see the wkv folder) I have a basic RWKV-4 kernel in the wkv folder. Let's optimize it. Experiment 1 - depthwise_conv1d - 20x faster than pytorch The formula: w.shape = (C, T) k.shape = (B, C, T) out.shape = (B, C, T) out[b][c][t] = sum_u{ w[c][(T-1)-(t-u)] * k[b][c][u] } pytorch = fwd 94ms bwd 529ms CUDA kernel v0 = fwd 45ms bwd 84ms (simple) CUDA kernel v1 = fwd 17ms bwd 43ms (shared memory) CUDA...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
dbsystel/jl23-rp2040 | GroovyShellCSS | 4 | 0 | 1 | 0 |
michigan-traffic-lab/Dense-Deep-Reinforcement-Learning | Jupyter NotebookPython | 237 | 0 | 37 | 0 |
hikariming/alpaca_chinese_dataset | Jupyter NotebookPython | 970 | 0 | 78 | 0 |
Mitek-Systems/MiSnap-iOS | CObjective-CSwift | 9 | 0 | 6 | 0 |
openai/chatgpt-retrieval-plugin | PythonOther | 19.8k | 0 | 3.6k | 0 |
cisagov/untitledgoosetool | PythonPowerShell | 839 | 0 | 69 | 0 |
gururise/AlpacaDataCleaned | PythonHTMLJavaScript | 1.3k | 0 | 133 | 0 |
binary-husky/chatgpt_academic | PythonCSSOther | 42.9k | +437 | 5.6k | +41 |
sahil280114/codealpaca | Python | 1.3k | 0 | 96 | 0 |
feizc/MLE-LLaMA | Python | 292 | 0 | 19 | 0 |