BlinkDL/RWKV-CUDA

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

CudaPythonC++
This is stars and forks stats for /BlinkDL/RWKV-CUDA repository. As of 03 May, 2024 this repository has 135 stars and 27 forks.

RWKV-CUDA The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM ) Towards RWKV-4 (see the wkv folder) I have a basic RWKV-4 kernel in the wkv folder. Let's optimize it. Experiment 1 - depthwise_conv1d - 20x faster than pytorch The formula: w.shape = (C, T) k.shape = (B, C, T) out.shape = (B, C, T) out[b][c][t] = sum_u{ w[c][(T-1)-(t-u)] * k[b][c][u] } pytorch = fwd 94ms bwd 529ms CUDA kernel v0 = fwd 45ms bwd 84ms (simple) CUDA kernel v1 = fwd 17ms bwd 43ms (shared memory) CUDA...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
dbsystel/jl23-rp2040GroovyShellCSS4010
michigan-traffic-lab/Dense-Deep-Reinforcement-LearningJupyter NotebookPython2370370
hikariming/alpaca_chinese_datasetJupyter NotebookPython9700780
Mitek-Systems/MiSnap-iOSCObjective-CSwift9060
openai/chatgpt-retrieval-pluginPythonOther19.8k03.6k0
cisagov/untitledgoosetoolPythonPowerShell8390690
gururise/AlpacaDataCleanedPythonHTMLJavaScript1.3k01330
binary-husky/chatgpt_academicPythonCSSOther42.9k+4375.6k+41
sahil280114/codealpacaPython1.3k0960
feizc/MLE-LLaMAPython2920190