Luna_12B_V.3

This is a merge of pre-trained language models.

I'm tired of Mistral 24B, but sadly, it seems there are no consumer-grade models as good for roleplaying as it is.

However, some smaller models are worth trying. 12B Gemma is one of them.

The goal of this merge was to create a non-thinking, all-around model for everyday use, primarily on RU.

Model is capable of roleplaying and is not overly censored. (Extreme stuff wasn't tested, in my use scenarios, i haven't encountered refusals or strong indirect censorship.)

In rp, the model is able to act as an antagonist or describe harmful content.

The model works well as an assistant. I tested formatted responses, document citation, etc. (For this usage, a low temperature is better, around T0.2.)

The model is good at context attention and perceiving small details such as part numbers or dates. In a roleplaying, context attention is comparable to 24B Mistral.

RU is excellent. While RU RP was tested briefly, the assistant on RU was tested extensively. For a 12B model, the performance is great.

Of course, it's only a 12B model, with limitations and problems corresponding to its size, but for a 12B model, it works very well.

Vision is present. Works, but vision isn't my thing so i can't tell more.

I've noticed a small repetition problem when the model gives one long answer. (The context becomes poisoned by a long, monotonous answer, and because of the excellent attention, the model starts to repeat itself more and more.) It's probably fixable with a repetition penalty setting, but I haven't figured out the sweet spot where it works without breaking the response too much. On shorter answers this problem didn't occured.

Tested on the GemmaT4 preset, modified Shingane sysprompt T0.8 - 1.04 for roleplaying; a custom assistant sysprompt, T0.21 for work.

Also tested by chance on rx 6600xt and on gtx 1060 6gb (with offload ofc), it works and speed was even bearable.