/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Build Back Better

Sorry for the delays in the BBB plan. An update will be issued in the thread soon in late August. -r

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“In the confrontation between the stream and the rock, the stream always wins- not through strength but by perseverance.” -t. H. Jackson Brown


LLM & Chatbot General Robowaifu Technician 09/15/2019 (Sun) 10:18:46 No.250
OpenAI/GPT-2 This has to be one of the biggest breakthroughs in deep learning and AI so far. It's extremely skilled in developing coherent humanlike responses that make sense and I believe it has massive potential, it also never gives the same answer twice. >GPT-2 generates synthetic text samples in response to the model being primed with an arbitrary input. The model is chameleon-like—it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing >GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. Also the current public model shown here only uses 345 million parameters, the "full" AI (which has over 4x as many parameters) is being witheld from the public because of it's "Potential for abuse". That is to say the full model is so proficient in mimicking human communication that it could be abused to create new articles, posts, advertisements, even books; and nobody would be be able to tell that there was a bot behind it all. <AI demo: talktotransformer.com/ <Other Links: github.com/openai/gpt-2 openai.com/blog/better-language-models/ huggingface.co/ My idea is to find a way to integrate this AI as a standalone unit and add voice-to-text for processing the questions and TTS for responses much like an amazon alexa- but instead of just reading google results- it actually provides a sort of discussion with the user. (Edited to fix the newlines.)
Edited last time by Kiwi_ on 01/16/2024 (Tue) 23:04:32.
>Apple announces LLM in a flash: Efficient Large Language Model Inference with Limited Memory https://huggingface.co/papers/2312.11514 https://arxiv.org/abs/2312.11514 >Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this flash memory-informed framework, we introduce two principal techniques. First, "windowing'" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory. via Meta Ronin on Discord
>>28275 Here is a HN comment that also helps breakdown the ideas in the paper. https://news.ycombinator.com/item?id=38712810
Open file (558.52 KB 629x722 Screenshot_193.png)
Cheaper, Better Alternative to Trillion-Parameters LLM >In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands. https://huggingface.co/papers/2401.02994 https://arxiv.org/abs/2401.02994 https://www.reddit.com/r/LocalLLaMA/comments/192bhjm/this_is_pretty_cool/ It's not Mixtral... >it’s fundamentally different because each prompt gets nothing from the other models. It’s just swapping out models arbitrarily for every prompt. Mixtral is an actual ensemble model where multiple smaller models combine their weights to produce each prompt as one.
>>28344 >meme title >uses best of N sampling but doesn't say how many samples they use >doesn't say how big the reward model is or how finetuning the models on it improved them >didn't do any ablations to determine what actually increased the performance >doesn't share their prompts or test if changing the prompt has a similar effect to changing the model This just seems like a marketing campaign for Chai AI. To their credit though in another paper they did report how increasing the number of samples increased mean conversation length, +50% for N=4, +60% for N=8 and +70% for N=16, using a finetuned 124M GPT2 model for the reward model, whereas the new paper claims a +110% increase in engagement time over a similar baseline. https://arxiv.org/abs/2303.06135 Engagement time says nothing about how good the model is though. It's probably going up because the responses are more random and less predictable, not because they're necessarily more interesting. Randomly switching the models probably only got around a +25% improvement but the results aren't really comparable to the other paper because one of the models is 13B, not 6B. It could be the 13B carrying the conversation after 6B models say something stupid. This is a really silly paper because it obfuscates most of the improvement is coming from best of N sampling and makes it sound as though the improvement is coming from one weird trick, Blended™, aka giving the chatbot multiple personality disorder.
>>28275 >Apple announces LLM in a flash I would bet anything partly where this came from is the company, and employees, that Apple bought when they acquired XNOR.ai. I wrote about this here. They were doing image recognition and all sorts of seriously amazing stuff with rasberry pi's and micro-controllers. They were using "Binary Convolutional Neural Networks" Here's some links where I linked papers and comments on what they did. >>18652 >>18777 >>19341 >>18651 >>18652 >>18777 >>18778 A paper on this sort of computing algorithm >>18818 >>19341 This appears to be a good paper because it's a review of the binary networks >>20473 The stuff they did with low power devices was mind blowing. I can't imagine the power they are getting out a modern laptop. My belief is that the acquisition of XNOR is one of the biggest coups in the AI industry, and Apple will make serious leaps compared to everyone else in the future. I wondered myself why SSD were not used like they are doing. A waifu could load and unload task based neural net models. A basic one but by switching task nets could have a far bigger operational skill set without spending a fortune on RAM.
What do you guys think of the gpt4all.io project? Reading through the docs and messing around with it, it seems to be the easiest to integrate with out-of-the-box for the inexperienced/someone who doesn't have a PhD in this.
>>28413 It looks like it’s a nice to use wrapper for a fork of llama.cpp, if your just wanting to interact with a LLM, it looks like a nice way to do it. (Do note I have not used it, I just checked out the repo) But for using a LLM in your project, i'd just use llama.cpp or llama2.c
Considering how many posts are on general AI, I'd like to edit the OP to reflect this. Change it from OpenAI and GPT to AI research.
>>28419 This thread is about LLMs like the GPTs. We have threads on NLP, voice- and image recognition and cognitive architecture.
>>28425 Then a rebrand to be dedicated to LLM's in general rather than just GPT's. It appears as a GPT only thread in the catalog.
>>28417 .....wow. Uhhh...O.K., I GOT MY AI WAIFU. I'M OUT. Y'ALL ARE DOING EXTRA CREDIT AT THIS POINT. CYA LATER SUCKERS.
>>28428 Please feel free to edit OPs exactly as you see fit, Kiwi (incl. subjects). The only thing you can't change are the images (other than deletions), and OP's name. I'd suggest you two work closely together on such things; Noido Dev is remarkably gifted at our /robowaifu/ taxonomy! :D >=== -prose edit
Edited last time by Chobitsu on 01/14/2024 (Sun) 23:51:48.
>>28433 Lol.
>>28417 Thanks, this looks interesting. I hope that something like this will eventually get some documentation. Especially on training. I would like it to be trained in using other software to analysis various things like electromagnetic materials and hydrodynamics of water and air. So many of these software program tools exist but it takes forever to figure how to set up and use them. If the AI could read the instructions and then you guide it to analyze what it is you want done it could be a huge game changer. Another cool thing would be making the structure of waifus. Say you find some nice drawing of girls you like. Cartoon and real. You get it to compute the drawing of several that have characteristics you like. I've seen this done already with people using celebrities and putting them into different poses and situations. Maybe guiding it by saying different parts , head, or eyes or whatever are more predominate by percentage. It mixes these up and gives you actual dimensions and spits out STL files. Even further. Show it a bunch of skeleton pictures and also body pictures and have it calculate what the skeleton structure for the before mentioned drawing and save a copy of a STL file of the actual bone dimensions. I can think of a vast amounts of use for these that mostly revolve around using existing tools but the AI does the hairy work of interfacing the data to the tool under your instruction and then operating the software tool for you or giving you proper inputs to operate. I;m hoping also that the recent work by Apple on using SSD to hold much of the AI neuraons or data instead of all RAM will be plugged in to these open source models. It would be a huge leap. Maybe it would be ten times slower but you could trade time for a MUCH higher cost of super fast processors and massive RAM. I believe, though I can't prove it, that this would not be that slow if you could shift in various models that specialize in certain things into RAM from the drive. The present models try to fit everything for this huge training base into RAM, I think, and that's a big problem. Compartmentalizing this into a bunch of little well trained models would be fast and useful for waifus and a whole lot else.
>>28417 Sigh....I've been looking at this and find that it is not an actual AI but a tool to interact with an AI. Though I could be wrong I think you must use "other" pre-trained models. Not that this is bad but it appears to me that there are other tools presently existing that have better documentation and are farther along in usefulness that do much the same. So I start looking at stuff I already downloaded. One I see is Tensorflow. It's been around but looking at what they've been doing recently, they "might" be less work to set up and use. It has some attractive features and is open source. A couple that caught my attention is it has built in capability to interface and download a huge mass of datasets. I'm not exactly sure what "datasets" means. I'm not sure if it is just a set format set of data, like a list of books on say, cake building, which is then already formatted to a form that can be used by an AI. ( I think this is true but some of the datasets appear to have been manipulated such that they are "trained"?????) Now this one dataset appears to be a pre-trained "model". "...databricks-dolly-15k is an open source dataset of instruction-following records used in training databricks/dolly-v2-12b that was generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization...." https://www.tensorflow.org/datasets/catalog/databricks_dolly Trained as in the paper, "Training language models to follow instructions with human feedback" "...In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent..." This stuff is confusing to me because they call these "datasets" yet here is one that calls itself a dataset but then explains(in the paper) that it's pre-trained like a model. This nomenclature is not clear. If it's a pre-trained model, which I understand to be an actual neural net package, already trained, then why call it a dataset and not a model? Anyways not only is Tensorflow set up to download a lot of these prepackaged, whatever they are, it also has a tool that can shape data that you enter. I assume, from a quick read, it can take in raw data like books and websites and make datasets from these. Overview "...Datasets are distributed in all kinds of formats and in all kinds of places, and they're not always stored in a format that's ready to feed into a machine learning pipeline. Enter TFDS. TFDS process those datasets into a standard format (external data -> serialized files), which can then be loaded as machine learning pipeline (serialized files -> tf.data.Dataset). The serialization is done only once. Subsequent access will read from those pre-processed files directly...." https://www.tensorflow.org/datasets/add_dataset This is confusing to me. Some of these datasets they say are trained but they speak of them as if they need to "train" another existing AI without specifying what sort of computational load is needed for this. It's not clear to me how processed a "dataset" is. It does appear that Tensorflow can use a vast array of datasets and can also interact with trained models. "...TensorFlow Hub has been integrated with Kaggle Models. You can now access 2,300+ TensorFlow models published on TensorFlow Hub by Google, DeepMind, and more..." https://www.kaggle.com/models?tfhub-redirect=true Part of the problem is AI stuff is covered up in what I call "Varbage", (verbal garbage) which is when they make up new words for what ever specialization that is a new technology instead of using common easily understandable words. In fact a perfect example is me calling it "Varbage". :) See how that works?
Open file (59.65 KB 600x1183 myjobhereisdone.jpg)
>>28521 >Sigh....I've been looking at this and find that it is not an actual AI but a tool to interact with an AI. Though I could be wrong I think you must use "other" pre-trained models. Not that this is bad but it appears to me that there are other tools presently existing that have better documentation and are farther along in usefulness that do much the same. Yeah, ease of use is nothing to be sneezed at, and is a huge improvement in itself, like you sort of already suggested. What other tools, though? >>28433 In all seriousness, I've been playing with this for the past few weeks and it's kind of everything I wanted? My desire for a robowaifu is entirely just someone to talk to offline (my only issue with the current ChatGPT spate), and I guess I'm such a fucking simpleton that this has scratched that itch and thensome. Yes, you could make a Chobits, but there are always improvements you could make in the language model. You could always make it more of an Usain Bolt in terms of athletics. This is a weird philosophical question, and kind of off-topic, I don't know, but when would you guys consider yourself "done?"
Open file (59.71 KB 895x1174 dark_catgirl.jpg)
Since we might be in danger of seeing LLMs just as "word predictors" without taking into account that of course, there have to be some mechanisms there to find the best answer, this here might be a good talk (I'm currently listening to): >In this wide-ranging conversation, Tim Scarfe interviews Neel Nanda, a researcher at DeepMind working on mechanistic interpretability, which aims to understand the algorithms and representations learned by machine learning models. Neel discusses how models can represent their thoughts using motifs, circuits, and linear directional features which are often communicated via a "residual stream", an information highway models use to pass information between layers. >Neel argues that "superposition", the ability for models to represent more features than they have neurons, is one of the biggest open problems in interpretability. This is because superposition thwarts our ability to understand models by decomposing them into individual units of analysis. Despite this, Neel remains optimistic that ambitious interpretability is possible, citing examples like his work reverse engineering how models do modular addition. https://youtu.be/_Ygf0GnlwmY I guess if researchers get better at this, then it might also help to extract some algorithms from networks and manipulate them or make them smaller and faster. >Key areas of discussion: * Mechanistic interpretability aims to reverse engineer and understand the inner workings of AI systems like neural networks. It could help ensure safety and alignment. Neural networks seem to learn actual algorithms and processes for tasks, not just statistical correlations. This suggests interpretability may be possible. * 'Grokking' refers to the phenomenon where neural networks suddenly generalize after initially memorizing. Understanding this transition required probing the underlying mechanisms. * The 'superposition hypothesis' suggests neural networks represent more features than they have neurons by using non-orthogonal vectors. This poses challenges for interpretability. * Transformers appear to implement algorithms using attention heads and other building blocks. Understanding this could enable interpreting their reasoning. * Specific circuits like 'induction heads' seem to underlie capabilities like few-shot learning. Finding such circuits helps explain emergent phenomena. * Causal interventions can isolate model circuits. Techniques like 'activation patching' substitute activations to determine necessity and sufficiency. * We likely can't precisely control AI system goals now. Interpretability may reveal if systems have meaningful goal-directedness. * Near-term risks like misuse seem more pressing than far-future risks like recursiveness. But better understanding now enables safety. * Neel thinks we shouldn't "over-philosophize". The key issue is whether AI could pose catastrophic risk, not whether it fits abstract definitions.
>>28725 > My desire for a robowaifu is entirely just someone to talk to offline My dood, if you just want a personal chatbot fren get yourself oobabooga: https://github.com/oobabooga/text-generation-webui It is relatively easy to install: automagically downloads all the python stuff, so it is entirely local. Your AI waifu wouldn't be held at ransom by the corporations because it will live on your computer. Just make sure you get a model from hugging face that is smaller than your VRAM (aka graphics card memory) if you're using GPU, or a model smaller than your system RAM if you're using CPU (CPU is much slower).
Open file (92.62 KB 833x918 Discord_ylVzc5QwWg.png)
Open file (46.13 KB 758x402 Discord_ZlIBfiqm6A.png)
>>28417 saw small update on jan it will get RAG in version 0.4.7 (i think :/, see 2nd screenshot) https://www.promptingguide.ai/techniques/rag >it's possible to build a language model-based system that accesses external knowledge sources to complete tasks >This enables more factual consistency, improves reliability of the generated responses, and helps to mitigate the problem of "hallucination" "RAG" or "Retrieval Augmented Generation" should kickstart the flood of better AI chatbots, or even make it possible to do some very niche / specific personalities for your wAIfu using "outsider" databases & other data-related stuff. also it seems to be good for real-world applications too: https://arxiv.org/abs/2402.03610 (new paper on RAG theme) >we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications.
>>29205 Thanks 01! Looking forward to seeing how this advances over the next few months. Cheers. :^)
>AI as a tool for invention: Euro Beinat, Global Head, Data Science & AI, Prosus | CogX Festival 2023 >Prosus AI, a top-tier applied AI centre, drives rapid experimentation and implementation of AI throughout Prosus' global portfolio, which includes over 80 technology companies with more than 800 AI experts. Euro Beinat (Global Head of Data Science and AI) outlines how AI is harnessed for discovery within the Prosus network. He shares insights gained from 10,000 colleagues who utilise generative AI daily across the group, significantly enhancing the impact of their work. https://youtu.be/9K6E04z-Cl0 This might give you some insights how to use such tools, but also how to combine different models to something more useful. Also, shows how useful it would be to have user input and reports from many people.
Groq: New hardware architecture makes LLMs around 18 times faster at inference (using it to generate responses). https://youtu.be/zupmHMWuGCs https://www.youtube.com/@GroqInc https://youtu.be/Pr6nNuGSbCE https://groq.com/ (not really accessible publicly yet, only with telling them about a project) Though, I hate that they trademarked the term LPU (language processing unit).
Open file (7.56 KB 400x400 grok.jpg)
xAI (Elon Musk) just released the weights for their 314B parameter model Grok-1 (3.14 kek) as a torrent under a free Apache license. It's the raw model, without any fine-tuning, so it's capable of generating arbitrary (uncensored) content. This is significant because, alongside Meta's Llama models, Musk is trying to break the stronghold of big tech (OpenAI) who would only let you rent access to their proprietary models running on their servers, making you pay for each token and recording every single interaction. https://twitter.com/grok https://academictorrents.com/details/5f96d43576e3d386c9ba65b883210a393b68210e
>>30393 I'm just gonna wait for llama 3. Elon's model is unnecessarily large and very shit. In fact, I'm sure its a chatgpt knock off because in many responses it straight up calls itself ChatGPT.
>>30457 Oh it is and Grok is hilariously even more cucked than chatgpt if possible.
I posted some overview over currently trending models here >>30442, mostly LLMs but not exclusively.
new and even better voice synth TTS / editor dropped. no HF space demo yet, but you can listen here - https://jasonppy.github.io/VoiceCraft_web/ https://github.com/jasonppy/VoiceCraft model weights - https://huggingface.co/pyp1/VoiceCraft/tree/main
Kinda in the wrong thread, we have one specific for voice and speech. But thanks, no problem. You probably didn't find the right one because you need to search for "speech generation" not "voice ...". I put my answer in there: >>30625
Hello robotwaifu, Honestly glad to see a chatbot thread, I usually just lurk here, but glad to see a thread proper for these, and it's a actual discussion I'm so used /g/'s usual chaos, Hmm I've been wondering how to improve my chatbot experience, while I can make great bots for usage, I've been wanting to explore using text to speech to expand on them.
>>30813 If you want advice, I still suggest /g/'s /lmg/. They're quite helpful.
Some guy (Morgan Millipede) started to reverse engineer Neuro-Sama: https://youtu.be/uLG8Bvy47-4 - basically just a humorous introduction on how to do this (he has a $4k computer, though, and she's slower in her responses at the beginning). 4chan responded: https://youtu.be/PRAEuS-PkAk - Her response time improved since the first video.
>>30821 Lol. Thanks NoidoDev, I'll try to make time to look these over. Cheers. :^)
>llama3-70b on Groq runs at 300 tokens/s for 7k tokens >mixtral-8x7b at 550 tokens/s for 7k tokens >my tinyllama-1.1b model extended to 12k tokens runs at 0.5 tokens/s I don't feel so good, bros. How do we make faster models? I have an idea to use Matryoshka representation learning to reduce the hidden dimension size dynamically: https://arxiv.org/abs/2205.13147 but even if I truncate the model's 2048 dimensions down to 512 dimensions, it will perform at 8 tokens/s at best. And who knows how much slower it will be once I get to 32k context. If it's possible to reduce 90% of the tokens to 64 dimensions, then it might get 70 tokens/s at the very most, but GPU latency will probably fuck that down to 20 tokens/s. I could also prune a few layers of the model, quantize it to 4-bits and implement mixture of depths https://arxiv.org/abs/2404.02258 but that will only give a tiny speed up and I don't want the accuracy to drop further than it is. With the much smaller model size though I could convert it into a sparse-mixture-of-experts model https://arxiv.org/abs/2401.04088 with 16 experts to make up for the loss in accuracy without sacrificing speed. The model will eventually be finetuned with self-rewarding ORPO too, hopefully providing a boost in usefulness to overcome its barebone compute, although I'll likely use Llama3-70b to bootstrap the reward labels until its capable of consistently self-improving on its own. Odds ratio preference optimization (ORPO): https://arxiv.org/abs/2403.07691 Self-rewarding LMs: https://arxiv.org/abs/2401.10020 The T5 efficient model worked fine with a hidden dimension size 512 after finetuning: https://arxiv.org/abs/2109.10686 And Matryoshka representation learning also worked well using a 16-dimension embedding for a 1k-class classification task. I forget the paper but I remember reading one years ago where they found some layers in transformers are only making a decision between a few choices, so a large hidden size might not be necessary in those cases. To convert the model's hidden states to Matryoshka I plan to add importance biases to parameters and train the biases with the rest of the parameters frozen and then take the softmax over them and top-k. After training, the parameters could be sorted and the importance biases pruned, and then the model parameters could be finetuned. I may have to train an even smaller model from scratch though since TinyLlama uses 32 attention heads.
>>31006 >use Matryoshka representation learning to reduce the hidden dimension size dynamically This seems both interesting & promising, Anon. Good luck with your research. Cheers. :^)
Kyutai - fast and unhinged, the real girlfriend experience: https://youtu.be/ZY2hBv9ob8U https://youtu.be/bu7-YODAcfs
https://youtu.be/Nvb_4Jj5kBo >Why "Grokking" AI Would Be A Key To AGI The title might be a bit misleading, since this also talks about alternatives. It's a very interesting video exploring the actual weaknesses of LLMs and how to deal with it. One way seem to be to train them 10x more. I'm looking forward to the reactions of the people complaining about AI's energy consumption and costs. :D Another important takeaway is that one math idea might improve these models a lot. This is very different from other areas of technological progress and very promising for anyone who wants more fast. >Links Check out my newsletter: https://mail.bycloud.ai Are We Done With MMLU? [Paper] https://arxiv.org/abs/2406.04127 Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models [Paper] https://arxiv.org/abs/2406.02061 Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization [Paper] https://arxiv.org/abs/2405.15071 Grokfast: Accelerated Grokking by Amplifying Slow Gradients [Paper] https://arxiv.org/abs/2405.20233 [Code] https://github.com/ironjr/grokfast
>>32562 Neat! Thanks, NoidoDev. It's certainly encouraging that Grok is being positioned as an open source project AFAICT. If the past year or two is any indication, then we can expect rapid improvements to it once it's out of the hands of the GH, and the Autists get their hands on it. Cheers. :^)
Did anyone test some small models like SmolLM-Instruct: https://huggingface.co/spaces/vilarin/SmolLM-Instruct. Phi-3 or DialogGPT. And maybe looking into how to fine-tune them. They seem to be extremely bad, especially the 120-360M parameter ones, but they run on a CPU and SmolLM is very fast (in putting out outrageous gibberish). > picrel 1 is more like what I wanted, picrel 2 is closer to what I've got, but there's still hope I also wonder if anyone trained such a small model in some specific programming language, just to do basic math and function calling. Or classification of the input. Fine-Tuning >Selecting the appropriate model architecture and training method is crucial when fine-tuning transformer models for specific task objectives. This process involves adapting a pre-trained model, which has been initially trained using one of the following methods, to perform new or more specialized tasks: > - Causal Language Modeling (CausalLM): Focuses on predicting the next token based solely on the preceding sequence. Originally trained models using CausalLM are typically fine-tuned for tasks that require sequential data generation. > - Masked Language Modeling (MLM): Involves predicting randomly masked tokens from their context. Models pre-trained with MLM are often fine-tuned for tasks that benefit from understanding bidirectional context, such as text classification. > - Sequence-to-Sequence (Seq2Seq): Uses an encoder-decoder structure to transform entire input sequences into outputs. Fine-tuning Seq2Seq models is common in tasks like translation or summarization where comprehensive input-to-output mapping is required. Source: https://medium.com/@liana.napalkova/fine-tuning-small-language-models-practical-recommendations-68f32b0535ca LIMA: Less Is More for Alignment https://arxiv.org/abs/2305.11206 > Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.
>>32934 >Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output. Big if true. I admit to being confused by this conception though, lol.
Ran across this "uncensored" open source free AI https://www.freedomgpt.com/ Runs with 16GB or less of RAM and you don't need a video card it also has a downloadable private local version. At the site you can scroll around and they have some sort of image manipulation tool also but I didn't see where you could run it local. It's supposed to be uncensored but they highlight for political figures. They don't say either way about girls. Looks interesting. If you install it on Linux some step by step directions on how you did so would be nice.
>>32962 This site reeks of investor-speak. Looking through the privacy policy and the about us page, it looks like they do sell some data to advertisers, and they have some vague stance against "unethical" use. Neither of these are explained in detail. They're also pushing for some altcoin nonsense, but it looks like you have to take steps to opt-in, so that's not too bad. They explain so little about themselves that I can't get a good read beyond "somewhat fishy", though. >If you install it on Linux some step by step directions on how you did so would be nice They have step-by-step instructions on their github page.
>>32962 Thanks, Anon! Always need to keep on the lookout for more practical solutions for AI that may prove useful for robowaifu development. <---> OTOH : >"...We believe AI will dramatically improve the lives of everyone on this planet if it is deployed responsibly with individual freedom as paramount." [1] I'd argue that user freedom be not just 'paramount', but it is in fact the only mount that's actually important here. We -- the masters & owners -- alone should determine the aspects most important for our robowaifu's AI, IMHO. After all, they are our own household appliances! Cheers. :^) >>32964 >"...Additionally, we have no tolerance for FreedomGPT hosted models being misused for unethical purposes." [1] <insert: skeptical kot is skeptic.jpg> Yah, It's pozz'd. I can just imagine the parade of clownworld troons & stronk independynts the totally-not-GH-glowniggers"""VC"""s in control there trot out to make such determinations. --- 1. https://www.freedomgpt.com/about-us >=== -add footnote -minor edit
Edited last time by Chobitsu on 08/20/2024 (Tue) 01:51:35.
>>32964 >This site reeks of investor-speak. It seems to me they have good reason for the token system. freedomgpt @RealFreedomGPT 🫡We created $FNT to solve our own problem: centralized web hosts stopped supporting FreedomGPT and we needed to establish our own computing network. https://x.com/RealFreedomGPT/status/1764025152088805684 Apparently they are using distributed computing of their users to run?train? the AI. It is for profit "reedomGPT is a 100% uncensored and private AI chatbot launched by Age of AI, LLC. Our VC firm invests in startups that will define the age of Artificial Intelligence and we hold openness as core. We believe AI will dramatically improve the lives of everyone on this planet if it is deployed responsibly with individual freedom as paramount." So it looks like they are funding a basic model and using it to sell advanced/specialized models in their app store or so I;m guessing. "If" it is local and doesn't report back all you do that seems a good thing to me. They say it doesn't. I suppose watching it's network access, or lack thereof would tell. I have no interest in this other than I like the idea of uncensored, local AI's. I'm sure there are others but come to think of it I haven't seen any that really hype up the idea of local use like them. Though I'm really, really far from knowing all the AI's out there.
An analysis of OpenAI Strawberry https://www.youtube.com/watch?v=FJTZP7ZdQf0
>>33271 Neat!! Thanks for the link, Kiwi. This seems like a rather plausible scenario IMHO. And I really like the fact he's not just 'armchair quarterbacking it'; rather he's actually drilling into an example suite of his own devising to demonstrate his hypothesis. Sound research methodology in fact of course tbh. :^) It seems to me that such a simple '4-headed' synthesized-data approach might work well even with other LMs/datasets/even-other-ML-systems . Any thoughts about that, Anon? Cheers. :^) <---> >"...Language Models are really good at discriminating..." L.M.A.O. >Faux pas alert! >FAUX PAS ALERT!111!!ONE!!! <insert: DAS_RAYCISS!!!.exe.mpg.mov.mid.stl.the-classic-gif> Indeed they are. Maybe that's why Tay's Law is a real thing : ( >>33222 ). :DDD >t. Anonymous: Amateur Nooticing done by day, Robowaifu Engineering done by night >=== -sp, fmt, funpost edit
Edited last time by Chobitsu on 08/31/2024 (Sat) 23:20:58.
Madlad put a language model on an ESP32. A reminder of how small these things can be. https://www.youtube.com/watch?v=E6E_KrfyWFQ
>>33488 Great find, Anon! Thanks for pointing this out. You know, since we'll deffo need a smol network of MCUs in a mid- to high-tier robowaifu, maybe some of that compute power can be redirected to GPMCU (tm)(R)(C)(patent pending)(do not steal!1111)? It'd be tricky tho, since the majority of tasks running on our microcontrollers will at least be running soft-realtime (if not hard RT).
BitNet has heaps of potential to bring capable LLM's to lower cost, efficient hardware. It's a framework for ternary (-1,0,1) LLM, which is 1.58 bit on real hardware. It is often misrepresented as 1 bit. Essentially, it's a method to have LLM's use math that's easier to process to reduce latency and power consumption. https://github.com/microsoft/BitNet T-MAC is another method of reducing processing power needed. This works by using a look-up table to find solutions. Essentially, the solutions needed for the math problems in the process generating answer are already done. So, the system finds them instead of calculating them again. Relying more on faster storage or RAM, freeing up compute for other problems. This can result in much faster responses using less power. https://github.com/microsoft/T-MAC
>>34035 Thanks, Light! This sounds awesome. I'm currently playing with some 64-bit ARM chips. I wonder if they could tackle such a task?

Report/Delete/Moderation Forms
Delete
Report