/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Happy New Year!

The recovered files have been restored.

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“Stubbornly persist, and you will find that the limits of your stubbornness go well beyond the stubbornness of your limits.” -t. Robert Brault


Python General Robowaifu Technician 09/12/2019 (Thu) 03:29:04 No.159
Python Resources general

Python is by far the most common scripting language for AI/Machine Learning/Deep Learning frameworks and libraries. Post info on using it effectively.

wiki.python.org/moin/BeginnersGuide
https://archive.is/v9PyD

On my Debian-based distro, here's how I set up Python, PIP, TensorFlow, and the Scikit-Learn stack for use with AI development:
sudo apt-get install python python-pip python-dev
python -m pip install --upgrade pip
pip install --user tensorflow numpy scipy scikit-learn matplotlib ipython jupyter pandas sympy nose


LiClipse is a good Python IDE choice, and there are a number of others.
www.liclipse.com/download.html
https://archive.is/glcCm
>>7919 Hey there, since it's trivial enough to do for me now, I went ahead and created a plain text file for you of every post on robowaifu (as of a few minutes ago, so you can even find your post there). I have a rudimentary approach to cleaning up the text just a bit before inserting it into the data containers so it's not an exact match to the posts. But, you might find it useful and I hope you can. Cheers Anon. https://files.catbox.moe/e37qds.7z
>>7922 Thanks, that's great. Except that you now filter out all non alphabetical characters? A reference to the board like "/robowaifu/" becomes "robowaifu" now, which is like a mentioning of a robowaifu. Cleaning these extra chars out should probably done later, before counting singular words, but some then still need to be a special case. Only when used as key for a dict "/" can't be in there. Some if the quotes are also being used to show that terms belong together, like a name of a movie. Such things should be able to be filtered out and stored into a extras list or file.
>>7923 Heh, in a word: Yes. We have two different agendas going on here: A) atm, mine is to create a search system that responds much faster than human perception, and to provide reasonably plausible links for the user to use in pursuing further information. For example, an AI such as B) your project that's trying to pin down exact semantic meanings, etc. Think of my tool as one that your tool can use as a front-end to quickly locate research material to use. For that material, you have the full JSON files to parse exactly as you see fit ofc.
Open file (491.22 KB 1192x894 gambatte.jpg)
>>7923 Hey Anon, so as a Christmas present to you I created another version of the all_posts.txt file that doesn't have much in the way of filtering at all. I hope it can be of some help to you. Good luck with your project. online text: https://files.catbox.moe/h6jp5n.txt zip you can download: https://files.catbox.moe/ub266f.7z Merry Christmas!
>>7937 Thanks. I'm just trying occasionally if and how we can use this to find the most interesting terms on the board and per posting, so that the index thread can be, at least in part, created automatically.
>>7916 # cleaning the space from the terms in the entry parsed3 = [(token[0], token[-1].lstrip().rstrip()) for token in parsed2] # also everything which is not a letter or number parsed4 = [(token[0], ''.join(c for c in token[-1] if c.isalpha() or c.isnumeric())) for token in parsed3] # creating a dict, counting the counters of entries together which appear twice due cleaning better_dict = {} keys = [] for token in parsed4: keys = (better_dict.keys()) if token[-1] in keys: better_dict[token[-1]] = better_dict[token[-1]] + int(token[0]) else: better_dict[token[-1]] = int(token[0]) #Test better_dict['wholesome']
from nltk.stem import PorterStemmer stemmer = PorterStemmer() # finding all the terms lemmatizing didn't # if the simple form is already in the set "lemmatized_longerthan3" then the longer one can be removed. First step is to have a list or set with those. stemmeable = {token for token in lemmatized_longerthan3 if stemmer.stem(token) in lemmatized_longerthan3 and token != stemmer.stem(token)} # Then the one is substracted from the other. Which kills around 3000 terms. longerthan3 = list(set(lemmatized_longerthan3) - stemmeable) # For a side to side peak into it, uncomment the following lines: # comp = [(token, stemmer.stem(token)) for token in stemmeable] # comp[42] # The Lancaster stemmer takes even some more out, I think it's good to have both results to work with. For now I substract both from the data. from nltk.stem.lancaster import LancasterStemmer st = LancasterStemmer() lancastered = {token for token in lemmatized_longerthan3 if st.stem(token) in lemmatized_longerthan3 and token != st.stem(token)} longerthan3 = list(set(lemmatized_longerthan3) - lancastered) We are down to only around 17k terms now. Huraay. Next, I'll try to use a list with the most common terms in English to get rid of those. If that doesn't bring down the number a lot, this whole approach failed. We currently have more than 7k terms appearing only once and 1.5k terms appearing more than 20 times. Looking into these lists, I can't see any pattern emerging.
Open file (197.83 KB 974x449 pcgrad.png)
Found a PyTorch implementation of PCGrad: https://github.com/WeiChengTseng/Pytorch-PCGrad Based on the paper Gradient Surgery for Multi-Task Learning: https://arxiv.org/pdf/2001.06782.pdf PCGrad fixes conflicts between two gradients that are causing learning interference to ensure when combined they move towards the optimal loss instead of a random direction. It performs best with a high learning rate, which is particularly interesting to training with extremely large batch sizes since higher learning rates can be used in those circumstances to improve training efficiency. PCGrad greatly enhances both accuracy and data efficiency and also works with reinforcement learning. I'm not sure how compatible it is with gradient accumulation. It seems no one has reported results on that yet, but I'll do some tests find out. It's also fairly simple to implement. Implementing this in mlpack should be a piece of cake and could enable older hardware to outperform what a modern desktop can do today without PCGrad. Of course one with PCGrad would still rip older hardware to shreds but I think it's cool this might enable older hardware to be able to train voice synthesis models and such, given they have enough memory to hold the model. The official Tensorflow implementation is also available here: https://github.com/tianheyu927/PCGrad
Python will soon have switch statements, resulting in even more readeable code. Some comment say this resembles finite-state machines and will also increase the quality of the code. https://hackaday.com/2021/04/02/python-will-soon-support-switch-statements/
>>9476 >match That looks really nice. I hope C++ will adopt something like that soon. They have occasionally demonstrated a willingness to step outside the beaten path of diehard C fanatics and do something a lot better. Full regex pattern matching for the switch() statement in C++ would be a very, very powerful mechanism indeed. Typically I don't consider Python's typeless variable paradigm an advantage -- quite the opposite, in fact. However this match qualifies in my mind as an exception to that rule. Thanks Python-Anon! :^)
>>9476 This will be immensely useful for scripting symbolic AI. Using elif statement trains or dictionaries for switch cases is a train wreck. >>9478 Python has had typing annotations since 3.5 and they've generally become standard practice because debugging without them is a nightmare. def greeting(name: str) -> str: return 'Hello ' + name The Python runtime doesn't enforce these types but they can be checked with the IDE or other tools like mypy before running. python -m pip install --user mypy mypy test.py && python test.py Jupyter notebooks can also be checked using nbqa with mypy: python -m pip install --user nbqa nbqa mypy notebook.ipynb Another useful package for Jupyter notebooks is nbconvert which can convert them to Python scripts and HTML. python -m pip install --user nbconvert jupyter nbconvert --to script notebook.ipynb jupyter nbconvert --to html notebook.ipynb
>>9484 >typing annotations I see. I didn't realize any of that. Thanks for the detailed explanation, Anon!
any nontrivial materials for learning llvmlite? I want to write compiler for metatrader-like language, means programs that not get run once sequentially, but certain parts of program run every "tick" (meaning every time price changes) and all is conditional on price change. Most examples are hello world languages, i am not sure how to begin writing standard library for my hello world language that will support what i want
>>9558 >any nontrivial materials for learning llvmlite? You can hardly get more 'nontrivial ' for learning some code than, well, the code itself, Anon. https://github.com/numba/llvmlite
>>9484 >Python has had typing annotations since 3.5 Starting with 3.9 this here seems to be interesting: >Type annotations for a tensor's shape, dtype, names, ... >Bye-bye bugs! Say hello to enforced, clear documentation of your code. >If (like me) you find yourself littering your code with comments like # x has shape (batch, hidden_state) or statements like assert x.shape == y.shape , just to keep track of what shape everything is, then this is for you. https://github.com/patrick-kidger/torchtyping
>>9673 Great find, anon. This is incredibly useful. I had no idea about typeguard either that does run-time type checking.
>>9673 Thanks, Anon. Much appreciated.
Here's a thread on Reddit about a problem with random in Numpy and PyTorch: https://www.reddit.com/r/MachineLearning/comments/mocpgj/p_using_pytorch_numpy_a_bug_that_plagues - It's about setting seeds in the data loader when using Numpy's random number generator. A lot of people got it wrong and 100k+ tested projects on Github, 95% got it wrong. It's not a bug, but easy to overlook.
>>9950 Surprising something like that could go unnoticed so long. C++'s standard library has a generator that uses hardware entropy device if available (almost always is in past 10 years). I wonder if Python provides something similar.
>The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets. >In particular, secrets should be used in preference to the default pseudo-random number generator in the random module, which is designed for modelling and simulation, not security or cryptography. https://docs.python.org/3/library/secrets.html
Here are 70+ Python coding projects, to learn Python or getting better at it. Some also strike me as useful for any home assistant. https://www.theinsaneapp.com/2021/06/list-of-python-projects-with-source-code-and-tutorials.html
>>10810 Thanks anon, this is good stuff! That one about age and gender detection with computer vision looks especially useful.
I'm actually working on a sentiment analysis neural network with TensorFlow, but it's actually pretty bad because I can't find a proper dataset to train it. Has anyone ever found something that could prove useful in the future?
>>10893 There's a 'dataset' thread Anon, maybe you can find something useful there? (>>2300)
Open file (351.60 KB 1920x1080 Workspace1_001.jpg)
Neat! I went for months (at least a year maybe?) despondent, angry & frustrated with Python because it's freakin hard to get anything to work with it. I even made my system entirely unusable one time trying do a downgrade (had to nuke my whole box and start over). I tried for the 1'000th time it seems to get Spyder installed and running, and this time, have actually made some forward progress! > I discovered this one stupid Python trick yesterday: pip3 list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip3 install -U Now I expect all the bitches will be clamoring for my attention now of course. I had to re-run and re-run it, but finally after maybe a dozen times, enough of the system was upgraded sufficiently to actually bring the IDE up. Now if I can only figure out a way to downgrade the three libraries in the warning dialog, I can actually apply myself to learning Python soon. I'm not hopeful even now though, and I'm terribly gunshy after all my previous unsuccessful efforts getting Python dependencies sorted out.
Open file (505.69 KB 1920x1080 Workspace1_002.jpg)
>>11584 >Update: Well whadya know? I seemed to find a way that a) didn't destroy my machine in the process (always important!), and b) got rid of the error warning in Spyder. pip uninstall jedi pip install --upgrade jedi==0.17.2 pip uninstall parso pip install --upgrade parso==0.7.0 pip uninstall pyls_spyder pip install --upgrade pyls_spyder==0.3.2 > It came from a LUA site, of all places. code.luasoftware.com/tutorials/python/python-pip-downgrade-package/
>>11584 You probably should use virtual environments: https://realpython.com/python-virtual-environments-a-primer/
Open file (44.83 KB 504x227 Warning_001.jpg)
Open file (6.22 KB 209x37 Selection_016.jpg)
Open file (313.37 KB 300x300 tableflip.exe.gif)
>>11589 >Update-update Turns out, the exact reason I wanted an IDE (rather than just a Python console) doesn't work on my machine even after downgrade-patching the lib dependencies. > I tried restarting several times, but didn't help. I noticed the IDE had a little message at the bottom 'LSP Python restarting' with a little orbiting icon. Finally the orbit stopped and the message changed to > maybe it's related to code-completion not working, etc. I don't want to hate Python, I really don't. It's simply impossible not to do, of course. D:< >pic related
>>11591 There seems to be an Anaconda installer, did anyone try that one? https://anaconda.org/anaconda/spyder Spyder really looks promising. Most machine learning scientists and engineers seem to use Jupyter notebooks, though. In case you didn't know. These two config frameworks were also mentioned, for managing configuration of ML projects without the notebooks: https://medium.com/pytorch/hydra-a-fresh-look-at-configuration-for-machine-learning-projects-50583186b710 https://github.com/toml-lang/toml
>>11591 What OS or distro are you using? Also: python --version pip --version python -m pip --version pip show torch tensorflow spyder uname -a >>11592 Jupyter notebooks are great for experimenting but not so much for writing libraries. Personally I write most of my code in Mousepad that only has syntax highlighting and only touch Spyder when I'm writing clean code for other people to use.
Open file (20.03 KB 560x321 GottaPyFast.png)
Sentdex created an open-source Python code transformer model like Github Copilot. It has only been trained 2 epoches so it's not great but it's interesting and fun to play around with. I feel it's gonna be important to stay on top of these developments to keep up with AI-accelerated productivity so I made an easy-to-use GUI to inference these models (just press Control+Tab to generate.) GottaPyFast: https://gitlab.com/robowaifudev/gottapyfast/ Sentdex Demo: https://www.youtube.com/watch?v=1PMECYArtuk Sentdex Model: https://huggingface.co/Sentdex/GPyT How to use the model: from transformers import AutoTokenizer, AutoModelWithLMHead # set device to cuda if you have a GPU device = "cpu" tokenizer = AutoTokenizer.from_pretrained("Sentdex/GPyT") model = AutoModelWithLMHead.from_pretrained("Sentdex/GPyT").to(device) def generate(model, query, max_length=100, top_p=0.9, temperature=1.0, do_sample=True): newlinechar = "<N>" query = query.replace("\n", newlinechar) tokenized = tokenizer.encode(query, return_tensors="pt").to(model.device) response = model.generate(tokenized, max_length=max_length, do_sample=do_sample, top_p=top_p, temperature=temperature).to(model.device) decoded = tokenizer.decode(response[0]) return decoded.replace("<N>","\n") print(generate(model, "import torch", max_length=256, top_p=0.95, temperature=1.0))
Trying to package PyTorch and Transformers for Windows is a nightmare. I put together some notes here on how to get it to work: https://gitlab.com/robowaifudev/cookbook/-/wikis/python/pyinstaller I think using MLPack would be much more practical. PyTorch takes up a massive 3.1 GB which is completely unnecessary. Another option could be to rewrite the GPT2 transformer model to use NumPy instead which is only 5.4 MB. I'll look more into this later.
>>11683 Thanks for all your hard work here Anon. I apologize to you and everyone else here for being such a pussified faggot about Python. I recognize it's important to all of us, or else I wouldn't even consider picking it up. Please look into mlPack sooner rather than later if you at all can. It's probably our only real hope for doing waifu AI on a shoestring budget hardware-wise.
Open file (13.19 KB 849x445 chainer.png)
>>11684 MLPack's documentation is really lacking, especially for newer features and seems to be missing essential features. I'd I have to sit down with it for 3-6 months to get transformers and text-to-speech models working in it. I'm looking into using Chainer which is built on top of NumPy and quite popular in Japan. A basic application with Chainer packaged with PyInstaller compresses down to 14 MB. On top of that there's already lots of ML models implemented in it. I think if I roll out some waifu tech with Chainer to garner interest we could get some more help to build things in MLPack, which will be particularly useful for embedded systems and actual physical robowaifu.
>>11688 Migration guide from PyTorch to Chainer https://chainer.github.io/migration-guide/
I've been compressing datasets with zstd and using them with streaming decompression to save space, reduce SSD wear and speed up access to compressed data. It's also useful for previewing datasets saved as zst as they download. I couldn't find anything readily available on the net on how to do it so hopefully this saves someone else some time: # installation: python -m pip install zstandard ijson import zstandard as zstd import ijson # streaming decompression for JSON files compressed with zstd --long=31 with open("laion_filtered.json.zst", "rb") as f: dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31 with dctx.stream_reader(f) as reader: for record in ijson.items(reader, "item"): print(record) import io import json # streaming decompression for NDJSON/JSONL files compressed with zstd --long=31 with open("00.jsonl.zst", "rb") as f: dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31 with dctx.stream_reader(f) as reader: for line in io.BufferedReader(reader): record = json.loads(line) print(record) import csv # streaming decompression for TSV files compressed with zstd --long=31 with open("Image_Labels_Subset_Train_GCC-Labels-training.tsv.zst", "rb") as f: dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31 with dctx.stream_reader(f) as reader: buffered_reader = io.BufferedReader(reader) wrapper = io.TextIOWrapper(reader) csv_reader = csv.reader(wrapper, delimiter='\t') # or csv.DictReader if it has fieldnames for record in csv_reader: print(record) Also I recommend building zstd from source since the latest version has improved performance and compression.
>>18103 Excellent work Anon, thank you. Nice clean-looking code too, BTW.
Open file (79.43 KB 483x280 Screenshot_70.png)
Open file (124.54 KB 638x305 Screenshot_71.png)
Open file (124.48 KB 686x313 Screenshot_73.png)
Open file (186.01 KB 636x374 Screenshot_74.png)
Advanced use of exceptions in Python for reliability and debugging: >I Take Exception to Your Exceptions: Using Custom Errors to Get Your Point Across https://youtu.be/wJ5EO7tnDiQ (audio quality is a bit suboptimal)
>>24658 Great stuff NoidoDev. Exceptions are based.
I watched this video here in 1.75x speed as a refresher, since I had some gaps from not writing Python in quite a while. It might also work for beginners with experience in some other language: > Python As Fast as Possible - Learn Python in ~75 Minutes https://youtu.be/VchuKL44s6E As a beginner you should of course go slower and test the code you've learned. While we're at it, Python got a lot of new features, including compiling better to C-code now using Cython: https://youtu.be/e6zFlbEU76I
https://automatetheboringstuff.com/ Free online ebook. >Practical Programming for Total Beginners >If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you?
Related: Some anon wants to get started >>26616
>Python dependency-hell I think this might biggest barriers to entry. There are a lot of pre-compiled solutions like Backyard, but they all have limitations to me and hard to tweak like adding new voices\changing whisper models. My issue is most developers use Mac\Linux and I'm just a gamer on Windows. Getting start with Python Windows is mostly just understanding environments, but a lot of programs require the WSL (Linux subsystem) and Triton. Here are the steps I use to get my basic bot going on Windows 1.) Download\Install Visual Studio Code 2.) Download\Install Python and Git 3.) In Visual studio code, open the folder your project is in or the directory you cloned a GitHub to. 4.) Open a Terminal within Visual Studio Code and create a new Python Environment for your project by running - python -m venv myenv This step is crucial to avoid dependency issues in the future, as all packages you download will now be isolated to that environment. If you do not do this, all your package dependencies will become part of "Global\System" python environment and conflict with each other 5.) Install Pytorch - https://pytorch.org/get-started/locally/ For my project, I used CUDA 12.1, so the command is - pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 6.) Once you have these installed, most other packages should install fine unless they require things like WSL\Triton. This should be enough to get OpenAI transformers, Whisper and many TTS systems working for a basic chatbot Please add anything I missed!
>>35923 POTD <---> Super-helpful information, Barf! Especially the little trick about running the python -m venv myenv statement. This type of tutorial-style, concrete example is exactly what a community like this needs. Its why I spent long hours creating C++ learning resources here. Also, its seemingly too easy for the old hands to forget what it was like being a newfag 'still wet behind the ears', and therefore failing to share the basics with other Anons here; 'one generation to the next' as it were. And that's a shame, because not only do the newcomers here often never get over these basic hurdles, but there's a real risk that the important information itself gets lost to the winds of time (or -- more likely -- cloistered behind info-barriers like D*xxcord, thereafter only available to the Globohomo itself all by-design on their parts, of course). >tl;dr I'm simply encouraging every Anon here yes, this means you! :D to share their knowledge one with the other. And don't worry even if you think its been said here before! This board is absolutely packed with information, and it can be especially intimidating to the newcomer trying to find some pertinent bit of info. Please just share it on the spot (again) when & where it's most useful. (Also, crosslinks are good for this.) >ttl;dr Some things bear repeating here. :^) <---> Cheers, Anons. >=== -prose edit
Edited last time by Chobitsu on 01/21/2025 (Tue) 02:03:18.
Thanks for all this work, all.
On the chatbot front, I've been working to update my old Python chatbot to actually be a good companion. Here's a sample of my work, but I'm far from done. https://files.catbox.moe/pf85ai.zip
>>35955 Thanks, GreerTech! Personally, I'm a big fan of Cleverbot! [1] :DD JK, good luck with revamping it, Anon. Cheers. :^) --- 1. Remarkably, it's still available!! https://www.cleverbot.com/
>>35928 Fun bot and easy to install. Another suggestion for Python is using pre-compiled C binaries and just calling the CLI via python. It should make it a little faster, more modular and gets around having to install PyTorch so much easier to ship. Here are links to the latest Whisper and Piper binaries https://github.com/ggerganov/whisper.cpp/actions/runs/12886572193 https://github.com/rhasspy/piper/releases/tag/2023.11.14-2 It's kind of the best of both worlds since you can code in python and still get the speed\ease of pre-compiled binaries for the STT\TTS system.
>>35976 Thanks very kindly for the links here, Barf. Cheers. :^)
>>35976 I wonder, is it possible to make a Appimage(Linux) https://appimage.org/ or 0install( Linux, Windows and macOS) https://0install.net/ download program for Linux? These have all the files needed to run whatever program installed all in one place. No additional installations needed.

Report/Delete/Moderation Forms
Delete
Report