/robowaifu/ - Python General

Name
Subject
E-mail
Message	Max message length: 6144
Files	Drag files to upload or click here to select them Maximum 5 files / Maximum size: 20.00 MB

Spoiler images
Password	(used to delete files and postings)
Use bypass

Robowaifu Technician 12/25/2020 (Fri) 13:10:18 No.7942

>>7937 Thanks. I'm just trying occasionally if and how we can use this to find the most interesting terms on the board and per posting, so that the index thread can be, at least in part, created automatically.

Robowaifu Technician 12/25/2020 (Fri) 18:56:37 No.7946

>>7916


# cleaning the space from the terms in the entry
parsed3 = [(token[0], token[-1].lstrip().rstrip()) for token in parsed2]
# also everything which is not a letter or number
parsed4 = [(token[0], ''.join(c for c in token[-1] if c.isalpha() or c.isnumeric())) for token in parsed3]

# creating a dict, counting the counters of entries together which appear twice due cleaning
better_dict = {}
keys = []
for token in parsed4:
    keys = (better_dict.keys())
    if token[-1] in keys:
        better_dict[token[-1]] = better_dict[token[-1]] + int(token[0])
    else:
        better_dict[token[-1]] = int(token[0])

#Test
better_dict['wholesome']

Robowaifu Technician 12/25/2020 (Fri) 20:33:51 No.7947


from nltk.stem import PorterStemmer    
stemmer = PorterStemmer()

# finding all the terms lemmatizing didn't
# if the simple form is already in the set "lemmatized_longerthan3" then the longer one can be removed. First step is to have a list or set with those.
stemmeable = {token for token in lemmatized_longerthan3 if stemmer.stem(token) in lemmatized_longerthan3 and token != stemmer.stem(token)}
# Then the one is substracted from the other. Which kills around 3000 terms.
longerthan3 = list(set(lemmatized_longerthan3) - stemmeable)

# For a side to side peak into it, uncomment the following lines: 
# comp = [(token, stemmer.stem(token)) for token in stemmeable]
# comp[42]

# The Lancaster stemmer takes even some more out, I think it's good to have both results to work with. For now I substract both from the data.
from nltk.stem.lancaster import LancasterStemmer
st = LancasterStemmer()
lancastered = {token for token in lemmatized_longerthan3 if st.stem(token) in lemmatized_longerthan3 and token != st.stem(token)}
longerthan3 = list(set(lemmatized_longerthan3) - lancastered)

We are down to only around 17k terms now. Huraay. Next, I'll try to use a list with the most common terms in English to get rid of those. If that doesn't bring down the number a lot, this whole approach failed. We currently have more than 7k terms appearing only once and 1.5k terms appearing more than 20 times. Looking into these lists, I can't see any pattern emerging.

Robowaifu Technician 03/29/2021 (Mon) 19:11:22 No.9299

Found a PyTorch implementation of PCGrad: https://github.com/WeiChengTseng/Pytorch-PCGrad Based on the paper Gradient Surgery for Multi-Task Learning: https://arxiv.org/pdf/2001.06782.pdf PCGrad fixes conflicts between two gradients that are causing learning interference to ensure when combined they move towards the optimal loss instead of a random direction. It performs best with a high learning rate, which is particularly interesting to training with extremely large batch sizes since higher learning rates can be used in those circumstances to improve training efficiency. PCGrad greatly enhances both accuracy and data efficiency and also works with reinforcement learning. I'm not sure how compatible it is with gradient accumulation. It seems no one has reported results on that yet, but I'll do some tests find out. It's also fairly simple to implement. Implementing this in mlpack should be a piece of cake and could enable older hardware to outperform what a modern desktop can do today without PCGrad. Of course one with PCGrad would still rip older hardware to shreds but I think it's cool this might enable older hardware to be able to train voice synthesis models and such, given they have enough memory to hold the model. The official Tensorflow implementation is also available here: https://github.com/tianheyu927/PCGrad

Robowaifu Technician 04/03/2021 (Sat) 07:46:32 No.9476

Python will soon have switch statements, resulting in even more readeable code. Some comment say this resembles finite-state machines and will also increase the quality of the code. https://hackaday.com/2021/04/02/python-will-soon-support-switch-statements/

Robowaifu Technician 04/03/2021 (Sat) 08:17:05 No.9478

>>9476 >match That looks really nice. I hope C++ will adopt something like that soon. They have occasionally demonstrated a willingness to step outside the beaten path of diehard C fanatics and do something a lot better. Full regex pattern matching for the switch() statement in C++ would be a very, very powerful mechanism indeed. Typically I don't consider Python's typeless variable paradigm an advantage -- quite the opposite, in fact. However this match qualifies in my mind as an exception to that rule. Thanks Python-Anon! :^)

Robowaifu Technician 04/03/2021 (Sat) 16:24:50 No.9484

>>9476 This will be immensely useful for scripting symbolic AI. Using elif statement trains or dictionaries for switch cases is a train wreck. >>9478 Python has had typing annotations since 3.5 and they've generally become standard practice because debugging without them is a nightmare.

def greeting(name: str) -> str:
    return 'Hello ' + name

The Python runtime doesn't enforce these types but they can be checked with the IDE or other tools like mypy before running.

python -m pip install --user mypy
mypy test.py && python test.py

Jupyter notebooks can also be checked using nbqa with mypy:

python -m pip install --user nbqa
nbqa mypy notebook.ipynb

Another useful package for Jupyter notebooks is nbconvert which can convert them to Python scripts and HTML.

python -m pip install --user nbconvert
jupyter nbconvert --to script notebook.ipynb
jupyter nbconvert --to html notebook.ipynb

Robowaifu Technician 04/03/2021 (Sat) 19:30:39 No.9487

>>9484 >typing annotations I see. I didn't realize any of that. Thanks for the detailed explanation, Anon!

Robowaifu Technician 04/07/2021 (Wed) 08:16:52 No.9558

any nontrivial materials for learning llvmlite? I want to write compiler for metatrader-like language, means programs that not get run once sequentially, but certain parts of program run every "tick" (meaning every time price changes) and all is conditional on price change. Most examples are hello world languages, i am not sure how to begin writing standard library for my hello world language that will support what i want

Robowaifu Technician 04/07/2021 (Wed) 11:03:13 No.9565

>>9558 >any nontrivial materials for learning llvmlite? You can hardly get more 'nontrivial ' for learning some code than, well, the code itself, Anon. https://github.com/numba/llvmlite

Robowaifu Technician 04/10/2021 (Sat) 00:13:03 No.9673

>>9484 >Python has had typing annotations since 3.5 Starting with 3.9 this here seems to be interesting: >Type annotations for a tensor's shape, dtype, names, ... >Bye-bye bugs! Say hello to enforced, clear documentation of your code. >If (like me) you find yourself littering your code with comments like # x has shape (batch, hidden_state) or statements like assert x.shape == y.shape , just to keep track of what shape everything is, then this is for you. https://github.com/patrick-kidger/torchtyping

Robowaifu Technician 04/10/2021 (Sat) 00:37:32 No.9676

>>9673 Great find, anon. This is incredibly useful. I had no idea about typeguard either that does run-time type checking.

Robowaifu Technician 04/10/2021 (Sat) 03:18:25 No.9688

>>9673 Thanks, Anon. Much appreciated.

Robowaifu Technician 04/19/2021 (Mon) 13:26:31 No.9950

Here's a thread on Reddit about a problem with random in Numpy and PyTorch: https://www.reddit.com/r/MachineLearning/comments/mocpgj/p_using_pytorch_numpy_a_bug_that_plagues - It's about setting seeds in the data loader when using Numpy's random number generator. A lot of people got it wrong and 100k+ tested projects on Github, 95% got it wrong. It's not a bug, but easy to overlook.

Robowaifu Technician 04/19/2021 (Mon) 16:25:34 No.9955

>>9950 Surprising something like that could go unnoticed so long. C++'s standard library has a generator that uses hardware entropy device if available (almost always is in past 10 years). I wonder if Python provides something similar.

Robowaifu Technician 04/19/2021 (Mon) 19:42:47 No.9961

>The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets. >In particular, secrets should be used in preference to the default pseudo-random number generator in the random module, which is designed for modelling and simulation, not security or cryptography. https://docs.python.org/3/library/secrets.html

Robowaifu Technician 06/04/2021 (Fri) 19:52:28 No.10810

Here are 70+ Python coding projects, to learn Python or getting better at it. Some also strike me as useful for any home assistant. https://www.theinsaneapp.com/2021/06/list-of-python-projects-with-source-code-and-tutorials.html

SophieDev 06/04/2021 (Fri) 21:53:24 No.10813

>>10810 Thanks anon, this is good stuff! That one about age and gender detection with computer vision looks especially useful.

Robowaifu Technician 06/11/2021 (Fri) 04:06:08 No.10893

I'm actually working on a sentiment analysis neural network with TensorFlow, but it's actually pretty bad because I can't find a proper dataset to train it. Has anyone ever found something that could prove useful in the future?

Robowaifu Technician 06/12/2021 (Sat) 22:54:32 No.10906

>>10893 There's a 'dataset' thread Anon, maybe you can find something useful there? (>>2300)

Robowaifu Technician 07/18/2021 (Sun) 19:38:32 No.11584

Neat! I went for months (at least a year maybe?) despondent, angry & frustrated with Python because it's freakin hard to get anything to work with it. I even made my system entirely unusable one time trying do a downgrade (had to nuke my whole box and start over). I tried for the 1'000th time it seems to get Spyder installed and running, and this time, have actually made some forward progress! > I discovered this one stupid Python trick yesterday: pip3 list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip3 install -U Now I expect all the bitches will be clamoring for my attention now of course. I had to re-run and re-run it, but finally after maybe a dozen times, enough of the system was upgraded sufficiently to actually bring the IDE up. Now if I can only figure out a way to downgrade the three libraries in the warning dialog, I can actually apply myself to learning Python soon. I'm not hopeful even now though, and I'm terribly gunshy after all my previous unsuccessful efforts getting Python dependencies sorted out.

Robowaifu Technician 07/18/2021 (Sun) 20:49:46 No.11589

>>11584 >Update: Well whadya know? I seemed to find a way that a) didn't destroy my machine in the process (always important!), and b) got rid of the error warning in Spyder.

pip uninstall jedi
pip install --upgrade jedi==0.17.2

pip uninstall parso
pip install --upgrade parso==0.7.0

pip uninstall pyls_spyder
pip install --upgrade pyls_spyder==0.3.2

> It came from a LUA site, of all places. code.luasoftware.com/tutorials/python/python-pip-downgrade-package/

Robowaifu Technician 07/18/2021 (Sun) 21:06:36 No.11590

>>11584 You probably should use virtual environments: https://realpython.com/python-virtual-environments-a-primer/

Robowaifu Technician 07/18/2021 (Sun) 21:13:31 No.11591

>>11589 >Update-update Turns out, the exact reason I wanted an IDE (rather than just a Python console) doesn't work on my machine even after downgrade-patching the lib dependencies. > I tried restarting several times, but didn't help. I noticed the IDE had a little message at the bottom 'LSP Python restarting' with a little orbiting icon. Finally the orbit stopped and the message changed to > maybe it's related to code-completion not working, etc. I don't want to hate Python, I really don't. It's simply impossible not to do, of course. D:< >pic related

Robowaifu Technician 07/18/2021 (Sun) 21:28:10 No.11592

>>11591 There seems to be an Anaconda installer, did anyone try that one? https://anaconda.org/anaconda/spyder Spyder really looks promising. Most machine learning scientists and engineers seem to use Jupyter notebooks, though. In case you didn't know. These two config frameworks were also mentioned, for managing configuration of ML projects without the notebooks: https://medium.com/pytorch/hydra-a-fresh-look-at-configuration-for-machine-learning-projects-50583186b710 https://github.com/toml-lang/toml

Robowaifu Technician 07/19/2021 (Mon) 02:09:13 No.11598

>>11591 What OS or distro are you using? Also:

python --version
pip --version
python -m pip --version
pip show torch tensorflow spyder
uname -a

>>11592 Jupyter notebooks are great for experimenting but not so much for writing libraries. Personally I write most of my code in Mousepad that only has syntax highlighting and only touch Spyder when I'm writing clean code for other people to use.

Robowaifu Technician 07/23/2021 (Fri) 21:19:30 No.11679

Sentdex created an open-source Python code transformer model like Github Copilot. It has only been trained 2 epoches so it's not great but it's interesting and fun to play around with. I feel it's gonna be important to stay on top of these developments to keep up with AI-accelerated productivity so I made an easy-to-use GUI to inference these models (just press Control+Tab to generate.) GottaPyFast: https://gitlab.com/robowaifudev/gottapyfast/ Sentdex Demo: https://www.youtube.com/watch?v=1PMECYArtuk Sentdex Model: https://huggingface.co/Sentdex/GPyT How to use the model:

from transformers import AutoTokenizer, AutoModelWithLMHead

# set device to cuda if you have a GPU
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained("Sentdex/GPyT")
model = AutoModelWithLMHead.from_pretrained("Sentdex/GPyT").to(device)

def generate(model, query, max_length=100, top_p=0.9, temperature=1.0, do_sample=True):
    newlinechar = "<N>"
    query = query.replace("\n", newlinechar)
    tokenized = tokenizer.encode(query, return_tensors="pt").to(model.device)
    response = model.generate(tokenized, max_length=max_length, do_sample=do_sample, top_p=top_p, temperature=temperature).to(model.device)
    decoded = tokenizer.decode(response[0])
    return decoded.replace("<N>","\n")

print(generate(model, "import torch", max_length=256, top_p=0.95, temperature=1.0))

Robowaifu Technician 07/24/2021 (Sat) 11:24:59 No.11683

Trying to package PyTorch and Transformers for Windows is a nightmare. I put together some notes here on how to get it to work: https://gitlab.com/robowaifudev/cookbook/-/wikis/python/pyinstaller I think using MLPack would be much more practical. PyTorch takes up a massive 3.1 GB which is completely unnecessary. Another option could be to rewrite the GPT2 transformer model to use NumPy instead which is only 5.4 MB. I'll look more into this later.

Robowaifu Technician 07/24/2021 (Sat) 15:23:11 No.11684

>>11683 Thanks for all your hard work here Anon. I apologize to you and everyone else here for being such a pussified faggot about Python. I recognize it's important to all of us, or else I wouldn't even consider picking it up. Please look into mlPack sooner rather than later if you at all can. It's probably our only real hope for doing waifu AI on a shoestring budget hardware-wise.

Robowaifu Technician 07/24/2021 (Sat) 22:23:24 No.11688

>>11684 MLPack's documentation is really lacking, especially for newer features and seems to be missing essential features. I'd I have to sit down with it for 3-6 months to get transformers and text-to-speech models working in it. I'm looking into using Chainer which is built on top of NumPy and quite popular in Japan. A basic application with Chainer packaged with PyInstaller compresses down to 14 MB. On top of that there's already lots of ML models implemented in it. I think if I roll out some waifu tech with Chainer to garner interest we could get some more help to build things in MLPack, which will be particularly useful for embedded systems and actual physical robowaifu.

Robowaifu Technician 07/24/2021 (Sat) 23:10:36 No.11689

>>11688 Migration guide from PyTorch to Chainer https://chainer.github.io/migration-guide/

Robowaifu Technician 12/10/2022 (Sat) 17:40:59 No.18103

I've been compressing datasets with zstd and using them with streaming decompression to save space, reduce SSD wear and speed up access to compressed data. It's also useful for previewing datasets saved as zst as they download. I couldn't find anything readily available on the net on how to do it so hopefully this saves someone else some time:

# installation: python -m pip install zstandard ijson

import zstandard as zstd
import ijson
# streaming decompression for JSON files compressed with zstd --long=31
with open("laion_filtered.json.zst", "rb") as f:
    dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31
    with dctx.stream_reader(f) as reader:
        for record in ijson.items(reader, "item"):
            print(record)

import io
import json
# streaming decompression for NDJSON/JSONL files compressed with zstd --long=31
with open("00.jsonl.zst", "rb") as f:
    dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31
    with dctx.stream_reader(f) as reader:
        for line in io.BufferedReader(reader):
            record = json.loads(line)
            print(record)

import csv
# streaming decompression for TSV files compressed with zstd --long=31
with open("Image_Labels_Subset_Train_GCC-Labels-training.tsv.zst", "rb") as f:
    dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31
    with dctx.stream_reader(f) as reader:
        buffered_reader = io.BufferedReader(reader)
        wrapper = io.TextIOWrapper(reader)
        csv_reader = csv.reader(wrapper, delimiter='\t') # or csv.DictReader if it has fieldnames
        for record in csv_reader:
            print(record)

Also I recommend building zstd from source since the latest version has improved performance and compression.

Robowaifu Technician 12/10/2022 (Sat) 20:04:48 No.18106

>>18103 Excellent work Anon, thank you. Nice clean-looking code too, BTW.

NoidoDev ##eCt7e4 08/16/2023 (Wed) 17:36:40 No.24658

Advanced use of exceptions in Python for reliability and debugging: >I Take Exception to Your Exceptions: Using Custom Errors to Get Your Point Across https://youtu.be/wJ5EO7tnDiQ (audio quality is a bit suboptimal)

Chobitsu 08/16/2023 (Wed) 20:35:45 No.24660

>>24658 Great stuff NoidoDev. Exceptions are based.

NoidoDev ##pTGTWW 09/22/2023 (Fri) 03:19:28 No.25441

I watched this video here in 1.75x speed as a refresher, since I had some gaps from not writing Python in quite a while. It might also work for beginners with experience in some other language: > Python As Fast as Possible - Learn Python in ~75 Minutes https://youtu.be/VchuKL44s6E As a beginner you should of course go slower and test the code you've learned. While we're at it, Python got a lot of new features, including compiling better to C-code now using Cython: https://youtu.be/e6zFlbEU76I

Robowaifu Technician 10/23/2023 (Mon) 13:46:07 No.26089

https://automatetheboringstuff.com/ Free online ebook. >Practical Programming for Total Beginners >If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you?

NoidoDev ##pTGTWW 11/29/2023 (Wed) 19:24:45 No.26623

Related: Some anon wants to get started >>26616

Barf 01/20/2025 (Mon) 18:15:03 No.35923

>Python dependency-hell I think this might biggest barriers to entry. There are a lot of pre-compiled solutions like Backyard, but they all have limitations to me and hard to tweak like adding new voices\changing whisper models. My issue is most developers use Mac\Linux and I'm just a gamer on Windows. Getting start with Python Windows is mostly just understanding environments, but a lot of programs require the WSL (Linux subsystem) and Triton. Here are the steps I use to get my basic bot going on Windows 1.) Download\Install Visual Studio Code 2.) Download\Install Python and Git 3.) In Visual studio code, open the folder your project is in or the directory you cloned a GitHub to. 4.) Open a Terminal within Visual Studio Code and create a new Python Environment for your project by running - python -m venv myenv This step is crucial to avoid dependency issues in the future, as all packages you download will now be isolated to that environment. If you do not do this, all your package dependencies will become part of "Global\System" python environment and conflict with each other 5.) Install Pytorch - https://pytorch.org/get-started/locally/ For my project, I used CUDA 12.1, so the command is - pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 6.) Once you have these installed, most other packages should install fine unless they require things like WSL\Triton. This should be enough to get OpenAI transformers, Whisper and many TTS systems working for a basic chatbot Please add anything I missed!

Chobitsu 01/20/2025 (Mon) 23:35:06 No.35927

>>35923 POTD <---> Super-helpful information, Barf! Especially the little trick about running the python -m venv myenv statement. This type of tutorial-style, concrete example is exactly what a community like this needs. Its why I spent long hours creating C++ learning resources here. Also, its seemingly too easy for the old hands to forget what it was like being a newfag 'still wet behind the ears', and therefore failing to share the basics with other Anons here; 'one generation to the next' as it were. And that's a shame, because not only do the newcomers here often never get over these basic hurdles, but there's a real risk that the important information itself gets lost to the winds of time (or -- more likely -- cloistered behind info-barriers like D*xxcord, thereafter only available to the Globohomo itself all by-design on their parts, of course). >tl;dr I'm simply encouraging every Anon here yes, this means you! :D to share their knowledge one with the other. And don't worry even if you think its been said here before! This board is absolutely packed with information, and it can be especially intimidating to the newcomer trying to find some pertinent bit of info. Please just share it on the spot (again) when & where it's most useful. (Also, crosslinks are good for this.) >ttl;dr Some things bear repeating here. :^) <---> Cheers, Anons. >=== -prose edit

Edited last time by Chobitsu on 01/21/2025 (Tue) 02:03:18.

Grommet 01/21/2025 (Tue) 01:34:26 No.35928

Thanks for all this work, all.

Classic Chatbot Connoisseur GreerTech 01/21/2025 (Tue) 14:13:37 No.35955

On the chatbot front, I've been working to update my old Python chatbot to actually be a good companion. Here's a sample of my work, but I'm far from done. https://files.catbox.moe/pf85ai.zip

Chobitsu 01/22/2025 (Wed) 03:04:39 No.35971

>>35955 Thanks, GreerTech! Personally, I'm a big fan of Cleverbot! [1] :DD JK, good luck with revamping it, Anon. Cheers. :^) --- 1. Remarkably, it's still available!! https://www.cleverbot.com/

Barf 01/22/2025 (Wed) 04:44:42 No.35976

>>35928 Fun bot and easy to install. Another suggestion for Python is using pre-compiled C binaries and just calling the CLI via python. It should make it a little faster, more modular and gets around having to install PyTorch so much easier to ship. Here are links to the latest Whisper and Piper binaries https://github.com/ggerganov/whisper.cpp/actions/runs/12886572193 https://github.com/rhasspy/piper/releases/tag/2023.11.14-2 It's kind of the best of both worlds since you can code in python and still get the speed\ease of pre-compiled binaries for the STT\TTS system.

Chobitsu 01/24/2025 (Fri) 17:51:22 No.36034

>>35976 Thanks very kindly for the links here, Barf. Cheers. :^)

Grommet 01/28/2025 (Tue) 22:51:32 No.36210

>>35976 I wonder, is it possible to make a Appimage(Linux) https://appimage.org/ or 0install( Linux, Windows and macOS) https://0install.net/ download program for Linux? These have all the files needed to run whatever program installed all in one place. No additional installations needed.

GreerTech 05/30/2025 (Fri) 20:07:54 No.38837

I found this book while looking on Amazon https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167/

Chobitsu 05/30/2025 (Fri) 22:14:00 No.38848

>>38837 I've bought numerous books from that imprint. Think you'll pursue this sometime, Anon?

GreerTech 05/30/2025 (Fri) 22:18:20 No.38850

>>38848 Yeah, it's definitely a good route for the future.

Chobitsu 05/30/2025 (Fri) 22:23:46 No.38851

>>38850 Great! Please let us all know how it goes once you're underway with that, GreerTech. Cheers. :^)