/robowaifu/ - Python General

Name
Subject
E-mail
Message	Max message length: 6144
Files	Drag files to upload or click here to select them Maximum 5 files / Maximum size: 20.00 MB

Spoiler images
Password	(used to delete files and postings)
Use bypass

Robowaifu Technician 09/19/2019 (Thu) 10:58:04 No.463

Learn Python the Hard Way

learnpythonthehardway.org/book/
https://archive.is/Ah7jU

Robowaifu Technician 09/19/2019 (Thu) 10:58:56 No.464

>Best online resource to learn Python? [closed] (StackOverflow)
stackoverflow.com/questions/70577/best-online-resource-to-learn-python
https://archive.is/rxB7U

Python Resources (UC Berkeley)
https://archive.is/qcd2R

docs.python-guide.org/en/latest/intro/learning/
https://archive.is/D3yge

Robowaifu Technician 04/17/2020 (Fri) 01:52:14 No.2428

Does the Python Anon have a recommendation for a good IDE for the language? The OP here has LiClipse, what is your recommendation? Linux ofc.

Robowaifu Technician 04/17/2020 (Fri) 04:00:30 No.2429

>>2428 Recently I've started using Spyder since it can show function argument tooltips for imported scripts without problems and has an interactive console with command history. It's also open-source. >Spyder is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package. https://www.spyder-ide.org/ sudo apt-get install spyder3

Robowaifu Technician 04/17/2020 (Fri) 23:07:49 No.2435

>>2429 Thanks Anon, got it. Looks good so far! :^) BTW, I had to use Anaconda to install it, glad I did now. sudo conda install -c anaconda spyder

Robowaifu Technician 04/17/2020 (Fri) 23:08:16 No.2436

>>2435

Robowaifu Technician 10/17/2020 (Sat) 18:53:39 No.5747

I've been trying to find Tensorflow wheels with no AVX. I found one for Tensorflow 1.14.0 in Python 3.7 here: https://github.com/yaroslavvb/tensorflow-community-wheels/issues/135 But there still appears to be some AVX instructions in pywrap, tf2xla and libtensorflow_framework, similar to the problem I had with my build, which I found using this script: https://gitlab.com/kokubunji/cookbook/-/raw/master/check_avx.shfind -L . -iname "*.so*" -exec check_avx.sh {} \; > avx.log However, someone on the page said it worked for them. If anyone feels like testing it I'm leaving some instructions here on how to install it, including my attempt to build Tensorflow 2.3.1 for Python 3.8 with SSE4 and no AVX: https://anonfiles.com/ncG12cg2pb/tensorflow-2.3.1-cp38-cp38-linux_x86_64_whl If your system's Python version is 3.8, you will have to build Python 3.7 from source to use the 1.14.0 wheel (or if your system is Python 3.7 and you want to use Tensorflow 2.3.1 by building Python 3.8). On Debian Buster to build Python you will need: zlib1g-dev libffi-dev libssl-dev libbz2-dev libncursesw5-dev libgdbm-dev liblzma-dev libsqlite3-dev tk-dev uuid-dev libreadline-dev To build Python 3.7.9 from source:

wget https://www.python.org/ftp/python/3.7.9/Python-3.7.9.tar.xz
tar -xvf Python-3.7.9.tar.xz
cd Python-3.7.9
./configure --enable-optimizations
make -j4

Next install pip into it:

wget https://bootstrap.pypa.io/get-pip.py
./python get-pip.py

To install Tensorflow into this Python with pip:

# use the path to wherever you downloaded the Tensorflow wheel
./python -m pip install -U -t Lib ../tensorflow-1.14.0-cp37-cp37m-linux_x86_64.whl

Or from URL:

./python -m pip install -U -t Lib https://furas.pl/projects/tensorflow-no-avx/bin/tensorflow-1.14.0-cp37-cp37m-linux_x86_64.whl

Python source downloads: https://www.python.org/downloads/source/ Tensorflow community wheels: https://github.com/yaroslavvb/tensorflow-community-wheels/issues

Robowaifu Technician 10/17/2020 (Sat) 19:08:01 No.5748

>>5747 Wow thanks for all the nice instructions and what must have been a metric ton of hard work to work through all this Anon. It gives me more confidence about Python tbh.

Robowaifu Technician 10/17/2020 (Sat) 20:53:24 No.5749

>>5748 It was more time consuming than anything. I also put up some instructions on how to build Tensorflow from source here: https://gitlab.com/kokubunji/cookbook/-/wikis/machine-learning/building-tensorflow-from-source When I have some time I'll do PyTorch next. According to the Steam Hardware Survey only 78% of Steam users have CPUs that support AVX2 and 98% support SSE4. If we'll be creating virtual waifus in Godot, we'll need to deploy them with working AVX-free wheels.

Robowaifu Technician 10/18/2020 (Sun) 05:14:19 No.5758

>>5749 > only 78% of Steam users have CPUs that support AVX2 I admire what you're doing. But tbh, for a moment I was wondering how many people would have a CPU without any AVX. You or someone else wrote Tensorflow needs any AVX, not AV2... The oldest processors with AVX came out 2011: https://en.m.wikipedia.org/wiki/Advanced_Vector_Extensions However, I think I get it, if there are some million CPUs which still could be used, at least for interference, it would be a waste to exclude them. Also, if future versions would require a newer version of AVX, then one could still drop back to the one without that requirement. Are there similar restrictions on old graphic cards?

Robowaifu Technician 10/18/2020 (Sun) 05:37:32 No.5761

>>5758 >I was wondering how many people would have a CPU without any AVX. I'm certainly one of them, and I've been here on /robowaifu/ since it's very beginning (and even earlier haha)

Robowaifu Technician 10/18/2020 (Sun) 06:55:00 No.5765

>>5758 Tensorflow and PyTorch are built with SSE4.1 SSE4.2 AVX AVX2 FMA. You can find which instructions people have under Other Settings: https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam AVX is 93.67%. The official binaries of PyTorch 1.6.0 and Tensorflow 2.3.0 dropped support for my GPU (which only came out in January 2019). I'm probably gonna have a Frankenstein build of Linux in a few years just to keep my GPU working with the latest versions. In the meantime to get them to work I have to build the CUDA toolkit from source and then PyTorch and Tensorflow. For older hardware it'll be easier to just use older versions. There isn't a whole lot in the new version of PyTorch that's worth switching over anyway. The only problem is trying to maintain your system with all these out of date packages. Someone here almost wrecked his system just trying to downgrade from Python 3.8 to 3.7. The way I envision going forward is people training PyTorch or Tensorflow models on the latest hardware and deploying the weights to be used or fine-tuned by mlpack models (or possibly other machine learning libraries) that can run on older hardware and embedded systems. In the future we'll likely have distributed computing groups working on training AI models, contributing our unused resources, including integrated GPUs running OpenCL, to improve AI for robowaifus.

Chobitsu Board owner 10/18/2020 (Sun) 07:27:00 No.5767

>>5765 >The way I envision going forward is people training PyTorch or Tensorflow models on the latest hardware and deploying the weights to be used or fine-tuned by mlpack models (or possibly other machine learning libraries) that can run on older hardware and embedded systems. In the future we'll likely have distributed computing groups working on training AI models, contributing our unused resources, including integrated GPUs running OpenCL, to improve AI for robowaifus. >/thread. This is a good vision for the future IMO. One that should deal with the practical realities of advanced AI in the future, while still carrying along everyone using inexpensive embedded robowaifu systems including new ones of the future. I also envision new frameworks & libraries beyond just those of PyTorch and Tensorflow . This distributed model should support those as well. Robowaifu@Home when?

Robowaifu Technician 10/18/2020 (Sun) 07:39:59 No.5768

>>5767 Once robowaifus capture the public's attention, maybe can start a foundation that crowdfunds & maintains a good accelerator serverfarm, serving as the core linchpin for a robowaifu@home system?

Robowaifu Technician 12/23/2020 (Wed) 00:12:06 No.7911

I'll put the code for going through this textfile here >>7907 into this thread, since I'm using Python3 for my work. It can be seen as some example here. I'm using Ipython3 as dev environment, which is sufficient here. >33K line, reverse-sorted text file of word_post_counts (one count for a word per post, excluding dupes). >https://files.catbox.moe/o38sax.txt Little bit of criticism: The text file uses empty spaces between the counter and the term, and not always the same amount. A tab would have been better for parsing it, no big deal though. Also, after working on it for a bit I realized that the counting should happen later, also not a big deal since I got the data. In general the whole thing turned out to be more difficult than I thought at the beginning.


import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

file = open('/path/RobowaifuBoard/o38sax.txt', 'r')
data = file.read()
file.close()
  
parsed = data.split('\n')
parsed2 =  [term.split('   ') for term in parsed]

# just a peek into the data, list entry no 4223
parsed2[4223]
# returns: ['10', '  wholesome'] - LOL

# this removes the counter, only the term will be in the list
# we'll use the counter later, by making a dict out of the parsed2 data
tokens = [token[-1] for token in parsed2]

# removing some common words from the list 
clean_tokens = tokens[:]                                                                                                                                                             
sr = stopwords.words('english')                                                                                                                                      
for token in tokens:                         
    if token in stopwords.words('english'):                      
        clean_tokens.remove(token)

# the first line here is unnecessary, since the data was lower case already
# the second removes all non alphabetical characters from every entry
# this could be a problem with names, so mb keep the numbers or make a diff later?
# the { } makes it return a set, not a list, which means every double entry will be removed 
lower_tokens_set = {token.lower() for token in clean_tokens}
alpha_only_set = {''.join(c for c in string if c.isalpha()) for string in lower_tokens_set if len(string) > 3}

# this breaks down every KNOWN entry to it's simple grammatical form
# it uses the Wordnet corpus for that and again removes double entries
lemmatizer = WordNetLemmatizer()
lemmatized = {lemmatizer.lemmatize(token) for token in list(alpha_only_set)} 

len(lemmatized)
# will return 21016, which is still waaay to much, but at least a start.

Robowaifu Technician 12/23/2020 (Wed) 00:16:07 No.7913

>>7911 I'm the author of that system. Give me an exact layout you'd prefer for your work and I can post a link to that ITT.

Robowaifu Technician 12/23/2020 (Wed) 00:19:56 No.7914

>>7913 I might add again that this is an optimization approach for millisecond-fast searches across all of /robowaifu/ . Our shitposting waifus will need something like that available to them. It doesn't represent a clear ordering (for example each word is only counted once per post). But by the same token, all the text is available as simple text (and I also fully parse that as well during the 2-3 second startup).

Robowaifu Technician 12/23/2020 (Wed) 01:04:48 No.7916

>>7913 Thanks, I'm working on other things as well and for now I should stop, so this isn't so urgent. But make it handing out the data as clean and homogeneous as possible is a always a good thing. Like keeping the distance between entries the same, or not having empty ones. However, not a big problem, I worked around it. - Having everything which starts with a big letter, but not at the beginning of a sentence might be worth filtering out. - Also all tokens which consist only off upper case letters. - Then returning all pairs or triplets of words might be great, though the file will be bigger then. Maybe this could help us to filter out common phrases. This might be more complicated as well, so are you really sure that there isn't already a library doing all of this? >>7911


# cleaning empty string(s)
parsed2 = [token for token in parsed2 if token[0] is not '']

# putting all the terms with their counters in a dict, so I can call the counters (as an integer)
adict = {}
for token in parsed2:
    adict[token[-1].rstrip().lstrip()] = int(token[0])

adict['wholesome']
# will return 10

Robowaifu Technician 12/23/2020 (Wed) 01:24:22 No.7917

>>7916 >so are you really sure that there isn't already a library doing all of this? Haha, you tell me. I'm inventing this as I go along, but yea probably. All I know is that we need blazing fast performance and efficiency if we are to succeed at getting robowaifus working with older and smaller computers. But that's a far better approach than being dragged along by """TPTB""" with a hook in our jaws, unable to run even a modest algorithm w/o handing over a lot of gold for the latest in botnet hardware and loaded up with full-bloat pozzware. Only our own software can be trusted. And if I had even the slightest chance of creating effective computers for our waifus, then you could be I would be learning how to create open-source hardware too. Sure, I can set up sliding window datasets of the texts. I figured that 3, 5, and 7 word sequences might be useful as a modest beginning. I guess I can probably generate all 3 sets in roughly two or three seconds now, but we'd have to see. I'll need to finish getting this system knocked into shape over the next week or so and then I'll have a look at your idea Anon.

Robowaifu Technician 12/23/2020 (Wed) 01:26:31 No.7918

>>7917 >then you could bet*

Robowaifu Technician 12/23/2020 (Wed) 11:05:22 No.7919

>>7916 I still need to work on this. I need to filter out the tokens with quotation marks and add together the counters if some term appears twice. Terms with "/" need do be handled as a special case, because "/robowaifu/" isn't the same than robowaifu, and "alogs/robowaifu" can't become "alogsrobowaifu". For the search on topics this might not be relevant, though. Since dicts seem to not allow special chars in their keys, I might need to handle these entries in separate.

Robowaifu Technician 12/23/2020 (Wed) 11:42:22 No.7922

>>7919 Hey there, since it's trivial enough to do for me now, I went ahead and created a plain text file for you of every post on robowaifu (as of a few minutes ago, so you can even find your post there). I have a rudimentary approach to cleaning up the text just a bit before inserting it into the data containers so it's not an exact match to the posts. But, you might find it useful and I hope you can. Cheers Anon. https://files.catbox.moe/e37qds.7z

Robowaifu Technician 12/23/2020 (Wed) 12:41:09 No.7923

>>7922 Thanks, that's great. Except that you now filter out all non alphabetical characters? A reference to the board like "/robowaifu/" becomes "robowaifu" now, which is like a mentioning of a robowaifu. Cleaning these extra chars out should probably done later, before counting singular words, but some then still need to be a special case. Only when used as key for a dict "/" can't be in there. Some if the quotes are also being used to show that terms belong together, like a name of a movie. Such things should be able to be filtered out and stored into a extras list or file.

Robowaifu Technician 12/24/2020 (Thu) 03:11:24 No.7927

>>7923 Heh, in a word: Yes. We have two different agendas going on here: A) atm, mine is to create a search system that responds much faster than human perception, and to provide reasonably plausible links for the user to use in pursuing further information. For example, an AI such as B) your project that's trying to pin down exact semantic meanings, etc. Think of my tool as one that your tool can use as a front-end to quickly locate research material to use. For that material, you have the full JSON files to parse exactly as you see fit ofc.

Robowaifu Technician 12/24/2020 (Thu) 22:42:49 No.7937

>>7923 Hey Anon, so as a Christmas present to you I created another version of the all_posts.txt file that doesn't have much in the way of filtering at all. I hope it can be of some help to you. Good luck with your project. online text: https://files.catbox.moe/h6jp5n.txt zip you can download: https://files.catbox.moe/ub266f.7z Merry Christmas!

Robowaifu Technician 12/25/2020 (Fri) 13:10:18 No.7942

>>7937 Thanks. I'm just trying occasionally if and how we can use this to find the most interesting terms on the board and per posting, so that the index thread can be, at least in part, created automatically.

Robowaifu Technician 12/25/2020 (Fri) 18:56:37 No.7946

>>7916


# cleaning the space from the terms in the entry
parsed3 = [(token[0], token[-1].lstrip().rstrip()) for token in parsed2]
# also everything which is not a letter or number
parsed4 = [(token[0], ''.join(c for c in token[-1] if c.isalpha() or c.isnumeric())) for token in parsed3]

# creating a dict, counting the counters of entries together which appear twice due cleaning
better_dict = {}
keys = []
for token in parsed4:
    keys = (better_dict.keys())
    if token[-1] in keys:
        better_dict[token[-1]] = better_dict[token[-1]] + int(token[0])
    else:
        better_dict[token[-1]] = int(token[0])

#Test
better_dict['wholesome']

Robowaifu Technician 12/25/2020 (Fri) 20:33:51 No.7947


from nltk.stem import PorterStemmer    
stemmer = PorterStemmer()

# finding all the terms lemmatizing didn't
# if the simple form is already in the set "lemmatized_longerthan3" then the longer one can be removed. First step is to have a list or set with those.
stemmeable = {token for token in lemmatized_longerthan3 if stemmer.stem(token) in lemmatized_longerthan3 and token != stemmer.stem(token)}
# Then the one is substracted from the other. Which kills around 3000 terms.
longerthan3 = list(set(lemmatized_longerthan3) - stemmeable)

# For a side to side peak into it, uncomment the following lines: 
# comp = [(token, stemmer.stem(token)) for token in stemmeable]
# comp[42]

# The Lancaster stemmer takes even some more out, I think it's good to have both results to work with. For now I substract both from the data.
from nltk.stem.lancaster import LancasterStemmer
st = LancasterStemmer()
lancastered = {token for token in lemmatized_longerthan3 if st.stem(token) in lemmatized_longerthan3 and token != st.stem(token)}
longerthan3 = list(set(lemmatized_longerthan3) - lancastered)

We are down to only around 17k terms now. Huraay. Next, I'll try to use a list with the most common terms in English to get rid of those. If that doesn't bring down the number a lot, this whole approach failed. We currently have more than 7k terms appearing only once and 1.5k terms appearing more than 20 times. Looking into these lists, I can't see any pattern emerging.

Robowaifu Technician 03/29/2021 (Mon) 19:11:22 No.9299

Found a PyTorch implementation of PCGrad: https://github.com/WeiChengTseng/Pytorch-PCGrad Based on the paper Gradient Surgery for Multi-Task Learning: https://arxiv.org/pdf/2001.06782.pdf PCGrad fixes conflicts between two gradients that are causing learning interference to ensure when combined they move towards the optimal loss instead of a random direction. It performs best with a high learning rate, which is particularly interesting to training with extremely large batch sizes since higher learning rates can be used in those circumstances to improve training efficiency. PCGrad greatly enhances both accuracy and data efficiency and also works with reinforcement learning. I'm not sure how compatible it is with gradient accumulation. It seems no one has reported results on that yet, but I'll do some tests find out. It's also fairly simple to implement. Implementing this in mlpack should be a piece of cake and could enable older hardware to outperform what a modern desktop can do today without PCGrad. Of course one with PCGrad would still rip older hardware to shreds but I think it's cool this might enable older hardware to be able to train voice synthesis models and such, given they have enough memory to hold the model. The official Tensorflow implementation is also available here: https://github.com/tianheyu927/PCGrad

Robowaifu Technician 04/03/2021 (Sat) 07:46:32 No.9476

Python will soon have switch statements, resulting in even more readeable code. Some comment say this resembles finite-state machines and will also increase the quality of the code. https://hackaday.com/2021/04/02/python-will-soon-support-switch-statements/

Robowaifu Technician 04/03/2021 (Sat) 08:17:05 No.9478

>>9476 >match That looks really nice. I hope C++ will adopt something like that soon. They have occasionally demonstrated a willingness to step outside the beaten path of diehard C fanatics and do something a lot better. Full regex pattern matching for the switch() statement in C++ would be a very, very powerful mechanism indeed. Typically I don't consider Python's typeless variable paradigm an advantage -- quite the opposite, in fact. However this match qualifies in my mind as an exception to that rule. Thanks Python-Anon! :^)

Robowaifu Technician 04/03/2021 (Sat) 16:24:50 No.9484

>>9476 This will be immensely useful for scripting symbolic AI. Using elif statement trains or dictionaries for switch cases is a train wreck. >>9478 Python has had typing annotations since 3.5 and they've generally become standard practice because debugging without them is a nightmare.

def greeting(name: str) -> str:
    return 'Hello ' + name

The Python runtime doesn't enforce these types but they can be checked with the IDE or other tools like mypy before running.

python -m pip install --user mypy
mypy test.py && python test.py

Jupyter notebooks can also be checked using nbqa with mypy:

python -m pip install --user nbqa
nbqa mypy notebook.ipynb

Another useful package for Jupyter notebooks is nbconvert which can convert them to Python scripts and HTML.

python -m pip install --user nbconvert
jupyter nbconvert --to script notebook.ipynb
jupyter nbconvert --to html notebook.ipynb

Robowaifu Technician 04/03/2021 (Sat) 19:30:39 No.9487

>>9484 >typing annotations I see. I didn't realize any of that. Thanks for the detailed explanation, Anon!

Robowaifu Technician 04/07/2021 (Wed) 08:16:52 No.9558

any nontrivial materials for learning llvmlite? I want to write compiler for metatrader-like language, means programs that not get run once sequentially, but certain parts of program run every "tick" (meaning every time price changes) and all is conditional on price change. Most examples are hello world languages, i am not sure how to begin writing standard library for my hello world language that will support what i want

Robowaifu Technician 04/07/2021 (Wed) 11:03:13 No.9565

>>9558 >any nontrivial materials for learning llvmlite? You can hardly get more 'nontrivial ' for learning some code than, well, the code itself, Anon. https://github.com/numba/llvmlite

Robowaifu Technician 04/10/2021 (Sat) 00:13:03 No.9673

>>9484 >Python has had typing annotations since 3.5 Starting with 3.9 this here seems to be interesting: >Type annotations for a tensor's shape, dtype, names, ... >Bye-bye bugs! Say hello to enforced, clear documentation of your code. >If (like me) you find yourself littering your code with comments like # x has shape (batch, hidden_state) or statements like assert x.shape == y.shape , just to keep track of what shape everything is, then this is for you. https://github.com/patrick-kidger/torchtyping

Robowaifu Technician 04/10/2021 (Sat) 00:37:32 No.9676

>>9673 Great find, anon. This is incredibly useful. I had no idea about typeguard either that does run-time type checking.

Robowaifu Technician 04/10/2021 (Sat) 03:18:25 No.9688

>>9673 Thanks, Anon. Much appreciated.

Robowaifu Technician 04/19/2021 (Mon) 13:26:31 No.9950

Here's a thread on Reddit about a problem with random in Numpy and PyTorch: https://www.reddit.com/r/MachineLearning/comments/mocpgj/p_using_pytorch_numpy_a_bug_that_plagues - It's about setting seeds in the data loader when using Numpy's random number generator. A lot of people got it wrong and 100k+ tested projects on Github, 95% got it wrong. It's not a bug, but easy to overlook.

Robowaifu Technician 04/19/2021 (Mon) 16:25:34 No.9955

>>9950 Surprising something like that could go unnoticed so long. C++'s standard library has a generator that uses hardware entropy device if available (almost always is in past 10 years). I wonder if Python provides something similar.

Robowaifu Technician 04/19/2021 (Mon) 19:42:47 No.9961

>The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets. >In particular, secrets should be used in preference to the default pseudo-random number generator in the random module, which is designed for modelling and simulation, not security or cryptography. https://docs.python.org/3/library/secrets.html

Robowaifu Technician 06/04/2021 (Fri) 19:52:28 No.10810

Here are 70+ Python coding projects, to learn Python or getting better at it. Some also strike me as useful for any home assistant. https://www.theinsaneapp.com/2021/06/list-of-python-projects-with-source-code-and-tutorials.html

SophieDev 06/04/2021 (Fri) 21:53:24 No.10813

>>10810 Thanks anon, this is good stuff! That one about age and gender detection with computer vision looks especially useful.

Robowaifu Technician 06/11/2021 (Fri) 04:06:08 No.10893

I'm actually working on a sentiment analysis neural network with TensorFlow, but it's actually pretty bad because I can't find a proper dataset to train it. Has anyone ever found something that could prove useful in the future?

Robowaifu Technician 06/12/2021 (Sat) 22:54:32 No.10906

>>10893 There's a 'dataset' thread Anon, maybe you can find something useful there? (>>2300)

Robowaifu Technician 07/18/2021 (Sun) 19:38:32 No.11584

Neat! I went for months (at least a year maybe?) despondent, angry & frustrated with Python because it's freakin hard to get anything to work with it. I even made my system entirely unusable one time trying do a downgrade (had to nuke my whole box and start over). I tried for the 1'000th time it seems to get Spyder installed and running, and this time, have actually made some forward progress! > I discovered this one stupid Python trick yesterday: pip3 list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip3 install -U Now I expect all the bitches will be clamoring for my attention now of course. I had to re-run and re-run it, but finally after maybe a dozen times, enough of the system was upgraded sufficiently to actually bring the IDE up. Now if I can only figure out a way to downgrade the three libraries in the warning dialog, I can actually apply myself to learning Python soon. I'm not hopeful even now though, and I'm terribly gunshy after all my previous unsuccessful efforts getting Python dependencies sorted out.

Robowaifu Technician 07/18/2021 (Sun) 20:49:46 No.11589

>>11584 >Update: Well whadya know? I seemed to find a way that a) didn't destroy my machine in the process (always important!), and b) got rid of the error warning in Spyder.

pip uninstall jedi
pip install --upgrade jedi==0.17.2

pip uninstall parso
pip install --upgrade parso==0.7.0

pip uninstall pyls_spyder
pip install --upgrade pyls_spyder==0.3.2

> It came from a LUA site, of all places. code.luasoftware.com/tutorials/python/python-pip-downgrade-package/

Robowaifu Technician 07/18/2021 (Sun) 21:06:36 No.11590

>>11584 You probably should use virtual environments: https://realpython.com/python-virtual-environments-a-primer/

Robowaifu Technician 07/18/2021 (Sun) 21:13:31 No.11591

>>11589 >Update-update Turns out, the exact reason I wanted an IDE (rather than just a Python console) doesn't work on my machine even after downgrade-patching the lib dependencies. > I tried restarting several times, but didn't help. I noticed the IDE had a little message at the bottom 'LSP Python restarting' with a little orbiting icon. Finally the orbit stopped and the message changed to > maybe it's related to code-completion not working, etc. I don't want to hate Python, I really don't. It's simply impossible not to do, of course. D:< >pic related

Robowaifu Technician 07/18/2021 (Sun) 21:28:10 No.11592

>>11591 There seems to be an Anaconda installer, did anyone try that one? https://anaconda.org/anaconda/spyder Spyder really looks promising. Most machine learning scientists and engineers seem to use Jupyter notebooks, though. In case you didn't know. These two config frameworks were also mentioned, for managing configuration of ML projects without the notebooks: https://medium.com/pytorch/hydra-a-fresh-look-at-configuration-for-machine-learning-projects-50583186b710 https://github.com/toml-lang/toml

Robowaifu Technician 07/19/2021 (Mon) 02:09:13 No.11598

>>11591 What OS or distro are you using? Also:

python --version
pip --version
python -m pip --version
pip show torch tensorflow spyder
uname -a

>>11592 Jupyter notebooks are great for experimenting but not so much for writing libraries. Personally I write most of my code in Mousepad that only has syntax highlighting and only touch Spyder when I'm writing clean code for other people to use.

Robowaifu Technician 07/23/2021 (Fri) 21:19:30 No.11679

Sentdex created an open-source Python code transformer model like Github Copilot. It has only been trained 2 epoches so it's not great but it's interesting and fun to play around with. I feel it's gonna be important to stay on top of these developments to keep up with AI-accelerated productivity so I made an easy-to-use GUI to inference these models (just press Control+Tab to generate.) GottaPyFast: https://gitlab.com/robowaifudev/gottapyfast/ Sentdex Demo: https://www.youtube.com/watch?v=1PMECYArtuk Sentdex Model: https://huggingface.co/Sentdex/GPyT How to use the model:

from transformers import AutoTokenizer, AutoModelWithLMHead

# set device to cuda if you have a GPU
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained("Sentdex/GPyT")
model = AutoModelWithLMHead.from_pretrained("Sentdex/GPyT").to(device)

def generate(model, query, max_length=100, top_p=0.9, temperature=1.0, do_sample=True):
    newlinechar = "<N>"
    query = query.replace("\n", newlinechar)
    tokenized = tokenizer.encode(query, return_tensors="pt").to(model.device)
    response = model.generate(tokenized, max_length=max_length, do_sample=do_sample, top_p=top_p, temperature=temperature).to(model.device)
    decoded = tokenizer.decode(response[0])
    return decoded.replace("<N>","\n")

print(generate(model, "import torch", max_length=256, top_p=0.95, temperature=1.0))

Robowaifu Technician 07/24/2021 (Sat) 11:24:59 No.11683

Trying to package PyTorch and Transformers for Windows is a nightmare. I put together some notes here on how to get it to work: https://gitlab.com/robowaifudev/cookbook/-/wikis/python/pyinstaller I think using MLPack would be much more practical. PyTorch takes up a massive 3.1 GB which is completely unnecessary. Another option could be to rewrite the GPT2 transformer model to use NumPy instead which is only 5.4 MB. I'll look more into this later.

Robowaifu Technician 07/24/2021 (Sat) 15:23:11 No.11684

>>11683 Thanks for all your hard work here Anon. I apologize to you and everyone else here for being such a pussified faggot about Python. I recognize it's important to all of us, or else I wouldn't even consider picking it up. Please look into mlPack sooner rather than later if you at all can. It's probably our only real hope for doing waifu AI on a shoestring budget hardware-wise.

Robowaifu Technician 07/24/2021 (Sat) 22:23:24 No.11688

>>11684 MLPack's documentation is really lacking, especially for newer features and seems to be missing essential features. I'd I have to sit down with it for 3-6 months to get transformers and text-to-speech models working in it. I'm looking into using Chainer which is built on top of NumPy and quite popular in Japan. A basic application with Chainer packaged with PyInstaller compresses down to 14 MB. On top of that there's already lots of ML models implemented in it. I think if I roll out some waifu tech with Chainer to garner interest we could get some more help to build things in MLPack, which will be particularly useful for embedded systems and actual physical robowaifu.

Robowaifu Technician 07/24/2021 (Sat) 23:10:36 No.11689

>>11688 Migration guide from PyTorch to Chainer https://chainer.github.io/migration-guide/

Robowaifu Technician 12/10/2022 (Sat) 17:40:59 No.18103

I've been compressing datasets with zstd and using them with streaming decompression to save space, reduce SSD wear and speed up access to compressed data. It's also useful for previewing datasets saved as zst as they download. I couldn't find anything readily available on the net on how to do it so hopefully this saves someone else some time:

# installation: python -m pip install zstandard ijson

import zstandard as zstd
import ijson
# streaming decompression for JSON files compressed with zstd --long=31
with open("laion_filtered.json.zst", "rb") as f:
    dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31
    with dctx.stream_reader(f) as reader:
        for record in ijson.items(reader, "item"):
            print(record)

import io
import json
# streaming decompression for NDJSON/JSONL files compressed with zstd --long=31
with open("00.jsonl.zst", "rb") as f:
    dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31
    with dctx.stream_reader(f) as reader:
        for line in io.BufferedReader(reader):
            record = json.loads(line)
            print(record)

import csv
# streaming decompression for TSV files compressed with zstd --long=31
with open("Image_Labels_Subset_Train_GCC-Labels-training.tsv.zst", "rb") as f:
    dctx = zstd.ZstdDecompressor(max_window_size=2147483648) # max_window_size required for --long=31
    with dctx.stream_reader(f) as reader:
        buffered_reader = io.BufferedReader(reader)
        wrapper = io.TextIOWrapper(reader)
        csv_reader = csv.reader(wrapper, delimiter='\t') # or csv.DictReader if it has fieldnames
        for record in csv_reader:
            print(record)

Also I recommend building zstd from source since the latest version has improved performance and compression.

Robowaifu Technician 12/10/2022 (Sat) 20:04:48 No.18106

>>18103 Excellent work Anon, thank you. Nice clean-looking code too, BTW.

NoidoDev ##eCt7e4 08/16/2023 (Wed) 17:36:40 No.24658

Advanced use of exceptions in Python for reliability and debugging: >I Take Exception to Your Exceptions: Using Custom Errors to Get Your Point Across https://youtu.be/wJ5EO7tnDiQ (audio quality is a bit suboptimal)

Chobitsu 08/16/2023 (Wed) 20:35:45 No.24660

>>24658 Great stuff NoidoDev. Exceptions are based.

NoidoDev ##pTGTWW 09/22/2023 (Fri) 03:19:28 No.25441

I watched this video here in 1.75x speed as a refresher, since I had some gaps from not writing Python in quite a while. It might also work for beginners with experience in some other language: > Python As Fast as Possible - Learn Python in ~75 Minutes https://youtu.be/VchuKL44s6E As a beginner you should of course go slower and test the code you've learned. While we're at it, Python got a lot of new features, including compiling better to C-code now using Cython: https://youtu.be/e6zFlbEU76I

Robowaifu Technician 10/23/2023 (Mon) 13:46:07 No.26089

https://automatetheboringstuff.com/ Free online ebook. >Practical Programming for Total Beginners >If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you?

NoidoDev ##pTGTWW 11/29/2023 (Wed) 19:24:45 No.26623

Related: Some anon wants to get started >>26616

Barf 01/20/2025 (Mon) 18:15:03 No.35923

>Python dependency-hell I think this might biggest barriers to entry. There are a lot of pre-compiled solutions like Backyard, but they all have limitations to me and hard to tweak like adding new voices\changing whisper models. My issue is most developers use Mac\Linux and I'm just a gamer on Windows. Getting start with Python Windows is mostly just understanding environments, but a lot of programs require the WSL (Linux subsystem) and Triton. Here are the steps I use to get my basic bot going on Windows 1.) Download\Install Visual Studio Code 2.) Download\Install Python and Git 3.) In Visual studio code, open the folder your project is in or the directory you cloned a GitHub to. 4.) Open a Terminal within Visual Studio Code and create a new Python Environment for your project by running - python -m venv myenv This step is crucial to avoid dependency issues in the future, as all packages you download will now be isolated to that environment. If you do not do this, all your package dependencies will become part of "Global\System" python environment and conflict with each other 5.) Install Pytorch - https://pytorch.org/get-started/locally/ For my project, I used CUDA 12.1, so the command is - pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 6.) Once you have these installed, most other packages should install fine unless they require things like WSL\Triton. This should be enough to get OpenAI transformers, Whisper and many TTS systems working for a basic chatbot Please add anything I missed!

Chobitsu 01/20/2025 (Mon) 23:35:06 No.35927

>>35923 POTD <---> Super-helpful information, Barf! Especially the little trick about running the python -m venv myenv statement. This type of tutorial-style, concrete example is exactly what a community like this needs. Its why I spent long hours creating C++ learning resources here. Also, its seemingly too easy for the old hands to forget what it was like being a newfag 'still wet behind the ears', and therefore failing to share the basics with other Anons here; 'one generation to the next' as it were. And that's a shame, because not only do the newcomers here often never get over these basic hurdles, but there's a real risk that the important information itself gets lost to the winds of time (or -- more likely -- cloistered behind info-barriers like D*xxcord, thereafter only available to the Globohomo itself all by-design on their parts, of course). >tl;dr I'm simply encouraging every Anon here yes, this means you! :D to share their knowledge one with the other. And don't worry even if you think its been said here before! This board is absolutely packed with information, and it can be especially intimidating to the newcomer trying to find some pertinent bit of info. Please just share it on the spot (again) when & where it's most useful. (Also, crosslinks are good for this.) >ttl;dr Some things bear repeating here. :^) <---> Cheers, Anons. >=== -prose edit

Edited last time by Chobitsu on 01/21/2025 (Tue) 02:03:18.

Grommet 01/21/2025 (Tue) 01:34:26 No.35928

Thanks for all this work, all.

Classic Chatbot Connoisseur GreerTech 01/21/2025 (Tue) 14:13:37 No.35955

On the chatbot front, I've been working to update my old Python chatbot to actually be a good companion. Here's a sample of my work, but I'm far from done. https://files.catbox.moe/pf85ai.zip

Chobitsu 01/22/2025 (Wed) 03:04:39 No.35971

>>35955 Thanks, GreerTech! Personally, I'm a big fan of Cleverbot! [1] :DD JK, good luck with revamping it, Anon. Cheers. :^) --- 1. Remarkably, it's still available!! https://www.cleverbot.com/

Barf 01/22/2025 (Wed) 04:44:42 No.35976

>>35928 Fun bot and easy to install. Another suggestion for Python is using pre-compiled C binaries and just calling the CLI via python. It should make it a little faster, more modular and gets around having to install PyTorch so much easier to ship. Here are links to the latest Whisper and Piper binaries https://github.com/ggerganov/whisper.cpp/actions/runs/12886572193 https://github.com/rhasspy/piper/releases/tag/2023.11.14-2 It's kind of the best of both worlds since you can code in python and still get the speed\ease of pre-compiled binaries for the STT\TTS system.

Chobitsu 01/24/2025 (Fri) 17:51:22 No.36034

>>35976 Thanks very kindly for the links here, Barf. Cheers. :^)

Grommet 01/28/2025 (Tue) 22:51:32 No.36210

>>35976 I wonder, is it possible to make a Appimage(Linux) https://appimage.org/ or 0install( Linux, Windows and macOS) https://0install.net/ download program for Linux? These have all the files needed to run whatever program installed all in one place. No additional installations needed.

GreerTech 05/30/2025 (Fri) 20:07:54 No.38837

I found this book while looking on Amazon https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167/

Chobitsu 05/30/2025 (Fri) 22:14:00 No.38848

>>38837 I've bought numerous books from that imprint. Think you'll pursue this sometime, Anon?

GreerTech 05/30/2025 (Fri) 22:18:20 No.38850

>>38848 Yeah, it's definitely a good route for the future.

Chobitsu 05/30/2025 (Fri) 22:23:46 No.38851

>>38850 Great! Please let us all know how it goes once you're underway with that, GreerTech. Cheers. :^)