doordesk-js/dennis/static/blog/20220701-progress.md

85 lines
4.4 KiB
Markdown

Content_Type: blog
Title: It's a post about nothing!
Date: 2022 7 1
The progress update
<p style='text-align: center;'>
<img src="https://old.doordesk.net/pics/plates.gif" />
</p>
### Bots
After finding a number of ways not to begin the project formerly known as my capstone,
I've finally settled on a
[dataset](https://www.kaggle.com/datasets/bwandowando/ukraine-russian-crisis-twitter-dataset-1-2-m-rows). The project is about detecting bots, starting with twitter. I've
[studied](https://old.doordesk.net/projects/bots/docs/debot.pdf) a
[few](https://old.doordesk.net/projects/bots/docs/botwalk.pdf)
[different](https://old.doordesk.net/projects/bots/docs/smu.pdf)
[methods](https://old.doordesk.net/projects/bots/docs/div.pdf) of bot detection and particularly like the
[DeBot](https://old.doordesk.net/projects/bots/docs/debot.pdf) and
[BotWalk](https://old.doordesk.net/projects/bots/docs/botwalk.pdf) methods and think I will try to mimic them,
in that order.
Long story short, DeBot uses a fancy method of time correlation to group accounts
together based on their posting habits. By identifying accounts that all have identical
posting habits that are beyond what a human could do by coincidence, this is a great
first step to identifying an inital group of seed bots. This can then be expanded by
using BotWalk's method of checking all the followers of the bot accounts and comparing
anomalous behavior to separate humans from non-humans. Rinse and repeat. I'll begin this
on twitter but hope to make it platform independent.
### The Real Capstone
The bot project is too much to complete in this short amount of time, so instead I'm
working with a
[small dataset](https://archive-beta.ics.uci.edu/ml/datasets/auto+mpg)
containing info about cars with some specs and I'll predict MPG. The problem itself for
me is trivial from past study/experience as an auto mechanic so I should have a nice
playground to focus completely on modeling. It's a very small data set too at < 400
lines, I should be able to test multiple models in depth very quickly. It may or may not
be interesting, expect a write-up anyway.
### Cartman
Well I guess I've adopted an 8 year old. Based on
[this project](https://github.com/RuolinZheng08/twewy-discord-chatbot)
I've trained a chat bot with the personality of Eric Cartman. He's a feature of my
Discord bot living on a Raspberry Pi 4B, which I would say is probably the slowest
computer you would ever want to run something like this on. It takes a somewhat
reasonable amount of time to respond, almost feeling human if you make it think a bit.
The project uses [PyTorch](https://pytorch.org/) to train the model. I'd like
to re-create it using [TensorFlow](https://www.tensorflow.org/) as an
exercise to understand each one better, but that's a project for another night. It also
only responds to one line at a time so it can't carry a conversation with context,
yet...
### Website
I never thought I'd end up having a blog. I had no plans at all actually when I set up
this server, just to host a silly page that I would change from time to time whenever I
was bored. I've been looking at
[Hugo](https://gohugo.io/) as a way to organize what is now just a list of
divs in a single html file slowly growing out of control. Basically you just dump each
post into its own file, create a template of how to render them, and let it do its
thing. I should be able to create a template that recreates exactly what you see right
now, which is beginning to grow on me.
If you haven't noticed yet, (and I don't blame you if you haven't because only a handful
of people even visit this page) each time there is an update there is a completely new
background image, color scheme, a whole new theme. This is because this page is a near
identical representation of terminal windows open my computer and each time I update the
page I also update it with my current wallpaper, which generates the color scheme
dynamically using
[Pywal](https://github.com/dylanaraps/pywal).
TODO:
* Code blocks with syntax highlighting
* Develop an easy workflow to dump a jupyter notebook into the website and have it display nicely with minimal effort
* Find a way to hack plots generated with matplotlib to change colors with the page color scheme (or find another way to do the same thing)
* Automate generating the site - probably [Hugo](https://gohugo.io/)
* Separate from blog, projects, etc.
* Add socials, contact, about
* A bunch of stuff I haven't even thought of yet
That's all for now