• 0 Posts
  • 20 Comments
Joined 1 year ago
cake
Cake day: July 31st, 2023

help-circle


  • Many, many, many subnets, so many subnets, different subnets for vms, for jailed services, for guest wifi, ‘secure’ wifi, ‘normal’ wifi (ie phones and shit), my workstation has a routed subnet for its lxc containers, I have remote subnets for my wifi routers over vpn when I travel (with restrictions similar to home access and the same 3 ssids), an unrouted subnet for stuff like bmcs, switches and infrastructure, a subnet in my dmz with statics, the backside of that subnet, the subnet that subnet uses for upstream access.

    I have a lot of subnets.











  • Have a video dataset with 1m recordsize, primarycache=metadata, secondarycache=metadata, and a general dataset as parent with 128kb recordsize, primarycache=secondarycache=normal, compression=lzma or lz44 or something.

    Works like a monster, I don’t worry about things like srts and such, though your symlinks idea looks interesting.

    I’m reworking my entire system to get off the filesystem structure anyway and use python and some other dB possibly reading from sonarr for metadata seeding, but haven’t got to it yet.

    Actually, you make a good point, what would be nice is if sonarr put nfos in a different structure, but since I’m going to read sonarr metadata I can just delete them anyway.



  • That’s interesting, I spent a decade doing hpc and other optimizations for large software on 2 socket systems, there are degenerate cases, which can be fixed, I just doubt they’re here.

    Freecad sounds like it was poorly written with a lot of hopping about ram with poor cache localization, which happens but is pretty ugly.

    Ml tends to be better behaved, it’s actually very close to dsp code and the compilers try to enforce locality, more importantly a lot of the modules are hand coded for extreme performance.

    I’m not trying to be that discouraging, I’m saying this as someone who originally looked for performance, and often found it in the os, but later found more performance in the loops themselves or the compiler, basically linux is a lot smarter than it used to be, and many applications are too.

    Just my 2c, there are performance tools that can tell you how bad the os is vs other things, and you shouldn’t be swapping so much that it hurts you a lot in ml.


  • K, for that look at a kernel subsystem/feature called cpu_isol, friend of mine implemented/upstreamed, basically you take cores half out of Linux and can use them for heavy workloads.

    But I doubt you’d see more than 1% improvement, linux doesn’t do that much without you asking.

    You can try setting rt priority but I’ve never found that to matter much.

    Listen, this is the kind of thing I would have tried a decade ago, but the thing to remember is: time spent improving algorithm is generally more effective than time trying to optimize kernel overhead that millions of people have been trying to optimize for decades.


  • Stop. Go back. This is the wrong way.

    If you’re running python you basically need a full os.

    There are projects that run as an rtos, and in fact I worked on an ml soc that ran Linux, but there are 2 levels here:

    1. The ml processing itself, ie the math. This is simple in software and very complex otherwise. The software just says “copy this block and start running a matrix multiply”. The hard logic is in moving data around efficiently.

    2. The stack. This is high level, python or so, and has graph processing overhead too. This needs a lot of “overhead” by its nature.

    In either case, don’t worry about any of this, the overhead won’t be very noticeable, you’ll be cpu gated hard, the main thing is finding an optimized pytorch library.

    If you have an amd cpu or somehow have an nvidia gpu in your laptop you might be able to use their pytorch library which would improve performance by roughly 1.5-2 orders of magnitude.

    Unfortunately there isn’t a pytorch implementation for Intel igpus, but there is an opencl backend for pytorch, and apparently this madlad got it working through opencl on an Intel igpu: https://dev-discuss.pytorch.org/t/implementing-opencl-backend-for-pytorch/283/9

    But don’t worry about overhead, it’s less than fractions of percents in these kinds of tasks and there are ways to bypass them completely.




  • Lot of good choices:

    One of the 4 port atom pcs on Amazon, or even one of the arm ones, the key is ethernet ports and remember you’ll need to handle your wifi. Put debian, pfsense, openwrt, whatever you like, it’ll be great.

    One of the openwrt systems, a high end glinet isn’t bad, just any of the better ones.

    Had a freebsd server that run a vnet jail for routing, was glorious, no notes, jut perfect.

    Running a unifi dream machine se right now, mostly because I want someone else to handle security (I know it’s not much, I just don’t have any bandwidth for that now). Works fine, but I’m using unifi wifi so it’s a tie-in there.

    If you want a retail system, either openwrt or unifi, I know why people have issues with ubiquiti, but it’s probably the best prosumer hardware and software you can get without using your own. I haven’t used pfsense much, maybe that would change my mind.