Hi all, looking for my next major project/frustration. I’ve been forcing myself to learn the new AI tools and I think I’m ready for the next step. I’m familiar with image generation and I dabbled in a bit of chat bot stuff, but I think I’m ready.

I’ve read a few blogs but I want to find something that could work with my existing setup. My dream setup would be:

A voice assistant that runs locally, preferably dockerized, backup linux, and final option would be Windows, that can run a decent model and preferably let me train a custom voice for it.

I currently have:

  • Home Assistant set up already, I’ve seen the OpenAI integrations but would like to migrate off of those
  • Google Minis laying around, I’m willing to sacrifice one of them if it means I can use my own stuff
  • Spare 1650GTX GPU, I know not the best but hopefully enough to get it off the ground before deciding to go in on a larger GPU that would be dedicated to this

Needs/wants/nice to haves would be:

  • Basic chat functionality, what’s the weather like
  • Play music from my plex or jellyfin server
  • HA integrations so I could say stuff like “Turn off the lights”

Sorry for dumping all of this, like I said I’ve seen blog posts around, some are doing parts of this, but I wonder if anyone has done something like this. I’m sure people have tried. Guides, jumping off points, even githubs/projects you know of would be helpful.

Thanks all!

  • motsu@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    ·
    11 months ago

    Rhasspy. Idk if rhasspy3 is out fully, but I would wait for that and then set it up. (I have began to see the home assistant side being released - its supposed to tie in a lot better than rhasspy2, and even brought the dev on to the HA project)

    • Yowasa@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      11 months ago

      Didn’t home assistant hire the guy who made rhasppy so he could work on their voice assistant?

      • cwagner@lemmy.cwagner.me
        cake
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 months ago

        Yeah, he switched from Mycroft to Nabu Casa. Not a lot of things happening with Rhasspy directly from what I can see (been following it for a while and had a semi-working setup before the world ran out of Pi Zero 2 W’s and I stopped), but HA has been getting more and more features. I think satellites are still missing, though.

          • cwagner@lemmy.cwagner.me
            cake
            link
            fedilink
            English
            arrow-up
            4
            ·
            11 months ago

            That‘s my current plan ;) I absolutely need the satellite feature and the option to use voice commands for playing music.

            • Scrubbles@poptalk.scrubbles.techOP
              link
              fedilink
              English
              arrow-up
              2
              ·
              11 months ago

              Awesome I’ll look more into this, do you know if they’ll let us use our own voice models? Will it be natural like chatgpt style or more scripted like Alexa? And the satellites, I assume that’s like what I was talking about where I (hopefully someday) can flash my google minis and put HA on them instead?

              • cwagner@lemmy.cwagner.me
                cake
                link
                fedilink
                English
                arrow-up
                3
                ·
                11 months ago

                do you know if they’ll let us use our own voice models?

                Probably? I don’t know what the tech in that area looks like.

                Will it be natural like chatgpt style or more scripted like Alexa

                Everything is about scripted commands, but you can use templates and variables. It requires more setup but is more reliable.

                And the satellites, I assume that’s like what I was talking about where I (hopefully someday) can flash my google minis and put HA on them instead?

                I’d guess the chances for that (or me flashing my Alexas) is close to zero, those are far too locked down.

                Satellites simply means that you can put a lower power device somewhere, and it will let your central server do the heavy processing. So with Rhasspy, you’d have one powerful device that would do Speech-To-Text (like an rPI 4), and smaller devices (like those Pi Zero 2 W’s I was never able to get back then) that only do wakeword recognition on-device (which needs to happen too fast for you to wait for the network), and upon waking, simply send the audio to the central server for processing.

                • Scrubbles@poptalk.scrubbles.techOP
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  11 months ago

                  Great, thanks, I think that’s all I needed. I’ll start playing with it but I’ll hold off on a major implementation until that’s all finished