I haven't had time to look into it, ok?
from plankton@programming.dev to selfhosted@lemmy.world on 16 Mar 05:51
https://programming.dev/post/47279690
from plankton@programming.dev to selfhosted@lemmy.world on 16 Mar 05:51
https://programming.dev/post/47279690
is this Little Bobby Tables?
#selfhosted
threaded - newest
Just cron a restart
Turns out you already did that and that’s why it’s going down
I actually have an issue that is similar. My server goes unresponsive/freezes after N hours of uptime. N is a variable, so far meassured between 6 and 72 hours. I tried working around it, by auto-rebooting the server each night. But it still sometimes happen before the 24 hour mark.
Nothing in logs, so my best option is to auto-reboot at this time. 😆
I just solved this exact issue after living with it for a few months.
For me it was a bad PSU, voltage drop probably stopping the HDDs and SSDs, which knocked over the Kernel
Hm. The PSU is the one delivered with the system. And the system is rated to handle this and more. I really hope it’s not a bad PSU.
Be better than bad RAM! Or CPU probably
Hey, I have the same thing for my second router that works as an extender, to cover some remote area. I auto-reboot it every 3 hours during the day (I don’t during the night). Sometimes, it stops transmitting data before the 3 hours mark, so I have to go and physically reboot it. It always helps, while there are very rare occasions when this software reboot does not help.
I have no idea what’s going on. I’ve bought it cheap as a broken one, but re-flashing it to OpenWrt seems like solved all its issues. However, I’m not qualified to say there’s no issues with it. It’s just that from a user perspective, it works exceptionally well. I see no issues. Except this forced auto-reboot thing, but I think it could be me not understanding the networking properly, and doing something wrong / not optimal. It gets the signal wirelessly via 5 GHz band (for speed) and shares it via 2.4 GHz band (for the distance). I fixed some obvious mistakes with the help of a GPT, which seems to work better now. But I’m not really sure. Could be that it’s winter and it was cold in there, I have to see how it’ll behave during the summer.
Honestly, I even started thinking maybe it has no issues now, and I can remove that
cronjob. But I think I can live with being offline for a minute or two a few times a day, when I’m in that remote location.Yeah, I mean. Tried to compliment your story with mine.
I had a bad NVME drive that caused that on two separate computers.
One of them I slowly replaced every single piece of hardware except the NVME, still crashed about once a day. Finally sucked it up and bought a new drive and magically everything stopped crashing.
Started happening on my server so I just immdietely replaced the NVME drive and magically no crashes anymore.
Zero issues in the logs, no failures on bootup, no issues with any hardware scanners, just hard freeze randomly.
Do you have an Intel ethernet NIC? That’s a known issue, in particular for more recent Linux Kernels used in Debian distros. This also means it extends to TrueNAS, Proxmox, etc. There’s a known fix for it too (or you could just downgrade the Kernel).
My latest example of this was a memory leak in some old software I wrote. It grew linearly with traffic, so it wasn’t until traffic randomly spiked that I started seeing problems.
Dylan bettle talks about this in one of his presentation. He made a website that did actor and actress recommendations, I believe for like extras to be hired. They kept getting alerts around 7-715am and then around 12.30pm to 1245pm.
Eventually they figured out that casting agents typically woke up, checked email, and submitted the requests on the website. The actors and actresses typically had other jobs and had to wait till lunch to check and apply.
They had to redesign the system and buy more bandwidth to handle these spikes.
He’s a great presenter and I highly recommend watching all of them, even if you’re not a programmer. And I probably got most of the details wrong… But it was a talk at one of NDC conventions.
To “fix” the memory leak, just restart your server every day with a cron job /s
One of these days I’ll get around to figuring out why Proxmox won’t start up unless I hit enter on the splash page.
One of these days I’ll figure out why NetSuiteERPs SuiteQL API fails with 4xx when you start the day by querying it at current_time © for all the data updated from c-1day to c-1sec.
There’s a silly bug where if you use Ventoy to install proxmox it fucks up the efi partition and won’t allow boot without the USB in, I left that USB in for probably 6 months before bothering to fix it lol.
Unintended safety feature
Won’t start if the key isn’t it.
Bro, that’s a car.
I had a server that would lose internet connectivity approximately every 36 hours. Could never figure out why. Ended up running a script every 2 min or so to check connectivity. If it failed, then it would trigger a reboot.