Uncategorized – www.gadjev.com https://www.gadjev.com IT does matter Sat, 14 Dec 2024 23:56:08 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Synology not booting correctly after DSM 7.2.2 Update https://www.gadjev.com/2024/12/15/synology-not-booting-correctly-after-dsm-7-2-2-update/ Sat, 14 Dec 2024 23:56:07 +0000 https://www.gadjev.com/?p=243 In this post, I’ll share the key lessons learned from a lengthy troubleshooting session after upgrading from DSM 7.1.1 to DSM 7.2.2.

The Symptoms:

  • The DSM upgrade from version 7.1.1 to 7.2.2 appeared successful.
  • After the upgrade, DSM booted fine, allowing login and access to SMB shares.
  • However, after the subsequent reboot, DSM got stuck at the “System is getting ready…” screen at the login prompt.
  • Multiple reboots didn’t resolve the issue.

SSH was unavailable, but fortunately, the serial connection was still functional, providing the only way in.

Initial Observations: Upon logging in, it became clear that:

  • No disks were mounted.
  • /etc/fstab was empty.
  • cat /proc/mdstat showed all drives and RAID arrays were healthy.

The first step was to investigate what was supposed to populate /etc/fstab and why it wasn’t working. It turns out that the process is controlled by a binary called /usr/syno/bin/synocfgen, with a symlink to /usr/syno/cfgen/s00_synocheckfstab.

Running the binary manually (/usr/syno/cfgen/s00_synocheckfstab), followed by mount -a, successfully mounted all the volumes.

The Next Step: The question then shifted to determining what was responsible for executing /usr/syno/cfgen/s00_synocheckfstab. After some digging, I found the culprit:

  • A script called volume.sh located in /usr/syno/lib/systemd/scripts/volume.sh.
  • This script is associated with the syno-volume.service, defined in /usr/lib/systemd/system/syno-volume.service.

ash-4.4# cat /usr/lib/systemd/system/syno-volume.service;
[Unit]
Description=Synology volume service
Wants=syno-space.target
After=syno-space.target

[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutStartSec=1800s
TimeoutStopSec=600s
ExecStartPre=-/usr/syno/lib/systemd/scripts/volume.sh –bootup-pre-start
ExecStart=/usr/syno/lib/systemd/scripts/volume.sh –bootup-start
ExecStop=/usr/syno/lib/systemd/scripts/volume.sh –stop-all

[X-Synology]

The Problem: The syno-volume.service wasn’t running! Running systemctl status syno-volume.service confirmed this.

Going next – Why syno-volume service wasn’t running. It took quite some time to go through various logs for messages/errors – all without result. Went through observing the dependencies of syno-volume service:

  • all dependent services were there and running (syno-spaces and some others…)
  • and somehow expected – none of the services that depend on syno-volume were started (systemctl list-dependencies –reverse syno-volume)

Starting manually the systemd services (returned by the reverse list-dependencies command) brought a fully functional NAS…Until the next reboot…when it stuck at login prompt again.

After some other round of thinking I decided to check what is the default target systemd level. And it turned out it is not the default one of Synology!

On the broken system:

systemctl get-default

pkg-synobrm-keep-session.target

On a healthy system:

systemctl get-default
syno-bootup-done.target

Final piece of the puzzle was to identify why the default systemd level was changed (and of course, manualy changing it was lost on next reboot, so another round of reverse engineering the synology scripts&configs was necessary). Grepping here and there led this time to /usr/syno/lib/systemd/generators/syno-brm-restore-generator – updated very recently and likely coming from the latest DSM.

Upon examining the syno-brm-restore-generator, it became evident that two files could trigger the logic that changes the default systemd target:

  • /var/run/.brm_restoring
  • /var/lib/abb_recovery_status

These files appear to play a role in altering the system’s default target, which was disrupting the boot process.

When comparing with a healthy system that had also gone through the same upgrade to DSM 7.2.2, it was found that the files /var/lib/abb_recovery_status and /var/run/.brm_restoring were absent. Removing these files allowed systemd to restore its default target, and the NAS became fully functional again.

However, the reason why these files were present in the first place, despite the system having been upgraded correctly, remains a mystery.

]]>
AWS CodeBuild fails when changing to a folder directory with a space https://www.gadjev.com/2021/10/15/aws-codebuild-fails-when-changing-to-a-folder-directory-with-a-space/ Fri, 15 Oct 2021 18:08:23 +0000 http://www.gadjev.com/?p=236 I write my articles when some technical issue really frustrates me and when there’s not much information available about it on the Internet.

There’s an issue a hit recently in AWS CodeBuild – when you cd into a directory with a space in it’s name, CodeBuild breaks on the very next command and “resets” your current directory. If you next commands depends on your current directory this could lead to Build Failures.

Error hit: “/codebuild/output/tmp/script.sh: 1: cd: can’t cd to XXXXXX

Reason behind seems a wrapper script(/codebuild/output/tmp/script.sh) used by AWS CodeBuild to keep hold of the directory you’re currently in. It stores in a temporary txt file the directory path and on the next command it tries to read the directory from the temp file and cd into it. Problem comes when there’s a space in the directory name which make the wrapper script interprets the content of the temp file as 2 strings rather than 1.

This is how the wrapper script “collects” the current directory:
pwd > /codebuild/output/tmp/dir.txt

And that’s how it tries to “load it back”
cd $(cat /codebuild/output/tmp/dir.txt) (in bash this will throw “-bash: cd: too many arguments”, in sh – “sh: 1: cd: can’t cd to” when dir.txt contains a space”

Here’s a sample buildspec to reproduce the issue.

version: 0.2
phases:
  build:
    commands:
      - pwd && echo "codebuild starts initially in this directory"
      - cat /codebuild/output/tmp/script.sh
      - cd /tmp
      - pwd && echo "obviously we are in tmp now"
      - mkdir /tmp/folder\ with\ space
      - touch /tmp/folder\ with\ space\somefile.txt
      - cd /tmp/folder\ with\ space
      - ls -lah && echo "expecting this to list somefile.txt"
      - pwd && echo "codebuild has reset the current directory"

Posted the same on AWS Forums https://forums.aws.amazon.com/thread.jspa?messageID=997823&tstart=0

Hopefully Amazon will fix it soon.

]]>