Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APP-7364] [APP-7366] [RSDK-9684] [APP-7154] Add more logging, reduce monitoring loop time, misc small fixes. #56

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cmd/viam-agent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ func main() {
// If the local /etc/viam.json config is corrupted, invalid, or missing (due to a new install), we can get stuck here.
// Rename the file (if it exists) and wait to provision a new one.
if !errors.Is(err, fs.ErrNotExist) {
globalLogger.Error(errors.Wrapf(err, "reading %s", absConfigPath))
globalLogger.Warn("renaming %s to %s.old", absConfigPath, absConfigPath)
if err := os.Rename(absConfigPath, absConfigPath+".old"); err != nil {
// if we can't rename the file, we're up a creek, and it's fatal
globalLogger.Error(errors.Wrapf(err, "removing invalid config file %s", absConfigPath))
Expand Down
2 changes: 1 addition & 1 deletion manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import (
)

const (
minimalCheckInterval = time.Second * 60
minimalCheckInterval = time.Second * 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the monitoring loop checking for on a super high level? Is this monitoring for FTDC data in the viam-server? What other things are "monitored" on a regular basis via the agent ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how often it fetches a new config from the cloud. The cloud itself can say "check again after X time") but that's currently not implemented, so it falls to this.

The main loop basically: check for new config > apply changes/updates (if there) > start subsystems that should be started > check health of subsystems > repeat

So the check interval determins (roughly) the overall timing. At 5 seconds, it may have trouble keeping up if there are slow responses or other work to do, but that should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

defaultNetworkTimeout = time.Second * 15
// stopAllTimeout must be lower than systemd subsystems/viamagent/viam-agent.service timeout of 4mins
// and higher than subsystems/viamserver/viamserver.go timeout of 2mins.
Expand Down
6 changes: 3 additions & 3 deletions subsystems/provisioning/templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ <h2>Smart Machine Setup</h2>
<div class="form-group">
<label for="network">Network</label>
{{if eq (len .VisibleSSIDs) 0}}
<input type="text" name="ssid" placeholder="Enter Wifi SSID" id="network" required>
<input type="text" name="ssid" placeholder="Enter Wifi SSID" id="network" required autocorrect="off" autocapitalize="off">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context these are "on" if omitted and that was not allowing you to connect with a SSID that was lowercased? Nice find if so 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micheal Lee found/suggested this fix. But yeah, I guess some phones/browsers see an empty text field, and if you type a password that's a normal "word" it auto-capitalizes it, which isn't ideal for passwords.

We don't use a "password" type field here, because I wanted this to be visible before submitting. Otherwise it's way too easy to accidentally typo and then it takes several minutes for the device to try out the thing, fail, and restart a hotspot.

{{else}}
<div class="select-border">
<select name="ssid" id="network" required>
Expand All @@ -52,14 +52,14 @@ <h2>Smart Machine Setup</h2>

<div class="form-group">
<label for="password">Password</label>
<input type="text" name="password" id="password" placeholder="Password">
<input type="text" name="password" id="password" placeholder="Password" autocorrect="off" autocapitalize="off">
</div>
{{end}}

{{if not .IsConfigured}}
<div class="form-group">
<label for="viamconfig">Device Config</label>
<textarea type="textarea" name="viamconfig" id="viamconfig" required placeholder="No config found on device. Paste your viam.json file here."></textarea>
<textarea type="textarea" name="viamconfig" id="viamconfig" required placeholder="No config found on device. Paste your viam.json file here." autocorrect="off" autocapitalize="off"></textarea>
</div>
{{end}}

Expand Down
34 changes: 19 additions & 15 deletions subsystems/viamagent/viamagent.go
Original file line number Diff line number Diff line change
Expand Up @@ -123,14 +123,14 @@ func Install(logger logging.Logger) error {
return errw.Wrap(err, "getting service file path")
}

//nolint:gosec
if err := os.MkdirAll(filepath.Dir(serviceFilePath), 0o755); err != nil {
return errw.Wrapf(err, "creating directory %s", filepath.Dir(serviceFilePath))
}
// use this later to avoid re-enabling an existing agent service a user might have disabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to confirm my understanding...there are users who have Viam agent installed but who don't want it to automatically start up on boot, so they've "disabled" auto start. When they upgrade their Viam agent, the configuration file where they've "disabled" the auto start gets overwritten such that startup on boot becomes true...and that's what you're fixing in this PR.

Copied the following snippet from the ticket - is this the final product (i.e. the user now has to manually restart Viam agent after an upgrade)?

Jan 07 16:11:09 pluto viam-agent[911]: 2025-01-07T22:11:09.496Z        INFO        viam-agent        agent/subsystem.go:355        viam-agent updated from 0.11.0 to 0.12.0
Jan 07 16:11:10 pluto viam-agent[911]: 2025-01-07T22:11:09.515Z        INFO        viam-agent        viamagent/viamagent.go:131        writing systemd service file to /usr/local/lib/systemd/system/viam-agent.service
Jan 07 16:11:10 pluto viam-agent[911]: 2025-01-07T22:11:09.515Z        INFO        viam-agent        viamagent/viamagent.go:146        enabling systemd viam-agent service
Jan 07 16:11:10 pluto viam-agent[911]: 2025-01-07T22:11:10.091Z        INFO        viam-agent        viamagent/viamagent.go:168        Install complete. Please (re)start the service with 'systemctl restart viam-agent' when ready.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if someone installed viam-agent, then runs systemctl disable viam-agent it won't start at boot. The problem is that, previously, when we did an upgrade, this same code got called, and it ALWAYS enabled the service again. Now it should only enable it on first/new install.

The snippet is the old behavior, where it shows "enabling systemd viam-agent service" was the problem. That should only happen on new installs now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on getting manual testing setup. Lmk if I'm moving in right direction. Using an RPi5 because agent runs on linux (and I use Mac). Should I be:

  1. pulling a config from prod (without actually starting the viam-server)
  2. check out your branch and run make
  3. run the binary that I've created which includes the changes in your PR
  4. validate the viam-agent starts at reboot of the RPi5
  5. run systemctl disable viam-agent
  6. upgrade the viam-agent on the RPi5 by doing what exactly?
  7. validate that the viam-agent service isn't running on 2nd reboot

Please let me know gaps here. Thanks

Copy link
Member

@maxhorowitz maxhorowitz Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated by disabling the agent and upgrading from a stable version

{
  "agent": {
    "viam-agent": {
      "pin_version": "0.12.0"
    },
    "viam-server": {
      "release_channel": "stable",
      "attributes": {}
    }
  }
}

to a pinned URL:

{
  "agent": {
    "viam-agent": {
      "pin_url": "file:///home/jeep/dev/agent/bin/viam-agent-custom-aarch64"
    },
    "viam-server": {
      "release_channel": "stable",
      "attributes": {}
    }
  }
}

After the upgrade and subsequent reboot, I validated the viam-agent system service wasn't running and was still disabled:

○ viam-agent.service - Viam Services Agent
     Loaded: loaded (/usr/local/lib/systemd/system/viam-agent.service; disabled; preset: enabled)
     Active: inactive (dead)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-enable and reboot the viam-agent starts up on its own 👍

● viam-agent.service - Viam Services Agent
     Loaded: loaded (/usr/local/lib/systemd/system/viam-agent.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-01-10 13:56:50 EST; 1min 22s ago
   Main PID: 758 (viam-agent)
      Tasks: 10 (limit: 9247)
        CPU: 4.953s
     CGroup: /system.slice/viam-agent.service
             └─758 /opt/viam/bin/viam-agent --config /etc/viam.json

_, err = os.Stat(serviceFilePath)
newInstall := err != nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth checking for a file does not exist error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers edge cases like file corruption too. If we can't stat, then it's best to treat it as a new install, IMHO. But a good thought! In many other cases it'd be worth differentiating.


logger.Infof("writing systemd service file to %s", serviceFilePath)
//nolint:gosec
if err := os.WriteFile(serviceFilePath, serviceFileContents, 0o644); err != nil {

newFile, err := agent.WriteFileIfNew(serviceFilePath, serviceFileContents)
if err != nil {
return errw.Wrapf(err, "writing systemd service file %s", serviceFilePath)
}

Expand All @@ -143,17 +143,21 @@ func Install(logger logging.Logger) error {
}
}

logger.Infof("enabling systemd viam-agent service")
cmd = exec.Command("systemctl", "daemon-reload")
output, err = cmd.CombinedOutput()
if err != nil {
return errw.Wrapf(err, "running 'systemctl daemon-reload' output: %s", output)
if newFile {
cmd = exec.Command("systemctl", "daemon-reload")
output, err = cmd.CombinedOutput()
if err != nil {
return errw.Wrapf(err, "running 'systemctl daemon-reload' output: %s", output)
}
}

cmd = exec.Command("systemctl", "enable", "viam-agent")
output, err = cmd.CombinedOutput()
if err != nil {
return errw.Wrapf(err, "running 'systemctl enable viam-agent' output: %s", output)
if newInstall {
logger.Infof("enabling systemd viam-agent service")
cmd = exec.Command("systemctl", "enable", "viam-agent")
output, err = cmd.CombinedOutput()
if err != nil {
return errw.Wrapf(err, "running 'systemctl enable viam-agent' output: %s", output)
}
}

_, err = os.Stat("/etc/viam.json")
Expand Down
3 changes: 3 additions & 0 deletions subsystems/viamserver/viamserver.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,9 @@ func (s *viamServer) Start(ctx context.Context) error {
s.logger.Errorw("non-zero exit code", "exit code", s.lastExit)
}
}
if s.shouldRun {
s.logger.Infof("%s exited unexpectedly and will be restarted shortly", SubsysName)
}
close(s.exitChan)
}()

Expand Down
Loading