Count command prints the number of records with a field combination

This is similar to `--output tsv --omit-header | sort | uniq -c` but can apply ADIF equality rules (e.g. case-insensitive, 7.200 = 7.2, etc.), doesn't get awkward with a TSV header, and can output any ADIF format.
flwyd · Jan 27, 2025 · 7caea37 · 7caea37
1 parent ce6eaa2
commit 7caea37
Show file tree

Hide file tree

Showing 4 changed files with 372 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -544,7 +544,6 @@ to use for string comparisons, using the
 [BCP-47 format](https://en.wikipedia.org/wiki/IETF_language_tag).  Available
 comparisons are
 
-
 * `field = value`: Case-insensitive equality, e.g. `contest_id=ARRL-field-day`
 * `field < value`: Less than, `freq<29.701`
 * `field <= value`: Less than or equal, `band<=10m`
@@ -616,6 +615,7 @@ or file names: `adifmt command --some-option --other=value file1.adi file2.csv`
 Name       | Description |
 ---------- | ----------- |
 `cat`      | Concatenate all input files to standard output |
+`count`    | Count records or unique field combinations |
 `edit`     | Add, change, remove, or adjust field values |
 `find`     | Include only records matching a condition |
 `fix`      | Correct field formats to match the ADIF specification |
@@ -647,6 +647,31 @@ format to another, e.g. `adifmt cat --output=csv mylog.adi` to convert from ADI
 format to CSV.  (If `--input` is not specified the file type is inferred from
 the file name; if `--output` is not specified ADI is used.)
 
+#### count
+
+`adifmt cat` groups equal field values and adds a field with the number of times
+the input had each group.  If no `--fields` are given, it outputs a single
+record with a single field with the number of records in the input, not
+including headers.  If one or more `--fields` are given, outputs one record for
+each unique combination of those fields, with a count of the occurrences.
+Fields which are unset in a record contribute an empty string to the
+combination; "all empty" is a valid group.
+
+The result of `count` is valid ADIF in the format specified by `--output`; using
+TSV or CSV may make the results easy to consume.  The name of the added field
+can be set like `--count-field=NUM`, it defaults to `APP_ADIFMT_COUNT`.  The
+output is sorted in the order given by `--fields` and in the natural order for
+each field (strings are case-insensitive, numeric fields sort by value; band
+sorts numerically, longitude and latitude sort in the same order as gridsquare.)
+
+This can be combined with `find` to discover duplicate QSOs with a given
+uniqueness criterion:
+
+```
+adifmt count --fields qso_date,call,band,mode --count-field num mylog.adi | \
+  adifmt find --if 'num>1'
+```
+
 #### edit
 
 `adifmt edit` adds, changes, or removes fields in each input record.
@@ -729,12 +754,7 @@ included unchanged in each record.  This can be useful when processing the
 output with tools which don’t expect a list of values in a field, e.g. counting
 the number of contacts you’ve made with each grid square while treating
 contacts on the border of a square as separate:
-
-```sh
-adifmt flatten --fields VUCC_GRIDS --output tsv \
-  | adifmt select --fields VUCC_GRIDS --output tsv --tsv-omit-header \
-  | sort | uniq -c
-```
+`adifmt flatten --fields VUCC_GRIDS | adifmt count --fields VUCC_GRIDS --output tsv`
 
 The `flatten` command will turn
 
@@ -759,10 +779,11 @@ AH1Z	FM08
 and the rest of the pipeline will produce grid counts like
 
 ```
-1 EM97
-1 EN98
-2 FM07
-2 FM08
+NUM VUCC_GRIDS
+1   EM97
+1   EN98
+2   FM07
+2   FM08
 ```
 
 If multiple fields are flattened and each has multiple instances, a Cartesian
@@ -881,7 +902,8 @@ adifmt select --fields call --fields qso_date --fields time_on,time_off mylog.ad
 `select` can be effectively combined with other standard Unix utilities.  To
 find duplicate QSOs by date, band, and mode, use
 [sort](https://man7.org/linux/man-pages/man1/sort.1.html) and
-[uniq](https://man7.org/linux/man-pages/man1/uniq.1.html):
+[uniq](https://man7.org/linux/man-pages/man1/uniq.1.html).
+(See [`count`](#count)` for another approach.)
 
 ```sh
 adifmt select --fields call,qso_date,band,mode --output tsv --tsv-omit-header mylog.adi \
@@ -961,24 +983,21 @@ simple tools that do one thing and can be easily composed together to build more
 powerful expressions.
 
 There are a lot of things that a ham radio log file program could do, and I
-would like `adifmt` to do many of them. The program is nearing feature maturity
-for an initial release.  If you've got a use case for working with ADIF files
+would like `adifmt` to do many of them. The program has most core features I've
+planned to add.  If you've got a use case for working with ADIF files
 that `adifmt` can’t do yet, please create a GitHub issue to discuss how it
 might work.
 
-Features I plan to add:
+Further features I plan to add:
 
 *   Validate more fields.
 *   Identify duplicate records using flexible criteria, e.g., two contacts with
     the same callsign on the same band with the same mode on the same Zulu day
-    and the same `MY_SIG_INFO` value.
+    and the same `MY_SIG_INFO` value.  (`count` plus `select` can do this, but
+    does not print the full records.)  I would also like a way to combine
+    duplicate records into one, e.g. reversing the `flatten` operation.
 *   Option for `save` to append records to an existing ADIF file.
 *   [FLE (fast log entry)](https://df3cb.com/fle/documentation/) format support.
-*   Count the total number of records or the number of distinct values of a
-    field.  (The total number of records can currently be counted with
-    `--output=tsv --tsv-omit-header` and piping the output to `wc -l`.)  This
-    could match the format of the “Report” comment in the test QSOs file
-    produced with the ADIF spec.
 *   Support for Cabrillo 2.0 format if needed.
 
 See the [issues page](https://github.com/flwyd/adif-multitool/issues) for more
@@ -989,7 +1008,8 @@ ideas or to suggest your own.
 I don't expect ADIF Multitool to support the following use cases. A different
 piece of software will be needed.
 
-*   Upload logs to any service like QRZ, eQSL, or LotW.
+*   Upload logs to any service like QRZ, eQSL, or LotW.  `adifmt` is a useful
+    tool in preparing logs for upload, though.
 *   Log-editing GUI. `adifmt` is a command-line tool; a GUI could be built which
     uses it to make edits, but that would be a separate program and project. I
     am open to the idea of an interactive console mode, though.
@@ -1044,7 +1064,7 @@ to ensure it’s present when adding files: `addlicense .`
 Apache header:
 
 ```
-Copyright 2024 Google LLC
+Copyright 2025 Google LLC
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.

diff --git a/adifmt/commands.go b/adifmt/commands.go
@@ -30,6 +30,15 @@ type cmdConfig struct {
 var (
 	catConf = cmdConfig{Command: cmd.Cat}
 
+	countConf = cmdConfig{Command: cmd.Count,
+		Configure: func(ctx *cmd.Context, fs *flag.FlagSet) {
+			cctx := cmd.CountContext{}
+			fs.StringVar(&cctx.CountFieldName, "count-field", "APP_ADIFMT_COUNT", "Field `name` for record counts")
+			fs.Var(&cctx.Fields, "fields", "Comma-separated or multiple instance field `names` to group by")
+			ctx.CommandCtx = &cctx
+		},
+	}
+
 	editConf = cmdConfig{Command: cmd.Edit,
 		Configure: func(ctx *cmd.Context, fs *flag.FlagSet) {
 			cctx := cmd.EditContext{
@@ -134,6 +143,7 @@ var (
 
 	cmds = []cmdConfig{
 		catConf,
+		countConf,
 		editConf,
 		findConf,
 		fixConf,

diff --git a/cmd/count.go b/cmd/count.go
@@ -0,0 +1,167 @@
+// Copyright 2025 Google LLC
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package cmd
+
+import (
+	"fmt"
+	"sort"
+	"strconv"
+	"strings"
+
+	"github.com/flwyd/adif-multitool/adif"
+	"github.com/flwyd/adif-multitool/adif/spec"
+	"golang.org/x/text/collate"
+	"golang.org/x/text/language"
+)
+
+var Count = Command{Name: "count", Run: runCount, Help: helpCount,
+	Description: "Count records or unique field combinations"}
+
+type CountContext struct {
+	CountFieldName string
+	Fields         FieldList
+}
+
+func helpCount() string {
+	return `If no fields are specified:
+  Outputs a single record with a single number field with the number of records in
+  all input files.
+If fields are specified:
+  Outputs each unique combination of those fields with the number of times that
+  combination occurs in the input records.  Record order is unspecified.
+`
+}
+
+func runCount(ctx *Context, args []string) error {
+	cctx := ctx.CommandCtx.(*CountContext)
+	if len(cctx.Fields) > 128 {
+		return fmt.Errorf("field list length %d is greater than maximum 128", len(cctx.Fields))
+	}
+	countName := cctx.CountFieldName
+	if countName == "" {
+		countName = "COUNT"
+	}
+	names := make([]string, len(cctx.Fields))
+	for i, f := range cctx.Fields {
+		names[i] = strings.ToUpper(f)
+	}
+	acc, err := newAccumulator(ctx)
+	if err != nil {
+		return err
+	}
+	all := make([]*adif.Record, 0, 128)
+	for _, file := range filesOrStdin(args) {
+		l, err := acc.read(file)
+		if err != nil {
+			return err
+		}
+		for _, r := range l.Records {
+			cr := adif.NewRecord()
+			for _, n := range cctx.Fields {
+				f, _ := r.Get(n)
+				cr.Set(f)
+			}
+			all = append(all, cr)
+		}
+	}
+	if len(all) == 0 {
+		r := adif.NewRecord(adif.Field{Name: countName, Value: "0", Type: adif.TypeNumber})
+		for _, n := range cctx.Fields {
+			r.Set(adif.Field{Name: n, Value: ""})
+		}
+		acc.Out.AddRecord(r)
+		if err := acc.prepare(); err != nil {
+			return err
+		}
+		return write(ctx, acc.Out)
+	}
+	comps := make([]spec.FieldComparator, len(cctx.Fields))
+	for i, n := range cctx.Fields {
+		if f, ok := spec.FieldNamed(n); ok {
+			comps[i] = spec.ComparatorForField(f, ctx.Locale)
+		} else if u, ok := acc.Out.GetUserdef(n); ok {
+			f := spec.Field{Name: n, Type: spec.DataTypes[u.Type.Indicator()]}
+			comps[i] = spec.ComparatorForField(f, ctx.Locale)
+		} else {
+			comps[i] = spec.ComparatorForField(spec.Field{Name: n, Type: spec.StringDataType}, ctx.Locale)
+		}
+	}
+	col := collate.New(language.Und, collate.IgnoreCase)
+	comp := func(a, b *adif.Record) int {
+		for i, c := range comps {
+			n := cctx.Fields[i]
+			af, _ := a.Get(n)
+			bf, _ := b.Get(n)
+			v, err := c(af.Value, bf.Value)
+			if err != nil {
+				v = col.CompareString(af.Value, bf.Value)
+			}
+			if v != 0 {
+				return v
+			}
+		}
+		return 0
+	}
+	sort.SliceStable(all, func(i, j int) bool { return comp(all[i], all[j]) < 0 })
+	for i := 0; i < len(all); {
+		cur := all[i]
+		vals := make([]map[adif.Field]int, len(cctx.Fields))
+		for j, n := range cctx.Fields {
+			vals[j] = make(map[adif.Field]int)
+			if f, ok := cur.Get(n); ok {
+				vals[j][f] = 1
+			}
+		}
+		num := 1
+		i++
+		for i < len(all) {
+			if comp(cur, all[i]) != 0 {
+				break
+			}
+			for j, n := range cctx.Fields {
+				if f, ok := all[i].Get(n); ok {
+					vals[j][f]++
+				}
+			}
+			num++
+			i++
+		}
+		r := adif.NewRecord(adif.Field{Name: countName, Value: strconv.Itoa(num), Type: adif.TypeNumber})
+		for j, m := range vals {
+			if len(m) == 0 {
+				r.Set(adif.Field{Name: cctx.Fields[j], Value: ""})
+			} else {
+				r.Set(mostFrequent(m))
+			}
+		}
+		acc.Out.AddRecord(r)
+	}
+	if err := acc.prepare(); err != nil {
+		return err
+	}
+	return write(ctx, acc.Out)
+}
+
+func mostFrequent(counts map[adif.Field]int) adif.Field {
+	var f adif.Field
+	var m int
+	for k, v := range counts {
+		if v > m || (m == v && k.Value > f.Value) {
+			f = k
+			m = v
+		}
+	}
+	return f
+}