Skip to content

Commit

Permalink
Count command prints the number of records with a field combination
Browse files Browse the repository at this point in the history
This is similar to `--output tsv --omit-header | sort | uniq -c` but can
apply ADIF equality rules (e.g. case-insensitive, 7.200 = 7.2, etc.),
doesn't get awkward with a TSV header, and can output any ADIF format.
  • Loading branch information
flwyd committed Jan 27, 2025
1 parent ce6eaa2 commit 7caea37
Show file tree
Hide file tree
Showing 4 changed files with 372 additions and 23 deletions.
66 changes: 43 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -544,7 +544,6 @@ to use for string comparisons, using the
[BCP-47 format](https://en.wikipedia.org/wiki/IETF_language_tag). Available
comparisons are
* `field = value`: Case-insensitive equality, e.g. `contest_id=ARRL-field-day`
* `field < value`: Less than, `freq<29.701`
* `field <= value`: Less than or equal, `band<=10m`
Expand Down Expand Up @@ -616,6 +615,7 @@ or file names: `adifmt command --some-option --other=value file1.adi file2.csv`
Name | Description |
---------- | ----------- |
`cat` | Concatenate all input files to standard output |
`count` | Count records or unique field combinations |
`edit` | Add, change, remove, or adjust field values |
`find` | Include only records matching a condition |
`fix` | Correct field formats to match the ADIF specification |
Expand Down Expand Up @@ -647,6 +647,31 @@ format to another, e.g. `adifmt cat --output=csv mylog.adi` to convert from ADI
format to CSV. (If `--input` is not specified the file type is inferred from
the file name; if `--output` is not specified ADI is used.)
#### count
`adifmt cat` groups equal field values and adds a field with the number of times
the input had each group. If no `--fields` are given, it outputs a single
record with a single field with the number of records in the input, not
including headers. If one or more `--fields` are given, outputs one record for
each unique combination of those fields, with a count of the occurrences.
Fields which are unset in a record contribute an empty string to the
combination; "all empty" is a valid group.
The result of `count` is valid ADIF in the format specified by `--output`; using
TSV or CSV may make the results easy to consume. The name of the added field
can be set like `--count-field=NUM`, it defaults to `APP_ADIFMT_COUNT`. The
output is sorted in the order given by `--fields` and in the natural order for
each field (strings are case-insensitive, numeric fields sort by value; band
sorts numerically, longitude and latitude sort in the same order as gridsquare.)
This can be combined with `find` to discover duplicate QSOs with a given
uniqueness criterion:
```
adifmt count --fields qso_date,call,band,mode --count-field num mylog.adi | \
adifmt find --if 'num>1'
```
#### edit
`adifmt edit` adds, changes, or removes fields in each input record.
Expand Down Expand Up @@ -729,12 +754,7 @@ included unchanged in each record. This can be useful when processing the
output with tools which don’t expect a list of values in a field, e.g. counting
the number of contacts you’ve made with each grid square while treating
contacts on the border of a square as separate:
```sh
adifmt flatten --fields VUCC_GRIDS --output tsv \
| adifmt select --fields VUCC_GRIDS --output tsv --tsv-omit-header \
| sort | uniq -c
```
`adifmt flatten --fields VUCC_GRIDS | adifmt count --fields VUCC_GRIDS --output tsv`
The `flatten` command will turn
Expand All @@ -759,10 +779,11 @@ AH1Z FM08
and the rest of the pipeline will produce grid counts like
```
1 EM97
1 EN98
2 FM07
2 FM08
NUM VUCC_GRIDS
1 EM97
1 EN98
2 FM07
2 FM08
```
If multiple fields are flattened and each has multiple instances, a Cartesian
Expand Down Expand Up @@ -881,7 +902,8 @@ adifmt select --fields call --fields qso_date --fields time_on,time_off mylog.ad
`select` can be effectively combined with other standard Unix utilities. To
find duplicate QSOs by date, band, and mode, use
[sort](https://man7.org/linux/man-pages/man1/sort.1.html) and
[uniq](https://man7.org/linux/man-pages/man1/uniq.1.html):
[uniq](https://man7.org/linux/man-pages/man1/uniq.1.html).
(See [`count`](#count)` for another approach.)
```sh
adifmt select --fields call,qso_date,band,mode --output tsv --tsv-omit-header mylog.adi \
Expand Down Expand Up @@ -961,24 +983,21 @@ simple tools that do one thing and can be easily composed together to build more
powerful expressions.
There are a lot of things that a ham radio log file program could do, and I
would like `adifmt` to do many of them. The program is nearing feature maturity
for an initial release. If you've got a use case for working with ADIF files
would like `adifmt` to do many of them. The program has most core features I've
planned to add. If you've got a use case for working with ADIF files
that `adifmt` can’t do yet, please create a GitHub issue to discuss how it
might work.
Features I plan to add:
Further features I plan to add:
* Validate more fields.
* Identify duplicate records using flexible criteria, e.g., two contacts with
the same callsign on the same band with the same mode on the same Zulu day
and the same `MY_SIG_INFO` value.
and the same `MY_SIG_INFO` value. (`count` plus `select` can do this, but
does not print the full records.) I would also like a way to combine
duplicate records into one, e.g. reversing the `flatten` operation.
* Option for `save` to append records to an existing ADIF file.
* [FLE (fast log entry)](https://df3cb.com/fle/documentation/) format support.
* Count the total number of records or the number of distinct values of a
field. (The total number of records can currently be counted with
`--output=tsv --tsv-omit-header` and piping the output to `wc -l`.) This
could match the format of the “Report” comment in the test QSOs file
produced with the ADIF spec.
* Support for Cabrillo 2.0 format if needed.
See the [issues page](https://github.com/flwyd/adif-multitool/issues) for more
Expand All @@ -989,7 +1008,8 @@ ideas or to suggest your own.
I don't expect ADIF Multitool to support the following use cases. A different
piece of software will be needed.
* Upload logs to any service like QRZ, eQSL, or LotW.
* Upload logs to any service like QRZ, eQSL, or LotW. `adifmt` is a useful
tool in preparing logs for upload, though.
* Log-editing GUI. `adifmt` is a command-line tool; a GUI could be built which
uses it to make edits, but that would be a separate program and project. I
am open to the idea of an interactive console mode, though.
Expand Down Expand Up @@ -1044,7 +1064,7 @@ to ensure it’s present when adding files: `addlicense .`
Apache header:
```
Copyright 2024 Google LLC
Copyright 2025 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
10 changes: 10 additions & 0 deletions adifmt/commands.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,15 @@ type cmdConfig struct {
var (
catConf = cmdConfig{Command: cmd.Cat}

countConf = cmdConfig{Command: cmd.Count,
Configure: func(ctx *cmd.Context, fs *flag.FlagSet) {
cctx := cmd.CountContext{}
fs.StringVar(&cctx.CountFieldName, "count-field", "APP_ADIFMT_COUNT", "Field `name` for record counts")
fs.Var(&cctx.Fields, "fields", "Comma-separated or multiple instance field `names` to group by")
ctx.CommandCtx = &cctx
},
}

editConf = cmdConfig{Command: cmd.Edit,
Configure: func(ctx *cmd.Context, fs *flag.FlagSet) {
cctx := cmd.EditContext{
Expand Down Expand Up @@ -134,6 +143,7 @@ var (

cmds = []cmdConfig{
catConf,
countConf,
editConf,
findConf,
fixConf,
Expand Down
167 changes: 167 additions & 0 deletions cmd/count.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
// Copyright 2025 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package cmd

import (
"fmt"
"sort"
"strconv"
"strings"

"github.com/flwyd/adif-multitool/adif"
"github.com/flwyd/adif-multitool/adif/spec"
"golang.org/x/text/collate"
"golang.org/x/text/language"
)

var Count = Command{Name: "count", Run: runCount, Help: helpCount,
Description: "Count records or unique field combinations"}

type CountContext struct {
CountFieldName string
Fields FieldList
}

func helpCount() string {
return `If no fields are specified:
Outputs a single record with a single number field with the number of records in
all input files.
If fields are specified:
Outputs each unique combination of those fields with the number of times that
combination occurs in the input records. Record order is unspecified.
`
}

func runCount(ctx *Context, args []string) error {
cctx := ctx.CommandCtx.(*CountContext)
if len(cctx.Fields) > 128 {
return fmt.Errorf("field list length %d is greater than maximum 128", len(cctx.Fields))
}
countName := cctx.CountFieldName
if countName == "" {
countName = "COUNT"
}
names := make([]string, len(cctx.Fields))
for i, f := range cctx.Fields {
names[i] = strings.ToUpper(f)
}
acc, err := newAccumulator(ctx)
if err != nil {
return err
}
all := make([]*adif.Record, 0, 128)
for _, file := range filesOrStdin(args) {
l, err := acc.read(file)
if err != nil {
return err
}
for _, r := range l.Records {
cr := adif.NewRecord()
for _, n := range cctx.Fields {
f, _ := r.Get(n)
cr.Set(f)
}
all = append(all, cr)
}
}
if len(all) == 0 {
r := adif.NewRecord(adif.Field{Name: countName, Value: "0", Type: adif.TypeNumber})
for _, n := range cctx.Fields {
r.Set(adif.Field{Name: n, Value: ""})
}
acc.Out.AddRecord(r)
if err := acc.prepare(); err != nil {
return err
}
return write(ctx, acc.Out)
}
comps := make([]spec.FieldComparator, len(cctx.Fields))
for i, n := range cctx.Fields {
if f, ok := spec.FieldNamed(n); ok {
comps[i] = spec.ComparatorForField(f, ctx.Locale)
} else if u, ok := acc.Out.GetUserdef(n); ok {
f := spec.Field{Name: n, Type: spec.DataTypes[u.Type.Indicator()]}
comps[i] = spec.ComparatorForField(f, ctx.Locale)
} else {
comps[i] = spec.ComparatorForField(spec.Field{Name: n, Type: spec.StringDataType}, ctx.Locale)
}
}
col := collate.New(language.Und, collate.IgnoreCase)
comp := func(a, b *adif.Record) int {
for i, c := range comps {
n := cctx.Fields[i]
af, _ := a.Get(n)
bf, _ := b.Get(n)
v, err := c(af.Value, bf.Value)
if err != nil {
v = col.CompareString(af.Value, bf.Value)
}
if v != 0 {
return v
}
}
return 0
}
sort.SliceStable(all, func(i, j int) bool { return comp(all[i], all[j]) < 0 })
for i := 0; i < len(all); {
cur := all[i]
vals := make([]map[adif.Field]int, len(cctx.Fields))
for j, n := range cctx.Fields {
vals[j] = make(map[adif.Field]int)
if f, ok := cur.Get(n); ok {
vals[j][f] = 1
}
}
num := 1
i++
for i < len(all) {
if comp(cur, all[i]) != 0 {
break
}
for j, n := range cctx.Fields {
if f, ok := all[i].Get(n); ok {
vals[j][f]++
}
}
num++
i++
}
r := adif.NewRecord(adif.Field{Name: countName, Value: strconv.Itoa(num), Type: adif.TypeNumber})
for j, m := range vals {
if len(m) == 0 {
r.Set(adif.Field{Name: cctx.Fields[j], Value: ""})
} else {
r.Set(mostFrequent(m))
}
}
acc.Out.AddRecord(r)
}
if err := acc.prepare(); err != nil {
return err
}
return write(ctx, acc.Out)
}

func mostFrequent(counts map[adif.Field]int) adif.Field {
var f adif.Field
var m int
for k, v := range counts {
if v > m || (m == v && k.Value > f.Value) {
f = k
m = v
}
}
return f
}
Loading

0 comments on commit 7caea37

Please sign in to comment.