Sourcery Starbot ⭐ refactored dialvarezs/alphafold #4

SourceryAI · 2023-07-02T07:42:53Z

Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch https://github.com/sourcery-ai-bot/alphafold main
git merge --ff-only FETCH_HEAD
git reset HEAD^

SourceryAI

Due to GitHub API limits, only the first 60 comments can be shown.

SourceryAI · 2023-07-02T07:42:55Z

run_alphafold.py

@@ -183,7 +183,6 @@ def predict_structure(
    models_to_relax: ModelsToRelax):
  """Predicts structure using AlphaFold for the given sequence."""
  logging.info('Predicting %s', fasta_name)
-  timings = {}


Function predict_structure refactored with the following changes:

Move assignment closer to its usage within a block (move-assign-in-block)

Merge dictionary assignment with declaration (merge-dict-assign)

SourceryAI · 2023-07-02T07:42:55Z

run_alphafold.py

-  if FLAGS.model_preset == 'monomer_casp14':
-    num_ensemble = 8
-  else:
-    num_ensemble = 1
-
+  num_ensemble = 8 if FLAGS.model_preset == 'monomer_casp14' else 1


Function main refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

SourceryAI · 2023-07-02T07:42:55Z

run_alphafold_singularity.py

-    command_args = []
-
    # Mount each fasta path as a unique target directory
    target_fasta_paths = []
    for i, fasta_path in enumerate(args.fasta_paths):
        mount, target_path = _generate_mount(f"fasta_path_{i}", Path(fasta_path))
        mounts.append(mount)
        target_fasta_paths.append(target_path)
-    command_args.append(f"--fasta_paths={','.join(target_fasta_paths)}")
-
+    command_args = [f"--fasta_paths={','.join(target_fasta_paths)}"]


Function main refactored with the following changes:

Move assignment closer to its usage within a block (move-assign-in-block)

Merge append into list declaration (merge-list-append)

SourceryAI · 2023-07-02T07:42:55Z

run_alphafold_singularity.py

-        default=datetime.today().strftime("%Y-%m-%d"),
+        default=datetime.now().strftime("%Y-%m-%d"),


Function parse_arguments refactored with the following changes:

Replace datetime.datetime.today() with datetime.datetime.now() (use-datetime-now-not-today)

SourceryAI · 2023-07-02T07:42:56Z

alphafold/common/protein.py


-  pdb_lines = []


Function to_pdb refactored with the following changes:

Merge consecutive list appends into a single extend [×2] (merge-list-appends-into-extend)

Move assignment closer to its usage within a block (move-assign-in-block)

Merge append into list declaration (merge-list-append)

This removes the following comments ( why? ):

# Close the final chain.

SourceryAI · 2023-07-02T07:42:58Z

alphafold/data/msa_pairing.py

-  all_paired_msa_rows = list(np.array(all_paired_msa_rows).transpose())
-  return all_paired_msa_rows
+  return list(np.array(all_paired_msa_rows).transpose())


Function _match_rows_by_sequence_similarity refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-07-02T07:42:58Z

alphafold/data/msa_pairing.py

-      merged_example[feature_name] = np.sum(x for x in feats).astype(np.int32)
+      merged_example[feature_name] = np.sum(iter(feats)).astype(np.int32)


Function _merge_features_from_multiple_chains refactored with the following changes:

Simplify generator expression (simplify-generator)

SourceryAI · 2023-07-02T07:42:58Z

alphafold/data/msa_pairing.py

-    sequence_set = set(tuple(s) for s in chain['msa_all_seq'])
-    keep_rows = []
-    # Go through unpaired MSA seqs and remove any rows that correspond to the
-    # sequences that are already present in the paired MSA.
-    for row_num, seq in enumerate(chain['msa']):
-      if tuple(seq) not in sequence_set:
-        keep_rows.append(row_num)
+    sequence_set = {tuple(s) for s in chain['msa_all_seq']}
+    keep_rows = [
+        row_num for row_num, seq in enumerate(chain['msa'])
+        if tuple(seq) not in sequence_set
+    ]


Function deduplicate_unpaired_sequences refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

Convert for loop into list comprehension (list-comprehension)

This removes the following comments ( why? ):

# Go through unpaired MSA seqs and remove any rows that correspond to the # sequences that are already present in the paired MSA.

SourceryAI · 2023-07-02T07:42:58Z

alphafold/data/parsers.py

-  if line[:4] == '#=GS':  # Description lines - keep if sequence in list.
+  if line.startswith('#=GS'):  # Description lines - keep if sequence in list.


Function _keep_line refactored with the following changes:

Replace str prefix/suffix check with call to startswith/endswith (str-prefix-suffix)

SourceryAI · 2023-07-02T07:42:58Z

alphafold/data/parsers.py

-    for line in f:
-      if _keep_line(line, seqnames):
-        filtered_lines.append(line)
-
+    filtered_lines.extend(line for line in f if _keep_line(line, seqnames))


Function truncate_stockholm_msa refactored with the following changes:

Replace a for append loop with list extend (for-append-to-extend)

SourceryAI · 2023-07-02T07:43:02Z

alphafold/data/parsers.py

-    else:
-      seen_sequences.add(masked_alignment)
-      seqnames.add(seqname)
-
-  filtered_lines = []
-  for line in stockholm_msa.splitlines():
-    if _keep_line(line, seqnames):
-      filtered_lines.append(line)
+    seen_sequences.add(masked_alignment)
+    seqnames.add(seqname)

+  filtered_lines = [
+      line for line in stockholm_msa.splitlines()
+      if _keep_line(line, seqnames)
+  ]


Function deduplicate_stockholm_msa refactored with the following changes:

Convert for loop into list comprehension (list-comprehension)

Remove unnecessary else after guard condition (remove-unnecessary-else)

SourceryAI · 2023-07-02T07:43:02Z

alphafold/data/parsers.py

-    for i in range(len(block_starts) - 1):
-      hits.append(_parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]]))
+    hits.extend(
+        _parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]])
+        for i in range(len(block_starts) - 1))


Function parse_hhr refactored with the following changes:

Replace a for append loop with list extend (for-append-to-extend)

SourceryAI · 2023-07-02T07:43:02Z

alphafold/data/parsers.py

-  # Example 1: >4pqx_A/2-217 [subseq from] mol:protein length:217  Free text
-  # Example 2: >5g3r_A/1-55 [subseq from] mol:protein length:352
-  match = re.match(
+  if match := re.match(
      r'^>?([a-z0-9]+)_(\w+)/([0-9]+)-([0-9]+).*protein length:([0-9]+) *(.*)$',
-      description.strip())
-
-  if not match:
+      description.strip(),
+  ):
+    return HitMetadata(
+        pdb_id=match[1],
+        chain=match[2],
+        start=int(match[3]),
+        end=int(match[4]),
+        length=int(match[5]),
+        text=match[6])
+  else:
    raise ValueError(f'Could not parse description: "{description}".')

-  return HitMetadata(
-      pdb_id=match[1],
-      chain=match[2],
-      start=int(match[3]),
-      end=int(match[4]),
-      length=int(match[5]),
-      text=match[6])


Function _parse_hmmsearch_description refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

Lift code into else after jump in control flow (reintroduce-else)

Swap if/else branches (swap-if-else-branches)

This removes the following comments ( why? ):

# Example 1: >4pqx_A/2-217 [subseq from] mol:protein length:217 Free text # Example 2: >5g3r_A/1-55 [subseq from] mol:protein length:352

SourceryAI · 2023-07-02T07:43:02Z

alphafold/data/parsers.py

-    aligned_cols = sum([r.isupper() and r != '-' for r in hit_sequence])
+    aligned_cols = sum(r.isupper() and r != '-' for r in hit_sequence)


Function parse_hmmsearch_a3m refactored with the following changes:

Replace unneeded comprehension with generator (comprehension-to-generator)

SourceryAI · 2023-07-02T07:43:03Z

alphafold/data/pipeline.py

-  features = {}
-  features['aatype'] = residue_constants.sequence_to_onehot(
-      sequence=sequence,
-      mapping=residue_constants.restype_order_with_x,
-      map_unknown_to_x=True)
+  features = {
+      'aatype':
+      residue_constants.sequence_to_onehot(
+          sequence=sequence,
+          mapping=residue_constants.restype_order_with_x,
+          map_unknown_to_x=True,
+      )
+  }


Function make_sequence_features refactored with the following changes:

Merge dictionary assignment with declaration (merge-dict-assign)

SourceryAI · 2023-07-02T07:43:05Z

alphafold/data/templates.py

-  cif_path = os.path.join(mmcif_dir, hit_pdb_code + '.cif')
+  cif_path = os.path.join(mmcif_dir, f'{hit_pdb_code}.cif')


Function _process_single_hit refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Remove unnecessary else after guard condition (remove-unnecessary-else)

SourceryAI · 2023-07-02T07:43:05Z

alphafold/data/templates.py

-    template_features = {}
-    for template_feature_name in TEMPLATE_FEATURES:
-      template_features[template_feature_name] = []
-
+    template_features = {
+        template_feature_name: []
+        for template_feature_name in TEMPLATE_FEATURES
+    }


Function HhsearchHitFeaturizer.get_templates refactored with the following changes:

Convert for loop into dictionary comprehension (dict-comprehension)

Use items() to directly unpack dictionary values (use-dict-items)

SourceryAI · 2023-07-02T07:43:05Z

alphafold/data/templates.py

-    template_features = {}
-    for template_feature_name in TEMPLATE_FEATURES:
-      template_features[template_feature_name] = []
-
+    template_features = {
+        template_feature_name: []
+        for template_feature_name in TEMPLATE_FEATURES
+    }


Function HmmsearchHitFeaturizer.get_templates refactored with the following changes:

Convert for loop into dictionary comprehension (dict-comprehension)

Use items() to directly unpack dictionary values (use-dict-items)

SourceryAI · 2023-07-02T07:43:05Z

alphafold/data/tools/hhblits.py

-      if not glob.glob(database_path + '_*'):
+      if not glob.glob(f'{database_path}_*'):


Function HHBlits.__init__ refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

SourceryAI · 2023-07-02T07:43:05Z

alphafold/data/tools/hhblits.py

-        db_cmd.append('-d')
-        db_cmd.append(db_path)
+        db_cmd.extend(('-d', db_path))


Function HHBlits.query refactored with the following changes:

Merge consecutive list appends into a single extend (merge-list-appends-into-extend)

SourceryAI · 2023-07-02T07:43:09Z

alphafold/data/tools/hhsearch.py

-      if not glob.glob(database_path + '_*'):
+      if not glob.glob(f'{database_path}_*'):


Function HHSearch.__init__ refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

SourceryAI · 2023-07-02T07:43:09Z

alphafold/data/tools/hhsearch.py

-        db_cmd.append('-d')
-        db_cmd.append(db_path)
+        db_cmd.extend(('-d', db_path))


Function HHSearch.query refactored with the following changes:

Merge consecutive list appends into a single extend (merge-list-appends-into-extend)

SourceryAI · 2023-07-02T07:43:10Z

alphafold/data/tools/hmmsearch.py

-    template_hits = parsers.parse_hmmsearch_a3m(
-        query_sequence=input_sequence,
-        a3m_string=a3m_string,
-        skip_first=False)
-    return template_hits
+    return parsers.parse_hmmsearch_a3m(query_sequence=input_sequence,
+                                       a3m_string=a3m_string,
+                                       skip_first=False)


Function Hmmsearch.get_template_hits refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-07-02T07:43:10Z

alphafold/data/tools/jackhmmer.py

-    raw_output = dict(
-        sto=sto,
-        tbl=tbl,
-        stderr=stderr,
-        n_iter=self.n_iter,
-        e_value=self.e_value)
-
-    return raw_output
+    return dict(sto=sto,
+                tbl=tbl,
+                stderr=stderr,
+                n_iter=self.n_iter,
+                e_value=self.e_value)


Function Jackhmmer._query_chunk refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-07-02T07:43:10Z

alphafold/data/tools/jackhmmer.py

-      single_chunk_results = []
-      for input_fasta_path in input_fasta_paths:
-        single_chunk_results.append([self._query_chunk(
-            input_fasta_path, self.database_path, max_sequences)])
-      return single_chunk_results
-
+      return [[
+          self._query_chunk(input_fasta_path, self.database_path, max_sequences)
+      ] for input_fasta_path in input_fasta_paths]


Function Jackhmmer.query_multiple refactored with the following changes:

Convert for loop into list comprehension (list-comprehension)

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-07-02T07:43:12Z