-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DeepSmith to generate programs in other languages #47
Comments
Hi Jose, Sorry for the slow response! I was away when you sent this. So, you reported a couple of problems: "Importing 0 $LANG repos ..." when trying to scrape filesI suspect there may be something janky your config file. Could you please paste the contents of the "clone list" you're using here? "No such file or directory" with CLgen local_tar_archiveYou're absolutely right about setting the path in $ bazel run //deeplearning/clgen -- --clgen_debug --config=/path/to/the/config/file Cheers, |
Hi Chris, No worries about the slow response. This error "Importing 0 $LANG repos ..." when trying to scrape files I already fixed it. Now the problem seems to be with the extractor corpus. When I run:
It builds perfectly, but it doesn't export any file to the the export directory. By debugging the code, I found out that the line 84 from export_corpus.py the condition
it's always being false. By running the command with `INFO: Analysed target //deeplearning/clgen:clgen (0 packages loaded). Traceback (most recent call last): File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/clgen.py", line 294, in <module> app.run(main) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/pypi__absl_py_0_1_10/absl/app.py", line 274, in run _run_main(main, argv) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/pypi__absl_py_0_1_10/absl/app.py", line 238, in _run_main sys.exit(main(argv)) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/clgen.py", line 290, in main RunWithErrorHandling(DoFlagsAction) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/clgen.py", line 200, in RunWithErrorHandling return function_to_run(*args, **kwargs) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/clgen.py", line 244, in DoFlagsAction instance = Instance(config) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/clgen.py", line 100, in __init__ self.model: models.Model = models.Model(config.model) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/models/models.py", line 67, in __init__ self.corpus = corpuses.Corpus(config.corpus) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/corpuses/corpuses.py", line 113, in __init__ self.content_id = ResolveContentId(self.config, hc) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/corpuses/corpuses.py", line 361, in ResolveContentId path_prefix=FLAGS.clgen_local_path_prefix)) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/phd/deeplearning/clgen/corpuses/corpuses.py", line 416, in GetHashOfArchiveContents return checksumdir.dirhash(d, 'sha1') File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/pypi__checksumdir_1_0_5/checksumdir/__init__.py", line 40, in dirhash hash_func) for f in files if not File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/pypi__checksumdir_1_0_5/checksumdir/__init__.py", line 41, in <listcomp> f.startswith('.') and not re.search(r'/\.', f)]) File "/home/jwesley/.cache/bazel/_bazel_jwesley/4a1188fa51f277d88b59f17a8b59eb11/execroot/phd/bazel-out/k8-py3-opt/bin/deeplearning/clgen/clgen.runfiles/pypi__checksumdir_1_0_5/checksumdir/__init__.py", line 48, in _filehash with open(filepath, 'rb') as fp: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/clgen_corpus_zn4ptcvg/corpus/vphn.c'` |
My config file looks like this:
|
Ah! My mistake. It seems I accidentally committed some debugging code :) This region of the code should be uncommented: and this section should be commented: That should fix the corpus exporting :) For the CLgen corpus error, I tried your config and couldn't reproduce your error. Can you check that the corpus archive contains nothing but the text files you want to train on? Here's the steps I took to try and reproduce: # Create a single file "corpus"
$ cat <<EOF > main.c
int main() {
int x = 1;
int y = 2;
return x + y;
}
EOF
# Create the corpus tarball
$ tar cjvf c_corpus.tar.bz2 main.c
# Remove any cached files from previous failed runs
$ rm -rf /tmp/phd/deeplearning/clgen
# Run CLgen on the config.
$ bazel run //deeplearning/clgen -- --config=$PWD/config.pbtxt
...
I0724 17:45:08.671088 4568839616 preprocessed.py:188] Preprocessing 1 of 1 content files
...
I0724 17:45:09.225190 4568839616 encoded.py:226] Encoding 1 of 1 preprocessed files
...
I0724 17:45:09.271282 4568839616 encoded.py:173] Encoded corpus: 53 tokens, 1 files.
E0724 17:45:09.278481 4568839616 clgen.py:246] Not enough data. Use a smaller sequence_length and batch_size (UserError) As you can see, the above commands won't train a model (you need more than a single file to train on), but hopefully it'll be enough to know that we're both running the same commands. Cheers, |
By commenting the lines you suggested in export_corpus.py, it gives the error:
I thought this happens because Python version, but I'm using version 3.6.8 About the error with CLgen, I kinda fixed it, and I could run now, it trains a model!! void A(bnp (%c3),%k5"); or this: void A(void))
} My tar file contains only source (.c) files I extracted from the repositories I cloned using this. I extracted them with a shell script. Do you have any ideia on why the kernels looks like that? Is there a problem with my corpus? |
The https://github.com/ChrisCummins/phd/blob/master/datasets/github/scrape_repos/export_corpus.py#L76 should be: db = contentfiles.ContentFiles(f'sqlite:///{d}') Sorry about that :) Your progress is promising! But clearly the model has not learned anything useful yet. There's a whole bunch of potential issues to narrow down, some starting things to consider include:
It'll be quicker and easier to go through them on a call. Shoot me an email and we can have a Google Hangouts chat. Of course, there could also be something broken in CLgen's model training/sampling logic. I'm currently working on a private fork which has a handful of improvements, but it isn't ready for release yet Cheers, |
Hi @ChrisCummins .
I'm trying to use your neural network to generate kernels in other languages. I'm currently following the comment #30 (comment). Is that the best way to do it?
I tried to create a corpus following this: https://github.com/ChrisCummins/phd/tree/master/datasets/github/scrape_repos.
All the commands ran, I scraped and cloned the repos, and when I run importer it gives me
Importing 0 $LANG repos ...
Is that right? Am I missing something?
Another question, given a tar.gz file with source code files, how do I create a CLgen model? How can I specify a config file for it? I tried follow this file, changing the work_dir, the local_tar_archive and the sampler , but when I run
bazel run //deeplearning/clgen -- --config=/path/to/the/config/file
, it gives me the error:clgen.py:176] [Errno 2] No such file or directory: '/tmp/clgen_corpus_lh_2nyrr/corpus/some_file.c' (FileNotFoundError)
The text was updated successfully, but these errors were encountered: