Skip to content

trouble getting started #2

@cnjr2

Description

@cnjr2

Thank you for the development of ProFET!

I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.

I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:

feat_extract/
|_pipeline.py
|_other ProFET files...
|_test_seq/...
|_train/
| |_A/
| | |_train_sequences_A.fasta
| |_B/
|   |_train_sequences_B.fasta
|_test
  |_A/
  | |_test_sequences_A.fasta
  |_B/
    |_test_sequences_B.fasta

The fasta files were created with the following set of commands:

    cd ./test_seq/Extracellular/
    tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
    tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
    head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
    head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
    cd ../../

When running the command:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

I get the following error message:

<cProfile.Profile object at 0x107745db0>
Starting to extract features from training set
dirr change to: ./train
Multiclass fasta_files list found: []
Features generated
Removing any all zero features
df.shape:  (0, 0)
df_cleaned shape:  (0, 0)
Done
Extracted training data features
Training predictive model
Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 90, in pipeline
    model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier
    features, labels, lb_encoder,featureNames = load_data(filename, 'file')
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data
    df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well?
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in __init__
    self._make_engine(self.engine)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3173)
  File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912)
OSError: File b'./train/trainingSetFeatures.csv' does not exist

It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).

Thank you for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions