Skip to content
Scott Sievert edited this page Oct 30, 2019 · 15 revisions

Table of contents

Setup

Basics, and some tips

Issues


I get an error when I try to launch an experiment

Specifically, this is seen:

$ python launch.py strange_fruit_triplet/init.yaml strange_fruit_triplet/strangefruit30.zip                   
# ...
An error occured launching the experiment:

The difference between the request time and the current time is too large.

Details:
Traceback (most recent call last):
# ...

Traceback (most recent call last):
# ...
ValueError: Experiment didn't launch successfully

To resolve this, restart docker. More is covered in this SO answer: https://stackoverflow.com/questions/27674968/amazon-s3-docker-403-forbidden-the-difference-between-the-request-time-and

I'm running into Docker issues. How do I resolve these?

Try removing all the docker machines and resetting to factory defaults. [A guide on this] (https://techoverflow.net/blog/2013/10/22/docker-remove-all-images-and-containers/) and the related GitHub issue says that

#!/bin/bash
# Delete all containers
docker rm $(docker ps -a -q)
# Delete all images
docker rmi $(docker images -q)

On Mac, I set Docker for Mac to factory defaults in Docker for Mac > Preferences > Reset. This also can be run using

Note: this will delete all Docker data! Be sure to backup and/or delete if you need to.

Experiments hang at initialization. How do I fix this?

The issue: you launch an experiment (possibly through test_api.py). But all that's printed to stdout is Lauching experiment... and the docker logs don't get past the initial printing. It appears to be hanging at the initialization of the experiment.

A method we've tried with some success has been

docker-compose stop  # typically through cntrl-C
docker-compose rm
docker-compose up  # or ./docker_up.sh

If you launched via the AMI, you can stop/start your machine. This will restart the docker machines fix this issue.

How do I get many participants for some study?

We have used Mechanical Turk. We setup some machine (and obtain the URL), then direct them to this URL. Here they answer 50 questions, with no interactions with MTurk. At the end, we ask them to copy-paste their User ID (shown by default at the end of the study) back into MTurk. Using this, we can verify that they responded.

How do I specify the participant ID?

Instead of going to [next-url]:8000/.../[exp-uid], go to [next-url]:8000/.../[exp-uid]?participant=[id].

e.g., instead of http://localhost:8000/query/query_page/query_page/368de69569286ce0ba8a3f40b58a2a go to http://localhost:8000/query/query_page/query_page/368de69569286ce0ba8a3f40b58a2a?participant=scott

How do I get response and experiment information?

The response information includes participant and experiment ID, timestamp, query information, etc. Experiment information includes the results and any internal variables.

To access the response information, download the response as a JSON (or CSV is your application supports it).

Note: The participant ID shown to each user at the end of the study is available in this information. This JSON/CSV includes a field participant_uid which is [experiment-id]_[participant-id].

How do I access all the targets?

targets = butler.targets.get_targetset(butler.exp_uid)

How do I restart a machine I stopped on EC2?

  • Did you launch via the next_ec2.py script?
    • Use the next_ec2.py script. It has commands for start and stop, which mirror the EC2 GUI commands in the "Actions > Instate state" menu.
  • AMI launch
    • The "Actions > Instate state" menu should work.

Please, make backups before stopping machines.

How do I time every line in a function?

You add the lines

from decorator import decorator
from line_profiler import LineProfiler

@decorator
def profile_each_line(func, *args, **kwargs):
    profiler = LineProfiler()
    profiled_func = profiler(func)
    retval = None
    try:
        retval = profiled_func(*args, **kwargs)
    finally:
        profiler.print_stats()
    return retval

somewhere along with all your other imports at the beginning of the file, and then you can use @profile_each_line as a decorator for any function you wish to profile and you will get many statistics about how long each line takes to run, and which lines took what percentage of the total runtime, etc. This can be used with

@profile_each_line
def f(x):
    # ... perform lots of computation
    y = x**2 + x + 1
    return y

This will print the timing of each line to stdout after the function runs.

How do I include feature vectors with targets?

There are three options:

  1. NEXT accepts a list of dictionaries as targets. These dictionaries get stored in the butler and are accessible. This requires launching the experiment yourself (and writing any necessary scripts).
  2. Enforcing that feature vectors be passed in to your app in initExp can be done in the YAML. This requires developing your own app.
  3. We have also developed a feature to allow adding feature vectors to images to the examples in example/. The example below will illustrate adding feature vectors to an existing application, the primary empirical use case we have seen.

The third option in detail:

Advantages of this approach include using your algorithm with an existing application/framework. You can easily compare your algorithm with other algorithms. A new algorithm has a choice of paying attention to feature vectors or not; it's up to the developer of that algorithm.

We need to modify the file that launches the experiment on NEXT (e.g., examples/strangefruit/experiment_triplet.py). In this, if we include a key target_features in the experiment dictionary, feature vectors will be added by example/launch_experiment.py (note: only for images ending in .png or .jpg).

The dictionary we add will have keys of different filenames and values of the feature vector. i.e., the dictionary is the form of {filename: feature_vector}. We

experiment['primary_type'] = 'image'
target_zip = 'strangefruit30.zip'
experiment['primary_target_file'] = target_zip
experiment['target_features'] = {filename.split('/')[-1]: np.random.rand(2).tolist()  # tolist() because numpy array not serializable
                                 for filename in zipfile.ZipFile(target_zip).namelist()}
# filename.split above removes 'strangefuit/' from 'strangefruit/image.png'. Required for
# use of lauch_experiment.py (which the examples in next/examples use)

Note: This is only provides information on putting features in targets. It not give information on how to load feature vectors (although I would use np.load or scipy.io.loadmat).

Then to access these in myAlg.py, in initExp we include these lines:

import numpy as np

class myAlg:
    def initExp(self, butler, ...):
        targets = butler.targets.get_targetset(butler.exp_uid)
        feature_matrix = [target['feature_vector'] for target in targets]
        feature_matrix = np.array(feature_matrix)
        # ...

Launching an experiment takes a long time. How do I debug with this?

If you run docker-compose stop; docker-compose start, your experiment will remain (docker-compose stop is typically run via Cntrl-C). For more detail, see the wiki page on debugging.

If you run docker-compose rm it will remove your experiments. This will reset NEXT to look like it just launched with no experiments. Only run this if you're okay with removing all your experiment data.

Clone this wiki locally