Information for members of EIO organizing team

The main activities of EIO organizing committee are:

Creating tasks for various contests, and organizing these contests.
Organizing workshops ("õppesessioonid") for students to learn about contest programming.

Tasks and contests

There are 4 main contests during the school year:

Estonian Open (in October) - usually 7 tasks
Olympiad preliminary round (early December) - usually 5 tasks from which each contestant picks 3 to solve
Olympiad final round (mid-February) - usually 5 tasks from which each contestant picks 3 to solve
National team selection (March-April) - 2 days with 3 tasks each
Also, we are expected to submit a task to BOI

Total of 24 tasks per season.

Task creation process

Our process is following:

Before a contest round (as a matter of fact, you can do it any time), people are encouraged to send task ideas to the EIO mailing list and/or "task pile" (https://efo.herokuapp.com/). Ask Targo/Ahto/Sandra for permissions to the site.
The organizing team meets a few weeks before the contest.
In the meeting, we choose the tasks, and assign them to members of the organizing team. It is common to assign the tasks to idea authors themselves.
Also, we choose the technical administrator (usually Kostja) and content coordinator (usually Ahto) of the contest.
Task owners:
- Write the final statement for the task, and send it to the content coordinator.
  - Take care to specify the formats and limitations of inputs and outputs as clearly as possible.
  - You can write the text as plain text or use TeX.
  - Note that each task needs a longer title (that will be translated) and a shorter (usually 2-3 letters) name (that will be used by the grading system).
- Create a set of test cases, and send it to the technical admin.
  - Test cases should be text files named input0.txt/output0.txt, input1.txt/output1.txt etc.
  - The list of test cases should start with the examples from the task text.
- Write a sample solution for the task, and send it to the technical admin.
  - The solution should read from standard input and write to standard output.
  - In sample solution, include comments describing the solution idea and algorithms that were used.

Creating a good task

For beginners, it is fine to use any creative programming task (example: count valid moves for a chess piece).

For advanced tasks, anything covered by IOI syllabus is fair game. Usually, there would be a brute-force method of solving the problem, and a clever algorithm method. The best tasks have multiple tiers of possible solutions (e.g. O(2^N), O(N^3), O(N*logN) complexity).

"Implement a well-known standard algorithm" is not a good task. A good task should either have an interesting and somewhat original solution idea or a "natural" statement.

Preparing the task

So, you had a good task idea and the jury decided to use it in a contest. Now it's time to prepare it.

The folder of each task in the contest repository has the following structure:

task-short-name/
- gen/
  - GEN
- check/
  - checker.cpp
- input/
  - input0.txt
  - input1.txt
  - input2.txt
  - ...
- output/
  - output0.txt
  - output1.txt
  - output2.txt
  - ...
- solution/
  - (various solution files)
- statement/
  - statement.et.tex
  - statement.en.tex
- task.yaml

Creating tests

Make your tests as thorough as possible, covering various special cases, minimum/maximum inputs etc. If your task has multiple possible solutions with different complexity, make sure the scoring reflects it.

For example, if there's a brute force solution and a clever solution then a good rule of thumb is to give 50% of points for the brute-force (but correct) solution, and 50% for the fast solution. If there's a slow, medium and fast solution, you can do 40%-30%-30% etc.

GroupMin and GroupSum

Test cases are generally divided into subtasks. There are two basic types of scoring systems:

GroupSum. Test cases are grouped into subtasks and every subtask has a point value. A contestant gets points for a subtask proportional to the number of test cases they solved in that subtask. A sub-type of GroupSum is GroupSumCond, explained below.
GroupMin, also known as batch scoring. Test cases are grouped into subtasks and every subtask has a point value. A contestant only gets points for a subtask if they correctly solve every test in the subtask.

In some tasks you might want to give partial credit for a test case. For example, if the task is "find the shortest path in a graph", you might want to give half of the points if the length of the path is correct and the other half if the path itself is also correctly printed in the output. In such cases, similar logic applies to GroupSum and GroupMin. Suppose a contestant correctly calculates the length of the path in every test case of a subtask, but outputs the wrong path in one test case. In case of GroupSum, they will get full credit for all test cases except for the one with the mistake. In case of GroupMin, they will only get half of the points for the entire subtask. Whether the task uses GroupSum or GroupMin is defined in task.yaml.

In easier tasks (tasks 1, 2 and possibly 3 of a typical contest), we usually use GroupSum. These tasks are aimed at beginners and we try to be friendly. However, there are some cases in which GroupSum can't be used. For example, if the task has a yes/no answer (or the output format is "yes, here's a solution: ..."/"no, there is no solution"), then a contestant can potentially get a large number of points by submitting a program that simply outputs "no" in every case. GroupSumCond is designed precisely for this case; it divides groups into unconditional and conditional ones: the scoring for unconditional groups works the same as in GroupSum, but the scores for conditional groups are only awarded to programs scoring at least some points for unconditional groups.

In harder tasks however, GroupMin is strongly recommended. Why?

Contestants who these problems are aimed at are no longer beginners. They should be able to reason about an algorithm's correctness without extensive handholding.
In many tasks, it is easy to get a large number of points by submitting unproven and wrong heuristics. This undermines the difficulty of the task: the contestant is supposed to understand the problem, gain insights to the internal structure of the problem, invent an algorithm and have a clear understanding of why that algorithm is 100% correct in every case. Writing random heuristics completely sidesteps the thought process required to solve the problem.
Using GroupSum may result in contestants "fishing" for points by making irrelevant changes to their code. Suppose you have implemented 8 wrong greedy solutions and developed a countertest for each of them. If you used GroupSum, a contestant might implement one of these and get 7/8 of the points. If they implement two and choose one based on a heuristic that is in reality irrelevant, they may even be able to get all of the points.
Every international competition we send students to uses GroupMin or an equivalent mechanism. Students representing Estonia in such contests should be very comfortable competing in that environment.

If you use GroupMin, also document that in the problem statement's scoring section. There is a standard text we use ("Selles ülesandes on testid jagatud gruppidesse...").

The GEN file

The grouping of tests to subtasks is defined in the gen/GEN file. In its simplest form, it looks like this:

# ST: 0
0
1
# ST: 2
2
# ST: 4
3
4
# ST: 20
5
6
7
8
9
# ST: 4
10
11

There are 12 test cases in this problem, divided into 4 subtasks, worth 2, 4, 20 and 4 points, respectively. For example, the subtask worth 20 points consists of tests 5...9. Tests 0 and 1 are the examples shown in the task statement. Since these tests do not give points (but we still want to judge these on the server), they are in a separate subtask, worth 0 points.

However, this is not very good because it doesn't tell us where the tests came from. In most cases, you probably aren't typing up the test cases by hand, especially since it is common to have test cases with some 100 000 lines of data. It's a good idea to put these test generation scripts in the repository as well: if there are issues with the test cases and you aren't available, others can figure out where the problems are coming from and fix them. And usually, GEN is the perfect place to document which generation scripts and which parameters correspond to which test cases. The syntax of GEN allows to use it as a shell script. Here is an example from a recent contest.

# ST: 0
# Examples
touch ../input/input0.txt
touch ../input/input1.txt
touch ../input/input2.txt
# ST: 20
# 1-bigk
touch ../input/input3.txt
../bin/gen_simple -n=10 -m=15 -wnx=3 -K=10 > ../input/input4.txt
../bin/gen_simple -n=10 -m=20 -wnx=3 -K=10 > ../input/input5.txt
(some lines omitted)
../bin/gen_simple -n=1000 -m=300000 -wnx=1 -K=1000 > ../input/input10.txt
../bin/gen_dense -n=1000 -m=300000 > ../input/input11.txt
../bin/gen_simple -n=100000 -m=300000 -wnx=200 -K=100000 > ../input/input12.txt
../bin/gen_simple -n=300000 -m=300000 -wnx=200 -K=300000 > ../input/input13.txt
# ST: 20
# 2-20
../bin/gen_simple -n=7 -m=8 -wnx=3 -K=1 > ../input/input14.txt
(some lines omitted)
../bin/gen_dense -n=20 -m=150 > ../input/input20.txt
# ST: 20
# 3-1000
../bin/gen_simple -n=900 -m=1000 -wnx=10 -K=3 > ../input/input21.txt
../bin/gen_simple -n=990 -m=1000 -wnx=10 -K=10 > ../input/input22.txt
../bin/gen_simple -n=999 -m=1000 -wnx=10 -K=20 > ../input/input23.txt
../bin/gen_dense -n=45 -m=980 > ../input/input24.txt
# ST: 40
# 4-full
../bin/gen_simple -n=100000 -m=300000 -wnx=100 -K=1 > ../input/input25.txt
../bin/gen_simple -n=100000 -m=300000 -wnx=100 -K=10 > ../input/input26.txt
(some lines omitted)
../bin/gen_simple -n=299900 -m=300000 -wnx=100 -K=1000 > ../input/input36.txt
../bin/gen_dense -n=1000 -m=300000 again > ../input/input37.txt
../bin/comet -n=300000 -head=100 -K=10 > ../input/input38.txt
../bin/comet -n=300000 -head=100 -K=100 > ../input/input39.txt

Every line corresponds to a test case. In most cases, we execute a generation script and pipe the input to the corresponding input file. Some test cases are handwritten, those we simply touch. Comments not starting with # ST: can be used as regular comments.

In tasks using the GroupSumCond scoring methods, the # ST: headings have an additional flag: # ST: 30 U is an unconditional group worth 30 points, # ST: 20 C is a conditional group worth 20 points (for which any score is awarded only to programs that get at least some points for unconditional groups), # ST: 0 E is used for the examples.

Advice for writing strong tests

Implement every wrong heuristic you can think of and make sure all of them fail. If your solution has many special cases, try commenting out each case and check that the solution fails.

Randomize all numbers that can be randomized. For example, when generating a graph, randomize the labels of the vertices, the order in which edges are given and the order of vertices in each edge.

In some cases you might want to consider putting multiple test cases in the same file, as is common in university competitions. In the task statement, it might look something like this: "The first line of the input contains a single integer T (1 <= T <= 1000): the number of test cases. Each test case is described as follows. (...) It is guaranteed that the sum of N over all test cases doesn't exceed 100 000." It may prove to be very hard to fail all possible wrong solutions (even solutions that consist of total nonsense) using only a dozen or so test cases. Putting multiple test cases in a file allows you to use thousands of test cases; a wrong solution will almost surely have a wrong output on at least one of them. This can't be used in all tasks however:

It defeats the purpose of GroupSum, so it should only be used in hard tasks.
Having multiple test cases in a single file increases the (mental) complexity of implementing a correct solution, so it should only be used in hard tasks.
In some cases, implementing a solution that processes multiple test cases may make the contestant's life unreasonably more difficult, e.g. when a contestant is likely to use a lot of global state.

Creating custom checkers

A checker is a program that consumes the input file, the jury's output file and the contestant's output file and decides how many points should be given. If the correct output is unique in all test cases, you don't need to write a checker. However, if there are multiple possible correct outputs, we need a custom checker to assess the solution's correctness.

The checker must compile to an executable that takes three parameters <input file> <correct output file> <student's output file>, and outputs the score to standard output as a number from 0 to 1 indicating the fraction of the value of the test case the solution earned (most common are 0 for nothing and 1 for full score; however, some tasks may also allow partial scores).
The script should also write a descriptive message (e.g. "OK" or "Wrong answer") to the standard error stream. Note however that contestants can see these messages in the contest systems. This has the following implications:
- Carefully consider how much (and what) information to give out. Contestants may be able to work out details that are supposed to be hidden. We once had a task where the checker printed the following messages: "jury has an answer but participant doesn't", "jury has an answer, participant has wrong answer" and "jury has no answer". A contestant used this information to correctly arrive at a classification of test cases that have an answer, something that should've been derived by thinking about the problem.
- Avoid messages that may confuse the contestant (e.g. messages that are illegible to people who don't know how the checker works). This may lead contestants on a wild goose chase or generate a lot of questions to the jury.
The return code must be 0 in any case.
The checker must survive any garbage output that the solution might create.

testlib.h

testlib.h is a header-only C++ library for preparing programming contest problems. It is particularly useful for writing checkers, but it can also be used for generators. We have a variant of testlib.h suitable for use in our CMS.

Why use it?

Correctness in checkers. Recall that checkers must survive any garbage output that the solution may generate. This means that using std::cin >> n to read an integer from a contestant's output file is inadequate. Using this n in the program logic may lead to the checker crashing altogether. Testlib defines functions like ouf.readInt(), ouf.readLong() etc. that check if the token it just read is in fact what it is supposed to be. Otherwise, the participant gets a presentation error and no points.
Portability. You or someone else might want to reuse the task in a workshop, which may mean uploading the task to a different contest system (e.g. Codeforces). In other systems, the checker protocol is different: for example, on Codeforces, the verdict is communicated via return codes instead of standard output and the checker comment is communicated via standard output, not standard error. If you use testlib.h, this is not an issue at all: on Codeforces, your checker is compiled instead with their version testlib.h which defines the correct protocol.

Sample contest timeline

Any time before contest. Team members send ideas to EIO list or to the "pile".
Three-four weeks before the contest. Team meets to discuss ideas and assign owners.
One-two weeks before the contest at the latest (preferably earlier). Task owners send text, solutions and test data to content and tech coordinators.
A few days before the contest. Content owner sends final versions of the texts for review.
CONTEST TAKES PLACE.
Same day: publish all texts, tests, sample solutions and preliminary results.
Appeals (up to one week for Estonian Open and the preliminary round, same/next day only for other contests).
Final results published.
- For results, include students' name, school, and class.

Workshops

Workshops usually have 6 academic hours of work per day. Ideally, about 40% of time would be spent by the lecturer, explaining the material, and 60% would be spent on practical work. Seek out sample problems for the topics; https://codeforces.com/ and https://uva.onlinejudge.org/ are good resources.

Below is a sample division of topics between the workshops. This is not set in stone; bits and pieces may be shifted around. The "UVA" tag denotes task numbers at https://uva.onlinejudge.org/

Traditionally, we have organized the following workshops:

Fall workshop in November. One weekend with separate tracks for beginners and advanced, total of 12 academic hours per track.
- Beginners track
  - Analysis of Estonian Open
  - Basics of contest programming
    - Input/output to files and std
    - Data types, their limits and precision
    - String storage
    - Subroutines
    - Recursion; divide and conquer; recursive backtracking
  - Sorting and searching
    - Bubble sort
    - Quicksort
    - Binary search
  - Brute force (exhaustive search)
    - UVA: 861, 10237, 10128, 10270
  - Dynamic programming basics
    - Top-down DP (memorized recursion)
    - Bottom-up DP
- Advanced track
  - Analysis of Estonian Open
  - Greedy algorithms
  - Advanced dynamic programming
    - Edit distance
    - Travelling Salesman
Winter workshop in January. Similar structure to the fall one.
- Beginners track
  - Preliminary round tasks analysis
  - Basic number theory: Euclid's algorithm; sieve of Eratosthenes; efficient exponentiation
    - UVA: 10110, 10035, 10127, 10104, 10202, 10929
  - Basic data structures
    - List
    - Heap
    - Stack and queue
    - Binary search tree
    - Library classes for various data structures in Python, Java, C++
  - Algorithm complexity; common complexity classes (constant, logarithmic, linear, O(n log n), quadratic, cubic, exponential, factorial); time and space tradeoffs in algorithms
  - Basic graph theory
    - Graph theory terms; trees and their basic properties
    - Representations of graphs: adjacency matrix, adjacency list, edge list
    - Graph traversal: DFS, BFS
    - Flood fill
    - Topological sorting
    - Weighted and directed graphs; Dijkstra's algorithm
- Advanced track
  - Preliminary round tasks analysis
  - Advanced data structures
    - Binary indexed trees (Fenwick trees)
    - Segment trees
    - O(log n) time algorithms for answering lowest common ancestor queries in a static rooted tree
    - 2-D Fenwick trees
    - Lazy Propagation in segment trees; O(log n) algorithms for range assignment, operating (like addition) and queries
    - UVA: 11402, 12086
  - Advanced graphs
    - Topological sorting
    - Spanning tree, Kruskal's algorithm, union-find
    - Finding Euler cycle
    - Floyd-Warshall algorithm
    - Bipartite graphs
    - Max flow/min cut
    - UVA: 00599, 10895, 11991, 00793, 10507, 11503, 820, 11045, 10480
Estonian team preparation in March. Usually together with the team selection contest.
- Combinatorics and counting: binomial coefficients; Minimax search
- Text algorithms
  - Parsing
  - Hashing based quick substring comparison
  - Knuth-Morris-Pratt algorithm
  - Aho-Corasick algorithm
  - Suffix Arrays
- Geometric algorithms
  - Checking for collinear points, parallel/orthogonal vectors, clockwise turns
  - Intersection of two lines
  - Computing area of a polygon
  - Checking whether a (general/convex) polygon contains a point
  - Coordinate compression; weeping line method
  - O(n log n) time algorithms for convex hull
- Matrix algebra
  - UVA: 10229, 11582