module Set
Library functions for sets.
Usage
import Set;
Dependencies
import Exception;
import List;
import util::Math;
Description
The following library functions are defined for sets:
- classify
- getFirstFrom
- getOneFrom
- getSingleFrom
- group
- index
- isEmpty
- itoString
- jaccard
- mapper
- max
- min
- power
- power1
- reducer
- size
- sort
- sum
- takeFirstFrom
- takeOneFrom
- toList
- toMap
- toMapUnique
- toString
- top
- union
function classify
Classify elements in a set.
map[&K,set[&V]] classify(set[&V] input, &K (&V) getClass)
Examples
We classify animals by their number of legs.
rascal>import Set;
ok
Create a map from animals to number of legs.
rascal>legs = ("bird": 2, "dog": 4, "human": 2, "snake": 0, "spider": 8, "millepede": 1000, "crab": 8, "cat": 4);
map[str, int]: ("snake":0,"spider":8,"human":2,"crab":8,"cat":4,"bird":2,"dog":4,"millepede":1000)
Define function nLegs
that returns the number of legs for each animal (or 0
when the animal is unknown):
rascal>int nLegs(str animal){
>>>>>>> return legs[animal] ? 0;
>>>>>>>}
int (str): function(|prompt:///|(0,53,<1,0>,<3,1>))
Now classify a set of animals:
rascal>classify({"bird", "dog", "human", "spider", "millepede", "zebra", "crab", "cat"}, nLegs);
map[int, set[str]]: (
8:{"spider","crab"},
2:{"human","bird"},
4:{"cat","dog"},
1000:{"millepede"},
0:{"zebra"}
)
function group
Group elements in a set given an equivalence function.
set[set[&T]] group(set[&T] input, bool (&T a, &T b) similar)
Examples
We classify animals by their number of legs.
rascal>import Set;
ok
Create a map from animals to number of legs.
rascal>legs = ("bird": 2, "dog": 4, "human": 2, "snake": 0, "spider": 8, "millepede": 1000, "crab": 8, "cat": 4);
map[str, int]: ("snake":0,"spider":8,"human":2,"crab":8,"cat":4,"bird":2,"dog":4,"millepede":1000)
Define function nLegs
that returns the number of legs fro each animal (or 0
when the animal is unknown):
rascal>int nLegs(str animal){
>>>>>>> return legs[animal] ? 0;
>>>>>>>}
int (str): function(|prompt:///|(0,53,<1,0>,<3,1>))
rascal>bool similar(str a, str b) = nLegs(a) == nLegs(b);
bool (str, str): function(|prompt:///|(0,50,<1,0>,<1,50>))
Now group a set of animals:
rascal>group({"bird", "dog", "human", "spider", "millepede", "zebra", "crab", "cat"}, similar);
set[set[str]]: {
{"spider"},
{"zebra"},
{"human"},
{"crab"},
{"cat"},
{"bird"},
{"dog"},
{"millepede"}
}
WARNING: check compiler.
function index
Map set elements to a fixed index.
map[&T,int] index(set[&T] s)
Examples
rascal>import Set;
ok
rascal>index({"elephant", "zebra", "snake"});
map[str, int]: ("snake":0,"zebra":1,"elephant":2)
function isEmpty
Test whether a set is empty.
bool isEmpty(set[&T] st)
Yields true
if s
is empty, and false
otherwise.
Examples
rascal>import Set;
ok
rascal>isEmpty({1, 2, 3});
bool: false
rascal>isEmpty({});
bool: true
function mapper
Apply a function to all set elements and return set of results.
set[&U] mapper(set[&T] st, &U (&T) fn)
Return a set obtained by applying function fn
to all elements of set s
.
Examples
rascal>import Set;
ok
rascal>int incr(int x) { return x + 1; }
int (int): function(|prompt:///|(0,33,<1,0>,<1,33>))
rascal>mapper({1, 2, 3, 4}, incr);
set[int]: {5,3,2,4}
function max
Determine the largest element of a set.
&T max(set[&T] st)
Examples
rascal>import Set;
ok
rascal>max({1, 3, 5, 2, 4});
int: 5
rascal>max({"elephant", "zebra", "snake"});
str: "zebra"
---
zebra
---
function min
Smallest element of a set.
&T min(set[&T] st)
Examples
rascal>import Set;
ok
rascal>min({1, 3, 5, 2, 4});
int: 1
rascal>min({"elephant", "zebra", "snake"});
str: "elephant"
---
elephant
---
Examples
rascal>import Set;
ok
rascal>min({1, 3, 5, 4, 2});
int: 1
function power
Determine the powerset of a set.
set[set[&T]] power(set[&T] st)
Returns a set with all subsets of s
.
Examples
rascal>import Set;
ok
rascal>power({1,2,3,4});
set[set[int]]: {
{},
{1,2,4},
{1},
{3,2,4},
{3},
{1,3,2,4},
{1,3},
{2},
{4},
{1,2},
{1,4},
{3,2},
{3,4},
{1,3,2},
{1,3,4},
{2,4}
}
function power1
The powerset (excluding the empty set) of a set value.
set[set[&T]] power1(set[&T] st)
Returns all subsets (excluding the empty set) of s
.
Examples
rascal>import Set;
ok
rascal>power1({1,2,3,4});
set[set[int]]: {
{1,2,4},
{1},
{3,2,4},
{3},
{1,3,2,4},
{1,3},
{2},
{4},
{1,2},
{1,4},
{3,2},
{3,4},
{1,3,2},
{1,3,4},
{2,4}
}
function reducer
Apply a function to successive elements of a set and combine the results.
deprecated: marked for future deletion
Use a reducer expression instead, such as (init | fn(it,e) | e <- st)
.
&T reducer(set[&T] st, &T (&T,&T) fn, &T unit)
&T reducer(set[&T] _:{})
Apply the function fn
to successive elements of set s
starting with unit
.
Examples
rascal>import Set;
ok
rascal>int add(int x, int y) { return x + y; }
int (int, int): function(|prompt:///|(0,39,<1,0>,<1,39>))
rascal>reducer({10, 20, 30, 40}, add, 0);
int: 100
function size
Determine the number of elements in a set.
int size(set[&T] st)
Examples
rascal>import Set;
ok
rascal>size({1,2,3,4});
int: 4
rascal>size({"elephant", "zebra", "snake"});
int: 3
rascal>size({});
int: 0
function sum
(&T <:num) sum(set[(&T <:num)] _:{})
function sum
Sum the elements of a set.
default (&T <:num) sum({(&T <: num) e, *(&T <: num) r})
Examples
rascal>import Set;
ok
rascal>sum({3, 1, 4, 5});
int: 13
rascal>sum({3, 1.5, 4, 5});
num: 13.5
function getOneFrom
Pick an arbitrary element from a set.
&T getOneFrom(set[&T] st)
This randomly picks one element from a set, unless the set is empty.
Use Get Single From if you want the element from a singleton set. Get One From will silently continue even if there are more element present, which can be a serious threat to the validity of your analysis algorithm (arbitrary data is not considered).
Examples
rascal>import Set;
ok
rascal>getOneFrom({"elephant", "zebra", "snake"});
str: "elephant"
---
elephant
---
rascal>getOneFrom({"elephant", "zebra", "snake"});
str: "elephant"
---
elephant
---
rascal>getOneFrom({"elephant", "zebra", "snake"});
str: "snake"
---
snake
---
rascal>getOneFrom({"elephant", "zebra", "snake"});
str: "zebra"
---
zebra
---
Benefits
- Random sampling can be an effective test input selection strategy.
Pitfalls
- The name Get One From does not convey randomness.
- Get One From drops all the other elements. If you are sure there is only one element and you need it, then use Get Single From. It will fail fast if your assumption is wrong.
- If you need more then one element, then repeatedly calling Get One From will be expensive. Have a look at Sampling for more effective sampling utilities.
function getFirstFrom
Get "first" element from a set.
&T getFirstFrom(set[&T] st)
Get "first" element of a set. Of course, sets are unordered and do not have a first element.
However, we could assume that sets are internally ordered in some way and this ordering is reproducible (it's deterministic up to hashing collisions).
Applying getFirstFrom
on the same set will always returns the same element.
Use Get Single From if you want the element from a singleton set. Get First From will silently continue even if there are more element present, which can be a serious threat to the validity of your analysis algorithm (arbitrary data is not considered).
Benefits
This function helps to make set-based code more deterministic, for instance, for testing purposes.
Pitfalls
- The deterministic order is undefined. This means it may be stable between runs, but not between releases of Rascal.
- There are much better ways to iterate over the elements of a set:
- Use the
<-
enumerator operator in afor
loop or a comprehension. - Use list matching
- Use the
- Get First From drops all the other elements
- If you are sure there is only one element and you need it, then use Get Single From. It will fail fast if your assumption is wrong.
function getSingleFrom
Get the only element from a singleton set.
&T getSingleFrom(set[&T] st)
Get the only element of a singleton set. This fails with a Call Failed exception when the set is not a singleton.
Benefits
- Get Single From fails fast if the assumption (parameter
st
is a singleton) is not met. - If a binary relation
r
is injective (i.e. it models a function where each key only has one value) then all projectionsr[key]
should produce singleton values:{v}
. Using Get Single From to get the element out makes sure we fail fast in case our assumptions were wrong, or they have changed. - Never use Get First From or Take One From if you can use Get Single From.
Pitfalls
- Call Failed exceptions are sometimes hard to diagnose. Look at the stack trace to see that it was Get Single From that caused it, and then look at the parameter of Call Failed to see that the set was not a singleton.
function takeOneFrom
Remove an arbitrary element from a set, returns the element and a set without that element.
tuple[&T, set[&T]] takeOneFrom(set[&T] st)
Remove an arbitrary element from set s
and return a tuple consisting of the element and a set without that element.
Also see GetOneFrom.
Examples
rascal>import Set;
ok
rascal>takeOneFrom({1, 2, 3, 4});
tuple[int,set[int]]: <4,{1,3,2}>
rascal>takeOneFrom({1, 2, 3, 4});
tuple[int,set[int]]: <4,{1,3,2}>
rascal>takeOneFrom({1, 2, 3, 4});
tuple[int,set[int]]: <3,{1,2,4}>
function takeFirstFrom
Remove "first" element from a set, returns the element and a set without that element.
tuple[&T, set[&T]] takeFirstFrom({&T f, *&T r})
tuple[&T, set[&T]] takeFirstFrom(set[&T] _:{})
element of a set.
function toList
Convert a set to a list.
list[&T] toList(set[&T] st)
Examples
rascal>import Set;
ok
rascal>toList({1, 2, 3, 4});
list[int]: [1,3,2,4]
rascal>toList({"elephant", "zebra", "snake"});
list[str]: ["snake","zebra","elephant"]
Note that the same result can be obtained using splicing:
rascal>s = {1,2,3,4};
set[int]: {1,3,2,4}
rascal>l = [*s];
list[int]: [1,3,2,4]
Pitfalls
Recall that the elements of a set are unordered and that there is no guarantee in which order the set elements will be placed in the resulting list.
function toMap
Convert a set of tuples to a map; each key is associated with a set of values.
map[&A,set[&B]] toMap(rel[&A, &B] st)
Convert a set of tuples to a map in which the first element of each tuple is associated with the set of second elements of all tuples with the same first element.
Examples
rascal>import Set;
ok
rascal>toMap({<"a", 1>, <"b", 2>, <"a", 10>});
map[str, set[int]]: (
"a":{10,1},
"b":{2}
)
function toMapUnique
Convert a set of tuples to a map (provided that there are no multiple keys).
map[&A,&B] toMapUnique(rel[&A, &B] st) throws MultipleKey
Convert a set of tuples to a map. The result should be a legal map (i.e., without multiple keys).
Examples
rascal>import Set;
ok
rascal>toMapUnique({<"a", 1>, <"b", 2>, <"c", 10>});
map[str, int]: ("a":1,"b":2,"c":10)
Now explore an erroneous example:
rascal>toMapUnique({<"a", 1>, <"b", 2>, <"a", 10>});
|file:///home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/Set.rsc|(11682,520,<412,0>,<427,70>): MultipleKey("a",10,1)
at *** somewhere ***(|file:///home/runner/actions-runner/_work/rascal/rascal/src/org/rascalmpl/library/Set.rsc|(11682,520,<412,0>,<427,70>))
at toMapUnique(|prompt:///|(39,2,<1,39>,<1,41>))
ok
function toString
Convert a set to a string.
str toString(set[&T] st)
Examples
rascal>import Set;
ok
rascal>toString({1, 2, 3});
str: "{1,3,2}"
---
{1,3,2}
---
rascal>toString({"elephant", "zebra", "snake"});
str: "{\"snake\",\"zebra\",\"elephant\"}"
---
{"snake","zebra","elephant"}
---
Pitfalls
Recall that the elements of a set are unordered and that there is no guarantee in which order the set elements will be placed in the resulting string.
function itoString
Convert a set to an indented string.
str itoString(set[&T] st)
Examples
rascal>import Set;
ok
rascal>toString({1, 2, 3});
str: "{1,3,2}"
---
{1,3,2}
---
rascal>toString({"elephant", "zebra", "snake"});
str: "{\"snake\",\"zebra\",\"elephant\"}"
---
{"snake","zebra","elephant"}
---
Pitfalls
Recall that the elements of a set are unordered and that there is no guarantee in which order the set elements will be placed in the resulting string.
function sort
Sort the elements of a set. Sort the elements of a set: Use the built-in ordering on values to compare list elements. Give an additional lessThan
function that will be used to compare elements. This function lessThan
(<) function should implement a strict partial order, meaning: that it is not reflexive, i.e. never a < a
is anti-symmetric, i.e. never a < b && b < a
. * is transitive, i.e. if a < b
and b < c
then a < c
.
list[&T] sort(set[&T] s)
list[&T] sort(set[&T] l, bool (&T a, &T b) less)
Examples
rascal>import Set;
ok
rascal>import String;
ok
rascal>sort({10, 4, -2, 11, 100, 5});
list[int]: [-2,4,5,10,11,100]
rascal>fruits = {"mango", "strawberry", "pear", "pineapple", "banana", "grape", "kiwi"};
set[str]: {"mango","banana","pear","pineapple","grape","strawberry","kiwi"}
rascal>sort(fruits);
list[str]: ["banana","grape","kiwi","mango","pear","pineapple","strawberry"]
rascal>sort(fruits, bool(str a, str b){ return size(a) > size(b); });
list[str]: ["strawberry","pineapple","banana","mango","grape","kiwi","pear"]
function top
Produce the smallest k
elements of a set as sorted by the less
function.
list[&T] top(int k, set[&T] l, bool (&T a, &T b) less)
list[&T] top(int k, set[&T] l)
This function is fast if k
is relatively small, say 10 out of a 1000 elements.
It operates in O(n*k) time where n is the size of the set.
If k
is a larger value, say k > 10
, then it's perhaps better to just sort the entire set
using the asympotically faster (n*log^2(n)) sort function and take the first k
elements of the resulting list.
If k
is a negative number, top
will return the largest abs(k)
elements of the set instead of the smallest.
function union
Flatten a set of sets into a single set.
set[&T] union(set[set[&T]] sets)
function jaccard
Compute the Jaccard similarity between two sets.
real jaccard(set[value] x, set[value] y)