Norway


Swift’s built-in  function returns an array of subsequences, split from a sequence or collection’s elements. For example, you may have a string and want to split it into words. Here’s a simple approach that consumes whitespace, newline, and punctuation characters:

import Foundation

let string = """
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Praesent in purus a ante semper congue posuere at lacus.
    Nullam faucibus sem vel sem vestibulum, a ullamcorper nunc
    auctor.  
    """

extension Character {
    var firstScalar: UnicodeScalar {
        return self.unicodeScalars.first ??  UnicodeScalar(0)
    }
}

// NOTE: This code demonstrates a simple way to split a collection.
// It's not a great way to actually tokenize a string, lacking both
// localization and in-word punctuation handling such as
// hyphens and single quotes.
let targetSet = CharacterSet.whitespacesAndNewlines
    .union(CharacterSet.punctuationCharacters)
let isWordSeparator: (Character) -> Bool = { c in targetSet.contains(c.firstScalar) }
let words = string.split(whereSeparator: isWordSeparator)

But what happens when you want to split a sequence or collection without ? There’s no built-in Swift solution to turn to.

A few months ago, Soroush Khanlou and I were playing around with this problem, trying to return subsequences around predicate boundaries. When the predicate failed, a new subsequence would begin. We wanted to break sequences  along logical lines, for example, where a value changed or a slope trend updated from increasing to decreasing:

let x = [1, 2, 2, 3, 3, 3, 1]
 .sliced(where: { $0 != $1 })
// [ArraySlice([1]), ArraySlice([2, 2]), ArraySlice([3, 3, 3]), ArraySlice([1])]
let z = [1, 2, 2, 1, 3, 3, 1].sliced(where: >)
// [ArraySlice([1, 2, 2]), ArraySlice([1, 3, 3]), ArraySlice([1])]

We decided to work with arrays and use ArraySlice instances. Each ArraySlice provides a view onto a larger array:

Instead of copying over the elements of a slice to new storage, an `ArraySlice` instance presents a view onto the storage of a larger array. And because `ArraySlice` presents the same interface as `Array`, you can generally perform the same operations on a slice as you could on the original array.

Slices don’t allocate new storage so the splits are efficient. Be careful not to hold onto slices long term. Each slice holds a reference to the array it describes, so you can run into memory issues if the slice outlives the array it points to.

For , reference the start and end of each slice with startIndex and endIndex. These values are often not zero or the array count, the way you’d expect with actual array instances:

let array = ["a", "b", "c", "d", "e"]
let slice = array[1...3]
print(slice.startIndex) // 1
print(slice.endIndex) // 4

Here’s the code I eventually settled on. It extends ArraySlice with a recursive slicing function that ends when it runs out of or exceeds the maximum partition count. For each slice, it walks the elements until it finds a member element that fails the predicate test, at which point it recurses with a new slice. The ArraySlice code is hidden behind an Array entry point, which wraps the slice functionality away from public view.

Here are some examples of how the code could be used to break apart arrays. They show creating subsequences and run length encoding. You could easily adapt this to more complex data. For example, you might use keypaths to chunk data structures based on stamps or look for inflection points for rising and falling values.

// Same value
let x = [1, 2, 2, 3, 3, 3, 1]
    .sliced(where: { $0 != $1 })
// [ArraySlice([1]), ArraySlice([2, 2]), ArraySlice([3, 3, 3]), ArraySlice([1])]

// Run-length encoding
let xx = [1, 2, 2, 3, 3, 3, 3, 3, 1, 1, 2, 1, 1]
  .sliced(where: !=)
  .map { ($0.count, $0.first!) }
// [(1, 1), (2, 2), (5, 3), (2, 1), (1, 2), (2, 1)]

// Strings
let y = Array("aaaabbbbcccdef")
  .sliced(where: !=)
  .map({ String($0) })
// ["aaaa", "bbbb", "ccc", "d", "e", "f"]

// Increasing subsequences
let z = [1, 2, 2, 1, 3, 3, 1].sliced(where: >)
// [ArraySlice([1, 2, 2]), ArraySlice([1, 3, 3]), ArraySlice([1])]

// Decreasing subsequences
let w = [1, 2, 2, 1, 3, 3, 1].sliced(where: <)
// [ArraySlice([1]), ArraySlice([2, 2, 1]), ArraySlice([3, 3, 1])]

Got suggestions for improvements? Drop a note in the comments. Thanks!



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here