02. String 操作

由 datahunter 在日, 09/09/2012 - 00:42 發表

最後更新: 2020-02-10

介紹

String 是 built-in class 來, Class 名叫 str

有 immutable 的特性

String Function

Basic

s.len(s)
s.lower()
s.upper()
s.strip() # 去除頭尾的空格

checking

s.isalpha()
s.isdigit()
s.isspace()
s.startswith('start') # True, if s starts with 'start'
s.endswith('end')

search & replace

s.find('other') # return -1 if not found
s.replace('old', 'new' [,max]) # 把 s 內的 max 個 'old' replace 成 'new'. max default 1

split & join

s.split('delim') # 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']

s.join(list)

e.g.

', '.join("abc") # 'a, b, c'

'-'.join(['a1', 'b2', 'c3']) # 'a1-b2-c3'

轉換

chr(i) # return c
ord(c) # return i

cmp 比較:

print cmp(2,1) # 1

print cmp(1,1) # 0

print cmp(1,2) # -1

String attribute

string.lowercase
string.uppercase
string.letters (lowercase + uppercase)
string.digits
string.hexdigits
string.printable
string.whitespace # space, tab, linefeed, return, formfeed, and vertical tab

i.e.

import string
string.lowercase[:14]
'abcdefghijklmn'

Placeholder

方式

%X
str.format()
f-strings

%X

%d # int
%s # string
%f # floating point

i.e.

>>> format = "Hello, %s. %s enough for ya?"
>>> values = ('world', 'Hot')
>>> print format % values

Hello, world. Hot enough for ya?

Width and Precision

print '%.3s' % 'abcdefg' # abc

print '%.*s' % (3, 'abcdefg') # abc

i.e.

from math import pi

"%10.3f" % pi     # 10位.3位 '     3.142'

'%010.3f' % pi    # 加0      '000003.142'

'%-10.2f' % pi    # 靠左     '3.14      '

%s left align

%-6s

str.format()

print("My name is {}.".format("Datahunter"))

# Multiple placeholders

print('{0} : {1}, {2}'.format("string0", "string1", "string2"))

# Variable substitutions

print("{who} name is {name}.".format(name="Datahunter", who="My"))

# Value formatting

print('{:.2f}'.format(12345.12345))

print('{0:.2f}'.format(12345.12345))

import datetime
today = datetime.datetime.today()
print("{t:%B %d, %Y}".format(t=today))

# Padding Substitutions

<     left-align text in the field
^     center text in the field
>     right-align text in the field

# Dictionary for string formatting

acl = {"uid": 0, "username": "datahunter", "is_admin": "True"}

print('{uid}, {username}: {is_admin}'.format(**acl))

* 要用 "True" 不能用 boolean

f-strings

它新增於 PEP498, 可以實現 str.format() 相同的功能

i.e. netdata 的 API URL

api_url = "http://localhost:19999/api/v1/data?chart="
chart = "net.ens192"
after = 86400
url = f'{api_url}{chart}&after=-{after}'

i.e. datetime format

import datetime
today = datetime.datetime.today()
print(f"{today:%B %d, %Y}")

Out: "February 20, 2024"

Unicdoe

unicode type

* built-in types

unicode(string[, encoding, errors])

string = 8-bit strings

>>> unicode('abcdef')
u'abcdef'

相當於

>>> u'abcdef'

不同的 len

<1>

# len() 是計算 Byte sequence 的長度, 而不是"字"的長度

# 4

# 告訴 python source code 是以 Big5 來編碼的
# coding=Big5

text = '測試'

print len(text)

<2>

# len() 取得一個 unicode instance 的 length

# 2

# coding=Big5

text = u'測試'
print len(text)

<3>

# 4

# coding=utf-8

text = '中文'
print len(text)

讀取 UTF-8 檔案

# 讀取UTF-8文字檔案, 可用選項 Big5, UTF-8

file = open(name, 'r', encoding='UTF-8')

Encode 與 Decode

codecs - base classes for standard Python codecs

# 當 python 要做編碼轉換的時候

原有編碼 -> 內部編碼 -> 目的編碼

* Default 的原有 encoding 是 'ascii'
* 內部是使用 unicode

unicode 在 python 一共有兩種

UCS-2 (65536)
UCS-4 (2147483648)

* 編譯時通過 --enable-unicode=ucs2 或 --enable-unicode=ucs4 來指定的

# 查看是用 UCS-2 還是 UCS-4

import sys
print sys.maxunicode

Output

# Linux

# Window (UCS-2)

decode 與 encode

它們都是 Decodes the object input and returns a tuple

decode: 原本 -> python 內部編碼

converts a plain string encoded using a particular character set encoding to a Unicode object

encode: python 內部編碼 -> 原本

encoding converts a Unicode object to a plain string using a particular character set encoding

Usage

decode([codec])

codec = ascii, utf-8, big5 ... # default 是 ascii

用錯 codec 會有

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

# 把 mystr 用 utf-8 轉為內部 string

newstr = mystr.decode('utf-8')

# 內部 string 轉成 big output

mystr.encode('big5')

big5 -> 內部編碼的 len

#!/usr/bin/env python
# coding=big5

mytext = "中"

# 3
print len(mytext)

mybig5 = mytext.decode("big5")

# 1
print len(mybig5)

"codecs" Usage

import codecs

codecs.encode(obj[, encoding[, errors]])
codecs.decode(obj[, encoding[, errors]])

usage:

#!/usr/bin/env python
# coding=big5
import codecs

mytext = "中"

# 3
print len(mytext)

big5toutf8=codecs.lookup("big5")

# (u'\u4e2d', 2)
# (data, len)
print big5toutf8.decode(mytext)

用 codecs 的 open 打開文件. 在讀取的時候自動轉換為內部 unicode

ascii type()

<1>

#coding=utf-8

mytext=u"中文"

def isEng(msg):
    lang="en"
    for n in msg:
        if ord(n) >127:
            lang="zh"
    return lang

print isEng(mytext)

def is_ascii(s):
return all(ord(c) < 128 for c in s)

<2>

In Python 2, a string may be of type str or of type unicode

def whatisthis(s):

    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

# int
x = 5
print isinstance(x,int);

# ordinary string
y = '測試'
print isinstance(y,str);

# unicode string
z = u'測試'
print isinstance(z,unicode);

# str 與 unicode
print isinstance(y,basestring);

==================

type(s)

One will say unicode, the other will say str.

text = '測試'
print len(text) # 6

text = u'測試'
print len(text) # 2

type(s)

==================

s.decode('ascii')

# if it raises an exception, the string is not 100% ASCII.

s.decode('ascii')

ascii_*

>>> import string
>>> string.ascii_lowercase[:14]
'abcdefghijklmn'
>>> string.ascii_lowercase[:14:2]
'acegikm'

Newline char

print "line 1" + os.linesep + "line 2"

Remove newline in string

>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'

>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'

>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'

Using '\r\n' as the parameter to rstrip means that it will strip out any trailing combination of '\r' or '\n'.

That's why it works in all three cases above.

python check empty line

f = open(myfile)
for line in f.readlines():
        if not line.strip():
                continue
        else:
                print line

line.strip() is doing is removing whitespace from the ends of the string.

If the line is completely empty, or it's only spaces and tabs,

line.strip() will return an empty string '' which evaluates to false.

Print without newline

import sys
sys.stdout.write('some thing')
sys.stdout.flush()

print "this should be",
print "on the same line"

StringIO

介紹

- 支援多次寫野入 string buffer

Usage:

class StringIO.StringIO([buffer])

* If no string is given, the StringIO will start empty

# Retrieve the entire contents

StringIO.getvalue()

# Free the memory buffer.

StringIO.close()

Example:

import StringIO

output = StringIO.StringIO()

# 多次寫野入去
output.write('First line.\n')
print >> output, 'Second line.'

contents = output.getvalue()

output.close()

瀏覽次數： 117314

夢想家