Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
crawlersNoticias
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
4
Issues
4
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
m3
crawlersNoticias
Commits
a96d6232
Commit
a96d6232
authored
7 years ago
by
Renán Sosa Guillen
Browse files
Options
Browse Files
Download
Plain Diff
merge foraneos
parents
5f865616
1033155b
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
0 deletions
+7
-0
noticias.py
.../foraneos/prensaGrafica/prensaGrafica/spiders/noticias.py
+7
-0
No files found.
descarga_hacia_atras/foraneos/prensaGrafica/prensaGrafica/spiders/noticias.py
View file @
a96d6232
...
...
@@ -22,6 +22,7 @@ def remove_tags(text):
return
TAG_RE
.
sub
(
''
,
text
)
DAT_RE
=
re
.
compile
(
r'-\d{8}-'
)
RE
=
re
.
compile
(
r'\n\xa0'
)
class
ImportantData
(
scrapy
.
Item
):
...
...
@@ -147,6 +148,12 @@ class QuotesSpider(scrapy.Spider):
for
p
in
response
.
css
(
'div.news-body'
)
.
css
(
'p'
)
.
extract
():
text
+=
remove_tags
(
p
)
+
"
\n
"
if
text
==
''
:
t
=
remove_tags
(
response
.
xpath
(
'//div[@class="news-body"]'
)
.
extract_first
())
res
=
RE
.
search
(
t
)
if
res
:
text
=
t
[:
t
.
rfind
(
res
.
group
(
0
))]
item
[
'text'
]
=
text
.
strip
()
item
[
'url'
]
=
response
.
url
...
...
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment